Transparent Quality: Building Trust with Psychometrics
Justin Paulsen & Allison BrckaLorenz--Whatever the data might be, someone is always going to ask, “Can you really trust those findings?” Effectively, that is a question about the validity of the data. There are a few different validity frameworks out in the literature (Borsboom, Mellenbergh, & van Heerden’s, 2004; Lissitz & Samuelson, 2007; Messick, 1989; etc.), but at the Center of Postsecondary Research, we subscribe to Messick’s framework of Unified Validity (see figure below) and provide such evidences for our NSSE, FSSE, and BCSSE surveys. We provide some examples in this blog, but more can be found at our NSSE and FSSE psychometric portfolio websites. These web pages are dedicated to studies of validity, reliability, and other indicators of data quality.
Recently researchers presented guidance on building and maintaining a portfolio of data quality at the 2018 Association for Institutional Research Forum in Orlando, FL. Examples of evidence to fill a portfolio are below.
Evidence based on Test Content
Evidence used to show that the content of the survey is representative of the content domain.
NSSE Example: In validating the Civic Engagement Topical module we examine key scholars in the literature (e.g., Ehrlich, Jacoby), ask experts in the area questions (e.g., Are we measuring what is important? Are these items actionable?), and examine existing associations’ standards (e.g., Institute for Democracy & Higher Education).
Evidence based on Response Process
Evidence that indicates the relationship between the construct and the actual performance on the item.
NSSE Example: In developing the 2013 revised survey, we conducted cognitive interviews with college students to see how they interpreted the questions. An initial draft included items that asked students about both “Synthesizing an idea, experience, or line of reasoning” and “Analyzing an idea, experience, or line of reasoning.” Students were unable to differentiate between synthesizing and analyzing, and so the “Synthesizing…” item was dropped, as it was a more difficult item to read.
Evidence based on Internal Structure
Evidence indicating the degree to which items relate to the conceptual framework.
Examples: NSSE and FSSE provide CFA model fits, inter-item correlations, internal consistency measures, and differential item functioning estimates of the various scales included in those surveys. These measures help users to understand the degree to which hypothesized scale items actually share common variance and measure the same latent construct.
Evidence based on Relations to Other Variables
Evidence indicating the degree to which relationships with other constructs follow expectations.
NSSE Example: NSSE examined the extent to which the NSSE scales related to second to third semester retention. The study found that after conditioning on pre-college ability, those who rated active and collaborative learning scales the highest had significantly higher retention rates than those who rated their active and collaborative learning experience lower.
Evidence based on Consequences of Testing
Evidence indicating the degree to which consequences of test score use are related to the conceptual framework of the test.
NSSE Example: NSSE solicits colleges and universities to share how they use their data and to what effect. The University of Nebraska at Lincoln shared:
NSSE findings called attention to the need to revisit UNL’s learning outcomes and the structure of its general education program. UNL provided each college with a detailed report of their students’ NSSE responses. Some colleges shared the results with other constituent groups (students, alumni, faculty members), and all colleges used the results as benchmark data.