Posted by Austin Fossey
To continue last week’s discussion about big themes at the recent NCME annual meeting, I wanted to give an update on conversations about validity.
Validity is a core concept in good assessment development, which we have frequently discussed on this blog. Even though this is such a fundamental concept, our industry is still passionately debating what constitutes validity and how the term should be used.
Michael Kane (Educational Testing Service) began the discussion with his well-established views around argument-based validity. In this framework, the test developer must make a validity claim about an inference (or interpretation as Kane puts it) and then support that claim with evidence. Kane argues that validity is not a property of the test or scores, but it is instead a property of the inferences we make about those scores.
If you have read some of my previous posts on validity or attended my presentations about psychometric principles, you already know that I am a strong proponent of Kane’s view that validity refers to the interpretations and use cases—not to the instrument.
But not everyone agrees. Keith Markus (City University of New York) suggested that nitpicking about whether the test or the inference is the object of validity causes us to miss the point. The test and the inference work only as a combination, so validity (as a term and as a research goal) should be applied to these as a pair.
Pamela Moss (University of Michigan) argued that we need to shift the focus of validity study away from intended inferences and use cases to the actual use cases. Moss believes that the actual use cases of assessment results can be quite varied and nuanced, but we are really more interested in these real-world impacts. She proposed that we work to validate what she called “conceptual uses.” For example, if we want to use education assessments to improve learning, then we need to research why students earn low scores.
Greg Cizek (University of North Carolina) disagreed with Kane’s approach, saying that the evidence we gather to support an inference says nothing about the use cases, and vice versa. Cizek argued that we make two inferential leaps: one from the score to the inference, and one from the inference to the use case. So we should gather evidence that supports both inferential leaps.
Though I see Cizek’s point, I feel that it would not drastically change how I would approach a validity study in practice. After all, you cannot have a use case without making an inference, so I would likely just tackle the inferences and their associated use cases jointly.
Steve Sireci (University of Massachusetts) felt similarly. Sireci is one of my favorite presenters on the topic of validity, plus he gets extra points for matching his red tie and shirt to the color theme on his slides. Sireci posed this question: can we have an inference without having a use case? If so, then we have a “useless” test, and while there may be useless tests out there, we usually only care about the tests that get used. As a result, Sireci suggested that we must validate the inference, but that this validation must also demonstrate that the inference is appropriate for the intended uses.