Discussing validity at NCME

Austin FosseyPosted by Austin Fossey

To continue last week’s discussion about big themes at the recent NCME annual meeting, I wanted to give an update on conversations about validity.

Validity is a core concept in good assessment development, which we have frequently discussed on this blog. Even though this is such a fundamental concept, our industry is still passionately debating what constitutes validity and how the term should be used.

NCME hosted a wonderful coordinated session during which some of the big names in validity theory presented their thoughts on how the term validity should be used today.validity NCME

Michael Kane (Educational Testing Service) began the discussion with his well-established views around argument-based validity. In this framework, the test developer must make a validity claim about an inference (or interpretation as Kane puts it) and then support that claim with evidence. Kane argues that validity is not a property of the test or scores, but it is instead a property of the inferences we make about those scores.

If you have read some of my previous posts on validity or attended my presentations about psychometric principles, you already know that I am a strong proponent of Kane’s view that validity refers to the interpretations and use cases—not to the instrument.

But not everyone agrees. Keith Markus (City University of New York) suggested that nitpicking about whether the test or the inference is the object of validity causes us to miss the point. The test and the inference work only as a combination, so validity (as a term and as a research goal) should be applied to these as a pair.

Pamela Moss (University of Michigan) argued that we need to shift the focus of validity study away from intended inferences and use cases to the actual use cases. Moss believes that the actual use cases of assessment results can be quite varied and nuanced, but we are really more interested in these real-world impacts. She proposed that we work to validate what she called “conceptual uses.” For example, if we want to use education assessments to improve learning, then we need to research why students earn low scores.

Greg Cizek (University of North Carolina) disagreed with Kane’s approach, saying that the evidence we gather to support an inference says nothing about the use cases, and vice versa. Cizek argued that we make two inferential leaps: one from the score to the inference, and one from the inference to the use case. So we should gather evidence that supports both inferential leaps.

Though I see Cizek’s point, I feel that it would not drastically change how I would approach a validity study in practice. After all, you cannot have a use case without making an inference, so I would likely just tackle the inferences and their associated use cases jointly.

Steve Sireci (University of Massachusetts) felt similarly. Sireci is one of my favorite presenters on the topic of validity, plus he gets extra points for matching his red tie and shirt to the color theme on his slides. Sireci posed this question: can we have an inference without having a use case? If so, then we have a “useless” test, and while there may be useless tests out there, we usually only care about the tests that get used. As a result, Sireci suggested that we must validate the inference, but that this validation must also demonstrate that the inference is appropriate for the intended uses.

2 Responses to Discussing validity at NCME

  1. Howard Eisenberg Howard says:

    Can you provide some real world examples of the types of interfaces and use cases you are speaking of? What does this all mean for a Questionmark customer and end/user who works in the training and development function and designs training and assessments for new hires or existing employees?

  2. Austin Fossey says:

    Hi Howard,

    I would be happy to, though of course there is no limit to the types of inferences and use cases that could exist for assessment results. There are infinite possibilities! This is why it is important for test developers and psychometricians to conduct the validity research for their own program and to document inferences and use cases—both intended and actual.

    To answer your question, I am going to paraphrase Bachman’s article, “Building and Supporting a Case for Test Use” (2005), which is one of my favorite articles on the topic. Bachman is proposing an argument-based validity approach to validating inferences and use cases, and to illustrate his point, he uses the following example:

    “An international hotel company is hiring people to take room reservations over the phone. The hotel requires all employees who must work with customers to use English, and many of the applicants are not native speakers of English. Therefore, the company needs a screening test to determine if applicants have sufficient English ability to take room reservations over the phone. In addition to this language test, an assessment of professional knowledge and skills related to the specific job and to the hotel company’s administrative procedures in general is also given to applicants. Applicants’ scores on both assessments are considered in the employment decision” (Bachman, 2005).

    Bachman then walks through the argument structure for the inferences made about the results of these assessments. He provides an example where, based on the data from an assessment, the managers make a claim that a hypothetical employee has a low level of ability to interpret and record the information needed to make a reservation at the hotel. For this claim to become a valid inference, the managers must be able to show how the assessment response data supports that claim.

    But that is just the inference. The managers then need to act on that inference, which is the use case. In Bachman’s example, the inference now becomes the data for the use case, and the claim for the use case is that the employee will not be given the job of making reservations for the hotel. The managers must then again be able to provide evidence for why this is an appropriate decision given their interpretation of the assessment results.

    Bachman describes four warrants that need to be addressed to decide if an inference about the assessment results is appropriate for a specific use case:

    1. Is the assessment result relevant to the decision being made?
    2. Is the assessment result useful for the decision being made?
    3. Are the intended consequences of the decision beneficial (to the participant, the company, society, etc.)?
    4. Do the assessment results provide sufficient information to make the decision?

    So to summarize, inferences are how we interpret the results, and use cases are the decisions that are made based on those interpretations. A Questionmark customer who works in training and development (or really any test developer) should be able to define and defend the inferences and use cases of their assessment. This does not need to be a huge formal process, especially for small-scale, lower stakes assessments, but the principles are the same. Even if an instructor does not launch a big validity study for every classroom assessment, we still expect them to know why they are asking certain questions, how they will interpret their students’ responses, and how they will use those results in the classroom for grading and instructional decisions.

    I hope this is illustrative, and I encourage people to read Bachman’s article if they have time, since I think he does a great job of summarizing the issues and dissecting the different facets of validity arguments.



Leave a Reply

Your email address will not be published. Required fields are marked *