Posted by John Kleeman
Is a test better if it has a higher pass (or cut) score?
For example, if you develop a test to check that people know material for regulatory compliance purposes, is it better if the pass score is 60%, 70%, 80% or 90%? And is your organization safer if your test has a high pass score?
To answer this question, you first need to know the purpose of the test – how the results will be used and what inferences you want to make from it. Most compliance tests are criterion-referenced – that is to say they measure specific skills, knowledge or competency. Someone who passes the test is competent for the job role; and someone who fails has not demonstrated competence and might need remedial training.
Before considering a pass score, you need to consider whether questions are substitutable, i.e. that you can balance getting certain questions wrong and others right, and still be competent. It could be that getting particular questions wrong implies lack of competence, even if everything else is answered correctly. (For another way of looking at this, see Comprehensive Assessment Framework: Building the student model.) If a participant performs well on many items but gets a crucial safety question wrong, they still fail the test. See Golden Topics- Making success on key topics essential for passing a test for one way of creating tests that work like this in Questionmark.
But assuming questions are substitutable and that a single pass score for a test is viable, how do you work out what that pass score should be? The table below shows 4 possible outcomes:
||Error of rejection
|Participant not competent
||Error of acceptance
Providing that the test is valid and reliable, a competent participant should pass the test and a not-competent one should fail it.
Clearly, picking a pass score as a number “out of a hat” is not the right way to approach this. For a criterion-referenced test, you need to match the pass score to the way your questions measure competence. If you have too high a pass score, then you increase the number of errors of rejection: competent people are rejected and you will waste time re-training them and having them re-take the test. If you have too low a pass score, you will have too many errors of acceptance: not competent people are accepted with potential consequences for how they do the job..
You need to use informed judgement or statistical techniques to choose a pass score that supports valid inferences about the participants’ skills, knowledge or competence in the vast majority of cases. This means the number of errors or misclassifications is tolerable for the intended use-case. One technique for doing this is the Angoff method, as described in this SlideShare. Using Angoff, you classify each question by how likely it is that a minimally- competent participant would get it right, and then roll this up to work out the pass score.
Going back to the original question of whether a better test has a higher pass score, what matters is that your test is valid and reliable and that your pass score is set to the appropriate level to measure competency. You want the right pass score, not necessarily the highest pass score.
So what happens if you set your pass score without going through this process? For instance, you say that your test will have an 80% pass score before you design it. If you do this, you are assuming that on average all the questions in the test will have an 80% chance of being answered correctly by a minimally-competent participant. But unless you have ways of measuring and checking that, you are abandoning logic and trusting to luck.
In general, a lower pass score does not necessarily imply an easier assessment. If the items are very difficult, a low pass score may still yield low pass rates. Pass scores are often set with a consideration for the difficulty of the items, either implicitly or explicitly.
So, is a test better if it has a higher pass score?
The answer is no. A test is best if it has the right pass score. And if one organization has a compliance test where the pass score is 70% and another has a compliance test where the pass score is 80%, this tells you nothing about how good each test is. You need to ask whether the tests are valid and reliable and how the pass scores were determined. There is an issue of “face validity” here: people might find it hard to believe that a test with a very low pass score is fair and reasonable, but in general a higher pass score does not make a better test.
If you want to learn more about setting a pass score, search this blog for articles on “standard setting” or “cut score” or read the excellent book Criterion-Referenced Test Development, by Sharon Shrock and Bill Coscarelli. We’ll also be talking about this and other best practices at our upcoming Users Conferences in Barcelona November 10-12 and San Antonio, Texas, March 4 – 7.