Standard Setting: Compromise and Normative Methods

Austin Fossey

Posted by Austin Fossey

We have discussed the Angoff and Bookmark methods of standard setting, which are two commonly used methods, but there are many more. I would again refer the interested reader to Hambleton and Pitoniak’s chapter in Educational Measurement (4th ed.) for descriptions of other criterion-referenced methods.

Though criterion-referenced assessment is the typical standard-setting scenario, cut scores may also be determined for normative assessments. In these cases, the cut score is often not set to make an inference about the participant, but instead set to help make an operational decision.

A common example of a normative standard is when the pass rate is set based on information that is unrelated to participants’ performance. A company may decide to hire the ten highest-scoring candidates, not because the other candidates are not qualified, but because there are only ten open positions. Of course if the candidate pool is weak overall, even the ten highest performers may still turn out to be lousy employees.

We may also set normative standards based on risk tolerance. You may recall from our post about criterion validity that test developers may use a secondary measure that they expect to correlate with performance on the assessment. An employer may wish to set a cut score to minimize type I errors (false positives) because of the risk involved. For example, ability to fly a plane safely may correlate strongly with aviation test scores, but because of the risk involved if we let an unqualified person fly a plane, we may want to set the cut score high even though we will exclude some qualified pilots.

aviation 1

Normative Standard Setting with Secondary Criterion Measure

The opposite scenario may occur as well. If Type I errors have little risk, an employer may set the cut score low to make sure that all qualified candidates are identified. Unqualified candidates who happen to pass may be identified for additional training through subsequent assessments or workplace observation.

If we decided to use a normative approach to standard setting, we need to be sure that there is justification, and the cut score should not be used to classify individuals. A normative standard by its nature implies that not everyone will pass the assessment, regardless of their individual abilities, which is why it would be inappropriate for most cases in education or certification assessment.

Hambleton and Pitoniak also describe one final class of standard-setting methods called compromise methods. Compromise methods combine the judgment of the standard setters with information about the political realities of different pass rates. One example is the Hofstee Method, where stand setters define the highest acceptable cut score (1), the lowest acceptable cut score (2), highest acceptable fail rate (3), and the lowest acceptable fail rate (4). These are plotted against a curve of participants’ score data, and the intersection is used as a cut score.

 aviation 2Hofstee Method ExampleAdapted from Educational Measurement (Ed. Brennan, 2006)

One Response to “Standard Setting: Compromise and Normative Methods”

  1. Matt Barney says:

    Judgment-based approaches are not the only class of approaches that can be considered. I’m fond of decision-theoretic approaches that explicitly model the costs and benefits of a given standard.

    I’ve done some work on this with the Cue See model, looking not only at the tradeoffs of stringency and leniency of the cut score for a single test, but across several. Further, the cost of a type 1 and 2 error are not the only costs involved. For employee selection or training standards, standards that are too high or low can have their own operational costs.

    In one study I did when I was at Infosys to set standards for the senior-most leaders, that neither Angoff nor Bookmark (using a Many-Facet Rasch Model) produced standards that the leaders themselves could meet. Only the Cue See approach respected tradeoffs, especially around the fact that a non-compensatory multiple hurdle approach makes it increasingly more difficult to pass, and having a large number of tests makes even a medium standard nearly impossible for a given person to pass all tests.

Leave a Reply