Standard Setting: Compromise and Normative Methods

Austin Fossey

We have discussed the Angoff and Bookmark methods of standard setting, which are two commonly used methods, but there are many more. I would again refer the interested reader to Hambleton and Pitoniak’s chapter in Educational Measurement (4th ed.) for descriptions of other criterion-referenced methods.

Though criterion-referenced assessment is the typical standard-setting scenario, cut scores may also be determined for normative assessments. In these cases, the cut score is often not set to make an inference about the participant, but instead set to help make an operational decision.

A common example of a normative standard is when the pass rate is set based on information that is unrelated to participants’ performance. A company may decide to hire the ten highest-scoring candidates, not because the other candidates are not qualified, but because there are only ten open positions. Of course if the candidate pool is weak overall, even the ten highest performers may still turn out to be lousy employees.

We may also set normative standards based on risk tolerance. You may recall from our post about criterion validity that test developers may use a secondary measure that they expect to correlate with performance on the assessment. An employer may wish to set a cut score to minimize type I errors (false positives) because of the risk involved. For example, ability to fly a plane safely may correlate strongly with aviation test scores, but because of the risk involved if we let an unqualified person fly a plane, we may want to set the cut score high even though we will exclude some qualified pilots.

Normative Standard Setting with Secondary Criterion Measure

The opposite scenario may occur as well. If Type I errors have little risk, an employer may set the cut score low to make sure that all qualified candidates are identified. Unqualified candidates who happen to pass may be identified for additional training through subsequent assessments or workplace observation.

If we decided to use a normative approach to standard setting, we need to be sure that there is justification, and the cut score should not be used to classify individuals. A normative standard by its nature implies that not everyone will pass the assessment, regardless of their individual abilities, which is why it would be inappropriate for most cases in education or certification assessment.

Hambleton and Pitoniak also describe one final class of standard-setting methods called compromise methods. Compromise methods combine the judgment of the standard setters with information about the political realities of different pass rates. One example is the Hofstee Method, where stand setters define the highest acceptable cut score (1), the lowest acceptable cut score (2), highest acceptable fail rate (3), and the lowest acceptable fail rate (4). These are plotted against a curve of participants’ score data, and the intersection is used as a cut score.

 aviation 2Hofstee Method ExampleAdapted from Educational Measurement (Ed. Brennan, 2006)

What is the Angoff Method?

When creating tests that define levels of competency as they relate to performance, it’s essential to use a reliable method for establishing defensible pass/fail scores.

One of these is the Angoff Method, which uses a focus-group approach for this process. This method has a strong track record and is widely accepted by testing professionals and courts.

Subject-matter experts (SMEs) review each test question and then predict how many minimally-qualified candidates would answer the item correctly. The average of the judges’ predictions for test questions is used to calculate the passing percentage (cut score) for a test.

Basing cut scores on empirical data instead of choosing arbitrary passing scores helps test developers produce legally defensible tests that meet the Standards for Educational and Psychological Testing. The Angoff Method offers a practical way to achieve this.

Standard Setting: An Introduction


Standard setting was a topic of considerable interest to attendees at the Questionmark 2010 Users Conference  in March.We had some great discussions about standard setting methods and practical applications in some of the sessions I was leading, so I thought I would share some details about this topic here.

Standard setting is generally used in summative criterion referenced contexts. It is the process of setting a “pass/fail” score that distinguishes those participants who have the minimum acceptable level of competence in an area to pass from those participants who do not have the minimum acceptable level of competence in an area. For example, in a crane operation certification course, participants would be expected to have a certain level of knowledge and skills to operate a crane successfully and safely. In addition to a practical test (e.g., operation of a crane in a safe environment) candidates may also be required to take a crane certification exam in which they would need to achieve a certain minimum score in order to be allowed to operate a crane. On the crane certification exam a pass score of 75% or higher is required for a candidate to be able to operate a crane; anything below 75% and they would need to take the course again. Cut scores do not only refer to pass/fail benchmarks. For example, organizations may have several cut scores within an assessment that differentiate between “Advanced”, “Acceptable”, and “Failed” levels.

Cut scores are very common in high and medium-stakes assessment programs; well established processes for setting these cut scores and maintaining them across administrations are available. Generally, one would first build/develop the assessment with the cut score in mind. This would entail selecting questions that represent the proportionate topics areas being covered, ensuring an appropriate distribution of difficulty of the questions, and selecting more questions in the cut score range to maximize the “measurement information” near the cut score.

Once a test form is built it would undergo formal standard setting procedures to set or confirm the cut score(s). Here is a general overview of a typical Modified Angoff type standard setting process:

typical Modified Angoff type standard setting process

Stay tuned for my next post on this topic, in which I will describe some standard setting methods for establishing cut scores.