Standard Setting: Compromise and Normative Methods

Austin Fossey

Posted by Austin Fossey

We have discussed the Angoff and Bookmark methods of standard setting, which are two commonly used methods, but there are many more. I would again refer the interested reader to Hambleton and Pitoniak’s chapter in Educational Measurement (4th ed.) for descriptions of other criterion-referenced methods.

Though criterion-referenced assessment is the typical standard-setting scenario, cut scores may also be determined for normative assessments. In these cases, the cut score is often not set to make an inference about the participant, but instead set to help make an operational decision.

A common example of a normative standard is when the pass rate is set based on information that is unrelated to participants’ performance. A company may decide to hire the ten highest-scoring candidates, not because the other candidates are not qualified, but because there are only ten open positions. Of course if the candidate pool is weak overall, even the ten highest performers may still turn out to be lousy employees.

We may also set normative standards based on risk tolerance. You may recall from our post about criterion validity that test developers may use a secondary measure that they expect to correlate with performance on the assessment. An employer may wish to set a cut score to minimize type I errors (false positives) because of the risk involved. For example, ability to fly a plane safely may correlate strongly with aviation test scores, but because of the risk involved if we let an unqualified person fly a plane, we may want to set the cut score high even though we will exclude some qualified pilots.

aviation 1

Normative Standard Setting with Secondary Criterion Measure

The opposite scenario may occur as well. If Type I errors have little risk, an employer may set the cut score low to make sure that all qualified candidates are identified. Unqualified candidates who happen to pass may be identified for additional training through subsequent assessments or workplace observation.

If we decided to use a normative approach to standard setting, we need to be sure that there is justification, and the cut score should not be used to classify individuals. A normative standard by its nature implies that not everyone will pass the assessment, regardless of their individual abilities, which is why it would be inappropriate for most cases in education or certification assessment.

Hambleton and Pitoniak also describe one final class of standard-setting methods called compromise methods. Compromise methods combine the judgment of the standard setters with information about the political realities of different pass rates. One example is the Hofstee Method, where stand setters define the highest acceptable cut score (1), the lowest acceptable cut score (2), highest acceptable fail rate (3), and the lowest acceptable fail rate (4). These are plotted against a curve of participants’ score data, and the intersection is used as a cut score.

 aviation 2Hofstee Method ExampleAdapted from Educational Measurement (Ed. Brennan, 2006)

Problem Questions and Summary – Item Writing Guide, Part 5

Doug Peterson Headshot

Posted By Doug Peterson

Let’s look at two more item writing problems. These last two are a little controversial.5- a

The stimulus for this question tells a wonderful story. The problem is, the first three sentences contain no information that relates to the question. A long stimulus full of extraneous, unneeded information can easily distract or confuse the test-taker. This item needs a re-write of the stimulus to get directly to the question at hand – and nothing else. Let’s change it to “Which Questionmark video explains how to use assessments in solving business problems?”

But here’s the controversy: This question is just fine if you’re trying to ascertain the test-taker’s ability to recognize pertinent information and ignore extraneous information! Therefore I won’t advise that you *never* use a question like this, only that you make sure you use it in the right situation.

And now, on to our last question in this series of posts.

5 -bAt first glance, there doesn’t appear to be a problem with this question – no repetition of a keyword, distracters are the same length, no grammar inconsistencies, short and to the point… But note the word “not” in the stimulus.

The other questions we’ve looked at in part 3 and part 4 of the series ask the test-taker to find the *correct* answer, but this question suddenly has them looking for an *incorrect* answer. This requires the test-taker to reverse their approach to the question, which can be very confusing.

That being said, there are some who advocate putting a certain number of negative questions on a *survey* to help ensure that the person filling it out is paying attention and not just flying through the questions. I’m not sure I agree with this approach. I feel that if they’re not interested and not paying attention to what they’re doing, negative questions aren’t going to change that, but they could lead to some very bad data being collected.

When it comes to quizzes, tests and exams, especially high-stakes exams, I strongly advise against using negative questions. If you absolutely must use a negative question, emphasize the negative by using all capital letters, bold it, and maybe even underline it.

So let’s pull it all together. It’s important to be fair to both the test-taker and the testing organization.

  • The test-taker should only be tested for the knowledge, skills or abilities in question, and nothing else.
  • The testing organization needs to be assured that the assessment accurately and reliably measures the test-taker’s knowledge, skills or abilities.

To do this, your assessment needs to be made up of well-written questions. To write good assessment questions:

  • Be careful with your wording so that you don’t create overly long or confusing questions.
  • Be concise. Sentences should be as short as possible while still posing the question clearly.
  • Keep it simple. Avoid compound sentences and use short, commonly used words whenever possible. Technical terminology is acceptable if it is part of what
    the test measures.
  • Make sure each question has a specific focus, and that you’re not actually testing multiple pieces of knowledge in a single question.
  • Always use positive phrasing to avoid confusion. If you have no choice but to use negative phrasing, make sure that the negative word – for example,
    “not” – is emphasized with capital letters, bold font, and/or underlining.
  • When creating distracters:
  • keep them all the same relative length,
  • as short as possible,
  • avoid using keywords from the stimulus,
  • watch out for grammatical cues, and
  • make sure that all distracters are reasonable answers within the context of the question.

As always, feel free to leave your comments, or contact me directly at doug.peterson@questionmark.com.

Summer webinars — including tips on better test planning and delivery

Joan Phaup HeadshotPosted by Joan Phaup

Students (and teachers) may be clicking their heels about summer vacation, but the joy of learning continues year-round for us!

Helping our customers understand how to use assessments effectively is as important to us as providing good testing and assessment technologies — we ’re keeping our web seminars going strong during the summer months.

Here’s the current line-up:

Questionmark Customers Online: Using Questionmark and SAP for Effective Learning and Compliance — June 20 at 1 p.m. Eastern Time:

Learn about the use of Questionmark and SAP for a wide array of learning and compliance needs, including safety training, certifications and regulatory compliance testing. This presentation by Kati Sulzberger of BNSF Railway also describes how Questionmark helped the company meet some unique test delivery requirements.

Five Steps to Better Tests: Best Practices for Design and Delivery — July 18 at noon Eastern Time:

Get practical tips for planning tests, creating items, and building, delivering and evaluating tests that yield actionable, meaningful results. Questionmark Product Owner Doug Peterson, who will present this webinar, previously spent more than 12 years in workforce development.  During that time, Doug created training materials, taught in the classroom and over the Web, and created many online surveys, quizzes and tests.

Questionmark Customers Online: Achieving a Better Assessment-Development Process — August 22 at 1 p.m. Eastern Time:

Need a better assessment building process? Find out how enterprise architecture principles can help you and your team work more efficiently.  Tom Metzler,  Knowledge Assessment Administrator at TIBCO Software, Inc.,  will explain how the company’s certification team uses well-established software architecture principles to continually improve the efficiency of its assessment development process. Find out how using systematic processes and thorough documentation result in better information for subject matter experts, time-savings and higher-quality assessments.

Introduction to Questionmark’s Assessment Management System — Choose from a variety of dates and times

This primer  explains and demonstrates key features and functions available in Questionmark OnDemand and Questionmark Perception. Spend  an hour with a Questionmark expert learning the basics of authoring, delivering and reporting on surveys, quizzes, tests and exams.

Click here for more details and free online registration.

Problems and Fixes — Item Writing Guide, Part 4

Doug Peterson Headshot Posted By Doug Peterson

In part 3 of this series on item writing, we began taking a look at some “problem questions” to figure out what was wrong with them and how to make them better. Let’s continue doing that.iwg 1

This is the ol’ “grammar give-away” problem. The stimulus ends in “a”, indicating that the answer begins with a consonant (or at least * should* begin with a consonant, if the assessment author is following standard rules of grammar). There’s only one choice that begins with a consonant, so the participant doesn’t need to know the answer – they just need to know a little grammar.

There are a couple of ways to fix this. One would be to end the stimulus with “a/an”. Another way would be to move the indefinite article (yes, I had to look that up) into the choice: an apple, a banana, an orange, and an eggplant.

Also be sure not to mix a singular in the stimulus with plurals in the choices, or vice versa. And if you’re writing questions in gender-specific languages like Spanish, French, or Italian, be sure to account for masculine and feminine definite and indefinite articles.

This question has a couple of things wrong with it:

iwg 2The first problem is pretty obvious. One choice is significantly longer than the other three. Typically this would mean that choice (b) is the correct answer, and in this case, that would be true.

Can you spot the other problem? It’s a little more subtle. The stimulus uses an important word – “strings” – and only the correct answer uses this word (in its singular form) as well. Without knowing anything about bass guitars, most people would answer this question correctly simply by noticing the use of the same important word in both the stimulus and one of the choices.

To fix this question, the second choice should be changed to something like “Set the intonation.” At that point the length of the correct choice is about the same as the length of the other choices, and the important word “string(s)” is not being used.

Please feel free to add your comments to this discussion – the more, the merrier! In our next installment, we’ll diagnose two more problems, and then wrap things up with a little summary.

To Your Health! What assessments do regulators require?

John Kleeman HeadshotPosted by John Kleeman

In Questionmark’s white paper, The Role of Assessments in Mitigating Risk for Financial Services Organizations, we shared advice  and requirements from financial services regulators about compliance-related testing for employees.

Do health care regulators also advise or require companies to test their employees to check understanding?

The answer is yes, and here are some examples.

The World Health Organization (WHO) states in its principles for good manufacturing practices for pharmaceutical products:

“Continuing training should also be given, and its practical effectiveness periodically assessed.”WHO | World Health Organization

WHO guidance also states:

“If training is conducted to achieve a goal, it is reasonable to ask if the goals of the
organization’s training programme and the specific training course have been attained or not. Assessment and evaluation are conducted to determine if the goals have been met.

European Commission logo

The European Commission directive 2005/62/EX requires for organizations handling blood that

“Training programmes shall be in place and shall include good
practice. The contents of training programmes shall be periodically assessed and the competence of personnel evaluated regularly.”

The US Department  of Health and Human Services in its Compliance Program Guidance for Medicare Contractors states:

US Department of Health & Human Services“Contractors should consider using tests or other mechanisms to determine the trainees’ comprehension of the training concepts presented.”

Also in the US, the Pharmacy Compounding Accreditation Board (PCAB) gives guidance that

PCAB.org“The pharmacy has SOPs for educating, training, and assessing the competencies of all compounding personnel on an ongoing basis, including documentation that compounding personnel is trained on SOPs.”

Just like in financial services, health care regulators strongly encourage and in some cases require that regulated organizations test their employees to ensure that they have understood training and that they are competent to do their jobs.

One thing health care regulators emphasize more than those overseeing financial services  is the merit of giving  observational assessments  as well as knowledge tests — presumably because skills are often more practical. For example PCAB guidance says that:

“Staff competency can be evaluated by a combination of … direct observation … written tests [and] … other quality control activities”

Previously, in this series on assessments in health care, I’ve covered good practice in competency testing in the health care industry and shared analysis of why errors are made and how testing can help. I hope these examples of regulator guidance and requirements are also useful.

Come to Barcelona for the European Users Conference November 10 – 12

Joan Phaup HeadshotPosted by Joan Phaup

Getting together with fellow users of Questionmark technologies is one of the best ways to learn best practices and discover new uses for online assessments. So we’re delighted to announce plans for the Questionmark 2013 European Users Conference in Barcelona November 10 – 12.

Mark your calendar now for this important learning event, and register as soon as you can. Here’s what you can expect during this gathering:conf-photo-collage2

  • Real-world case studies by Questionmark users
  • Introductions to new solutions and features
  • Sessions explaining Questionmark features & functions
  • Presentations about testing and assessment best practices
  • Opportunities to influence future solutions
  • One-on-one meetings with Questionmark technicians
  • Plenty of time to network with your peers

The conference will take place at Hotel Fira Palace in the heart of Barcelona — between the famous Plaza de España and Gran Via Avenue and within easy reach of other parts of the city as well as the airport.

The call for presentation proposals is open until July 1st — so take this opportunity to share experiences and generate discussion among colleagues.

Early-bird registration discounts available until June 30th 2013, so sign up soon and start making your plans for Barcelona.

2013 banner

 

Next Page »
Microsoft SAP SAP Certified Oracle HR-XML AAIC