How much do you know about defensible assessments?

Julie Delazyn HeadshotPosted by Julie Delazyn

This quiz is a re-post from a very popular blog entry published by John Kleeman.

Readers told us that it was instructive and engaging to take quizzes on using assessments, and we like to listen to you! So here is the second quiz in a pre-published series of quizzes on assessment topics. This one was authored in conjunction with Neil Bachelor of Pure Questions. You can see the first quiz on Cut Scores here.

As always, we regard resources like this quiz as a way of contributing to the ongoing process of learning about assessment. In that spirit, please enjoy the quiz below and feel free to comment if you have any suggestions to improve the questions or the feedback.

Is a longer test likely to be more defensible than a shorter one? Take the quiz and find out. Be sure to look for your feedback after you have completed it!

NOTE: Some people commented on the first quiz that they were surprised to lose marks for getting questions wrong. This quiz uses True/False questions and it is easy to guess at answers, so we’ve set it to subtract a point for each question you get wrong, to illustrate that this is possible. Negative scoring like this encourages you to answer “Don’t Know” rather than guess; this is particularly helpful in diagnostic tests where you want participants to be as honest as possible about what they do or don’t think they know.

Standard Setting – How Much Does the Ox Weigh?

Austin FosseyPosted by Austin Fossey

At the Questionmark 2013 Users Conference, I had an enjoyable debate with one of our clients about the merits and pitfalls underlying the assumptions of standard setting.

We tend to use methods like Angoff or the Bookmark Method to set standards for high-stakes assessments, and we treat the resulting cut scores as fact, but how can we be sure that the results of the standard setting reflect reality?

In his book, The Wisdom of Crowds, James Surowiecki recounts a story about Sir Francis Galton visiting a fair in 1906. Galton observed a game where people could guess the weight of an ox, and whoever was closest would win a prize.

Because guessing the weight of an ox was considered to be a lot of fun in 1906, hundreds of people lined up and wrote down their best guess. Galton got his hands on their written responses and took them home. He found that while no one guess was exactly right, the crowd’s mean guess was pretty darn good: only one pound off from the true weight of the ox.weight ox

We cannot expect any individual’s recommended cut score in a standard setting session to be spot on, but if we select a representative sample of experts and provide them with relevant information about the construct and impact data, we have a good basis for suggesting that their aggregated ratings are a faithful representation of the true cut score.

This is the nature of education measurement—our certainty about our inferences is dependent on the amount of data we have and the quality of that data. Just as we infer something about a student’s true abilities based on their responses to carefully selected items on a test, we have to infer something about the true cut score based on our subject matter experts’ responses to carefully constructed dialogues in the standard setting process.

We can also verify cut scores through validity studies, thus strengthening the case for our stakeholders. So take heart—your standard setters as a group have a pretty good estimate on the weight of that ox.

Webinar: Using the Angoff method to set cut scores

Posted by Joan Phaup

How do you set appropriate pass/fail scores for competency tests?

We learned a lot about this during this year’s Questionmark Users Conference from two customers who have used the Angoff method for setting cut scores and think it’s a practical answer to this question.

Alan H. Wheaton and James R. Parry, who are involved respectively in curriculum management and test development for a large government agency, regard the Angoff method as a systematic,effective approach to establishing pass/fail scores for advancement tests. They will share their experiences and lessons learned during a Questionmark Customers Online webinar at 1 p.m. Eastern Time on Thursday, May 31.

Click here to sign up for Using the Angoff Method to Set Cut Scores, and plan to join us for an hour at the end of this month. The webinar will explain a five-step process for implementing the Angoff method as a way to improve the defensibility of your tests.

How much do you know about assessment? Quiz 4: Trialling questions

Posted by John Kleeman

Here is the fourth of our series of quizzes on assessment subjects, authored in conjunction with Neil Bachelor of Pure Questions. You can see the first quiz, on Cut Scores here, the second, on Validity and Defensibility here, and the third on use of formative quizzes here.

This week’s quiz is on trialling (or piloting) questions, the important process of checking the questions prior to using them in production.

We regard resources like this quiz as a way of contributing to the ongoing process of learning about assessment. In that spirit, please enjoy the quiz below and feel free to comment if you have any suggestions to improve the questions.

.

Questionmark software is very effective for beta testing and trialling questions. You can easily construct a trial assessment containing selected questions for trial delivery, and you can easily schedule certain people to be able to trial. You can also easily change delivery settings, e.g. require secure browser / not require secure browser, require monitoring / not require monitoring to suit the need. Beta questions can also be included within production exams by setting them as “Experimental”, which means that they gather statistics but don’t count to the score of the participant.

How much do you know about assessment? Quiz 1: Cut Scores

Posted by John Kleeman

Taking quizzes is a fun way to learn. One of the Questionmark blog’s most popular-ever entries was Howard Eisenberg’s Take our Quiz on Writing Good Test Questions, back in 2009.

Readers told us that it was instructive and engaging to take quizzes on using assessments, and we like to listen to you! So here is the first of a new series of quizzes on assessment topics. This week’s quiz is on setting a cut score (pass score).  The questions, written for us by Neil Bachelor of Pure Questions, are about what to do when designing a diagnostic test for safety procedures.

We regard resources like this quiz as a way of contributing to the ongoing process of learning about assessment. In that spirit, please enjoy the quiz below and feel free to comment if you have any suggestions to improve the questions.

Now on to the quiz! Be sure to look for your feedback after you have completed it!

Standard Setting: An Introduction

greg_pope-150x1502

Posted by Greg Pope

Standard setting was a topic of considerable interest to attendees at the Questionmark 2010 Users Conference  in March.We had some great discussions about standard setting methods and practical applications in some of the sessions I was leading, so I thought I would share some details about this topic here.

Standard setting is generally used in summative criterion referenced contexts. It is the process of setting a “pass/fail” score that distinguishes those participants who have the minimum acceptable level of competence in an area to pass from those participants who do not have the minimum acceptable level of competence in an area. For example, in a crane operation certification course, participants would be expected to have a certain level of knowledge and skills to operate a crane successfully and safely. In addition to a practical test (e.g., operation of a crane in a safe environment) candidates may also be required to take a crane certification exam in which they would need to achieve a certain minimum score in order to be allowed to operate a crane. On the crane certification exam a pass score of 75% or higher is required for a candidate to be able to operate a crane; anything below 75% and they would need to take the course again. Cut scores do not only refer to pass/fail benchmarks. For example, organizations may have several cut scores within an assessment that differentiate between “Advanced”, “Acceptable”, and “Failed” levels.

Cut scores are very common in high and medium-stakes assessment programs; well established processes for setting these cut scores and maintaining them across administrations are available. Generally, one would first build/develop the assessment with the cut score in mind. This would entail selecting questions that represent the proportionate topics areas being covered, ensuring an appropriate distribution of difficulty of the questions, and selecting more questions in the cut score range to maximize the “measurement information” near the cut score.

Once a test form is built it would undergo formal standard setting procedures to set or confirm the cut score(s). Here is a general overview of a typical Modified Angoff type standard setting process:

typical Modified Angoff type standard setting process

Stay tuned for my next post on this topic, in which I will describe some standard setting methods for establishing cut scores.