Is a compliance test better with a higher pass score?

John Kleeman portraitPosted by John Kleeman

Is a test better if it has a higher pass (or cut) score?

For example, if you develop a test to check that people know material for regulatory compliance purposes, is it better if the pass score is 60%, 70%, 80% or 90%? And is your organization safer if your test has a high pass score?

To answer this question, you first need to know the purpose of the test – how the results will be used and what inferences you want to make from it. Most compliance tests are criterion-referenced – that is to say they measure specific skills, knowledge or competency. Someone who passes the test is competent for the job role; and someone who fails has not demonstrated competence and might need remedial training.

Before considering a pass score, you need to consider whether questions are substitutable, i.e. that you can balance getting certain questions wrong and others right, and still be competent.  It could be that getting  particular questions wrong implies lack of competence, even if everything else is answered correctly. (For another way of looking at this, see Comprehensive Assessment Framework: Building the student model.) If a participant performs well on many items but gets a crucial safety question wrong, they still fail the test. See Golden Topics- Making success on key topics essential for passing a test for one way of creating tests that work like this in Questionmark.

But assuming questions are substitutable and that a single pass score for a test is viable, how do you work out what that pass score should be? The table below shows 4 possible outcomes:

Pass test Fail test
Participant competent Correct decision Error of rejection
Participant not competent Error of acceptance Correct decision

Providing that the test is valid and reliable, a competent participant should pass the test and a not-competent one should fail it.

Picking a number at randomClearly, picking a pass score as a number “out of a hat” is not the right way to approach this. For a criterion-referenced test, you need to match the pass score to the way your questions measure competence. If you have too high a pass score, then you increase the number of errors of rejection: competent people are rejected and you will waste time re-training them and having them re-take the test. If you have too low a pass score, you will have too many errors of acceptance: not competent people are accepted with potential consequences for how they do the job..

You need to use informed judgement or statistical techniques to choose a pass score that supports valid inferences about the participants’ skills, knowledge or competence in the vast majority of cases. This means the number of errors or misclassifications is tolerable for the intended use-case. One technique for doing this is the Angoff method, as described in this SlideShare. Using Angoff, you classify each question by how likely it is that a minimally- competent participant would get it right, and then roll this up to work out the pass score.

Going back to the original question of whether a better test has a higher pass score, what matters is that your test is valid and reliable and that your pass score is set to the appropriate level to measure competency. You want the right pass score, not necessarily the highest pass score.

So what happens if you set your pass score without going through this process? For instance, you say that your test will have an 80% pass score before you design it.  If you do this, you are assuming that on average all the questions in the test will have an 80% chance of being answered correctly by a minimally-competent participant. But unless you have ways of measuring and checking that, you are abandoning logic and trusting to luck.

In general, a lower pass score does not necessarily imply an easier assessment. If the items are very difficult, a low pass score may still yield low pass rates. Pass scores are often set with a consideration for the difficulty of the items, either implicitly or explicitly.

So, is a test better if it has a higher pass score?

The answer is no. A test is best if it has the right pass score. And if one organization has a compliance test where the pass score is 70% and another has a compliance test where the pass score is 80%, this tells you nothing about how good each test is. You need to ask whether the tests are valid and reliable and how the pass scores were determined. There is an issue of “face validity” here: people might find it hard to believe that a test with a very low pass score is fair and reasonable, but in general a higher pass score does not make a better test.

If you want to learn more about setting a pass score, search this blog for articles on “standard setting” or “cut score” or read the excellent book Criterion-Referenced Test Development, by Sharon Shrock and Bill Coscarelli. We’ll also be talking about this and other best practices at our upcoming Users Conferences in Barcelona November 10-12 and San Antonio, Texas, March 4 – 7.

Boot Camp for Beginners set for March 4, 2014

Joan Phaup 2013 (3)Posted by Joan Phaup

New customers who attended the Questionmark Users Conference in the past used to tell us that some hands-on instruction before the start of the conference would help them get a lot more out of the proceedings.

Enter Questionmark Boot Camp: Basic Training for Beginners – where people learn the basics before they join the throng at the conference. This full-day workshop has become a popular pre-conference option,  and we’re bringing it back again on Tuesday, March 4, 2014 – right before the Questionmark 2014 Users Conference in San Antonio.

I don’t get  to attend Boot Camp, but I did spend a few minutes talking about it with our trainer, Rick Ault, who has as much fun there as his pupils:

Rick Ault

Rick Ault

What happens at Boot Camp?
We talk about all the different Questionmark tools — how they are used together — and give people a solid understanding of the process and what they can do to build meaningful assessments. It’s hands-on. We ask people to bring their own laptops so that we can give them actual practice using the software. They have the chance to use Questionmark Live to build questions and assessments, then put that content onto a server to see it work.

Who should attend?
Any new users of Questionmark would benefit from it, because it’s designed to give people an understanding of what the product does and how it works.

What should they bring?
They should bring their laptops, plus some ideas for how they might like to use Questionmark. They should also bring some ideas for some fun content that they might want to create?

How does Boot Camp prepare new customers for the Users Conference?
It gives them exposure to all of the tools, and it helps them understand the process. By getting some hands-on experience with our technologies, they will be able to make better choices about what conference tracks and sessions to attend. They’ll also be able to think of meaningful questions to ask at the conference.

What do YOU like best about Boot Camp?
I like meeting new customers, and I like seeing their happiness when they create something. It’s great to see the birth of their new content as they join the Questionmark family!

Newcomers to Questionmark can join the familhy in style by attending Boot Camp. You can sign up when you register for the conference.

EATP Keynotes: insights on instructional technologies, learning and assessment

Jim Farrell HeadshotPosted by Jim Farrell

The best part of being a product manager is visiting customers. It is inspiring to see your product solving real-world business problems and contributing to the success of many organizations.

I recently got the opportunity to visit some of our customers while attending the European Association of Test Publishers (E-ATP) conference. I have attended this conference for many years in the US (ATP), but this was the first time attending in Europe. Both conferences bring together thought leaders and display real-world examples of how assessment programs benefit organizations — from formative assessment and quizzing to life-and-limb certification and compliance testing.

The highlights of the conference for me this year were the two keynotes, as I felt they were perfect bookends to the conference program (which included many presentations by Questionmark customers and team members).

The first was by Steve Wheeler (@timbuckteeth), an Associate Professor of Learning Technologies at Plymouth University in South West England. He painted a picture of where we are today with the use of instructional technologies. I have always said it is not the technology but the teaching methods that need to change to improve test scores. The threat of tests does not improve learning: good pedagogy improves learning. Steve compared teaching today to sitting on an airplane — everyone sitting in rows, facing forward, waiting for something to happen. He promoted the use of ipsative assessment ( which our chairman recently wrote about) and trans-literacy, which is showing knowledge across many different types of media. The theme that carried through his keynote was feedback. Feedback is vital to learning but often not included in assessment.

The closing keynote was much more application and less blue sky. The leaders of the session were Sue Stanhope and David Rippon from the Sunderland City Council in the UK. They painted a story of an economy that was going through dramatic change. with job loses throughout the region. By using assessments and job matching, they were able to match people with their strengths and put them into jobs that inspired them. The message is clear: assessment is not just to see what you know. It can be used to guide learning and careers, too. The success stories left everyone excited to take what they learned out into the world of learning, achievement, competency and performance.

Conferences really are a great place to share ideas, knowledge and innovation. I look forward to meeting with Questionmark customers either at the European Questionmark Users Conference in Barcelona, Spain, November 10-12 or at the Questionmark 2014 Users Conference in San Antonio, Texas, March 4 – 7.

Conceptual Assessment Framework: Building the Evidence Model

Austin FosseyPosted by Austin Fossey

In my previous posts, I introduced the student model and the task model—two of the three sections of the Conceptual Assessment Framework (CAF) in Evidence-Centered Design (ECD).

The student and task models are linked by the evidence model. The evidence model has two components: the evaluation / evidence identification component and the measurement model / evidence accumulation component (e.g., Design and Discovery in Educational Assessment: Evidence-Centered Design, Psychometrics, and Educational Data Mining; Mislevy, Behrens, Dicerbo, & Levy, 2012).

The evaluation component defines how we identify and collect evidence in the responses or work products produced by the participant in the context of the task model. For the evaluation component, we must ask ourselves what is it we are looking for as evidence about the participant’s ability, and how will we store that evidence?

In a multiple choice item, the evaluation component is simply whether or not the participant selected the item key, but evidence identification can be more complex. Consider drag-and-drop items where you may need to track the options the participant chose as well as their order. In hot spot items, the evaluation component consists of capturing the coordinates of the participant’s selection in relation to a set of item key coordinates.

Some simulation assessments will collect information about the context of the participant’s response (i.e., was it the correct response given the state of the simulation at that moment?), and others consider aspects of the participant’s response patterns, such as sequence and efficiency (i.e., what order did the participant perform the response steps, and were there any extraneous steps?).

In the measurement model component, we define how evidence is scored and how those scores are aggregated into measures that can be used in the student model.

In a multiple choice assessment using Classical Test Theory (CTT), the measurement model may be simple: if the participant selects the item key, we award one point, then create an overall score measure by adding up the points. Partial credit scoring is another option for a measurement model. Raw scores may be transformed into a percentage score, which is the aggregation method used for many assessments built with Questionmark. Questionmark also provides a Scoring Tool for external measurement models, such as rubric scoring of
essay items.

Measurement models can also be more complex depending on the assessment design. Item Response Theory (IRT) is another commonly used measurement model that provides probabilistic estimates of participants’ abilities based on each participant’s response pattern and the difficulty and discrimination of the items. Some simulation assessments also use logical scoring trees, regression models, Bayes Nets, network analyses or a combination of these methods to score work products and aggregate results.

caf 3

Example of a simple evidence model structure showing the relationships between evidence identification and accumulation.

Catch up with this growing video library

Joan Phaup 2013 (3)Posted by Joan Phaup

Since we revamped the Questionmark Learning Cafe last year, I’m amazed to see the number of instructional videos now on view there.Learning Cafe

Just looking at videos about Questionmark Live browser-based authoring could keep you busy for a while. Here are just some of the Questionmark Live tutorials that are posted there now.

  •  Sharing topics
  • Authoring a hot spot question
  • Using numeric questions
  • Adding multimedia to an item
  • Creating surveys
  • Versioning and revisions
  • Embedding videos
  • Creating pull-down questions
  • Adding audio

You will find many other videos about authoring, delivery, reporting and integrations as well as webinars and best practices presentations. If you haven’t visited the Learning Cafe lately, take a look. You might have some catching up to do.

11 Tips to help prevent cheating and ensure test security

Headshot JuliePosted by Julie Delazyn

With the summer behind us, it’s officially fall, and that means schools, colleges and universities have launched into a new academic year.

In this time of tests and exams, the security of test results is crucial to the validity of test scores. Today, I’d like to introduce 11 tips to help prevent cheating and ensure assessment security.

1. Screening tests — A small pre-screening can be administered to prevent people from taking an assessment for which they are not yet prepared.

2. Candidate agreements — Candidate agreements or examination honor codes are codes of conduct that a participant must agree to before they start an assessment . Candidate agreements generally are phrased in a personal manner ; the participant agrees by clicking on an ―OK‖ or ―Yes‖ button to the code of conduct for the exam

3. Limiting content exposure/leakage — In order to limit the amount of question content being shown to a participant at any given time, consider using question-by-question templates. These present questions one at a time to participants so that exam content is not completely exposed on screen.

4. Screening participants who achieve perfect scores — Many organizations will automatically investigate participants who achieve perfect scores on an assessment. Perfect scores are rare events, and could be attributed to a test-taker having had access to answer keys. The Questionmark Score List Report provides a fast and easy way to identify participants who obtain 100% on their assessments. An organization can then conduct an investigation of these participants to ensure that no suspicious behavior had occurred.

5. Verifying expected IP addresses — If assessments are to be taken from a specific location, often the IP address of the computer in that location will be known. Verifying expected IP addresses is a useful way to screen whether participants somehow took an assessment from an unauthorized location.

6. Reviewing time to finish information — The overall time it takes for a participant to complete an assessment can be a useful way to screen for suspicious behavior. If a participant takes a very short time to complete an assessment yet achieves a high score, this could be an indication that they cheated in some way.

7. Using Trojan horse or stealth items — Trojan horse or stealth items can be used to help detect whether a participant has memorized the answer key. Stealth items are inserted into an assessment and look just like the other questions, but they are purposely keyed incorrectly and one of the distracters is marked as the correct answer.

8. Post information that cheater prevention tactics are used — Inform participants that cheater -detection tactics are regularly employed. This can help to deter the low – motivation cheaters.

9. Proper seating arrangements for participants — Implementing a seating plan where participants are equally spaced, with limited ability to see another participant‘s screen/paper, is an import strategy.

10. Using unique make-up exams — When offering a make-up exam, make sure to administer it in the same strict proctored environment as the scheduled exam. Also, having another test form available specifically for make-up exams can lessen the risks of cheating and exposure for the actual large-scale exam.

11. Using more constructed response questions — Constructed response questions, like essay or short answer questions, provide less opportunity for participants to cheat because they require them to produce unique answers to questions.

If you’d like more details about these and other tips on ensuring the security and defensibility of your assessments you can download our white paper, “ Delivering Assessments Safely and Securely.”

« Previous Page