Item Analysis for Beginners – When are very Easy or very Difficult Questions Useful?

Posted by John Kleeman

I’m running a session at the Questionmark user conference next month on Item Analysis for Beginners and thought I’d share the answer to an interesting question in this blog.

Item analysis fragment showing a question with difficulty of 0.998 and discrimination of 0.034When you run an Item Analysis report, one of the useful statistics you get on a question is its “p-value” or “item difficulty”. This is a number from 0 to 1, with the higher the value the easier the question. An easy question might have a p-value of 0.9 to 1.0, meaning 90% to 100% of participants answer the question correctly. A difficult question might have a p-value of 0.0 to 0.25 meaning less than 25% of participants answer the question correctly. For example, the report fragment to the right shows a question with p-value 0.998 which means it is very easy and almost everyone gets it right.

Whether such questions are appropriate depends on the purpose of the assessment. Most participants will get difficult questions wrong and easy questions right. In general, very easy and very difficult questions will not be as helpful as other questions in helping you discriminate between participants and so use the assessment for measurement purposes.

Here are three reasons why you might decide to include very difficult questions in an assessment:

  1. Sometimes your test blueprint requires questions on a topic and the only ones you have available are difficult ones – if so, you need to use them until you can write more.
  2. If a job has high performance needs and you need to filter out a few participants from many, then very difficult questions can be useful. This might apply for example if you are selecting potential astronauts or special forces team members.
  3. If you need to assess a wide range of ability within a single assessment, then you may need some very difficult questions to be able to assess abilities within the top performing participants.

And here are five reasons why you might decide to include very easy questions in an assessment:

  1. Answering questions gives retrieval practice and helps participants remember things in future – so including easy questions still helps reduce people’s forgetting.
  2. In compliance or health and safety, you may choose to include basic questions that almost everyone gets right. This is because if someone gets it wrong, you want to know and be able to intervene.
  3. More broadly, sometimes a test blueprint requires you to cover some topics that almost everyone knows, and it’s not practical to write difficult questions about.
  4. Easy questions at the start of an assessment can build confidence and reduce test anxiety. See my blog post Ten tips on reducing test anxiety for online test-takers for other ways to deal with test anxiety.
  5. If the purpose of your assessment is to measure someone’s ability to process information quickly and accurately at speed, then including many low difficulty questions that need to be answered in a short time might be appropriate.

If you want to learn more about Item Analysis, search this blog for other articles. You might also find the Questionmark user conference useful, since as well as my session on Item Analysis, there are also many other useful sessions including setting the cut-score in a fair, defensible way and identifying knowledge gaps. The conference also gives opportunity to learn and network with other assessment practitioners – I look forward to seeing some of you there.

Should I include really easy or really hard questions on my assessments?


Posted by Greg Pope

I thought it might be fun to discuss something that many people have asked me about over the years: “Should I include really easy or really hard questions on my assessments?” It is difficult to provide a simple “Yes” or “No” answer because, as with so many things in testing, it depends! However, I can provide some food for thought that may help you when building your assessments.

We can define easy questions as those with high p-values (item difficulty statistics) such as 0.9 to 1.0 (90-100% of participants answer the question correctly). We can define hard questions as those with low p-values such as 0.15 to 0 (15-0% answer the question correctly). These ranges are fairly arbitrary: some organizations in some contexts may consider greater than 0.8 easy and less than 0.25 difficult.

When considering how easy or difficult questions should be, start by asking, “What is the purpose of the assessment program and the assessments being developed?” If the purpose of an assessment is to provide a knowledge check and facilitate learning during a course, then maybe a short formative quiz would be appropriate. In this case, one can be fairly flexible in selecting questions to include on the quiz. Having some easier and harder questions is probably just fine. If the purpose of an assessment is to measure a participant’s ability to process information quickly and accurately under duress, then a speed test would likely be appropriate. In that case, a large number of low-difficulty questions should be included on the assessment.

However, in many common situations having very difficult or very easy questions on an assessment may not make a great deal of sense. For a criterion referenced example, if the purpose of an assessment is to certify participants as knowledgeable and skilful enough to do a certain job competently (e.g., crane operation), the difficulty of questions  would need careful scrutiny. The exam may have a cut score that participants need to achieve in order to be considered good enough (e.g., 60+%). Here are a few reasons why having many very easy or very hard questions on this type of assessment may not make sense:

Very easy items won’t contribute a great deal to the measurement of the construct

A very easy item that almost every participant gets right doesn’t tell us a great deal about what the participant knows and can do. A question like: “Cranes are big. Yes/No” doesn’t tell us a great deal about whether someone has the knowledge or skills to operate a crane. Very easy questions, in this context, are almost like “give-away” questions that contribute virtually nothing to the measurement of the construct. One would get almost the same measurement information (or lack thereof) from asking a question like “What is your shoe size?” because everyone (or mostly everyone) would get it correct.

Tricky to balance blueprint

Assessment construction generally requires following a blueprint that needs to be balanced in terms of question content, difficulty, and other factors. It is often very difficult to balance these blueprints for all factors, and using extreme questions makes this all the more challenging because there are generally more questions available that are of average rather than extreme difficulty.

Potentially not enough questions providing information near the cut score

In a criterion referenced exam with a cut score of 60% one would want the most measurement information in the exam near this cut score. What do I mean by this? Well, questions with p-values around 0.60 will provide the most information regarding whether participants just have the knowledge and skills to pass or just don’t have the knowledge and skills to pass. This topic requires a more detailed look at assessment development techniques that I will elaborate on soon in an upcoming blog post!

Effect of question difficulty on question discrimination

The difficulty of questions affects the discrimination (item-total correlation) statistics of the question. Extremely easy or extremely hard questions have a harder time obtaining those high discrimination statistics that we look for. In the graph below, I show the relationship between question difficulty p-values and item-total correlation discrimination statistics. Notice that the questions (the little diamonds) that have very low and very high p-values also have very low discrimination statistics and those around 0.5 have the highest discrimination statistics.