Item Analysis Analytics Part 4: The Nitty-Gritty of Item Analysis

Posted by Greg Pope

In my previous blog post I highlighted some of the essential things to look for in a typical Item Analysis Report. Now I will dive into the nitty-gritty of item analysis, looking at example questions and explaining how to use the Questionmark Item Analysis Report in an applied context for a State Capitals Exam.

The Questionmark Item Analysis Report first produces an overview of question performance both in terms of the difficulty of questions and in terms of the discrimination of questions (upper minus lower groups). These overview charts give you a “bird’s eye view” of how the questions composing an assessment perform. In the example below we see that we have a range of questions in terms of their difficulty (“Item Difficulty Level Histogram”), with some harder questions (the bars on the left), most average-difficulty questions (bars in the middle), and some easier questions (the bars on the right). In terms of discrimination (“Discrimination Indices Histogram”) we see that we have many questions that have high discrimination as evidenced by the bars being pushed up to the right (more questions on the assessment have higher discrimination statistics).

Overall, if I were building a typical criterion-referenced assessment with a pass score around 50% I would be quite happy with this picture. We have more questions functioning at the pass score point with a range of questions surrounding it and lots of highly discriminating questions. We do have one rogue question on the far left with a very low discrimination index, which we need to look at.

The next step is to drill down into each question to ensure that each question performs as it should. Let’s look at two questions from this assessment, one question that performs well and one question that does not perform so well.

The question below is an example of a question that performs nicely. Here are some reasons why:

• Going from left to right, first we see that the “Number of Results” is 175, which is a nice sample of participants to evaluate the psychometric performance of this question.
• Next we see thateveryone answered the question (“Number not Answered” = 0), which means there probably wasn’t a problem with people not finishing or finding the questions confusing and giving up.
• The “P Value Proportion Correct” shows us that this question is just above the pass score where 61% of participants ‘got it right.’ Nothing wrong with that: the question is neither too easy nor too hard.
• The “Item Discrimination” indicates good discrimination, with the difference between the upper and lower group in terms of the proportion selecting the correct answer of ‘Salem’ at 48%. This means that of the participants with high overall exam scores, 88% selected the correct answer versus only 40% of the participants with the lowest overall exam scores. This is a nice, expected pattern.
• The “Item Total Correlation” backs the Item Discrimination up with a strong value of 0.40. This means that of all participants who answered the questions, the pattern of high scorers getting the question right more than low scorers holds true.
• Finally we look at the Outcome information to see how the distracters perform. We find that each distracter pulled some participants, with ‘Portland’ pulling the most participants, especially from the “Lower Group.” This pattern makes sense because those with poor state capital knowledge may make the common mistake of selecting Portland as the capital of Oregon.

The psychometricians, SMEs, and test developers reviewing this question all have smiles on their faces when they see the item analysis for this item.

Next we look at that rogue question that does not perform so well in terms of discrimination-–the one we saw in the Discrimination Indices Histogram. When we look into the question we understand why it was flagged:

• Going from left to right, first we see that the “Number of Results” is 175, which is again a nice sample size: nothing wrong here.
• Next we see everyone answered the question, which is good.
• The first red flag comes from the “P Value Proportion Correct” as this question is quite difficult (only 35% of participants selected the correct answer). This is not in and of itself a bad thing so we can keep this in memory as we move on,
• The “Item Discrimination” indicates a major problem, a negative discrimination value. This means that participants with the lowest exam scores selected the correct answer more than participants with the highest exam scores. This is not the expected pattern we are looking for: Houston, this question has a problem!
• The “Item Total Correlation” backs up the Item Discrimination with a high negative value.
• To find out more about what is going on we delve into the Outcome information area to see how the distracters perform. We find that the keyed-correct answer of Nampa is not showing the expected pattern of upper minus lower proportions. We do, however, find that the distracter “Boise” is showing the expected pattern of the Upper Group (86%) selecting this response option much more than the Lower Group (15%). Wait a second…I think I know what is wrong with this one, it has been mis-keyed! Someone accidently assigned a score of 1 to Nampa rather than Boise.

No problem: the administrator pulls the data into the Results Management System (RMS), changes the keyed correct answer to Boise, and presto, we now have defensible statistics that we can work with for this question.

The psychometricians, SMEs, and test developers reviewing this question had a frown on their faces at first but those frowns were turned upside down when they realized it is just a simple mis-keyed question.

In my next blog post I would like share some observations on the relationship between Outcome Discrimination and Outcome Correlation.

Are you ready for some light relief after pondering all these statistics? Then have some fun with our own State Capitals Quiz.