Posted by Greg Pope

In my previous blog post I talked about outcome discrimination and outcome correlation and their relationship to one another. Now I will provide some criteria that can be used for outcome discrimination and outcome correlation coefficients to judge whether a question is making the grade in terms of psychometric quality.

Outcome discrimination (high-low)

Outcome correlation (Point-biserial correlation)

I’ll be back with more juicy psychometrics soon!

Tags: Analysis Analytics, Criteria, Greg Pope, Guest, Making the Grade, Outcome Correlation Coefficients, Outcome Discrimination, Psychometric Quality, Psychometrician, Question Quality, Questionmark | Best Practice, General, Psychometrics, Standards | Permalink | 1 Comment | Posted August 12, 2009 by Writers - Guest Bloggers

Posted by Greg Pope

In my previous blog post I dived into some details of item analysis, looking at example questions and how to use the Questionmark Perception Item Analysis Report in an applied context. I thought it might be useful in this post to talk about outcome discrimination and outcome correlation, as people sometimes ask me how are these different or the same, when should I use one or the other, and so on. The fact of the matter is that you can use one or the other and often it comes down to preference as they both yield quite similar results.

Outcome discrimination is the proportion of the top (27% according to assessment score) of participants who selected a response option minus the lowest (27% according to assessment score) of participants who selected each response option to the question. What you would expect is that participants with the highest assessment scores should select the correct response option more often than participants with the lowest assessment scores. Similarly, participants with the highest assessment scores should select the incorrect distracters less often compared to the participants with the lowest assessment scores.

Outcome correlation is a point-biserial correlation that correlates the outcomes scores that participants achieve to the assessment scores that they achieve. So rather than comparing only the top and bottom 27% of participants, the outcome correlation looks at all participants using a standard correlation approach.

If you are thinking that outcome discrimination and outcome correlation sound like they might be related to one another, you are right! High outcome discrimination statistics generally will result in high outcome correlations. In other words, outcome discrimination and outcome correlation statistics are highly correlated with one another. How correlated are they? Well, I looked at many real-life questions from Item Analysis Reports that customers have shared with me and found a positive correlation of 0.962, which is really high.

In my next post I will provide some criteria that can be used for outcome discrimination and outcome correlation coefficients to judge whether a question is meeting the grade in terms of psychometric quality.

Tags: Assessment Scores, Distracters, Greg Pope, Guest, Incorrect Distracters, Item Analysis Report, Outcome Correlation, Outcome Discrimination, Outcome Score, Point-Biserial Correlation, Psychometric Quality, Psychometrician, Questionmark, Questionmark Perception, SL | Best Practice, General, Psychometrics, Standards | Permalink | Leave a comment | Posted August 3, 2009 by Writers - Guest Bloggers

Posted by Greg Pope

In my previous blog post I highlighted some of the essential things to look for in a typical Item Analysis Report. Now I will dive into the nitty-gritty of item analysis, looking at example questions and explaining how to use the Questionmark Item Analysis Report in an applied context for a State Capitals Exam.

The Questionmark Item Analysis Report first produces an overview of question performance both in terms of the difficulty of questions and in terms of the discrimination of questions (upper minus lower groups). These overview charts give you a “bird’s eye view” of how the questions composing an assessment perform. In the example below we see that we have a range of questions in terms of their difficulty (“Item Difficulty Level Histogram”), with some harder questions (the bars on the left), most average-difficulty questions (bars in the middle), and some easier questions (the bars on the right). In terms of discrimination (“Discrimination Indices Histogram”) we see that we have many questions that have high discrimination as evidenced by the bars being pushed up to the right (more questions on the assessment have higher discrimination statistics).

Overall, if I were building a typical criterion-referenced assessment with a pass score around 50% I would be quite happy with this picture. We have more questions functioning at the pass score point with a range of questions surrounding it and lots of highly discriminating questions. We do have one rogue question on the far left with a very low discrimination index, which we need to look at.

The next step is to drill down into each question to ensure that each question performs as it should. Let’s look at two questions from this assessment, one question that performs well and one question that does not perform so well.

The question below is an example of a question that performs nicely. Here are some reasons why:

- Going from left to right, first we see that the “Number of Results” is 175, which is a nice sample of participants to evaluate the psychometric performance of this question.
- Next we see thateveryone answered the question (“Number not Answered” = 0), which means there probably wasn’t a problem with people not finishing or finding the questions confusing and giving up.
- The “P Value Proportion Correct” shows us that this question is just above the pass score where 61% of participants ‘got it right.’ Nothing wrong with that: the question is neither too easy nor too hard.
- The “Item Discrimination” indicates good discrimination, with the difference between the upper and lower group in terms of the proportion selecting the correct answer of ‘Salem’ at 48%. This means that of the participants with high overall exam scores, 88% selected the correct answer versus only 40% of the participants with the lowest overall exam scores. This is a nice, expected pattern.
- The “Item Total Correlation” backs the Item Discrimination up with a strong value of 0.40. This means that of all participants who answered the questions, the pattern of high scorers getting the question right more than low scorers holds true.
- Finally we look at the Outcome information to see how the distracters perform. We find that each distracter pulled some participants, with ‘Portland’ pulling the most participants, especially from the “Lower Group.” This pattern makes sense because those with poor state capital knowledge may make the common mistake of selecting Portland as the capital of Oregon.

The psychometricians, SMEs, and test developers reviewing this question all have smiles on their faces when they see the item analysis for this item.

Next we look at that rogue question that does not perform so well in terms of discrimination-–the one we saw in the Discrimination Indices Histogram. When we look into the question we understand why it was flagged:

- Going from left to right, first we see that the “Number of Results” is 175, which is again a nice sample size: nothing wrong here.
- Next we see everyone answered the question, which is good.
- The first red flag comes from the “P Value Proportion Correct” as this question is quite difficult (only 35% of participants selected the correct answer). This is not in and of itself a bad thing so we can keep this in memory as we move on,
- The “Item Discrimination” indicates a major problem, a negative discrimination value. This means that participants with the lowest exam scores selected the correct answer more than participants with the highest exam scores. This is not the expected pattern we are looking for: Houston, this question has a problem!
- The “Item Total Correlation” backs up the Item Discrimination with a high negative value.
- To find out more about what is going on we delve into the Outcome information area to see how the distracters perform. We find that the keyed-correct answer of Nampa is not showing the expected pattern of upper minus lower proportions. We do, however, find that the distracter “Boise” is showing the expected pattern of the Upper Group (86%) selecting this response option much more than the Lower Group (15%). Wait a second…I think I know what is wrong with this one, it has been mis-keyed! Someone accidently assigned a score of 1 to Nampa rather than Boise.

No problem: the administrator pulls the data into the Results Management System (RMS), changes the keyed correct answer to Boise, and presto, we now have defensible statistics that we can work with for this question.

The psychometricians, SMEs, and test developers reviewing this question had a frown on their faces at first but those frowns were turned upside down when they realized it is just a simple mis-keyed question.

In my next blog post I would like share some observations on the relationship between Outcome Discrimination and Outcome Correlation.

Are you ready for some light relief after pondering all these statistics? Then have some fun with our own State Capitals Quiz.

Tags: assessment, Criterion-Referenced Assessment, Greg Pope, Guest, Item Analysis Report, Item Discrimination, Negative Value, Outcome Correlation, Outcome Discrimination, P Value Proportion Correct, Psychometricians, Psychometrics, Question, Results, Sample Size, SMEs, Test Developers | Best Practice, General, Psychometrics, Standards | Permalink | Leave a comment | Posted July 24, 2009 by Writers - Guest Bloggers