Item Analysis Analytics Part 7: The psychometric good, bad and ugly

greg_pope-150x1502

Posted by Greg Pope

A few posts ago I showed an example item analysis report for a question that performed well statistically and a question that did not perform well statistically. The latter turned out to be a mis-keyed item. I thought it might be interesting to drill into a few more item analysis cases of questions that have interesting psychometric performance. I hope this will help all of you out there recognize the patterns of the psychometric good, bad and ugly in terms of question performance.

The question below is an example of a question that is borderline in terms of psychometric performance. Here are some reasons why:

  • Going from left to right, first we see that the “Number of Results” is 116, which is a decent sample of participants to evaluate the psychometric performance of this question.
  • Next we see everyone answered the question (“Number not Answered” = 0) which means there probably wasn’t a problem with people not finishing or finding the questions confusing and giving up.
  • The “P Value Proportion Correct” shows us that this question is average to easy, with 65% of participants “getting it right.”
  • The “Item Discrimination” indicates mediocre discrimination at best, with the difference between the upper and lower group in terms of the proportion selecting the correct answer of ‘Leptokurtic’ at 20%. This means that of the participants with high overall exam scores, 75% selected the correct answer versus 55% of the participants with the lowest overall exam scores. I would have liked to see a larger difference between the Upper and Lower groups.
  • The “Item Total Correlation” backs the Item Discrimination up with a lacklustre value of 0.14. A value like this would likely not meet many organizations’ internal criteria in terms of what is considered a “good” item.
  • Finally, we look at the Outcome information to see how the distracters perform. We find that each distracter pulls some participants, with ‘Platykurtic’ pulling the most participants and quite a large number of the Upper group (22%) selecting this distracter. If I were to guess what is happening, I would say that because the correct option and the distracters are so similar, and because this topic is so obscure you really need to know your material, participants get confused between the correct answer of ‘Leptokurtic’ and the distracter ‘Platykurtic’

The psychometricians, SMEs, and test developers reviewing this question would need to talk with instructors to find out more about how this topic was taught and understand where the problem lies: Is it a problem with the question wording or a problem with instruction and retention/recall of material? If it is a question wording problem, revisions can be made and the question re-beta tested. If the problem is in how the material is being taught, then instructional coaching can occur and the question re-beta tested as is to see if improvements in the psychometric performance of the question occur.

greg-11

The question below is an example of a question that has a classic problem. Here are some reasons why it is problematic:

  • Going from left to right, first we see that the “Number of Results” is 175. That is a fairly healthy sample, nothing wrong there.
  • Next we see everyone answered the question (“Number not Answered” = 0), which means there probably wasn’t a problem with people not finishing or finding the question confusing and giving up
  • The “P Value Proportion Correct” shows us that this question is easy, with 83% of participants ‘getting it right’. There is nothing immediately wrong with an easy question, so let’s look further.
  • The “Item Discrimination” indicates reasonable discrimination, with the difference between the Upper and Lower group in terms of the proportion selecting the correct answer of ‘Cronbach’s Alpha’ at 38%. This means that of the participants with high overall exam scores, 98% selected the correct answer versus 60% of the participants with the lowest overall exam scores. That is a nice difference between the Upper and Lower groups, with almost 100% of the Upper group choosing the correct answer. Obviously, this question is easy for participants who know their stuff!
  • The “Item Total Correlation” backs the Item Discrimination up with a value of 0.39. This value backs up the “Item Discrimination” statistics and would meet most organizations’ internal criteria in terms of what is considered a “good” item.
  • Finally, we look at the Outcome information to see how the distracters perform. Well, two of the distracters don’t pull any participants! This is a waste of good question real estate: Participants have to read through four alternatives when there are only two they even consider as being the correct answer.

The psychometricians, SMEs, and test developers reviewing this question would likely ask the SME who developed the question to come up with better distracters that would draw more participants. Clearly, ‘Bob’s Alpha’ is a joke distracter that participants dismiss immediately as is the ‘KR-1,000,000’, I mean Kuder-Richardson formula one million. Let’s get serious here!

part-8-pic-21

Soft Scaffolding and Other Patterns for Formative Assessment

steve-smallPosted by Steve Lay

As someone involved in software development, I’m used to thinking about ‘patterns’ in software design.  Design patterns started life as a way of looking at the physical design of buildings.  More recently, they’ve been used to identify solutions to common design problems in software.  One of the key aspects of pattern use is that patterns are named, and these names can be used as a vocabulary to help designers implement solutions in software.

So I was interested to see the technique discussed in the context of designs of formative assessment by the recent JISC project on Scoping a Vision for Formative e-Assessment.  In the final report, the authors document patterns for formative assessment as a way of bridging the gap between practitioners and those implementing solutions in software to support them.

The patterns have wonderful names like “Classroom Display,” “Round and Deep” and “Objects To Talk With” that entice me to want to use them in my own communications.

To give an example of how one might apply the theory, let’s take a design problem identified in the report.  Given that the point of formative assessment is to inform future learning activities it is not surprising that in some environments outcomes are used too rigidly to determine the paths students take resulting in a turgid experience.  What you need, apparently, is “soft scaffolding,” which describes solutions that soften the restrictions on types of responses or paths a student can take with a resource, for example, by providing free-text ‘other’ options in MCQs or replacing rigid navigation with recommendations and warnings.6107473_aaba2abff5

You can jump straight to the patterns themselves using this on the project wiki.