In my last post I talked a bit about Classical Test Theory (CTT) to lay the foundation for a discussion of item analysis analytics using CTT. In this post I will talk about the high-level purpose and process of conducting an item analysis. The general purpose of conducting an item analysis is to find out whether the questions composing an assessment are performing in a manner that is psychometrically appropriate and defensible. Item analyses are used to evaluate the psychometric performance of questions. They help us find out whether items need to be improved (sent back to development), sent to the scrap heap, or left as they are because they meet all the criteria for being included in an assessment.
I’d like to share a tip about how some of my colleagues decide whether to revise a problematic looking question or throw it away as “unfixable.” This involves setting a review time limit for each question that needs to be reviewed. In an item analysis review meeting which may involve psychometricians, subject matter experts, exam developers and other stakeholders, each question could be reviewed for no more than a pre-determined period of time, say 10 minutes. If an effective revision for the question does not become apparent within that period of time, the question goes to the scrap bin and a new question is developed by SMEs to take its place.
Many organizations beta test questions in order to choose those that should be included in an actual assessment. Questionmark Perception offers the delivery status field of “Experimental,” which allows beta questions to be included/interspersed within an actual assessment form but not scored and therefore not counted as part of the calculation of participant assessment scores. More on the topic of beta testing another time though…
In my next post I will discuss some essential things to look for in an Item Analysis Report.
Dr. Will Thalheimer of Work-Learning Research spoke with me recently about the role feedback plays in assessments and how it can be used to help learners.
Our conversation touches on the basics of using feedback effectively; if you want to learn more about this subject I recommend you check out Will’s research-to-practice paper: Providing Learners with Feedback. The paper examines the latest research on this complex topic and provides practical recommendations. You can find it in the Work-Learning Research Catalog or Questionmark’s white paper list.
So here’s Will! I hope you enjoy our conversation.
I have been watching the Questionmark Blog with interest and thought that, as Questionmark’s CEO, it was about time that I made a contribution!
The Questionmark Blog was started to keep you in touch with our products, our news releases, learning materials and our Product Owners’ points of view. We’ve been focusing on articles that assist assessment practitioners and instructional designers; recently we previewed how embedding syndicated assessments within wikis, web pages and blogs can support the learning process.
Separate to this initiative I have been running a personal blog (http://blog.eric.info) to bring you more abstract thoughts, observations from travels, and distillations of conversations that I’ve enjoyed along the way. Not surprisingly the Tag Cloud quickly shows what I blog about, Assessments, Books, Travel and Questionmark. Here are some links that you might find interesting:
Item analysis is a hot-button topic for social conversation (Okay, maybe just for some people). I thought it might be useful to talk about Classical Test Theory (CTT) and item analysis analytics in a series of blog posts over the next few weeks. This first one today will focus on some of the theory and background of CTT. In subsequent posts on this topic I will lay out a high-level overview of item analysis and then drill down into details. Some other testing theories include Item Response Theory (IRT), which might be fun to talk about in another post (at least fun for me).
CTT is a body of theory and research regarding psychological testing that predicts/explains the difficulty of questions, provides insight into the reliability of assessment scores, and helps us represent what examinees know and can do. In a similar manner to theories regarding weather prediction or ocean current flow, CTT provides a theoretical framework for understanding educational and psychological measurement. The essential basis of CTT is that many questions combine to produce a measurement (assessment score) representing what a test taker knows and can do.
CTT has been around a long time (since the early 20th century) and is probably the most widely used theory in the area of educational and psychological testing. CTT works well for most assessment applications for reasons such as its ability to work with smaller sample sizes (e.g., 100 or less), and that it is relatively simple to compute and understand the statistics.
The general CTT model is based on the notion that the observed score that test takers obtain from assessments is composed of a theoretical un-measurable “true score” and error. Just as most measurement devices have some error inherent in their measurement (e.g., a thermometer may be accurate to within 0.1 degree 9 times out of 10), so too do assessment scores. For example, if a participant’s observed score (what they got reported back to them) on an exam was 86%, their “true score” may actually be between 80% and 92%.
Measurement error can be estimated and relates back to reliability: greater assessment score reliability means less error of measurement. Why does error relate so directly to reliability? Well, reliability has to do with measurement consistency. So if you could take the average of all the scores that a participant obtained–if they took the same assessment an infinite number of times with no remembering effects–this would be a participant’s true score. The more reliability in the measurement the less wildly diverse the scores would be each time a participant took that assessment over eternity. (This would be a great place for an afterlife joke but I digress…)
For a more detailed overview of CTT, that won’t make your lobes fall off, try Chapter 5 in Dr. Theresa Kline’s book, “Psychological Testing: A Practical Approach to Design and Evaluation.”
In my next post I will provide a high-level picture of item analysis to continue this conversation.
The first early-bird deadline for the 2009 European Users Conference is June 30th. If you register by then you will save £70 off the full registration fee. Anyone looking to register should visit the conference website.
The European Users Conference will be a great place to get technical training on Questionmark Perception and learn how to write better assessments – all at a fraction of the cost of a training course! Delegates will also get the chance to see the latest Questionmark product developments and preview what’s coming in 2010 and beyond. When you add in the chance to meet other Perception Users and learn from their experiences, the European Users Conference provides an excellent return on investment for anyone looking to further their knowledge of Questionmark and e-Assessment.
Posted by Jim Farrell
Questionmark Live is quickly becoming one of the most popular tools used within the Perception community for authoring questions. This free tool, which is available to all of Questionmark’s software support plan customers, gives the power of question authoring to anyone with an Internet connection. Why is it so popular? Watch this video to see how easy it is for your subject matter experts to create new questions!