How the sample of participants being tested affects item analysis information

greg_pope-150x1502

Posted by Greg Pope

Ever think about who took the test when you are interpreting your item analysis report? Maybe you should! Classical Test Theory (CTT) item analysis information is very much based on the sample of participants who took the test.

Hold on a second, what is a sample? What is the difference between a sample and a population? Well, a sample is a selection from a population. If your population is composed of all the 1.5 million of people in the United States who will write a college entrance exam in a year, a sample of this population could be 1,000 people selected based on certain criteria (e.g., age, gender, ethnicity, etc.). If we were to beta test questions that we hope to include on an upcoming college entrance exam it is usually not possible or practical to beta test all 1.5 million people in the population, so one or more representative samples are selected to beta test the questions.


As I mentioned, the sample of participants taking an assessment has an impact on the difficulty and discrimination statistics that you will obtain in your CTT item analysis. For example, if you administered the college entrance exam beta test to a sample of gifted students who are the best and brightest, the Item Analysis Report is going to come back showing that all your questions are easy (p-values close to 1) and you probably won’t get very high discrimination statistics. However, we know that the population of people taking college entrance exams is not all composed of the best and brightest, so this sample is not an accurate representation of the population (we say the sample is not representative). It would not be wise to try to build the actual college entrance exam form from the beta test results from only this one sample of bright students because the item statistics would not reflect the population of students that will be tested.

Using strong sampling methods will help ensure that the statistics you get are appropriate. Typing in a search word like “Sampling” in your favorite online book store will yield numerous suggestions for some fun reading on this subject. If you don’t have the time or inclination to do some light reading on sampling methods in your spare time, start with the obvious: Think about the target population of test takers that are going to take a test and if you are beta testing questions try to obtain samples that reflect that population of test takers. In a previous blog post I talked more about beta testing.

As an aside, Item Response Theory (IRT) advocates will be quick to point out that IRT doesn’t have the same sample dependency challenges as CTT. I’ll discuss that at another time!

Beta Testing Questions: Methods and Best Practices

greg_pope-150x1502

Posted by Greg Pope

I had the good fortune of presenting a few sessions at the Questionmark 2010 Users Conference in sunny Miami a couple of weeks ago. It was a great opportunity to catch up with customers and learn about the priorities organizations are focusing on.

In one of my best practice sessions there was a great deal of interest in the topic of beta testing, so I thought I would put together a blog article on this in case others were interested.

Beta testing can be defined as gathering psychometric information regarding newly created questions in order to inform the creation of actual exams. Newly developed questions that have gone through the necessary editing and review processes are administered to representative samples of participants, either in advance of or during an actual high-stakes assessment. Psychometric information regarding the new questions is collected and used to build the actual assessments. Questions that have been beta tested are screened to ensure that they meet certain quality benchmarks (e.g., all questions fall into a certain range of difficulty, all questions have acceptable discrimination). These beta tested questions are then used to create the assessments built to specific structure criteria (e.g., there is an appropriate spread of question difficulty, a targeted mean test score is created, more questions are included on the assessment near the pass score if the assessment is criterion referenced, etc.).

A summary graphic describing the general beta testing process is included below:

raphic describing the general beta testing process

There are a number of common models for beta testing questions, two of the most common are:


models for beta testing questions

Want more details? Questionmark software support plan customers can learn more about beta testing from our best practice guide on this topic. See our Best Practice Guide Index.