Psychometrics 101: Item Total Correlation

Posted by Greg Pope
I’ll be talking about a subject dear to my heart — psychometrics — at the Questionmark Users Conference April 5 -8. Here’s a sneak preview on one of my topics: item total correlation! What is it, and what does it mean?
The item total correlation is a correlation between the question score (e.g., 0 or 1 for multiple choice) and the overall assessment score (e.g., 67%). It is expected that if a participant gets a question correct they should, in general, have higher overall assessment scores than participants who get a question wrong. Similarly with essay type question scoring where a question could be scored between 0 and 5 participants who did a really good job on the essay (got a 4 or 5) should have higher overall assessment scores (maybe 85-90%). This relationship is shown in an example graph below.

This relationship in psychometrics is called ‘discrimination’ referring to how well a question differentiates between participants who know the material and those that do not know the material. Participants who know the material taught to them should get high scores on questions and high overall assessment scores. Participants who did not master the material should get low scores on questions and lower overall assessment scores. This is the relationship that an item-total correlation provides to help evaluate the performance of questions. We want to have lots of highly discriminating questions on our tests because they are the most fine-tuned measurements to find out what participants know and can do. When looking at an item-total correlation generally negative values are a major red flag it is unexpected that participants who get low scores on the questions get high scores on the assessment. This could indicate a mis-keyed question or that the question was highly ambiguous and confusing to participants. Values for an item-total correlation (point-biserial) between 0 and 0.19 may indicate that the question is not discriminating well, values between 0.2 and 0.39 indicate good discrimination, and values 0.4 and above indicate very good discrimination.
No related posts.













Greg:
I’d really like to use all the blogs you’ve posted this year as a reference document, but when I try to print them I get the typed info, but not the great graphs and figures. Is there anything you can send me that is more easily printed?
You may recognize my name. I was previously a Colonel in the Canadian Forces in charge of the Directorate of Human Resources Research and Evaluation. You had done some work for us when you were working with Bruno Zumbo. I ran across your name again at NOCA last year and have been enjoying info from Questionmark ever since. Hope all is well with you.
Hi Cheryl, great to hear from you! It is really nice to hear that you have been enjoying my posts and I would be happy to send them to you. I will get them packaged up into one document and email them to you.
I will be at NOCA again this year with several presentations so if you are attending NOCA this year it would be great to see you there!
All the best,
Greg
[...] of the question. Extremely easy or extremely hard questions have a harder time obtaining those high discrimination statistics that we look for. In the graph below, I show the relationship between question difficulty p-values [...]
Hi Greg Pope,
I have a doubt regarding this. I have done item-total correlation to test unidimensionality following a study (same questionnaire which used mine). My supervisor said it is wrong and unidimensionality can be measured only using factor analysis. Please give your suggestion. Thanks.
Hello Arthi, yes your supervisor is correct, exploratory or confirmatory factor analysis (FA; http://en.wikipedia.org/wiki/Factor_analysis) or principal component analysis (PCA; http://en.wikipedia.org/wiki/Principal_component_analysis) are the most typical ways of conducting dimensionality analyses. Statistical programs like SPSS provide these analytics features. I was not advocating using item-total correlations directly to do dimensionality research, although one would expect higher item-total correlations for questions on assessments that all measure the same construct.
Hi Greg,
What is the basis for the cutoff ranges (0 to 0.2, 0.2 to 0.4, 0.4 to 1) in item-total correlation? If they are arbitrary, do you know who the source is? Many thanks.
Hi Gregory, thanks for your question. Yes the cut-offs for item-total correlations are semi-arbitrary in that different organizations can use different ranges. Also there are a lot of factors to consider when analyzing Classical Test Theory item statistics. The ranges I stated are fairly common amongst organizations that conduct item analyses for item-total correlations. In previous places that I have worked the cut-off for an acceptable question (i.e., whether it should continue on into the actual assessment for large scale administration) in terms of discrimination was around 0.300.
Academic references are not always easy to find as many books don’t suggest ranges but rather state “the higher the better above zero.” For example, in Shrock and Coscarelli’s 2007 book on Criterion-Referenced Test Development (http://www.amazon.com/Criterion-referenced-Test-Development-Technical-Guidelines/dp/0787988502) they discuss item-total correlations as an important part of item analyses but do not provide suggested ranges of values. They are not alone, many well written and well respected books on the subject avoid stating specific values.
However, there are some academic references out there if you dig for them:
• Nunnally & Bernstein (1994). Psychometric Theory. New York: McGraw Hill, 3rd ed.
o Page 304: “A cutoff of .3 is an arbitrary guide to defining a discriminating item.” However on the next page they suggest that items with item-total correlation values greater than 0.300 are “discriminating”
o Page 306: The authors states that “…very poorly discriminating items (r 0.2”
• Traub (1994). Reliability for the social sciences: Theory and applications. Thousand Oaks, CA, Sage.
o Page 108: “…relatively large indices of discrimination (say 0.30 or more)…”
• de Vaus (2002). Analyzing social science data: 50 Key Problems In Data Analysis. Thousand Oaks, CA, Sage.
o Page 128: “To remain in a scale an item should have an item-total correlation of at least 0.3.”
• Leong & Austin eds. (2006). The psychology research handbook: a guide for graduate students and research assistants. Thousand Oaks, CA: Sage, 1996. Chapter 9 (Scale Development; Lounsbury, Gibson, Saudardas)
o Page 144: Authors recommend corrected item-total correlations should be 0.400 and higher
In a previous post I provided more range suggestions based on my experience and based on discussions with other psychometric professionals from a range of organizations: http://blog.questionmark.com/item-analysis-analytics-part-6-determining-whether-a-question-makes-the-grade
I hope this helps and thanks again for your question!
Greg