# Item Analysis Report – Item Reliability

Posted by Austin Fossey

In this series of posts, we have been discussing the statistics that are reported on the Item Analysis Report, including the difficulty index, correlational discrimination, and high-low discrimination.

The final statistic reported on the Item Analysis Report is the item reliability. Item reliability is simply the product of the standard deviation of item scores and a correlational discrimination index (Item-Total Correlation Discrimination in the Item Analysis Report). So item reliability reflects how much the item is contributing to total score variance. As with assessment reliability, higher values represent better reliability.

Like the other statistics in the Item Analysis Report, item reliability is used primarily to inform decisions about item retention. Crocker and Algina (*Introduction to Classical and Modern Test Theory*) describe three ways that test developers might use the item reliability index.

**1) Choosing Between Two Items in Form Construction**

If two items have similar discrimination values, but one item has a higher standard deviation of item scores, then that item will have higher item reliability and will contribute more to the assessment’s reliability. All else being equal, the test developer might decide to retain the item with higher reliability and save the lower reliability item in the bank as backup.

**2) Building a Form with a Required Assessment Reliability Threshold **

As Crocker and Algina demonstrate, Cronbach’s Alpha can be calculated as a function of the standard deviations of items’ scores and items’ reliabilities. If the test developer desires a certain minimum for the assessment’s reliability (as measured by Cronbach’s Alpha), they can use these two item statistics to build a form that will yield the desired level of internal consistency.

**3) Building a Form with a Required Total Score Variance Threshold**

Crocker and Algina explain that the total score variance is equivalent to the square of the sum of item reliability indices, so test developers may continue to add items to a form based on their item reliability values until they meet their desired threshold for total score variance.

Hi Austin,

Could you provide a sample calculation to establish a value for Cronbach Alpha using the standard deviations of items’ scores and items’ reliabilities? What is the standard deviation of item’s scores and do how do you use them in the calculation?

An interesting question for example is if such a calculation can provide a valid basis for discussing/underpinning the rule-of-thumb that test’s need to contain approx. 40 reasonably well discriminating 4-option multiple choice questions to achieve a reliably enough test (say Cronbach alpha of 0.8).

I hope you want to resond to this question.

Hi Silvester,

I am happy to! It may be tough to write out the equation in this comment box, but let’s give it a shot. Here is pseudo-formula, adapted from Crocker and Algina’s “Introduction to Classical and Modern Test Theory” (2008):

alpha = k/(k-1)*(1-(sum(item variances)/(sum(item reliabilities)^2))

where:

k = the number of items on the form

item variance = the variance of the item scores for each item

item reliability = the item reliability for each item

The standard deviation for item scores is just the regular definition of standard deviation–there is no modification needed for dichotomous data. Specifically, standard deviation is the square root of the sum of squares over the sample size.

It would be interesting to investigate the rule-of-thumb that you referenced, and I bet it could be done with a simple simulation study. Please let us know if you end up doing this research!

Thanks,

Austin

[…] used for selecting items, but item reliability is also occasionally useful, as I explained in an earlier post. Item reliability can be used as a tie breaker when we need to choose between two items with the […]