How many questions do I need on my assessment?

greg_pope-150x1502

Posted by Greg Pope

I recently was asked a common question regarding creating assessments: How many questions are needed on an assessment in order to obtain valid and reliable participant scores? The answer to this question depends on the context/purpose of the assessment and how scores are used. For example, if an organization is administering a low-stakes quiz designed to facilitate learning during study with on-the-spot question-level feedback and no summary scores, then one question would be enough (although probably more would be better to achieve the intended purpose). If no summary scores are calculated (e.g., an overall assessment score), or if these overall scores are not used for anything, then very small numbers of questions are fine. However, if an organization is administering an end-of-course exam that a participant has to pass in order to complete a course, the volume of questions on that exam is important. (A few questions aren’t going to cut it!) The issue in terms of psychometrics is whether very few questions would provide enough measurement information to allow someone to draw conclusions from the score obtained (e.g., does this participant know enough to be considered proficient).

Ever wonder why you have to take so many questions on a certification or licensing exam? One rarely gets to take only 2-3 questions on a driving test, and certainly not a chartered accountant licensing exam. Oftentimes one might take close to 100 questions on such exams. One of the reasons for this is because more individual measurements of what a participant knows and can do need to be provided in order to ensure that the reliability of the scores obtained are high (and therefore that the error is low). Individual measurements are questions, and if we only asked one question to a participant on an accounting licensing exam we likely would not get a reliable estimate regarding the participant’s accounting knowledge and skills. Reliability is required for an assessment score to be considered valid, and generally the more questions on an assessment (to a practical limit), the higher the reliability.

Generally, what an organization would do is have a target reliability value in mind that would help determine how many questions one would need at a minimum in order to have the measurement accuracy required in a given context. For example, in a high-stakes testing program where people are being certified or licensed based on their assessment scores a reliability of 0.9 or higher (the closer to 1 the better) would likely be required. Once a target minimum reliability target is established one can estimate how many items might be required in order to achieve this reliability. An organization could administer a pilot beta test of an assessment and run the Test Analysis Report to obtain the Cronbach’s Alpha test reliability coefficient. One could then use the Spearman-Brown prophecy formula (described further in “Psychometric Theory” by Nunnally & Bernstein, 1994) to estimate how much the internal consistency reliability will be increased if the number of questions on the assessment increases:

Where:

  • k=the increase in length of the assessment (e.g., k=3 would mean the assessment is 3x longer)
  • r11=the existing internal consistency reliability of the assessment

For example, if the Cronbach’s Alpha reliability coefficient of a 20-item exam is 0.70 and 40 items are added to the assessment (increasing the length of the test by 3x), the estimated reliability of the new 60-item exam will be 0.88:

Let’s look at this information visually:

If you would like to learn more about validity and reliability, see our white paper: Defensible Assessments: What you need to know.

I hope this helps to shed light on this burning psychometric issue!

3 Responses to “How many questions do I need on my assessment?”

  1. Padmavathy says:

    Sir,

    I have collected 43 items for a construct from literature in various contexts. Now, before pilot testing, I need face and content validity. Should I do like this: I have defined the construct for my research. Circulating the questionnaire with the definition and asking the expers to rate each item for essential, useful..but not essential and not essential.. then applying Lawshe’s formula..

    My query is, if this is content validity, then which is face validity..? if an expert has rated an item as not essential, should i drop it at that juncture..? what if an item is rated as useful, not essential..?

    Please send me details to proceed further into my research.
    Thank you..

  2. admin says:

    Hello there, I have written a few blog articles on validity that you may want to check out [hyperlink to: http://blog.questionmark.com/understanding-assessment-validity-content-validity%5D but if you require in-depth knowledge in this area I would recommend the book “Test Validity”. Face validity is often contrasted with content validity as the Wikipedia definition of face validity states. Generally I would say that if a domain expert rates a question as not essential to measuring he construct than this question does not need to be part of the assessment. Real estate on an assessment is extremely valuable and generally only questions that are essential to measuring the construct are added. I hope this helps!

  3. Ling says:

    Sir,

    Can you only test cronbach alpha on likert scale questions? I have 2 questions on parental influence on likert scale questions and another 2 on multiple choice question. Can you still put all together and run the test on cronbach alpha?

    Ling

Leave a Reply