When and where should I use randomly delivered assessments?

greg_pope-150x1502

Posted by Greg Pope

I am often asked my psychometric opinion regarding when and where random administration of assessments is most appropriate.

To refresh memories, this is a feature in Questionmark Perception Authoring Manager that allows you to select questions at random from one or more topics when creating an assessment. Rather than administering the same 10 questions to all participants, you can give each participant a different set of questions that are pulled at random from the bank of questions in the repository.

So when is it appropriate to use random administration? I think that depends on the answer this question: What are the assessment’s  stakes and purpose? If the stakes are low and the assessment scores are used to help reinforce information learned, or to give participants a rough guess as to how they are doing in an area, I would say that using random administration is defensible. However, if the stakes are medium/high and the assessment scores are used for advancing or certifying participants I usually caution against random administration.  Here are a few reasons why:

  • Expert review of the assessment form(s) cannot be conducted in advance (each participant gets a unique form)
  • Generally SMEs, psychometricians, and other experts will thoroughly review a test form before it is put into live production. This is to ensure that the form meets difficulty, content and other criteria before being administered to participants in a medium/high stakes context. In the case of randomly administered assessments, this review in advance is not possible as every participant obtains a different set of questions.
  • Issues with the calculation of question statistics using Classical Test Theory (CTT)
  • Smaller numbers of participants will be answering each individual question. (Rather than all 200 participants answering all 50 questions in a fixed form test, randomly administered tests generated from a bank of 100 questions may only have a few participants answering each question.)
  • As we saw in a previous blog post, sample size has an effect on the robustness of item statistics. With fewer participants taking each question it becomes difficult to have confidence in the stability of the statistics generated.
  • Equivalency of assessment scores is difficult to achieve and prove
  • An important assumption of CTT is equivalence of forms or parallel forms. In assessment contexts where more than one form of an exam is administered to participants, a great deal of time is spent ensuring that the forms of the assessment are parallel in every way possible (e.g.., difficulty of questions, blueprint coverage, question types, etc.) so that the scores participants obtain are equivalent.
  • With random administration it is not possible to control and verify in advance of an assessment session that the forms are parallel because the questions are pulled at random. This leads to the following problem in terms of the equivalence of participant scores:
  • If one participant got 2/10 on a randomly administered assessment and another participant got 8/10 on the same randomly administered assessment it would be difficult to know whether the participant who got 2/10 scored low because they (by chance) got harder questions than the participant who got 8/10 or whether the low-scoring participant actually did not know the material and therefore scored low.
  • Using meta tags one can mitigate this issue to some degree (e.g.,  by randomly administering questions within topics by difficulty ranges and other meta tag data) but this would not completely guarantee randomly equivalent forms.
  • Issues with calculation of test reliability statistics using CTT
  • Statistics such as Cronbach’s Alpha have trouble with randomly administered assessment administration. Random administration produces a lot of missing data for questions (e.g., not all participants answer all questions), which psychometric statistics rarely handle well.

There are other alternatives to random administration depending on what the needs are. For example, if random administration is being looked at to curb cheating, options such as shuffling answer choices and randomizing presentation order could serve this need, making it very difficult for participants to copy answers off of one another.

It is important for an organization to look at their context to determine what is best for them. Questionmark provides many options for our customers when it comes to assessment solutions and invites them to work with us in adopting workable solutions.

Leave a Reply