Checklists for Test Development

Austin Fossey-42Posted by Austin Fossey

There are many fantastic books about test development, and there are many standards systems for test development, such as The Standards for Educational and Psychological Testing. There are also principled frameworks for test development and design, such as evidence-centered design (ECD). But it seems that the supply of qualified test developers cannot keep up with the increased demand for high-quality assessment data, leaving many organizations to piece together assessment programs, learning as they go.checklist

As one might expect, this scenario leads to new tools targeted at these rookie test developers—simplified guidance documents, trainings, and resources attempting to idiot-proof test development. As a case in point, Questionmark seeks to distill information from a variety of sources into helpful, easy-to-follow white papers and blog posts. At an even simpler level, there appears to be increased demand for checklists that new test developers can use to guide test development or evaluate assessments.

For example, my colleague, Bart Hendrickx, shared a Dutch article from the Research Center for Examination and Certification (RCEC) at University of Twente describing their Beoordelingssysteem. He explained that this system provides a rubric for evaluating education assessments in areas like representativeness, reliability, and standard setting. The Buros Center for Testing addresses similar needs for users of mental assessments. In the Assessment Literacy section of their website, Buros has documents with titles like “Questions to Ask When Evaluating a Test”—essentially an evaluation checklist (though Buros also provides their own professional ratings of published assessments). There are even assessment software packages that seek to operationalize a test development checklist by creating a rigid workflow that guides the test developer through different steps of the design process.

The benefit of these resources is that they can help guide new test developers through basic steps and considerations as they build their instruments. It is certainly a step up from a company compiling a bunch of multiple choice questions on the fly and setting a cut score of 70% without any backing theory or test purpose. On the other hand, test development is supposed to be an iterative process, and without the flexibility to explore the nuances and complexities of the instrument, the results and the inferences may fall short of their targets. An overly simple, standardized checklist for developing or evaluating assessments may not consider an organization’s specific measurement needs, and the program may be left with considerable blind spots in its validity evidence.

Overall, I am glad to see that more organizations are wanting to improve the quality of their measurements, and it is encouraging to see more training resources to help new test developers tackle the learning curve. Checklists may be a very helpful tool for a lot of applications, and test developers frequently create their own checklists to standardize practices within their organization, like item reviews.

What do our readers think? Are checklists the way to go? Do you use a checklist from another organization in your test development?

 

 

 

 

Item Development – Managing the Process for Large-Scale Assessments

Austin FosseyPosted by Austin Fossey

Whether you work with low-stakes assessments, small-scale classroom assessments or large-scale, high-stakes assessment, understanding and applying some basic principles of item development will greatly enhance the quality of your results.

This is the first in a series of posts setting out item development steps that will help you create defensible assessments. Although I’ll be addressing the requirements of large-scale, high-stakes testing, the fundamental considerations apply to any assessment.

You can find previous posts here about item development including how to write items, review items, increase complexity, and avoid bias. This series will review some of what’s come before, but it will also explore new territory. For instance, I’ll discuss how to organize and execute different steps in item development with subject matter experts. I’ll also explain how to collect information that will support the validity of the results and the legal defensibility of the assessment.

In this series, I’ll take a look at:

Item Dev.

These are common steps (adapted from Crocker and Algina’s Introduction to Classical and Modern Test Theory) taken to create the content for an assessment. Each step requires careful planning, implementation, and documentation, especially for high-stakes assessments.

This looks like a lot of steps, but item development is just one slice of assessment development. Before item development can even begin, there’s plenty of work to do!

In their article, Design and Discovery in Educational Assessment: Evidence-Centered Design, Psychometrics, and Educational Data Mining, Mislevy, Behrens, Dicerbo, and Levy provide an overview of Evidence-Centered Design (ECD). In ECD, test developers must define the purpose of the assessment, conduct a domain analysis, model the domain, and define the conceptual assessment framework before beginning assessment assembly, which includes item development.

Once we’ve completed these preparations, we are ready to begin item development. In the next post, I will discuss considerations for training our item writers and item reviewers.

Teaching to the test and testing to what we teach

Austin FosseyPosted by Austin Fossey

We have all heard assertions that widespread assessment creates a propensity for instructors to “teach to the test.” This often conjures images of students memorizing facts without context in order to eke out passing scores on a multiple choice assessment.

But as Jay Phelan and Julia Phelan argue in their essay, Teaching to the (Right) Test, teaching to the test is usually problematic when we have a faulty test. When our curriculum, instruction, and assessment are aligned, teaching to the test can be beneficial because we have are testing what we taught. We can flip this around and assert that we should be testing to what we teach.

There is little doubt that poorly-designed assessments have made their way into some slices of our educational and professional spheres. Bad assessment designs can stem from shoddy domain modeling, improper item types, or poor reporting.test classroom

Nevertheless, valid, reliable, and actionable assessments can improve learning and performance. When we teach to a well-designed assessment, we should be teaching what we would have taught anyway, but now we have a meaningful measurement instrument that can help students and instructors improve.

I admit that there are constructs like creativity and teamwork that are more difficult to define, and appropriate assessment for these learning goals can be difficult. We may instinctively cringe at the thought of assessing an area like creativity—I would hate to see a percentage score assigned to my creativity.

But if creativity is a learning goal, we should be collecting evidence that helps us support the argument that our students are learning to be creative. A multiple choice test may be the wrong tool for that job, but we can use frameworks like evidence-centered design (ECD) to decide what information we want to collect (and the best methods for collecting it) to demonstrate our students’ creativity.

Assessments have evolved a lot over the past 25 years, and with better technology and design, test developers can improve the validity of the assessments and their utility in instruction. This includes new item types, simulation environments, improved data collection, a variety of measurement models, and better reporting of results. In some programs, the assessment is actually embedded in the everyday work or games that the participant would be interacting with anyway—a strategy that Valerie Shute calls stealth assessment.

With a growing number of tools available to us, test developers should always be striving to improve how we test what we teach so that we can proudly teach to the test.