Checklists for Test Development

Austin Fossey-42Posted by Austin Fossey

There are many fantastic books about test development, and there are many standards systems for test development, such as The Standards for Educational and Psychological Testing. There are also principled frameworks for test development and design, such as evidence-centered design (ECD). But it seems that the supply of qualified test developers cannot keep up with the increased demand for high-quality assessment data, leaving many organizations to piece together assessment programs, learning as they go.checklist

As one might expect, this scenario leads to new tools targeted at these rookie test developers—simplified guidance documents, trainings, and resources attempting to idiot-proof test development. As a case in point, Questionmark seeks to distill information from a variety of sources into helpful, easy-to-follow white papers and blog posts. At an even simpler level, there appears to be increased demand for checklists that new test developers can use to guide test development or evaluate assessments.

For example, my colleague, Bart Hendrickx, shared a Dutch article from the Research Center for Examination and Certification (RCEC) at University of Twente describing their Beoordelingssysteem. He explained that this system provides a rubric for evaluating education assessments in areas like representativeness, reliability, and standard setting. The Buros Center for Testing addresses similar needs for users of mental assessments. In the Assessment Literacy section of their website, Buros has documents with titles like “Questions to Ask When Evaluating a Test”—essentially an evaluation checklist (though Buros also provides their own professional ratings of published assessments). There are even assessment software packages that seek to operationalize a test development checklist by creating a rigid workflow that guides the test developer through different steps of the design process.

The benefit of these resources is that they can help guide new test developers through basic steps and considerations as they build their instruments. It is certainly a step up from a company compiling a bunch of multiple choice questions on the fly and setting a cut score of 70% without any backing theory or test purpose. On the other hand, test development is supposed to be an iterative process, and without the flexibility to explore the nuances and complexities of the instrument, the results and the inferences may fall short of their targets. An overly simple, standardized checklist for developing or evaluating assessments may not consider an organization’s specific measurement needs, and the program may be left with considerable blind spots in its validity evidence.

Overall, I am glad to see that more organizations are wanting to improve the quality of their measurements, and it is encouraging to see more training resources to help new test developers tackle the learning curve. Checklists may be a very helpful tool for a lot of applications, and test developers frequently create their own checklists to standardize practices within their organization, like item reviews.

What do our readers think? Are checklists the way to go? Do you use a checklist from another organization in your test development?

 

 

 

 

Item Development – Organizing a bias review committee (Part 1)

Austin Fossey-42Posted by Austin Fossey

Once the content review is completed, it is time to turn the items over to a bias review committee. In previous posts, we have talked about methods for detecting bias in item performance using DIF analysis, but DIF analysis must be done after the item has already been delivered and item responses are available.

Your bias review committee is being tasked with identifying sources of bias before the assessment is ever delivered so that items can be edited or removed before presenting them to a participant sample (though you can conduct bias reviews at any stage of item development).

The Standards for Educational and Psychological Testing explain that bias occurs when the design of the assessment results in different interpretations of scores for subgroups of participants. This implies that some aspect of the assessment is impacting scores based on factors that are not related to the measured construct. This is called construct-irrelevant variance.

The Standards emphasize that a lack of bias is critical for supporting the overall fairness of the assessment, so your bias review committee will provide evidence to help demonstrate your compliance with the Standards. Before you convene your bias review committee, you should finalize a set of sensitivity guidelines that define the criteria for identifying sources of bias in your assessment.

As with your other committees, the members of this committee should be carefully selected based on their qualifications and representativeness, and they should not have been involved with any other test development processes like domain analysis, item writing, or content review. In his chapter in Educational Measurement (4th ed.), Gregory Camilli suggests building a committee of at least five to ten members who will be operating under the principle that “all students should be treated equitably.”

Camilli recommends carefully documenting all aspects of the bias review, including the qualifications and selection process for the committee members. The committee should be trained on the test specifications and the sensitivity guidelines that will inform their decisions. Just like item writing or content review trainings, it is helpful to have the committee practice with some examples before they begin their review.

Camilli suggests letting committee members review items on their own after they complete their training. This gives them each a chance to critique items based on their unique perspectives and understanding of your sensitivity guidelines. Once they have had time to review the items on their own, have your committee reconvene to discuss the items as a group. The committee should strive to reach a consensus on whether items should be retained, edited, or removed completely. If an item needs to be edited, they should document their recommendations for changes. If an item is edited or removed, be sure they document the rationale by relating their decision back to your sensitivity guidelines.

In the next post, I will talk about two facets of assessments that can result in bias (content and response process), and I will share some examples of publications that have recommendations for bias criteria you can use for your own sensitivity guidelines.

Check out our white paper: 5 Steps to Better Tests for best practice guidance and practical advice for the five key stages of test and exam development.

Austin Fossey will discuss test development at the 2015 Users Conference in Napa Valley, March 10-13. Register before Dec. 17 and save $200.

Item Development – Benefits of editing items before the review process

Austin FosseyPosted by Austin Fossey

Some test developers recommend a single round of item editing (or editorial review), usually right before items are field tested. When schedules and resources allow for it, I recommend that test developers conduct two rounds of editing—one right after the items are written and one after content and bias reviews are completed. This post addresses the first round of editing, to take place after items are drafted.

Why have two rounds of editing? In both rounds, we will be looking for grammar or spelling errors, but the first round serves as a filter to keep items with serious flaws from making it to content review or bias review.

In their chapter in Educational Measurement (4 th ed.), Cynthia Shmeiser and Catherine Welch explain that an early round of item editing “serves to detect and correct deficiencies in the technical qualities of the items and item pools early in the development process.” They recommend that test developers use this round of item editing to do a cursory review of whether the items meet the Standards for Educational and Psychological Testing.

Items that have obvious item writing flaws should be culled in the first round of item editing and either sent back to the item writers or removed. This may include item writing errors like cluing or having options that do not match the stem grammatically. Ideally, these errors will be caught and corrected in the drafting process, but a few items may have slipped through the cracks.

In the initial round of editing, we will also be looking for proper formatting of the items. Did the item writers use the correct item types for the specified content? Did they follow the formatting rules in our style guide? Is all supporting content (e.g., pictures, references) present in the item? Did the item writers record all of the metadata for the item, like its content area, cognitive level, or reference? Again, if an item does not match the required format, it should be sent back to the item writers or removed.

It is helpful to look for these issues before going to content review or bias review because these types of errors may distract your review committees from their tasks; the committees may be wasting time reviewing items that should not be delivered anyway due to formatting flaws. You do not want to get all the way through content and bias reviews only to find that a large number of your items have to be returned to the drafting process. We will discuss review committee processes in the following posts.

For best practice guidance and practical advice for the five key stages of test and exam development, check out our white paper: 5 Steps to Better Tests.

Item Development – Five Tips for Organizing Your Drafting Process

Austin FosseyPosted by Austin Fossey

Once you’ve trained your item writers, they are ready to begin drafting items. But how should you manage this step of the item development process?

There is an enormous amount of literature about item design and item writing techniques—which we will not cover in this series—but as Cynthia Shmeiser and Catherine Welch observe in their chapter in Educational Measurement (4th ed.), there is very little guidance about the item writing process. This is surprising, given that item writing is critical to effective test development.

It may be tempting to let your item writers loose in your authoring software with a copy of the test specifications and see what comes back, but if you invest time and effort in organizing your item drafting sessions, you are likely to retain more items and better support the validity of the results.

Here are five considerations for organizing item writing sessions:

  • Assignments – Shmeiser and Welch recommend giving each item writer a specific assignment to set expectations and to ensure that you build an item bank large enough to
    meet your test specifications. If possible, distribute assignments evenly so that no single author has undue influence over an entire area of your test specifications. Set realistic goals for your authors, keeping in mind that some of their items will likely be dropped later in item reviews.
  • Instructions – In the previous post, we mentioned the benefit of a style guide for keeping item formats consistent. You may also want to give item writers instructions or templates for specific item types, especially if you are working with complex item types. (You should already have defined the types of items that can be used to measure each area of your test specifications in advance.)
  • Monitoring – Monitor item writers’ progress and spot-check their work. This is not a time to engage in full-blown item reviews, but periodic checks can help you to provide feedback and correct misconceptions. You can also check in to make sure that the item writers are abiding by security policies and formatting guidelines. In some item writing workshops, I have also asked item writers to work in pairs to help check each other’s work.
  • Communication – With some item designs, several people may be involved in building the item. One team may be in charge of developing a scoring model, another team may draft content, and a third team may add resources or additional stimuli, like images or animations. These teams need to be organized so that materials are
    handed off on time, but they also need to be able to provide iterative feedback to each other. For example, if the content team finds a loophole in the scoring model, they need to be able to alert the other teams so that it can be resolved.
  • Be Prepared – Be sure to have a backup plan in case your item writing sessions hit a snag. Know what you are going to do if an item writer does not complete an assignment or if content is compromised.

Many of the details of the item drafting process will depend on your item types, resources, schedule, authoring software, and availability of item writers. Determine what you need to accomplish, and then organize your item writing sessions as much as possible so that you meet your goals.

In my next post, I will discuss the benefits of conducting an initial editorial review of the draft items before they are sent to review committees.

Item Development – Training Item Writers

Austin FosseyPosted by Austin Fossey

Once we have defined the purpose of the assessment, completed our domain analysis, and finalized a test blueprint, we might be eager to jump right in to item writing, but there is one important step to take before we begin: training!

Unless you are writing the entire assessment yourself, you will need a group of item writers to develop the content. These item writers are likely experts in their fields, but they may have very little understanding of how to create assessment content. Even if these experts have experience writing items, it may be beneficial to provide refresher trainings, especially if anything has changed in your assessment design.

In their chapter in Educational Measurement (4 th ed.), Cynthia Shmeiser and Catherine Welch note that it is important to consider the qualifications and representativeness of your item writers. It is common to ask item writers to fill out a brief survey to collect demographic information. You should keep these responses on file and possibly add a brief document explaining why you consider these item writers to be a qualified and representative sample.

Shmeiser and Welch also underscore the need for security. Item writers should be trained on your content security guidelines, and your organization may even ask them to sign an agreement stating that they will abide by those guidelines. Make sure everyone understands the security guidelines, and have a plan in place in case there are any violations.

Next, begin training your item writers on how to author items, which should include basic concepts about cognitive levels, drafting stems, picking distractors, and using specific item types appropriately. Shmeiser and Welch suggest that the test blueprint be used as the foundation of the training. Item writers should understand the content included in the specifications and the types of items they are expected to create for that content. Be sure to share examples of good and bad items.

If possible, ask your writers to create some practice items, then review their work and provide feedback. If they are using the item authoring software for the first time, be sure to acquaint them with the tools before they are given their item writing assignments.

Your item writers may also need training on your item data, delivery method, or scoring rules. For example, you may ask item writers to cite a reference for each item, or you might ask them to weight certain items differently. Your instructions need to be clear and precise, and you should spot check your item writers’ work. If possible, write a style guide that includes clear guidelines about item construction, such as fonts to use, acceptable abbreviations, scoring rules, acceptable item types, et cetera.

I know from my own experience (and Shmeiser and Welch agree) that investing more time in training will have a big payoff down the line. Better training leads to substantially better item retention rates when items are reviewed. If your item writers are not trained well, you may end up throwing out many of their items, which may not leave you enough for your assessment design. Considering the cost of item development and the time spent writing and reviewing items, putting in a few more hours of training can equal big savings for your program in the long run.

In my next post, I will discuss how to manage your item writers as they begin the important work of drafting the items.

Item Development – Managing the Process for Large-Scale Assessments

Austin FosseyPosted by Austin Fossey

Whether you work with low-stakes assessments, small-scale classroom assessments or large-scale, high-stakes assessment, understanding and applying some basic principles of item development will greatly enhance the quality of your results.

This is the first in a series of posts setting out item development steps that will help you create defensible assessments. Although I’ll be addressing the requirements of large-scale, high-stakes testing, the fundamental considerations apply to any assessment.

You can find previous posts here about item development including how to write items, review items, increase complexity, and avoid bias. This series will review some of what’s come before, but it will also explore new territory. For instance, I’ll discuss how to organize and execute different steps in item development with subject matter experts. I’ll also explain how to collect information that will support the validity of the results and the legal defensibility of the assessment.

In this series, I’ll take a look at:

Item Dev.

These are common steps (adapted from Crocker and Algina’s Introduction to Classical and Modern Test Theory) taken to create the content for an assessment. Each step requires careful planning, implementation, and documentation, especially for high-stakes assessments.

This looks like a lot of steps, but item development is just one slice of assessment development. Before item development can even begin, there’s plenty of work to do!

In their article, Design and Discovery in Educational Assessment: Evidence-Centered Design, Psychometrics, and Educational Data Mining, Mislevy, Behrens, Dicerbo, and Levy provide an overview of Evidence-Centered Design (ECD). In ECD, test developers must define the purpose of the assessment, conduct a domain analysis, model the domain, and define the conceptual assessment framework before beginning assessment assembly, which includes item development.

Once we’ve completed these preparations, we are ready to begin item development. In the next post, I will discuss considerations for training our item writers and item reviewers.