Question Type Report: Use Cases

Austin Fossey-42Posted by Austin Fossey

A client recently asked me if there is a way to count the number of each type of item in their item bank, so I pointed them toward the Question Type Report in Questionmark Analytics. While this type of frequency data can also be easily pulled using our Results API, it can be useful to have a quick overview of the number of items (split out by item type) in the item bank.

The Question Type Report does not need to be run frequently (and Analytics usage stats reflect that observation), but the data can help indicate the robustness of an item bank.

This report is most valuable in situations involving topics for a specific assessment or set of related assessments. While it might be nice to know that we have a total of 15,000 multiple choice (MC) items in the item bank, these counts are trivial unless we have a system-wide practical application—for example planning a full program translation or selling content to a partner.

This report can provide a quick profile of the population of the item bank or a topic when needed, though more detailed item tracking by status, topic, metatags, item type, and exposure is advisable for anyone managing a large-scale item development project. Below are some potential use cases for this simple report.

Test Development and Maintenance:
The Question Type Report’s value is primarily its ability to count the number of each type of item within a topic. If we know we have 80 MC items in a topic for a new assessment, and they all need be reviewed by a bias committee, then we can plan accordingly.

Form Building:
If we are equating multiple forms using a common-item design, the report can help us determine how many items go on each form and the degree to which the forms can overlap. Even if we only have one form, knowing the number of items can help a test developer check that enough items are available to match the blueprint.

Item Development:
If the report indicates that there are plenty of MC items ready for future publications, but we only have a handful of essay items to cover our existing assessment form, then we might instruct item writers to focus on developing new essay questions for the next publication of the assessment.

Question type

Example of a Question Type Report showing the frequency distribution by item type.

 

Item Development – Organizing a bias review committee (Part 2)

Austin Fossey-42Posted by Austin Fossey

The Standards for Educational and Psychological Testing describe two facets of an assessment that can result in bias: the content of the assessment and the response process. These are the areas on which your bias review committee should focus. You can read Part 1 of this post, here.

Content bias is often what people think of when they think about examples of assessment bias. This may pertain to item content (e.g., students in hot climates may have trouble responding to an algebra scenario about shoveling snow), but it may also include language issues, such as the tone of the content, differences in terminology, or the reading level of the content. Your review committee should also consider content that might be offensive or trigger an emotional response from participants. For example, if an item’s scenario described interactions in a workplace, your committee might check to make sure that men and women are equally represented in management roles.

Bias may also occur in the response processes. Subgroups may have differences in responses that are not relevant to the construct, or a subgroup may be unduly disadvantaged by the response format. For example, an item that asks participants to explain how they solved an algebra problem may be biased against participants for whom English is a second language, even though they might be employing the same cognitive processes as other participants to solve the algebra. Response process bias can also occur if some participants provide unexpected responses to an item that are correct but may not be accounted for in the scoring.

How do we begin to identify content or response processes that may introduce bias? Your sensitivity guidelines will depend upon your participant population, applicable social norms, and the priorities of your assessment program. When drafting your sensitivity guidelines, you should spend a good amount of time researching potential sources of bias that could manifest in your assessment, and you may need to periodically update your own guidelines based on feedback from your reviewers or participants.

In his chapter in Educational Measurement (4th ed.), Gregory Camilli recommends the chapter on fairness in the ETS Standards for Quality and Fairness and An Approach for Identifying and Minimizing Bias in Standardized Tests (Office for Minority Education) as sources of criteria that could be used to inform your own sensitivity guidelines. If you would like to see an example of one program’s sensitivity guidelines that are used to inform bias review committees for K12 assessment in the United States, check out the Fairness Guidelines Adopted by PARCC (PARCC), though be warned that the document contains examples of inflammatory content.

In the next post, I will discuss considerations for the final round of item edits that will occur before the items are field tested.

Check out our white paper: 5 Steps to Better Tests for best practice guidance and practical advice for the five key stages of test and exam development.

Austin Fossey will discuss test development at the 2015 Users Conference in Napa Valley, March 10-13. Register before Dec. 17 and save $200.

Item Development – Benefits of editing items before the review process

Austin FosseyPosted by Austin Fossey

Some test developers recommend a single round of item editing (or editorial review), usually right before items are field tested. When schedules and resources allow for it, I recommend that test developers conduct two rounds of editing—one right after the items are written and one after content and bias reviews are completed. This post addresses the first round of editing, to take place after items are drafted.

Why have two rounds of editing? In both rounds, we will be looking for grammar or spelling errors, but the first round serves as a filter to keep items with serious flaws from making it to content review or bias review.

In their chapter in Educational Measurement (4 th ed.), Cynthia Shmeiser and Catherine Welch explain that an early round of item editing “serves to detect and correct deficiencies in the technical qualities of the items and item pools early in the development process.” They recommend that test developers use this round of item editing to do a cursory review of whether the items meet the Standards for Educational and Psychological Testing.

Items that have obvious item writing flaws should be culled in the first round of item editing and either sent back to the item writers or removed. This may include item writing errors like cluing or having options that do not match the stem grammatically. Ideally, these errors will be caught and corrected in the drafting process, but a few items may have slipped through the cracks.

In the initial round of editing, we will also be looking for proper formatting of the items. Did the item writers use the correct item types for the specified content? Did they follow the formatting rules in our style guide? Is all supporting content (e.g., pictures, references) present in the item? Did the item writers record all of the metadata for the item, like its content area, cognitive level, or reference? Again, if an item does not match the required format, it should be sent back to the item writers or removed.

It is helpful to look for these issues before going to content review or bias review because these types of errors may distract your review committees from their tasks; the committees may be wasting time reviewing items that should not be delivered anyway due to formatting flaws. You do not want to get all the way through content and bias reviews only to find that a large number of your items have to be returned to the drafting process. We will discuss review committee processes in the following posts.

For best practice guidance and practical advice for the five key stages of test and exam development, check out our white paper: 5 Steps to Better Tests.

Item Development – Five Tips for Organizing Your Drafting Process

Austin FosseyPosted by Austin Fossey

Once you’ve trained your item writers, they are ready to begin drafting items. But how should you manage this step of the item development process?

There is an enormous amount of literature about item design and item writing techniques—which we will not cover in this series—but as Cynthia Shmeiser and Catherine Welch observe in their chapter in Educational Measurement (4th ed.), there is very little guidance about the item writing process. This is surprising, given that item writing is critical to effective test development.

It may be tempting to let your item writers loose in your authoring software with a copy of the test specifications and see what comes back, but if you invest time and effort in organizing your item drafting sessions, you are likely to retain more items and better support the validity of the results.

Here are five considerations for organizing item writing sessions:

  • Assignments – Shmeiser and Welch recommend giving each item writer a specific assignment to set expectations and to ensure that you build an item bank large enough to
    meet your test specifications. If possible, distribute assignments evenly so that no single author has undue influence over an entire area of your test specifications. Set realistic goals for your authors, keeping in mind that some of their items will likely be dropped later in item reviews.
  • Instructions – In the previous post, we mentioned the benefit of a style guide for keeping item formats consistent. You may also want to give item writers instructions or templates for specific item types, especially if you are working with complex item types. (You should already have defined the types of items that can be used to measure each area of your test specifications in advance.)
  • Monitoring – Monitor item writers’ progress and spot-check their work. This is not a time to engage in full-blown item reviews, but periodic checks can help you to provide feedback and correct misconceptions. You can also check in to make sure that the item writers are abiding by security policies and formatting guidelines. In some item writing workshops, I have also asked item writers to work in pairs to help check each other’s work.
  • Communication – With some item designs, several people may be involved in building the item. One team may be in charge of developing a scoring model, another team may draft content, and a third team may add resources or additional stimuli, like images or animations. These teams need to be organized so that materials are
    handed off on time, but they also need to be able to provide iterative feedback to each other. For example, if the content team finds a loophole in the scoring model, they need to be able to alert the other teams so that it can be resolved.
  • Be Prepared – Be sure to have a backup plan in case your item writing sessions hit a snag. Know what you are going to do if an item writer does not complete an assignment or if content is compromised.

Many of the details of the item drafting process will depend on your item types, resources, schedule, authoring software, and availability of item writers. Determine what you need to accomplish, and then organize your item writing sessions as much as possible so that you meet your goals.

In my next post, I will discuss the benefits of conducting an initial editorial review of the draft items before they are sent to review committees.

Item Development – Training Item Writers

Austin FosseyPosted by Austin Fossey

Once we have defined the purpose of the assessment, completed our domain analysis, and finalized a test blueprint, we might be eager to jump right in to item writing, but there is one important step to take before we begin: training!

Unless you are writing the entire assessment yourself, you will need a group of item writers to develop the content. These item writers are likely experts in their fields, but they may have very little understanding of how to create assessment content. Even if these experts have experience writing items, it may be beneficial to provide refresher trainings, especially if anything has changed in your assessment design.

In their chapter in Educational Measurement (4 th ed.), Cynthia Shmeiser and Catherine Welch note that it is important to consider the qualifications and representativeness of your item writers. It is common to ask item writers to fill out a brief survey to collect demographic information. You should keep these responses on file and possibly add a brief document explaining why you consider these item writers to be a qualified and representative sample.

Shmeiser and Welch also underscore the need for security. Item writers should be trained on your content security guidelines, and your organization may even ask them to sign an agreement stating that they will abide by those guidelines. Make sure everyone understands the security guidelines, and have a plan in place in case there are any violations.

Next, begin training your item writers on how to author items, which should include basic concepts about cognitive levels, drafting stems, picking distractors, and using specific item types appropriately. Shmeiser and Welch suggest that the test blueprint be used as the foundation of the training. Item writers should understand the content included in the specifications and the types of items they are expected to create for that content. Be sure to share examples of good and bad items.

If possible, ask your writers to create some practice items, then review their work and provide feedback. If they are using the item authoring software for the first time, be sure to acquaint them with the tools before they are given their item writing assignments.

Your item writers may also need training on your item data, delivery method, or scoring rules. For example, you may ask item writers to cite a reference for each item, or you might ask them to weight certain items differently. Your instructions need to be clear and precise, and you should spot check your item writers’ work. If possible, write a style guide that includes clear guidelines about item construction, such as fonts to use, acceptable abbreviations, scoring rules, acceptable item types, et cetera.

I know from my own experience (and Shmeiser and Welch agree) that investing more time in training will have a big payoff down the line. Better training leads to substantially better item retention rates when items are reviewed. If your item writers are not trained well, you may end up throwing out many of their items, which may not leave you enough for your assessment design. Considering the cost of item development and the time spent writing and reviewing items, putting in a few more hours of training can equal big savings for your program in the long run.

In my next post, I will discuss how to manage your item writers as they begin the important work of drafting the items.

Item Development – Managing the Process for Large-Scale Assessments

Austin FosseyPosted by Austin Fossey

Whether you work with low-stakes assessments, small-scale classroom assessments or large-scale, high-stakes assessment, understanding and applying some basic principles of item development will greatly enhance the quality of your results.

This is the first in a series of posts setting out item development steps that will help you create defensible assessments. Although I’ll be addressing the requirements of large-scale, high-stakes testing, the fundamental considerations apply to any assessment.

You can find previous posts here about item development including how to write items, review items, increase complexity, and avoid bias. This series will review some of what’s come before, but it will also explore new territory. For instance, I’ll discuss how to organize and execute different steps in item development with subject matter experts. I’ll also explain how to collect information that will support the validity of the results and the legal defensibility of the assessment.

In this series, I’ll take a look at:

Item Dev.

These are common steps (adapted from Crocker and Algina’s Introduction to Classical and Modern Test Theory) taken to create the content for an assessment. Each step requires careful planning, implementation, and documentation, especially for high-stakes assessments.

This looks like a lot of steps, but item development is just one slice of assessment development. Before item development can even begin, there’s plenty of work to do!

In their article, Design and Discovery in Educational Assessment: Evidence-Centered Design, Psychometrics, and Educational Data Mining, Mislevy, Behrens, Dicerbo, and Levy provide an overview of Evidence-Centered Design (ECD). In ECD, test developers must define the purpose of the assessment, conduct a domain analysis, model the domain, and define the conceptual assessment framework before beginning assessment assembly, which includes item development.

Once we’ve completed these preparations, we are ready to begin item development. In the next post, I will discuss considerations for training our item writers and item reviewers.