Item Development – Organizing a bias review committee (Part 2)

Austin Fossey-42Posted by Austin Fossey

The Standards for Educational and Psychological Testing describe two facets of an assessment that can result in bias: the content of the assessment and the response process. These are the areas on which your bias review committee should focus. You can read Part 1 of this post, here.

Content bias is often what people think of when they think about examples of assessment bias. This may pertain to item content (e.g., students in hot climates may have trouble responding to an algebra scenario about shoveling snow), but it may also include language issues, such as the tone of the content, differences in terminology, or the reading level of the content. Your review committee should also consider content that might be offensive or trigger an emotional response from participants. For example, if an item’s scenario described interactions in a workplace, your committee might check to make sure that men and women are equally represented in management roles.

Bias may also occur in the response processes. Subgroups may have differences in responses that are not relevant to the construct, or a subgroup may be unduly disadvantaged by the response format. For example, an item that asks participants to explain how they solved an algebra problem may be biased against participants for whom English is a second language, even though they might be employing the same cognitive processes as other participants to solve the algebra. Response process bias can also occur if some participants provide unexpected responses to an item that are correct but may not be accounted for in the scoring.

How do we begin to identify content or response processes that may introduce bias? Your sensitivity guidelines will depend upon your participant population, applicable social norms, and the priorities of your assessment program. When drafting your sensitivity guidelines, you should spend a good amount of time researching potential sources of bias that could manifest in your assessment, and you may need to periodically update your own guidelines based on feedback from your reviewers or participants.

In his chapter in Educational Measurement (4th ed.), Gregory Camilli recommends the chapter on fairness in the ETS Standards for Quality and Fairness and An Approach for Identifying and Minimizing Bias in Standardized Tests (Office for Minority Education) as sources of criteria that could be used to inform your own sensitivity guidelines. If you would like to see an example of one program’s sensitivity guidelines that are used to inform bias review committees for K12 assessment in the United States, check out the Fairness Guidelines Adopted by PARCC (PARCC), though be warned that the document contains examples of inflammatory content.

In the next post, I will discuss considerations for the final round of item edits that will occur before the items are field tested.

Check out our white paper: 5 Steps to Better Tests for best practice guidance and practical advice for the five key stages of test and exam development.

Austin Fossey will discuss test development at the 2015 Users Conference in Napa Valley, March 10-13. Register before Dec. 17 and save $200.

Randy Bennett of ETS says seize the opportunity to improve assessment

Posted by John Kleeman

Randy Bennett (who holds the Frederiksen Chair in Assessment Innovation at ETS – Educational Testing Service) is one of the world’s experts on computerizing assessments. I very much enjoyed his recent keynote at the International Computer Assisted Assessment Conference. With Dr. Bennett’s permission, here is a summary of his presentation.

Randy Bennett at International CAA Conference (CAA 2011)

His key proposal is that we should not use technology in assessment because it is cool or because it is efficient, but to make assessment better. “If we focus on efficiency, we may end up with nothing more than the ability to create existing tests faster, cheaper, and in greater numbers without necessarily making them better.”

Here are his 11 propositions for what technology in assessment should do:

1. Give students more meaningful assessment tasks than are feasible through traditional approaches.

2. Model good instructional practice, including encouraging habits of mind common to proficient performers in the domain.

3. Assess important competencies not measured well in conventional form, e.g. simulations or using a spreadsheet.

4. Measure “problem-solving with technology”, given that the workplace typically requires use of technology.

5. Collect response information that can enlighten substantive interpretation (e.g. the time taken to answer questions).

6. Make assessment fairer for all students including those with disabilities and for non-native language speakers.

7. Explore new approaches to adaptive testing to assess authentically the full range of important competencies not just the middle ranges.

8. Measure more frequently, aggregating information over time to form a summative judgement.

9. Improve the substantive aspects of scoring, for instance use technology to make scoring more effective.

10. Report assessment results in a timely and instructionally actionable manner, including pointing to likely next steps and instructional materials for them.

11. Help teachers and students understand the characteristics of good performance by participating in onscreen marking – for instance mark work and have others review your marks to help you develop understanding.

You can see the full keynote presentation here with several screenshots illustrating what can be done.

I believe that good computerized assessment does much more than simply computerize paper practices, and it’s great to see this thoughtful call to action.