Item Development – Organizing a bias review committee (Part 2)

Austin Fossey-42Posted by Austin Fossey

The Standards for Educational and Psychological Testing describe two facets of an assessment that can result in bias: the content of the assessment and the response process. These are the areas on which your bias review committee should focus. You can read Part 1 of this post, here.

Content bias is often what people think of when they think about examples of assessment bias. This may pertain to item content (e.g., students in hot climates may have trouble responding to an algebra scenario about shoveling snow), but it may also include language issues, such as the tone of the content, differences in terminology, or the reading level of the content. Your review committee should also consider content that might be offensive or trigger an emotional response from participants. For example, if an item’s scenario described interactions in a workplace, your committee might check to make sure that men and women are equally represented in management roles.

Bias may also occur in the response processes. Subgroups may have differences in responses that are not relevant to the construct, or a subgroup may be unduly disadvantaged by the response format. For example, an item that asks participants to explain how they solved an algebra problem may be biased against participants for whom English is a second language, even though they might be employing the same cognitive processes as other participants to solve the algebra. Response process bias can also occur if some participants provide unexpected responses to an item that are correct but may not be accounted for in the scoring.

How do we begin to identify content or response processes that may introduce bias? Your sensitivity guidelines will depend upon your participant population, applicable social norms, and the priorities of your assessment program. When drafting your sensitivity guidelines, you should spend a good amount of time researching potential sources of bias that could manifest in your assessment, and you may need to periodically update your own guidelines based on feedback from your reviewers or participants.

In his chapter in Educational Measurement (4th ed.), Gregory Camilli recommends the chapter on fairness in the ETS Standards for Quality and Fairness and An Approach for Identifying and Minimizing Bias in Standardized Tests (Office for Minority Education) as sources of criteria that could be used to inform your own sensitivity guidelines. If you would like to see an example of one program’s sensitivity guidelines that are used to inform bias review committees for K12 assessment in the United States, check out the Fairness Guidelines Adopted by PARCC (PARCC), though be warned that the document contains examples of inflammatory content.

In the next post, I will discuss considerations for the final round of item edits that will occur before the items are field tested.

Check out our white paper: 5 Steps to Better Tests for best practice guidance and practical advice for the five key stages of test and exam development.

Austin Fossey will discuss test development at the 2015 Users Conference in Napa Valley, March 10-13. Register before Dec. 17 and save $200.

Item Development – Organizing a bias review committee (Part 1)

Austin Fossey-42Posted by Austin Fossey

Once the content review is completed, it is time to turn the items over to a bias review committee. In previous posts, we have talked about methods for detecting bias in item performance using DIF analysis, but DIF analysis must be done after the item has already been delivered and item responses are available.

Your bias review committee is being tasked with identifying sources of bias before the assessment is ever delivered so that items can be edited or removed before presenting them to a participant sample (though you can conduct bias reviews at any stage of item development).

The Standards for Educational and Psychological Testing explain that bias occurs when the design of the assessment results in different interpretations of scores for subgroups of participants. This implies that some aspect of the assessment is impacting scores based on factors that are not related to the measured construct. This is called construct-irrelevant variance.

The Standards emphasize that a lack of bias is critical for supporting the overall fairness of the assessment, so your bias review committee will provide evidence to help demonstrate your compliance with the Standards. Before you convene your bias review committee, you should finalize a set of sensitivity guidelines that define the criteria for identifying sources of bias in your assessment.

As with your other committees, the members of this committee should be carefully selected based on their qualifications and representativeness, and they should not have been involved with any other test development processes like domain analysis, item writing, or content review. In his chapter in Educational Measurement (4th ed.), Gregory Camilli suggests building a committee of at least five to ten members who will be operating under the principle that “all students should be treated equitably.”

Camilli recommends carefully documenting all aspects of the bias review, including the qualifications and selection process for the committee members. The committee should be trained on the test specifications and the sensitivity guidelines that will inform their decisions. Just like item writing or content review trainings, it is helpful to have the committee practice with some examples before they begin their review.

Camilli suggests letting committee members review items on their own after they complete their training. This gives them each a chance to critique items based on their unique perspectives and understanding of your sensitivity guidelines. Once they have had time to review the items on their own, have your committee reconvene to discuss the items as a group. The committee should strive to reach a consensus on whether items should be retained, edited, or removed completely. If an item needs to be edited, they should document their recommendations for changes. If an item is edited or removed, be sure they document the rationale by relating their decision back to your sensitivity guidelines.

In the next post, I will talk about two facets of assessments that can result in bias (content and response process), and I will share some examples of publications that have recommendations for bias criteria you can use for your own sensitivity guidelines.

Check out our white paper: 5 Steps to Better Tests for best practice guidance and practical advice for the five key stages of test and exam development.

Austin Fossey will discuss test development at the 2015 Users Conference in Napa Valley, March 10-13. Register before Dec. 17 and save $200.

Guidelines and standards for defensible assessments


Posted by Greg Pope

I have been asked on occasion what guidelines and standards are available to ensure that an assessment program aligns with best practices and is defensible.

Organizations conducting assessments undertake internal reviews of where their assessment program is in relation to internationally recognized guidelines on assessment. High -stakes organizations will at times hire companies that specialize in psychometric audits to conduct a thorough review of the assessment processes and practices within the organization. This will usually yield an audit scorecard outlining where an organization is doing well and where is should improve according to the guidelines.

The most common guidelines document to be used for these sorts of audits is the “Standards for Educational and Psychological Testing” . This document is organized into numbered sections, each of which provides details on what is expected of an assessment program in various areas. For example, section 8.7 states:
“Test takers should be made aware that having someone else take the test for them, disclosing confidential test material, or any other form of cheating is inappropriate and that such behavior may result in sanctions.”

During an audit an organization may get scored on how it performs in relation to each section of the document with information on where the organization performed well and where the organization performed poorly and needs to improve. So in the example above, if an organization has a clearly written candidate agreement in place that participants need to agree to before continuing to the assessment, the organization would obtain full marks for this standard. If the organization does not have a candidate agreement in place, this would be an area for improvement.

One can go through the entire standards document and create a checklist to record the status of where the organization fairs compared to each of the standards, as in the example below:

Other guidelines and standards that you can use to help benchmark and improve your assessment program are:

I hope this article was helpful!