Item Development Tips For Defensible Assessments

Julie ProfilePosted by Julie Delazyn

Whether you work with low-stakes assessments, small-scale classroom assessments or large-scale, high-stakes assessment, understanding and applying some basic principles of item development will greatly enhance the quality of your results.

What began as a popular 11-part blog series has morphed into a white paper: Managing Item Development for Large-Scale Assessment, which offers sound advice on how-to organize and execute item development steps that will help you create defensible assessments. These steps include:   Item Dev.You can download your copy of the complimentary white paper here: Managing Item Development for Large-Scale Assessment

New tools for building questions and assessments

Jim Farrell HeadshotPosted by Jim Farrell

If you are a Questionmark customer and aren’t using Questionmark Live, what are you waiting for?

More than 2000 of our customers have started using Questionmark Live this year, so I think now is a good time to call out some of the features that are making it a vital part of their assessment development processes.

Let’s start with building questions. One new tool our customers are using is the ability to add notes to a question. This allows reviewers to open questions and leave comments for content developers without changing the version of the question.

image 1

Now over to the assessment-building side of things. Our new assessment interface allows users to add questions in many different ways including single questions, entire topics, and random pull from a topic. You can even prevent participants from seeing repeated of questions during retakes when pulling questions at random. Jump blocks allow you to shorten test time or redirect for extra questions to participants who obtain a certain score. You can also easily tag questions as demographic questions so they can be used as filters in our reporting and analytics tools.

image 2

We have also added a more robust outcome capabilities to give your test administrators new tools for controlling how assessments are completed and reported. You can have multiple outcomes for different score bands, but you can also make it so participants have to get certain scores on particular topics before they can pass a test. For example, suppose you are giving a test on Microsoft Office and you set a pass score at 80%. You probably want to make sure that your participants understand all the products and don’t bomb one of them. You can set a prerequisite for each topic at 80% to make sure participants have knowledge of all areas before passing. If someone gets a 100% on Word questions and 60% on Excel questions, they would not pass. Powerful outcome controls help ensure you are truly measuring the goals of your learning organization.

image 3

If you aren’t using Questionmark Live you are missing out, as we are releasing new functionality every month. Get access and start getting your subject matter experts to contribute to your item banks.

Item Development – Managing the Process for Large-Scale Assessments

Austin FosseyPosted by Austin Fossey

Whether you work with low-stakes assessments, small-scale classroom assessments or large-scale, high-stakes assessment, understanding and applying some basic principles of item development will greatly enhance the quality of your results.

This is the first in a series of posts setting out item development steps that will help you create defensible assessments. Although I’ll be addressing the requirements of large-scale, high-stakes testing, the fundamental considerations apply to any assessment.

You can find previous posts here about item development including how to write items, review items, increase complexity, and avoid bias. This series will review some of what’s come before, but it will also explore new territory. For instance, I’ll discuss how to organize and execute different steps in item development with subject matter experts. I’ll also explain how to collect information that will support the validity of the results and the legal defensibility of the assessment.

In this series, I’ll take a look at:

Item Dev.

These are common steps (adapted from Crocker and Algina’s Introduction to Classical and Modern Test Theory) taken to create the content for an assessment. Each step requires careful planning, implementation, and documentation, especially for high-stakes assessments.

This looks like a lot of steps, but item development is just one slice of assessment development. Before item development can even begin, there’s plenty of work to do!

In their article, Design and Discovery in Educational Assessment: Evidence-Centered Design, Psychometrics, and Educational Data Mining, Mislevy, Behrens, Dicerbo, and Levy provide an overview of Evidence-Centered Design (ECD). In ECD, test developers must define the purpose of the assessment, conduct a domain analysis, model the domain, and define the conceptual assessment framework before beginning assessment assembly, which includes item development.

Once we’ve completed these preparations, we are ready to begin item development. In the next post, I will discuss considerations for training our item writers and item reviewers.

Assessment Report Design: Reporting Multiple Chunks of Information

Austin FosseyPosted by Austin Fossey

We have discussed aspects of report design in previous posts, but I was recently asked whether an assessment report should report just one thing or multiple pieces of information. My response is that it depends on the intended use of the assessment results, but in general, I find that a reporting tool is more useful for a stakeholder if it can report multiple things at once.

This is not to say that more data are always better. A report that is cluttered or that has too much information will be difficult to interpret, and users may not be able to fish out the data they need from the display. Many researchers recommend keeping simple, clean layouts for reports while efficiently displaying relevant information to the user (e.g., Goodman & Hambleton, 2004; Wainer, 1984).

But what information is relevant? Again, it will depend on the user and the use case for the assessment, but consider the types of data we have for an assessment. We have information about the participants, information about the administration, information about the content, and information about performance (e.g., scores). These data dimensions can each provide different paths of inquiry for someone making inferences about the assessment results.

There are times when we may only care about one facet of this datascape, but these data provide context for each other, and understanding that context provides a richer interpretation.

Hattie (2009) recommended that a report should have a major theme; that theme should be emphasized with between five to nine “chunks” of information. He also recommended
that the user have control of the report to be able to explore the data as desired.

Consider the Questionmark Analytics Score List Report: Assessment Results View. The major theme for the report is to communicate the scores of multiple participants. The report arguably contains five primary chunks of information: aggregate scores for groups of participants, aggregate score bands for groups of participants, scores for individual participants, score bands for individual participants, and information about the administration of the assessment to individual participants.

Through design elements and onscreen tools that give the user the ability to explore the data, this report with five chunks of information can provide context for each participant’s score. The user can sort participants to find the high- and low-performing participants, compare a participant to the entire sample of participants, or compare the participant to their group’s performance. The user can also compare the performance of groups of participants to see if certain groups are performing better than others.

rep 1

Assessment Results View in the Questionmark Analytics Score List Report

Online reporting also makes it easy to let users navigate between related reports, thus expanding the power of the reporting system. In the Score List Report, the user can quickly jump from Assessment Results to Topic Results or Item Results to make comparisons at different levels of the content. Similar functionality exists in the Questionmark Analytics Item Analysis Report, which allows the user to navigate directly from a Summary View comparing item statistics for different items to an Item Detail view that provides a more granular look at item performance through interpretive text and an option analysis table.

Analyzing multiple groups with the JTA Demographic Report

Austin FosseyPosted by Austin Fossey

In my previous post, I talked about how the Job Task Analysis (JTA) Summary Report can be used by subject matter experts (SMEs) to inform their decisions about what content to include in an assessment.

In many JTA studies, we might survey multiple populations of stakeholders who may have different opinions about what content should be on the assessment. The populations we select will be guided by theory or previous research. For example, for a certification assessment, we might survey the practitioners who will be candidates for certification, their managers, and their clients—because our subject matter experts theorize that each of these populations will have different yet relevant opinions about what a competent candidate must know and be able to do in order to be certified.

Instead of requiring you to create multiple JTA survey instruments for each population in the study, Questionmark Analytics allows you to analyze the responses from different groups of survey participants using the JTA Demographic Report.

This report provides demographic comparisons of aggregated JTA responses for each of the populations in the study. Users can simply add a demographic question to their survey so that this information can be used by the JTA Demographic Report. In our earlier example, we might ask survey participants to identify themselves as a practitioner, manager, or client, and then this data would be used to compare results in the report.

As with the JTA Summary Report, there are no requirements for how SMEs must use these data. The interpretations will either be framed out by the test developer using theory or prior research, or the interpretations will be left completely to the SMEs’ expert judgment.

SMEs might wish to investigate topics where populations differed in their ratings, or they may wish to select only those topics where there was universal agreement. They may wish to prioritize or weight certain populations’ opinions, especially if a population is less knowledgeable about the content than others.

The JTA Demographic Report provides a frequency distribution table for each task on the survey, organized by dimension. A chart gives a visual indicator to show differences in response distributions between groups.

JTA2

Response distribution table and chart comparing JTA responses from nurses and doctors using the Questionmark JTA Demographic Report.

Job Task Analysis Summary Report

Austin FosseyPosted by Austin Fossey

There are several ways to determine content for an assessment blueprint and ultimately for the assessment instrument itself. A job task analysis (JTA) study, as explained by Jim Farrell in a previous post, is one commonly used method to describe potential topics or tasks that need to be assessed to determine if a participant meets minimum qualifications within an area of practice.

In their chapter in Educational Measurement (4 th ed.), Clauser, Margolis, and Chase describe the work that must go in to culling the initial list of topics down to a manageable, relevant list that will be used as the foundation for the test blueprint.

Consider a JTA survey that asks participants to rate the difficulty, importance, and frequency of a list of tasks related to a specific job. Subject matter experts (SMEs) must decide how to interpret the survey results to make decisions about which topics stay and which ones go.

For example, there may be a JTA that surveys employees about potential assessment topics and tasks for an assessment about the safe operation of machinery at the job site. One task relates to being able to hit the emergency shutoff in case something goes wrong. The JTA results may show that respondents think this is very important to know, but it is not something they do very frequently because there are rarely emergency situations that would warrant this action. Similarly, there may a task related to turning the machine on. The respondents may indicate that this is important and something that is done on a daily basis, but it is also very easy to do.

There is no all-encompassing rule for how SMEs should determine which tasks and topics to include in the assessment. It often comes down to having the SMEs discuss the merits of each task, with each SME making a recommendation informed by their own experience and expertise. Reporting the results of the JTA survey will give the SMEs context for their decision-making, much like providing impact data in a standard-setting study.

Questionmark Analytics currently provides two JTA reports: the JTA Summary Report, and the JTA Demographic Report. Today, we will focus on the JTA Summary Report.

This report uses the same assessment selection and result filtering tools that are used throughout Analytics. Users can report on different revisions of the JTA survey and filter by groups, dates, and special field values.

The current JTA survey item only supports categorical and ordinal response data, so the JTA Summary Report provides a table showing the frequency distribution of responses for each task by each of the dimensions (e.g., difficulty, importance, frequency) defined by the test developer in the JTA item.

These response patterns can help SMEs decide which tasks will be assessed and which ones are not required for a valid evaluation of participants.

JTA

Response distribution table for a JTA for medical staff using the Questionmark JTA Summary Report.