5 Steps to Better Tests

Julie ProfilePosted by Julie Delazyn

Creating fair, valid and reliable tests requires starting off right: with careful planning. Starting with that foundation, you will save time and effort while producing tests that yield trustworthy results.five steps white paper

Five essential steps for producing high-quality tests:

1. Plan: What elements must you consider before crafting the first question? How do you identify key content areas?

2. Create: How do you write items that increase the cognitive load, avoid bias and stereotyping?

3. Build: How should you build the test form and set accurate pass/ fail scores?

4. Deliver: What methods can be implemented to protect test content and discourage cheating?

5. Evaluate: How do you use item-, topic-, and test-level data to assess reliability and improve quality?

Download this complimentary white paper full of best practices for test design, delivery and evaluation.

 

Agree or disagree? 10 tips for better surveys — part 3

John Kleeman HeadshotPosted by John Kleeman

This is the third and last post in my “Agree or disagree” series on writing effective attitude surveys. In the first post I explained the process survey participants go through when answering questions and the concept of satisficing – where some participants give what they think is a satisfactory answer rather than stretching themselves to give the best answer.

In the second post I shared these five tips based on research evidence on question and survey design.

Tip #1 – Avoid Agree/Disagree questions

Tip #2 – Avoid Yes/No and True/False questions

Tip #3 – Each question should address one attitude only

Tip #4 – Minimize the difficulty of answering each question

Tip #5 – Randomize the responses if order is not important

Here are five more:

Tip #6 –  Pretest your survey

Just as with tests and exams, you need to pretest or pilot your survey before it goes live. Participants may interpret questions differently than you intended. It’s important to get the language right so as to trigger in the participant the right judgement. Here are some good pre-testing methods:

  • Get a peer or expert to review the survey.
  • Pre-test with participants and measuring the response time for each question (shown in some Questionmark reports). A longer response time could be connected with a more confusing question.
  • Allow participants to provide comments on questions they think they are confusing.
  • Follow up with your pretesting group by asking them why they gave particular answers or asking them what they thought you meant by your  questions.

Tip #7 – Make survey participants realize how useful the survey is

The more motivated a participant is, the more likely he or she is to answer optimally rather than just satisficing and choosing a good enough answer. To quote Professor Krosnick in his paper The Impact of Satisficing on Survey Data Quality:

“Motivation to optimize is likely to be greater among respondents who think that the survey in which they are participating is important and/or useful”

Ensure that you communicate the goal of the survey and make participants feel that filling it in usefully will be a benefit to something they believe in or value.

Tip #8. Don’t include a “don’t know” option

Including a “don’t know” option usually does not improve the accuracy of your survey. In most cases it reduces it. To those of us used to the precision of testing and assessment, this is surprising.

Part of the reason is that providing a “don’t know” or “no opinion” option allows participants to disengage from your survey and so diminishes useful responses. Also,  people are better at guessing or estimating than they think they are, so they will tend to choose an appropriate answer if they do not have an option of “don’t know”. See this paper by Mondak and Davis, which illustrates this in the political field.

Tip #9. Ask questions about the recent past only

The further back in time they are asked to remember, the less accurately participants will answer your questions. We all have a tendency to “telescope” the timing of events and imagine that things happened earlier or later than they did. If you can, ask about the last week or the last month, not about the last year or further back.

Picture of a trends graphTip #10 – Trends are good

Error can creep into survey results in many ways. Participants can misunderstand the question. They can fail to recall the right information. Their judgement can be influenced by social pressures. And they are limited by the choices available. But if you use the same questions over time with a similar population, you can be pretty sure that changes over time are meaningful.

For example, if you deliver an employee attitude survey with the same questions for two years running, then changes in the results to a question (if statistically significant) probably mean a change in employee attitudes. If you can use the same or similar questions over time and can identify trends or changes in results, such data can be very trustworthy.

I hope you’ve found this series of articles useful.  For more information on how Questionmark can help you create, deliver and report on surveys, see www.questionmark.com. I’ll also be presenting at Questionmark’s 2016 Conference: Shaping the Future of Assessment in Miami April 12-15. Check out the conference page for more information.

New white paper: Assessment Results You Can Trust

John Kleeman HeadshotPosted by John Kleeman

Questionmark published an important white paper about why trustable assessment results matter and about how an assessment management system like Questionmark’s can help you make your assessments valid and reliable — and therefore trustable.

The white paper, which I wrote together with Questionmark CEO Eric Shepherd, explains that trustable assessment results must be both valid (measuring what you are looking for them to measure) and reliable (consistently measuring what you want to be measured).

The paper draws upon the metaphor of a doctor using results from a blood test to diagnose an illness and then prescribe a remedy. Delays will occur if the doctor orders the wrong test, and serious consequences could result if the test’s results are untrustworthy. Using this metaphor, it is easy to understand the personnel and organizational risks that can stem from making decisions based on untrustworthy results. If you assesses someone’s knowledge, skill or competence for health and safety or regThe 6 stages of trustable results; Planning assessment, Authoring items, Assembling assessment, Pilot and review, Delivery, Analyze resultsulatory compliance purposes, you need to ensure that your assessment instrument is designed correctly and runs consistently.

Engaging subject matter experts to generate questions to measure the knowledge, skills and abilities required to perform essential tasks of the job is essential in creating the initial pool of questions. However, subject matter experts are not necessarily experts in writing good questions, so an effective authoring system requires a quality control process which allows assessment experts (e.g. instructional designers or psychometricians) to easily review and amend assessment items.

For assessments to be valid and reliable, it’s necessary to follow structured processes at each step from planning through authoring to delivery and reporting.

The white paper covers these six stages of the assessment process:

  • Planning assessment
  • Authoring items
  • Assembling assessment
  • Pilot and review
  • Delivery
  • Analyze results

Following the advice in the white paper and using the capabilities it describes will help you produce assessments that are more valid and reliable — and hence more trustable.
Modern organizations need their people to be competent.

Would you be comfortable in a high-rise building designed by an unqualified architect? Would you fly in a plane whose pilot hadn’t passed a flying test? Would you let someone operate a machine in your factory if they didn’t know what to do if something went wrong? Would you send a sales person out on a call  if they didn’t know what your products do? Can you demonstrate to a regulatory authority that your staff are competent and fit for their jobs if you do not have trustable assessments?

In all these cases and many more, it’s essential to have a reliable and valid test of competence. If you do not ensure that your workforce is qualified and competent, then you should not be surprised if your employees have accidents, cause your organization to be fined for regulatory infractions, give poor customer service or can’t repair systems effectively.

To download the white paper, click here.

John will be talking more about trustable assessments at our 2015 Users Conference in Napa next month. Register today for the full conference, but if you cannot make it, make sure to catch the live webcast.

Item Development – Conducting the final editorial review

Austin Fossey-42Posted by Austin Fossey

Once you have completed your content review and bias review, it is best to conduct a final editorial review.

You may have already conducted an editorial review prior to the content and bias reviews to cull items with obvious item-writing flaws or inappropriate item types—so by the time you reach this second editorial review, your items should only need minor edits.

This is the time to put the final polish on all of your items. If your content review committee and bias review committee were authorized to make changes to the items, go back and make sure they followed your style guide and that they used accurate grammar and spelling. Make sure they did not make any drastic changes that violate your test specifications, such as adding a fourth option to a multiple choice item that should only have three options.

If you have resources to do so, have professional editors review the items’ content. Ask the editors to identify issues with language, but review their suggestions rather than letting them make direct edits to the items. The editors may suggest changes that violate your style guide, they may not be familiar with language that is appropriate for your industry, or they may wish to make a change that would drastically impact the item content. You should carefully review their changes to make sure they are each appropriate.

As with other steps in the item development process, documentation and organization is key. Using item writing software like that provided by Questionmark can help you track revisions to items, document changes, and track your items to make sure each one is reviewed.

Do not approve items with a rubber stamp. If an item needs major content revisions, send it back to the item writers and begin the process again. Faulty items can undermine the validity of your assessment and can result in time-consuming challenges from participants. If you have planned ahead, you should have enough extra items to allow for some attrition while retaining enough items to meet your test specifications.

Finally, be sure that you have the appropriate stakeholders sign off on each item. Once the item passes this final editorial review, it should be locked down and considered ready to deliver to participants. Ideally, no changes should be made to items once they are in delivery, as this may impact how participants respond to the item and perform on the assessment. (Some organizations require senior executives to review and approve any requested changes to items that are already in delivery.)

When you are satisfied that the items are perfect, they are ready to be field tested. In the next post, I will talk about item try-outs, selecting a field test sample, assembling field test forms, and delivering the field test.

Check out our white paper: 5 Steps to Better Tests for best practice guidance and practical advice for the five key stages of test and exam development.

Austin Fossey will discuss test development at the 2015 Users Conference in Napa Valley, March 10-13. Register before Jan. 29 and save $100.

Item Development – Organizing a bias review committee (Part 2)

Austin Fossey-42Posted by Austin Fossey

The Standards for Educational and Psychological Testing describe two facets of an assessment that can result in bias: the content of the assessment and the response process. These are the areas on which your bias review committee should focus. You can read Part 1 of this post, here.

Content bias is often what people think of when they think about examples of assessment bias. This may pertain to item content (e.g., students in hot climates may have trouble responding to an algebra scenario about shoveling snow), but it may also include language issues, such as the tone of the content, differences in terminology, or the reading level of the content. Your review committee should also consider content that might be offensive or trigger an emotional response from participants. For example, if an item’s scenario described interactions in a workplace, your committee might check to make sure that men and women are equally represented in management roles.

Bias may also occur in the response processes. Subgroups may have differences in responses that are not relevant to the construct, or a subgroup may be unduly disadvantaged by the response format. For example, an item that asks participants to explain how they solved an algebra problem may be biased against participants for whom English is a second language, even though they might be employing the same cognitive processes as other participants to solve the algebra. Response process bias can also occur if some participants provide unexpected responses to an item that are correct but may not be accounted for in the scoring.

How do we begin to identify content or response processes that may introduce bias? Your sensitivity guidelines will depend upon your participant population, applicable social norms, and the priorities of your assessment program. When drafting your sensitivity guidelines, you should spend a good amount of time researching potential sources of bias that could manifest in your assessment, and you may need to periodically update your own guidelines based on feedback from your reviewers or participants.

In his chapter in Educational Measurement (4th ed.), Gregory Camilli recommends the chapter on fairness in the ETS Standards for Quality and Fairness and An Approach for Identifying and Minimizing Bias in Standardized Tests (Office for Minority Education) as sources of criteria that could be used to inform your own sensitivity guidelines. If you would like to see an example of one program’s sensitivity guidelines that are used to inform bias review committees for K12 assessment in the United States, check out the Fairness Guidelines Adopted by PARCC (PARCC), though be warned that the document contains examples of inflammatory content.

In the next post, I will discuss considerations for the final round of item edits that will occur before the items are field tested.

Check out our white paper: 5 Steps to Better Tests for best practice guidance and practical advice for the five key stages of test and exam development.

Austin Fossey will discuss test development at the 2015 Users Conference in Napa Valley, March 10-13. Register before Dec. 17 and save $200.

Item Development – Five Tips for Organizing Your Drafting Process

Austin FosseyPosted by Austin Fossey

Once you’ve trained your item writers, they are ready to begin drafting items. But how should you manage this step of the item development process?

There is an enormous amount of literature about item design and item writing techniques—which we will not cover in this series—but as Cynthia Shmeiser and Catherine Welch observe in their chapter in Educational Measurement (4th ed.), there is very little guidance about the item writing process. This is surprising, given that item writing is critical to effective test development.

It may be tempting to let your item writers loose in your authoring software with a copy of the test specifications and see what comes back, but if you invest time and effort in organizing your item drafting sessions, you are likely to retain more items and better support the validity of the results.

Here are five considerations for organizing item writing sessions:

  • Assignments – Shmeiser and Welch recommend giving each item writer a specific assignment to set expectations and to ensure that you build an item bank large enough to
    meet your test specifications. If possible, distribute assignments evenly so that no single author has undue influence over an entire area of your test specifications. Set realistic goals for your authors, keeping in mind that some of their items will likely be dropped later in item reviews.
  • Instructions – In the previous post, we mentioned the benefit of a style guide for keeping item formats consistent. You may also want to give item writers instructions or templates for specific item types, especially if you are working with complex item types. (You should already have defined the types of items that can be used to measure each area of your test specifications in advance.)
  • Monitoring – Monitor item writers’ progress and spot-check their work. This is not a time to engage in full-blown item reviews, but periodic checks can help you to provide feedback and correct misconceptions. You can also check in to make sure that the item writers are abiding by security policies and formatting guidelines. In some item writing workshops, I have also asked item writers to work in pairs to help check each other’s work.
  • Communication – With some item designs, several people may be involved in building the item. One team may be in charge of developing a scoring model, another team may draft content, and a third team may add resources or additional stimuli, like images or animations. These teams need to be organized so that materials are
    handed off on time, but they also need to be able to provide iterative feedback to each other. For example, if the content team finds a loophole in the scoring model, they need to be able to alert the other teams so that it can be resolved.
  • Be Prepared – Be sure to have a backup plan in case your item writing sessions hit a snag. Know what you are going to do if an item writer does not complete an assignment or if content is compromised.

Many of the details of the item drafting process will depend on your item types, resources, schedule, authoring software, and availability of item writers. Determine what you need to accomplish, and then organize your item writing sessions as much as possible so that you meet your goals.

In my next post, I will discuss the benefits of conducting an initial editorial review of the draft items before they are sent to review committees.