Planning the Test – Test Design & Delivery Part 1

Posted By Doug Peterson

A lot more goes into planning a test than just writing a few questions. Reliability and validity should be established right from the start. An assessment’s results are considered reliable if they are dependable, repeatable, and consistent. The assessment is deemed to be valid if it measures the specific knowledge and skills that it is meant to measure. Take a look at the following graphic from the Questionmark white paper, Assessments through the Learning Process.

An assessment can be consistent, meaning that a participant will receive the same basic score over multiple deliveries of the assessment, and that participants with similar knowledge levels will receive similar scores, yet not be valid if it doesn’t measure what it’s supposed to measure (Figure 1). This assessment contains well-written questions, but the questions don’t actually measure the desired knowledge, skill or attitude. An example would be a geometry exam that contains questions about European history. They could be absolutely excellent questions, very well-written with a perfect level of difficulty … but they don’t measure the participant’s knowledge of geometry.

If an assessment is not reliable, it can’t be valid (Figure 2). If five participants with similar levels of knowledge receive five very different scores, the questions are poorly written and probably confusing or misleading. In this situation, there’s no way the assessment can be considered to be measuring what it’s supposed to be measuring.

Figure 3 represents the goal of assessment writing – an assessment made up of well-written questions that deliver consistent scores AND accurately measure the knowledge they are meant to measure. In this situation, our geometry exam would contain well-written questions about geometry, and a participant who passes with flying colors would, indeed, possess a high level of knowledge about geometry.

For an assessment to be valid, the assessment designer needs to know not just the specific purpose of the assessment (e.g., geometry knowledge), they must understand the target population of participants as well.  Understanding the target population will help the designer ensure that the assessment is assessing what is supposed to be assessed and not extraneous information. Some things to take into account:

  • Job qualifications
  • Local laws/regulations
  • Company policies
  • Language localization
  • Reading level
  • Geographic dispersion
  •  Comfort with technology

For example, let’s say you’re developing an assessment that will be used in several different countries. You don’t want to include American slang in a test being delivered in France; at that point you’re not measuring subject matter knowledge, you’re measuring knowledge of American slang. Another example would be if you were developing an assessment to be taken by employees whose positions only require minimal reading ability. Using “fancy words” and complicated sentence structure would not be appropriate; the test should be written at the level of the participants to ensure that their knowledge of the subject matter is being tested, and not their reading comprehension skills.

In my next installment, we’ll take a look at identifying content areas to be tested.

Five faves: Top blog posts cover assessment hot topics and best practices

Posted by Julie Delazyn

Blogging makes it easy for us here at Questionmark to pass along news about assessment and tips about best practices.

As our readership continues to grow, so has the conversation on LinkedIn, Facebook, Twitter and, most recently, Google+ . What stories have generated the most buzz during the last couple of years? We thought it would be helpful to highlight our five most popular blog posts and make them all available to you on the same page:

Topic based feedback goes to the ball

Questionmark Chairman John Kleeman explains the untapped learning value in topic feedback.

12 Tips for Writing Good Test Questions

Writing effective questions takes time and practice. Joan Phaup highlights 12 tips for writing and reviewing reliable and defensible test questions.

How many items are needed for each topic in an assessment? How PwC decide.

John Kleeman takes a look PwC (PricewaterhouseCoopers)’s use of a five-stage model for diagnostic assessments and how it works.

Understanding Assessment Validity and Reliability

How can authors make sure they are producing valid, reliable assessments? I offer a few tips from the Questionmark White Paper, Assessments through the Learning Process.

What makes a good diagnostic question?

John Kleeman describes the common use of a diagnostic question as well as tactics for testing the quality of a question.

Feel free to comment and share. Let us know how these have helped you. And what are your favorite kinds of posts?

Webinar: Tips for improving test item quality and reducing time to market

Posted by Joan Phaup

Saving time, money and effort while at the same time improving the quality of assessments: what’s not to like?

Since we like this idea a great deal, we’re looking forward to a webinar next week by Tom Metzler from TIBCO Software, Inc. He will be telling how the company has achieved this during a free, hour-long presentation, Using Principles of Enterprise Architecture to Build Assessments

TIBCO’s knowledge assessments  cover everything from general software development concepts to the principles of computer science architecture. The company’s fast-changing technical environment makes for volatile subject matter: a single technical product enhancement can have a significant impact on an existing item bank. Hence the need for a solid but adaptable test creation process.

Tom will share some architecture basics and explain the drivers for using them in assessment development:

  • To improve the assessment development business process
  • To improve the information provided to SMEs
  • To document business processes

He’ll also share the results of doing this:

  • Higher item quality
  • Reduced time-to-market
  • A better business process
  • More relevant information provided to SMEs
  • An enhanced, automated item-level audit trail

If you are looking for a better, more efficient way to produce assessments, please join us at 1 p.m. Eastern Time on Thursday, August 2. Click here for details and registration.

You can get more background on this subject from my interview with Tom prior to this year’s Questionmark Users Conference

Preparing for exams like preparing for the Olympics?

Posted by John Kleeman

Staring out of the window from my desk in the Questionmark office in London, I can just about see the Olympics stadium. London has been preparing for the Olympics for years, and I hope you enjoy the show. Here’s a better view of the stadium than I have from my office!

 It struck me recently that there are many similarities in how Questionmark users prepare for exams and the exacting task of preparing for the Olympics. Questionmark is in no way associated with or connected with the London Olympics, but in the spirit of the games I’d like to share some thoughts.

Here are some similarities between preparing for the Olympics and preparing for exams.

Athletes prepare and practice for the Olympics, aiming to do their best at a key opportunity. Similarly, exam candidates prepare and practice, seeking to do their best.

At the Olympics, it’s essential to have a fair and open field, without any athlete being able to cheat or get an unfair advantage.  It’s the same with exams.

At the Olympics, organizers need to prepare in case things go wrong and plan for all eventualities. So, too, with computerized exams, we have to plan well in advance and think through contingencies to be sure that everything goes well on the day.

Unfortunately there are occasional athletes who try to cheat in the Olympic Games, and strong anti-cheating measures are taken. Likewise, in computerized assessment, we need to put measures in place to make it difficult to cheat.

The Olympics need accurate records of results, to prove who won which race and as evidence of achievement. And users of computerized exams rely on Questionmark or other software to achieve valid, reliable, trustable exam results.

And here is one thing that is very different.

Although the Olympics may be the greatest show in the world, they are ultimately about sporting prowess and our entertainment. The stakes at the Olympics are very high for athletes and their fans, but team or individual results do not significantly impact the world at large.

The stakes are far higher for exam takers and those around them. People who pass exams go on to be critical members of our society – medical professionals, university graduates, banking executives, IT specialists and more. Exams are used to qualify people for key roles in our society. Just like the Olympics, most exams are offered for public access and allow top performers to participate and demonstrate their capability. You may not have billions of television viewers for your assessments, but what you are doing in running assessments may be just as important to society as the Olympic Games.

I hope you enjoy and are inspired by the London Olympic Games.

Ten assessment types that can help mitigate risk

Posted by Julie Delazyn

Mitigating risk –- most notably the risk of non-compliance — is a key component of success in the financial services industry. Other risks abound, too, such as losing customers and/or good employees.

If employees don’t understand and follow the processes that organizations put in place to mitigate risk and maintain compliance, the risk of non-compliance increases – and a business is less likely to succeed.

Online assessments do a lot to help ensure that employees know the right procedures and follow them. Here are 10 assessment types that play essential roles here:

(1) Internal exams -– check your employees are competent

Some companies administer internal competency exams annually –- and do so more frequently. It’s also good to give these exams when regulations changed and new products are introduced. These exams address compliance with competency requirements and at the same time help employees prove they know how to do their jobs.

(2) Knowledge checks – confirm learning and document understanding

Running knowledge checks or post-course tests (also called Level 2s) right after training helps you find out whether the training has been understood. This also helps reduce forgetting.

(3) Needs analysis / diagnostic tests – allow testing out

These tests, which measure current skills and knowledge about particular topics, can be used as training needs assessments and/or pre-requisites for training. And if someone already has the critical skills and knowledge, he or she can “test out” and avoid unnecessary and costly training.

(4) Observational assessments – measure skills via role plays, customer visits

When checking practical skills, it’s common to have an observer monitor an employee to see if they are following correct procedures. With so many people using smartphones and tablets, such as the Apple iPad, it’s viable to use a mobile device for these assessments — which are great for measuring behavior, not just knowledge.

(5) Course evaluation surveys

These surveys, also called “level 1” or “smile sheet” surveys, let you check employee reaction following training. They are a key step in evaluating training effectiveness. You can also use them to gather qualitative information on topics, such as how well policies are applied in the field. Here is an example fragment from a course evaluation survey:

(6) Employee attitude surveys

Employee attitude surveys ask questions of your workforce or sections of it. HR department often use them to measure employee satisfaction, but they also can be used in corporate compliance to determine attitudes about ethical and cultural issues.

(7) Job task analysis surveys –- to fairly identify tasks against which to check compliance

How do you know that your competency assessments are valid and that they are addressing what is really needed for competence in a job role? A job task analysis (JTA) survey asks people who are experts in a job how important the task is for the job role and how often it is done. Analysis of JTA data lets you weight the number of questions associated with topics and tasks so that a competency test fairly measures the importance of different elements of a job role. Here is an extract from a typical JTA23:

(8) Practice tests

These often use questions that are retired from the exam question pool but remain valid. Candidates can take practice tests to assess their study needs and/or gain candidates experience with the technology and user interface before they take a real exam. This helps to reduce exam anxiety, and it’s important for less computer-literate candidates. Practice tests are also helpful when deploying new exam delivery technology.

(9) Formative quizzes during learning -– to help learning

These quizzes are those we are all familiar with: short quizzes during learning to inform instructors and learners about whether learners have understood the learning or need deeper instruction. Such quizzes can also diagnose misconceptions and also help reduce forgetting.

(10) 360-degree assessments of employees

A 360-degree assessment solicits opinions about an employee’s competencies from his/her superiors, reports and peers. It will usually cover job-specific competencies and general competencies such as integrity and communication skills. In compliance, such surveys allow you to potentially identify issues in people’s behavior and competencies that need review.

For more in-depth coverage of this subject, read our white paper,  The Role of Assessments in Mitigating Risk for Financial Services Organizations, which you can download free after login or sign-up.

Test Design and Delivery: Overview

Posted By Doug Peterson

I had the pleasure of attending an ASTD certification class on Test Design and Delivery in Denver, Colorado, several weeks ago (my wife said it was no big deal, as I’ve been certifiable for a long time now). I’m going to use my blog posts for the next couple of months to pass along the highlights of what I learned.

The content for the class was developed by the good folks at ACT. During our two days together we covered the following topics:

  1. Planning the Test
  2. Creating the Test Items
  3. Creating the Test Form
  4. Delivering the Test
  5. Evaluating the Test

Over the course of this blog series, we’ll take a look at the main points from each topic in the class. We’ll look at all the things that go into writing a test before the first question is crafted, like establishing reliability and validity from the beginning and identifying content areas to be covered (as well as the number of questions needed for each area).

Next we’ll discuss some best practices for writing test items, including increasing the cognitive load and avoiding bias and stereotypes. After that we’ll discuss pulling items together into a test form, including developing instructions and setting passing scores.

The last few blogs will focus on some things you need to look at when delivering a test like security and controlling item exposure. Then we’ll look at evaluating a test’s performance by examining item-level and test-level data to improve quality and assess reliability.

As we work our way through this series of blogs, be sure to ask questions and share your thoughts in the comments section!

Posts in this series:

  1. Planning the Test
  2. Determining Content
  3. Final Planning Considerations
  4. Writing Test Items
  5. Avoiding Bias and Stereotypes
  6. Preparing to Create the Assessment
  7. Assembling the Test Form
  8. Delivering the Test
  9. Content Protection and Secure Delivery
  10. Evaluating the Test