7 actionable steps for making your assessments more trustable

John Kleeman HeadshotPosted by John Kleeman

Questionmark has recently published a white paper on trustable assessment,  and we blog about this topic frequently. See Reliability and validity are the keys to trust and The key to reliability and validity is authoring for some recent blog posts about the white paper.

But what can you do today if you want to make your assessments more trustable? Obviously you can read the white paper! But here are seven actionable steps that if you’re not doing already you could do today or at least reasonably quickly to improve your assessments.

1. Organize questions in an item bank with topic structure

If you are already using Questionmark software, you are likely doing this already.  But putting questions in an item bank structured by hierarchical topics facilitates an easy management view of all questions and assessments under development. It allows you to use the same question in multiple assessments, easily add questions and retire them and easily search questions, for example to find the ones that need update when laws change or a product is retired.

2. Use questions that apply knowledge in the job context

It is better to ask questions that check how people can apply knowledge in the job context than just to find out whether they have specific knowledge. See my earlier post Test above knowledge: Use scenario questions for some tips on this. If you currently just test on knowledge and not on how to apply that knowledge, make today the day that you start to change!

3. Have your subject matter experts directly involved in authoring

Especially in an area where there is rapid change, you need subject matter experts directly involved in authoring and reviewing questions. Whether you use Questionmark Live or another system, start involving them.

4. Set a pass score fairly

Setting a pass score fairly is critical to being able to trust an assessment’s results. See Is a compliance test better with a higher pass score? and Standard Setting: A Keystone to Legal Defensibility for some starting points on setting good pass scores. And if you don’t think you’re following good practice, start to change.

5. Use topic scoring and feedback

As Austin Fossey explained in his ground-breaking post Is There Value in Reporting Subscores?, you do need to check whether it is sensible to report topic scores. But in most cases, topic scores and topic feedback can be very useful and actionable – they direct people to where there are problems or where improvement is needed.

6. Define a participant code of conduct

If people cheat, it makes assessment results much less trustable. As I explained in my post What is the best way to reduce cheating? , setting up a participant code of conduct (or honesty code) is an easy and effective way of reducing cheating. What can you do today to encourage your test takers to believe your program is fair and be on your side in reducing cheating?

7. Run item analysis and weed out poor items

This is something that all Questionmark users could do today. Run an item analysis report – it takes just a minute or two from our interfaces and look at the questions that are flagged as needing review (usually amber or red). Review them to check appropriateness and potentially retire them from your pool or else improve them.

Questionmark item analysis report

 

Many of you will probably be doing all the above and more, but I hope that for some of you this post could be a spur to action to make your assessments more trustable. Why not start today?

The key to reliability and validity is authoring

John Kleeman HeadshotPosted by John Kleeman

In my earlier post I explained how reliability and validity are the keys to trustable assessments results. A reliable assessment means that it is consistent and a valid assessment means that it measures what you need it to measure.

The key to validity and reliability starts with the authoring process. If you do not have a repeatable, defensible process for authoring questions and assessments, then however good the other parts of your process are, you will not have valid and reliable assessments.

The critical value that Questionmark brings is its structured authoring processes, which enable effective planning, authoring, Questionmark Liveand reviewing of questions and assessments and makes them more likely to be valid.

Questionmark’s white paper “Assessment Results You Can Trust” suggests 18 key authoring measures for making trustable assessments – here are three of the most important.

Organize items in an item bank with topic structure

There are huge benefits to using an assessment management system with an item bank that structures items by hierarchical topics as this facilitates:

  • An easy management view of all items and assessments under development
  • Mapping of topics to relevant organizational areas of importance
  • Clear references from items to topics
  • Use of the same item in multiple assessments
  • Simple addition of new items within a topic
  • Easy retiring of items when they are no longer needed
  • Version history maintained for legal defensibility
  • Search capabilities to identify questions that need updating when laws change or a product is retired

Some stand alone e-Learning creation tools and some LMSs do not provide you with an item bank and require you to insert questions individually within an assessment. If you only have a handful of assessments or you rarely need to update assessments, such systems can work, but for anyone with more than a few assessments, you need an item bank to be able to make effective assessments.

Authoring tool subject matter experts can use directly

One of the critical factors in making successful items is to get effective input from subject matter experts (SMEs), as they are usually more knowledgeable and better able to construct and review questions than learning technology specialists or general trainers.

If you can use a system like Questionmark Live to harvest or “crowdsource” items from SMEs and have learning or assessment specialists review them, your items will be of better quality.

Easy collaboration for item reviewers to help make items more valid

Items will be more valid if they have been properly reviewed. They will also be more defensible if the past changes are auditable. A track-changes capability, like that shown in the example screenshot below, is invaluable to aid the review process. It allows authors to see what changes are being proposed and to check they make sense.

Screenshot of track changes functionality in Questionmark Live

These three capabilities – having an item bank, having an authoring tools SMEs can access directly and allowing easy collaboration with “track changes” are critical for obtaining reliable and valid, and therefore trustable assessments.

For more information on how to make trustable assessments, see our white paper “Assessment Results You can Trust” 

Reliability and validity are the keys to trust

image.png

Not reliable

John Kleeman HeadshotPosted by John Kleeman

How can you trust assessment results? The two keys are reliability and validity.

Reliability explained

An assessment is reliable if it measures the same thing consistently and reproducibly. If you were to deliver an assessment with high reliability to the same participant on two occasions, you would be very likely to reach the same conclusions about the participant’s knowledge or skills. A test with poor reliability might result in very different scores across the two instances.An unreliable assessment does not measure anything consistently and cannot be used for any trustable measure of competency. It is useful visually to think of a dartboard; in the diagram to the right, darts have landed all over the board—they are not reliably in any one place.In order for an assessment to be reliable, there needs to be a predictable authoring process, effective beta testing of items, trustworthy delivery to all the devices used to give the assessment, good-quality post-assessment reporting and effective analytics.

Validity explained

image.png

Reliable but not valid

Being reliable is not good enough on its own. The darts in the dartboard in the figure to the right are in the same place, but not in the right place. A test can be reliable but not measure what it is meant to measure. For example, you could have a reliable assessment that tested for skill in word processing, but this would not be valid if used to test machine operators, as writing is not one of the key tasks in their jobs.

An assessment is valid if it measures what it is supposed to measure. So if you are measuring competence in a job role, a valid assessment must align with the knowledge, skills and abilities required to perform the tasks expected of a job role. In order to show that an assessment is valid, there must be some formal analysis of the tasks in a job role and the assessment must be structured to match those tasks. A common method of performing such analysis is a job task analysis, which surveys subject matter experts or people in the job role to identify the importance of different tasks.

Assessments must be reliable AND valid

Trustable assessments must be reliable AND valid.

image.png

Reliable and valid

The darts in the figure to the right are in the same place and at the right place.

When you are constructing an assessment for competence, you are looking for it to consistently measure the competence required for the job.

 Comparison with blood tests

It is helpful to consider what happens if you go to the doctor with an illness. The doctor goes through a process of discovery, analysis, diagnosis and prescription. As part of the discovery process, sometimes the doctor will order a blood test to identify if a particular condition is present, which can diagnose the illness or rule out a diagnosis.

It takes time and resources to do a blood test, but it can be an invaluable piece of information. A great deal of effort goes into making sure that blood tests are both reliable (consistent) and valid (measure what they are supposed to measure). For example, just like exam results, blood samples are labelled carefully, as shown in the picture, to ensure that patient identification is retained.

image_thumb.pngA blood test that was not reliable would be dangerous—a doctor might think that a disease is not present when it is. Furthermore, a reliable blood test used for the wrong purpose is not useful—for example, there is no point in having a test for blood glucose level if the doctor is trying to see if a heart attack is imminent.

The blood test results are a single piece of information that helps the doctor make the diagnosis in conjunction with other data from the doctor’s discovery process.

In exactly the same way, a test of competence is an important piece of information to determine if someone is competent in their job role.

Using the blood test metaphor, it is easy to understand the personnel and organizational risks that can result from making decisions based on untrustworthy results. If an organization assesses someone’s knowledge, skill or competence for health and safety or regulatory compliance purposes, you need to ensure the assessment is designed correctly and runs consistently, which means that they must be reliable and valid.

For assessments to be reliable and valid, it is necessary that you follow structured processes at each step from planning through authoring to delivery and reporting. These processes are explained in our new white paper “Assessment Results You can Trust” and I’ll be sharing some of the content in future articles in this blog.

For fuller information, you can download the white paper, click here

Assessment Security: 5 Tips from Napa

John Kleeman Headshot

Posted by John Kleeman

Assessment security has been an important topic at Questionmark, and that was echoed at the Questionmark Users Conference in Napa last week. Here is some advice I heard from attendees:

  •  Tip 1: It’s common to include an agreement page at the start of the assessment, where the participant agrees to follow the assessment rules, to keep exam content confidential and not to cheat. This discourages cheating by reducing people’s ability to rationalize that it’s okay to do so and also removes the potential for someone to claim they didn’t know the rules.
  • Tip 2: It’s a good idea to have a formal agreement with SMEs in your organization who author or review questions to remind them not to pass the questions to others. If they are employees, you should involve your HR and legal departments in drafting the agreements or notices. That way if someone leaks content, you have HR and legal on board to deal with the disciplinary consequences.Data gathering, screening, unproctored assessments, proctored assessments
  • Tip 3: It’s prudent to use the capabilities of Questionmark software to restrict access to the item bank by topic. Only give  authors access to the parts they are working on, to avoid inadvertent or deliberate content leakage.
  • Tip 4: There is increasing interest and practical application of hybrid testing strategies for proctored and unprotected tests to allow you to focus anti-cheating resources on risk. For example, you might screen participants with quizzes, then give un-proctored tests and give those who pass a proctored test.  Or you might deliver a series of short exams, at periodic intervals to make it harder for people to get proxy test takers to impersonate them. There is also a lot of interest in online proctoring, where people can take exams at home or in the office, and be proctored by a remote proctor using video monitoring.  This reduces travel time and is often more secure than face-to-face proctoring.
  • Tip 5: If your assessment system is on premise (behind the firewall), check regularly with your IT department that they are providing the security you need and that they are backing up your data. Most internal IT departments are hugely competent, but there is a risk as people change jobs over time that your IT department might lose touch with what the assessment application is used for. One user shared how their IT system failed to make backups of the Questionmark database, so when the server failed, they lost their data and had to restart from scratch. I’m sure this particular issue won’t happen for others, but IT teams have a wide set of priorities, so it’s good to check in with them.

There was lots more at the conference – iPads becoming mainstream for authoring and administration as well as delivery, people using OData to get access to Questionmark data, Questionmark being used to test the knowledge of soccer referees and some good thinking on balancing questions at higher cognitive levels.

One thing that particularly interested me was anecdotal evidence that having an internal employee certification program reduces employee attrition. Employees are less likely to leave your organization if you have an assessment and certification program. Certification makes employees feel more valued and more satisfied and so less likely to leave for a new job elsewhere. A couple of attendees shared that their internal statistics showed this.

This mirrors external research I’ve seen – for example the Aberdeen Group have published research which suggests that best-in-class organizations use assessments around twice as often as laggard organizations, and that first-year retention for best-of-class organizations is around 89% vs 76% for laggards.

For more information on security,  download the white paper: Delivering Assessments Safely and Securely.

 

New white paper: Assessment Results You Can Trust

John Kleeman HeadshotPosted by John Kleeman

Questionmark published an important white paper about why trustable assessment results matter and about how an assessment management system like Questionmark’s can help you make your assessments valid and reliable — and therefore trustable.

The white paper, which I wrote together with Questionmark CEO Eric Shepherd, explains that trustable assessment results must be both valid (measuring what you are looking for them to measure) and reliable (consistently measuring what you want to be measured).

The paper draws upon the metaphor of a doctor using results from a blood test to diagnose an illness and then prescribe a remedy. Delays will occur if the doctor orders the wrong test, and serious consequences could result if the test’s results are untrustworthy. Using this metaphor, it is easy to understand the personnel and organizational risks that can stem from making decisions based on untrustworthy results. If you assesses someone’s knowledge, skill or competence for health and safety or regThe 6 stages of trustable results; Planning assessment, Authoring items, Assembling assessment, Pilot and review, Delivery, Analyze resultsulatory compliance purposes, you need to ensure that your assessment instrument is designed correctly and runs consistently.

Engaging subject matter experts to generate questions to measure the knowledge, skills and abilities required to perform essential tasks of the job is essential in creating the initial pool of questions. However, subject matter experts are not necessarily experts in writing good questions, so an effective authoring system requires a quality control process which allows assessment experts (e.g. instructional designers or psychometricians) to easily review and amend assessment items.

For assessments to be valid and reliable, it’s necessary to follow structured processes at each step from planning through authoring to delivery and reporting.

The white paper covers these six stages of the assessment process:

  • Planning assessment
  • Authoring items
  • Assembling assessment
  • Pilot and review
  • Delivery
  • Analyze results

Following the advice in the white paper and using the capabilities it describes will help you produce assessments that are more valid and reliable — and hence more trustable.
Modern organizations need their people to be competent.

Would you be comfortable in a high-rise building designed by an unqualified architect? Would you fly in a plane whose pilot hadn’t passed a flying test? Would you let someone operate a machine in your factory if they didn’t know what to do if something went wrong? Would you send a sales person out on a call  if they didn’t know what your products do? Can you demonstrate to a regulatory authority that your staff are competent and fit for their jobs if you do not have trustable assessments?

In all these cases and many more, it’s essential to have a reliable and valid test of competence. If you do not ensure that your workforce is qualified and competent, then you should not be surprised if your employees have accidents, cause your organization to be fined for regulatory infractions, give poor customer service or can’t repair systems effectively.

To download the white paper, click here.

John will be talking more about trustable assessments at our 2015 Users Conference in Napa next month. Register today for the full conference, but if you cannot make it, make sure to catch the live webcast.

Item Development – Organizing a bias review committee (Part 2)

Austin Fossey-42Posted by Austin Fossey

The Standards for Educational and Psychological Testing describe two facets of an assessment that can result in bias: the content of the assessment and the response process. These are the areas on which your bias review committee should focus. You can read Part 1 of this post, here.

Content bias is often what people think of when they think about examples of assessment bias. This may pertain to item content (e.g., students in hot climates may have trouble responding to an algebra scenario about shoveling snow), but it may also include language issues, such as the tone of the content, differences in terminology, or the reading level of the content. Your review committee should also consider content that might be offensive or trigger an emotional response from participants. For example, if an item’s scenario described interactions in a workplace, your committee might check to make sure that men and women are equally represented in management roles.

Bias may also occur in the response processes. Subgroups may have differences in responses that are not relevant to the construct, or a subgroup may be unduly disadvantaged by the response format. For example, an item that asks participants to explain how they solved an algebra problem may be biased against participants for whom English is a second language, even though they might be employing the same cognitive processes as other participants to solve the algebra. Response process bias can also occur if some participants provide unexpected responses to an item that are correct but may not be accounted for in the scoring.

How do we begin to identify content or response processes that may introduce bias? Your sensitivity guidelines will depend upon your participant population, applicable social norms, and the priorities of your assessment program. When drafting your sensitivity guidelines, you should spend a good amount of time researching potential sources of bias that could manifest in your assessment, and you may need to periodically update your own guidelines based on feedback from your reviewers or participants.

In his chapter in Educational Measurement (4th ed.), Gregory Camilli recommends the chapter on fairness in the ETS Standards for Quality and Fairness and An Approach for Identifying and Minimizing Bias in Standardized Tests (Office for Minority Education) as sources of criteria that could be used to inform your own sensitivity guidelines. If you would like to see an example of one program’s sensitivity guidelines that are used to inform bias review committees for K12 assessment in the United States, check out the Fairness Guidelines Adopted by PARCC (PARCC), though be warned that the document contains examples of inflammatory content.

In the next post, I will discuss considerations for the final round of item edits that will occur before the items are field tested.

Check out our white paper: 5 Steps to Better Tests for best practice guidance and practical advice for the five key stages of test and exam development.

Austin Fossey will discuss test development at the 2015 Users Conference in Napa Valley, March 10-13. Register before Dec. 17 and save $200.