Ten Key Considerations for Defensibility and Legal Certainty for Tests and Exams

John KleemanPosted by John Kleeman

In my previous post, Defensibility and Legal Certainty for Tests and Exams, I described the concepts of Defensibility and Legal Certainty for tests and exams. Making a test or exam defensible means ensuring that it can withstand legal challenge. Legal certainty relates to whether laws and regulations are clear and precise and people can understand how to conduct themselves in accordance with them. Lack of legal certainty can provide grounds to challenge test and exam results.

Questionmark has just published a new best practice guide on Defensibility and Legal Certainty for Tests and Exams. This blog post describes ten key considerations when creating tests and exams that are defensible and encourage legal certainty.

1. Documentation

Without documentation, it will be very hard to defend your assessment in court, as you will have to rely on people’s recollections. It is important to keep records of the development of your tests and ensure that these records are updated so that they accurately reflect what you are doing within your testing programme. Such records will be powerful evidence in the event of any dispute.

2. Consistent procedures

Testing is more a process than a project. Tests are typically created and then updated over time. It’s important that procedures are consistent over time. For example, a question added into the test after its initial development should go through similar procedures as those for a question when the test was first developed. If you adopt an ad hoc approach to test design and delivery, you are exposing yourself to an increased risk of successful legal challenge.

3. Validity

Validity, reliability and fairness are the three generally accepted principles of good test design. Broadly speaking, validity is how well the assessment matches its purpose. If your tests and exams lack validity, they will be open to legal challenge.

4. Reliability

Reliability is a measure of precision and consistency in an assessment and is also critical.There are many posts explaining reliability and validity on this blog, one useful one is Understanding Assessment Validity and Reliability.

5.  Fairness (or equity)

Probably the biggest cause of legal disputes over assessments is whether they are fair or not. The International standard ISO 10667-1:2011 defines equity as the “principle that every assessment participant should be assessed using procedures that are fair and, as far as
possible, free from subjectivity that would make assessment results less accurate”. A significant part of fairness/equity is that a test should not advantage or disadvantage individuals because of characteristics irrelevant to the competence or skill being measured.

6. Job and task analysis

The type of skills and competence needed for a job change over time. Job and task analysis are techniques used to analyse a job and identify the key tasks performed and the skills and competences needed. If you use a test for a job without having some kind of analysis of job skills, it will be hard to prove and defend that the test is actually appropriate to measure someone’s competence and skills for that job.

7. Set the cut or pass score fairly

It is important that you have evidence to reasonably justify that the cut score used to divide pass from fail does genuinely distinguish the minimally competent from those who are not competent. You should not just choose a score of 60%, 70% or 80% arbitrarily, but instead you should work out the cut score based on the difficulty of questions and what you are measuring.

8. Test more than just knowledge recall

Most real-world jobs and skills need more than just knowing facts. Questions which test remember/recall skills are easy to write but they only measure knowledge. For most tests, it is important that a wider range of skills are included in the test. This can be done with conventional questions that test above knowledge or with other kinds of tests such as observational assessments.

9. Consider more than just multiple choice questions

Multiple choice tests can assess well; however in some regions, multiple choice questions sometimes get a “bad press”. As you design your test, you may want to consider including enhanced stimulus and a variety of question types (e.g. matching, fill-in-blanks, etc.) to reduce the possibility of error in measurement and enhance stakeholder satisfaction.

10. Robust and secure test delivery process

A critical part of the chain of evidence is to be able to show that the test delivery process is robust, that the scores are based on answers genuinely given by the test-taker and that there has been no tampering or mistakes. This requires that the software used to deliver the test is reliable and dependably records evidence including the answers entered by the test-taker and how the score is calculated. It also means that there is good security so that you have evidence that the right person took the test and that risks to the integrity of the test have been mitigated.

For more on these considerations, please check out our best practice guide on Defensibility and Legal Certainty for Tests and Exams, which also contains some legal cases to illustrate the points. You can download the guide HERE – it is free with registration.

Defensibility and Legal Certainty for Tests and Exams

John KleemanPosted by John Kleeman

Questionmark has just published a new best practice guide on Defensibility and Legal Certainty for Tests and Exams. Download the guide HERE.

We are all familiar with the concept of a chain of custody for evidence in a criminal case. If the prosecution seeks to provide evidence to a court of an object found at a crime scene, they will carefully document its provenance and what has happened to it over time, to show that the object offered as evidence at court is the object recovered from the crime scene.

There is a useful analogy between this concept and defensibility and legal certainty in tests and exams. Assessments have a “purpose” or a “goal”, for example, the need to check a person’s competence before allowing them to perform a job task. It is important that an assessment programme defines its purpose clearly, ensures that this purpose is then enshrined in the design of the test or exam, and checks that the assessment and delivery is consistent with the defined purpose. Essentially, there should be a chain from the purpose to design to delivery to decision, which makes the end decision defensible. If you follow that chain, your assessments may be defensible and legally certain; if that chain has breaks or gaps, then your assessments are likely to become less certain and more legally vulnerable.

Defensibility of assessments

Defensibility, in the context of assessments, concerns the ability of a testing organisation to withstand legal challenges. These legal challenges may come from individuals or groups who claim that the organisation itself, the processes followed (e.g., administration, scoring, setting pass scores, etc.), or the outcomes of the testing (e.g., a person is certified or not) are not legally valid. Essentially, defensibility has to do with the question: “Are the assessment results, and more generally the testing program, defensible in a court of law?”.

Ensuring that assessments are defensible means ensuring that assessments are valid, reliable and fair and that you have evidence and documentation available to demonstrate the above, in case of a challenge.

Legal certainty for assessments

Legal certainty (“Rechtssicherheit” in German) means that the law (or other rules) must be certain, in that the law is clear and precise, and its legal implications foreseeable. If there is legal certainty, people should understand how to conduct themselves in accordance with the law. This contrasts with legal indeterminacy, where the law is unclear and may require a court’s ruling to determine what it means

  • Lack of legal certainty can provide grounds to challenge assessment results. For instance many organisations have rules for how they administer assessments or make decisions based on the results of assessments. A test-taker might claim that the organisation has not followed its own rules or that the rules are ambiguous.
  • Some public bodies are constrained by law in which case they can only deliver assessments in a way that laws and regulations permit, and if they veer from this, they can be challenged under legal certainty.
  • Legal certainty issues can also arise if the exam process goes awry. For example, someone might claim that their answers have been swapped with those of another test-taker or that the exam was unfair because the user interface was confusing, e.g. they unintentionally pressed to submit their answers and finish the exam before actually intending to do so.

The best practice guide describes the principles and key steps to make assessments that are defensible and that provide legal certainty, and which are less likely to be successfully challenged in courts. The guide focuses primarily on assessments used in the workplace and in certification. It focuses particularly on legal cases and issues in Europe but will also be relevant in other regions.

You can download the guide HERE – it is free with registration.

FAQ – “Testing Out” of Training

Posted by Kristin Bernor

Let’s explore what it means to “test out”, what the business benefits include and how Questionmark enables you to do this in a simple, timely and valid manner.

“Testing out” of training saves time and money by allowing participants to forego unneeded training. It makes training more valid and respected, and so more likely to impact behavior, because it focuses training on the people who need it and further allows those that do know it, to learn additional knowledge, skills and abilities.

The key to “testing out” of training is that the test properly measures what it is you are training. If that is the case, then if someone can demonstrate by passing the test that they know it already, then they don’t need to do the training. Why testing out can sometimes be a hard sell is if the test doesn’t really measure the same outcomes as the training – so just because you pass the test, you might not in fact know the training. So, the key is to write a good test.

Online assessments are about both staying compliant with regulatory requirements AND giving business value. Assessments help ensure your workforce is competent and reduce risk, but they also give business value in improved efficiency, knowledge and customer service.

What does it mean to “test out” of training?

Many organizations create tests that allow participants to “test out” of training if they pass. Essentially, if you already know the material being taught, then you don’t need to spend time in the training. Testing them on training that is already know is a waste of time, value and resources. Directing them to training that is necessary ensures the candidate is motivated and feels they are spending their time wisely. Everyone wins!

Why is this so important? Or What are the advantages to incorporating “testing out”?

The key advantage of this approach is that you save time when people don’t have to attend the training that they don’t need. Time is money for most organizations, and saving time is an important benefit.

Suppose, for example, you have 1,000 people who need to take some training that lasts 2 hours. This is 2,000 hours of people’s time. Now, suppose you can give a 20-minute test that 25% of people pass and therefore skip the training. The total time taken is 333 hours for the test and 1,500 hours for the training, which adds up to 1,833 hours. So having one-fourth of the test takers skip the training saves 9% of the time that would have been required for everyone to attend the training.

In addition to saving time, using diagnostic tests in this way helps people who attend training courses focus their attention on areas they don’t know well and be more receptive to the training that is beneficial.

Is it appropriate to allow “testing out” of all training?

Obviously if you follow this approach, you’ll need to ensure that your tests are appropriate and sufficient – that they measure the right knowledge and skills that the training would otherwise cover.

You’ll need to check your regulations to confirm that this is permissible for you, but most regulators will see sense here.

How Questionmark can be used to “test out”

Online assessments are a consistent and cost-effective means of validating that your workforce knows the law, your procedures and your products. If you are required to document training, it’s the most reliable way of doing so. When creating and delivering assessments within Questionmark, it’s quite simple to qualify a candidate once they reach a score threshold. If they correctly answer a series of items and pass the assessment, this denotes that further training is not needed. It is imperative that the assessment accurately tests for the requisite knowledge that are part of the training objectives.

The candidate can then focus on training that is pertinent, worthwhile and beneficial to both themselves and the company. If they answer incorrectly and are unable to pass the assessment, then training is necessary until they are able to master the information and demonstrate this in a test.

How many errors can you spot in this survey question?

John KleemanPosted by John Kleeman

Tests and surveys are very different. In a test, you look to measure participant knowledge or skill; you know what answer you are looking for, and generally participants are motivated to answer well. In a survey, you look to measure participant attitude or recollection; you don’t know what answer you are looking for, and participants may be disinterested.

Writing good surveys is an important skill. If you’re interested in how to write good surveys of opinion and attitude in training, learning, compliance, certification, based on research evidence, you might be interested in a webinar I gave titled, “Designing Effective Surveys.” Click HERE for the webinar recording and slides.

In the meantime, here’s a sample survey question. How many errors can you spot in the question?

The material and presentation qualty at Questionmark webinars is always excellent. Strongly Agree Agree Slightly agree Neither agree nor disagree Disagree Strongly disagree

There are quite a few errors. Try to count the errors before you look at my explanation below!!

I count seven errors:

  1. I am sure you got the mis-spelling of “quality”. If you mis-spell something in a survey question, it indicates to the participant that you haven’t taken time and trouble writing your survey, so there is little incentive for them to spend time and trouble answering.
  2. It’s not usually sensible to use the word “always” in a survey question. Some participants make take the statement literally, and it’s much more likely that webinars are usually excellent than that every single one is excellent.
  3. The question is double-barreled. It’s asking about material AND presentation quality. They might be different. This really should be two questions to get a consistent answer.
  4. The “Agree” in “Strongly Agree” is capitalized but not in other places, e.g. “Slightly agree”. Capitalization should be equal in every part of the scale.

You can see these four errors highlighted below.

Red marking corresponding to four errors above

Is that all the errors? I count three more, making a total of seven:

  1. The scale should be balanced. Why is there a “Slightly agree” and not a “Slightly disagree”?
  2. This is a leading or “loaded” question, not a neutral one, it encourages you to a positive answer. If you genuinely want to get people’s opinion in a survey question, you need to ask it without encouraging the participant to answer a particular way.
  3. Lastly, any agree/disagree question has acquiescence bias. Research evidence suggests that some participants are more likely to agree when answering survey questions. Particularly those who are more junior or less educated who may tend to think that what is asked of them might be true. It would be better to word this question to ask people to rate the webinars rather than agree with a statement about them.

Did you get all of these? I hope you enjoyed this little exercise. If you did, I explain more about this and good survey practice in our Designing Effective Surveys webinar, click HERE for the webinar recording and slides.

Beyond Recall : Taking Competency Assessments to the Next Level

A pyramid showing create evaluate analyze apply understand remember / recall

John KleemanPosted by John Kleeman

A lot of assessments focus on testing knowledge or facts. Questions that ask for recall of facts do have some value. They check someone’s knowledge and they help reduce the forgetting curve for new knowledge learned.

But for most jobs, knowledge is only a small part of the job requirements. As well as remembering or recalling information, people need to understand, apply, analyze, evaluate and create as shown in Bloom’s revised taxonomy right. Most real world jobs require many levels of the taxonomy, and if your assessments focus only on recalling knowledge, they may well not test job competence validly.

Evaluating includes exercising judgement, and using judgement is a critical factor in competence required in a lot of job roles. But a lot of assessments don’t assess judgement, and this webinar will explain how you can do this.

There are many approaches to creating assessments that do more than test recall, including:

  • You can write objective questions which test understanding and application of knowledge, or analysis of situations. For example you can present questions within real-life scenarios which require understanding a real-life situation and working out how to apply knowledge and skills to answer it. It’s sometimes useful to use media such as videos to also make the question closer to the performance environment.
  • You can use observational assessments, which allow an observer to watch someone perform a task and grade their performance. This allows assessment of practical skills as well as higher level cognitive ones.
  • You can use simulations which assess performance within a controlled environment closer to the real performance environment
  • You can set up role-playing assessments, which are useful for customer service or other skills which need interpersonal skills
  • You can assess people’s actual job performance, using 360 degree assessments or performance appraisal.

In our webinar, we will give an overview of these methods but will focus on a method which has always been used in pre-employment but which is increasingly being used in post-hire training, certification and compliance testing. This method is Situational Judgement Assessments – which are questions carefully written to assess someone’s ability to exercise judgement within the domain of their job role.

It’s not just CEOs who need to exercise judgment and make decisions, almost every job requires an element of judgement. Many costly errors in organizations are caused by a failure of judgement. Even if people have appropriate skill, experience and knowledge, they need to use judgement to apply it successfully, otherwise failures occur or successful outcomes are missed.

Situational Judgment Assessments (SJAs) present a dilemma to the participant (using text or video)  and ask them to choose options in response. The dilemma needs to be one that is relevant to the job, i.e. one where using judgement is clearly linked to a needed domain of knowledge, skill or competency in the job role. And the scoring needs to be based on subject matter experts alignment that the judgement is the correct one to make.

Context is defined (text or video); Dilemma that needs judgment; The participant chooses from options; A score or evaluation is made

Situational Judgement Assessments can be a valid and reliable way of measuring judgement and can be presented in a standalone assessment or combined with other kinds of questions. If you’re interested in learning more, check out our webinar titled “Beyond Recall: Taking Competency Assessments to the Next Level.” You can download the webinar recording and slides HERE.

How is the SAP Global Certification program going? A re-interview with SAP’s manager of global certification, part 1.

Posted by Zainab Fayaz

Back in 2016, John Kleeman, Founder and Executive Director of Questionmark interviewed Ralf Kirchgaessner, Manager of SAP Global Certification program about their use of Questionmark software in their Certification in the Cloud program and about their move to online proctoring. You can see the interview on the Questionmark blog here. We also thought readers might be interested in an update, so here is a short interview between the two on how SAP are getting on three years later:

John: Could you give us an update on where you are with the Certification in the Cloud program?

Ralf: The uptake, adoption and increase of Certification in the Cloud is tremendous! Over the years we have seen a significant increase in the volume of candidates taking exams in the cloud; the numbers doubled from 2016 to 2017 and increased almost by 60% in 2018. This means more than 50% of SAP Global Certification exams are now done remotely!

John: Are all your SAP Global Certification exams now available online in the cloud?

Ralf: Nearly so. By mid-2019 we plan on having the complete portfolio of every SAP exam available on the cloud. This is great news for our learners who have invested in a Certification in the Cloud subscription. So, we then have Certification in the Cloud not only for SAP SuccessFactors and SAP Ariba, but for all products, including SAP C/4HANA.

John: How many different languages are your exams translated into?

Ralf: This depends on the portfolio. Some of our certifications are available in English and others, such as for SAP Business One are translated in up to 20 languages.

John: How are you dealing with the fast pace of change within SAP software in a certification context? How do you ensure certifications stay up to date when the software changes?

Ralf: This is of course a challenge. In previous years, it was the case of getting certified once every few years. However, now you must keep your skills up-to-date and stay current with quarterly release cycles of our SAP Cloud solutions. Also, for people who are first timers or newly enter the SAP eco-system; it is important that they are certified on the latest quarterly release.

To help overcome this challenge, we have developed an agile approach to updating our exams; we use the Questionmark platform for those who are new to the eco-system to help them getting certified initially. We also have a very good process in place and often use the same subject matter experts when it comes to keeping up to the speed of software changes.

For already certified professionals, another way to remain up to date is through our ‘Stay Current’ program. For some of our solutions, partners have to come back every 3 months to show that they are staying current. They do this in the form of taking a short “delta” knowledge assessment. For instance, for certified professionals of SAP SuccessFactors it is mandatory to stay current in order to get provisioning access to the software systems.

In 2018, SAP’s certification approach was acknowledged with the ITCC Innovation Award. Industry peers like from Microsoft, IBM and others recognized this great achievement with this award.