Ten Key Considerations for Defensibility and Legal Certainty for Tests and Exams

John KleemanPosted by John Kleeman

In my previous post, Defensibility and Legal Certainty for Tests and Exams, I described the concepts of Defensibility and Legal Certainty for tests and exams. Making a test or exam defensible means ensuring that it can withstand legal challenge. Legal certainty relates to whether laws and regulations are clear and precise and people can understand how to conduct themselves in accordance with them. Lack of legal certainty can provide grounds to challenge test and exam results.

Questionmark has just published a new best practice guide on Defensibility and Legal Certainty for Tests and Exams. This blog post describes ten key considerations when creating tests and exams that are defensible and encourage legal certainty.

1. Documentation

Without documentation, it will be very hard to defend your assessment in court, as you will have to rely on people’s recollections. It is important to keep records of the development of your tests and ensure that these records are updated so that they accurately reflect what you are doing within your testing programme. Such records will be powerful evidence in the event of any dispute.

2. Consistent procedures

Testing is more a process than a project. Tests are typically created and then updated over time. It’s important that procedures are consistent over time. For example, a question added into the test after its initial development should go through similar procedures as those for a question when the test was first developed. If you adopt an ad hoc approach to test design and delivery, you are exposing yourself to an increased risk of successful legal challenge.

3. Validity

Validity, reliability and fairness are the three generally accepted principles of good test design. Broadly speaking, validity is how well the assessment matches its purpose. If your tests and exams lack validity, they will be open to legal challenge.

4. Reliability

Reliability is a measure of precision and consistency in an assessment and is also critical.There are many posts explaining reliability and validity on this blog, one useful one is Understanding Assessment Validity and Reliability.

5.  Fairness (or equity)

Probably the biggest cause of legal disputes over assessments is whether they are fair or not. The International standard ISO 10667-1:2011 defines equity as the “principle that every assessment participant should be assessed using procedures that are fair and, as far as
possible, free from subjectivity that would make assessment results less accurate”. A significant part of fairness/equity is that a test should not advantage or disadvantage individuals because of characteristics irrelevant to the competence or skill being measured.

6. Job and task analysis

The type of skills and competence needed for a job change over time. Job and task analysis are techniques used to analyse a job and identify the key tasks performed and the skills and competences needed. If you use a test for a job without having some kind of analysis of job skills, it will be hard to prove and defend that the test is actually appropriate to measure someone’s competence and skills for that job.

7. Set the cut or pass score fairly

It is important that you have evidence to reasonably justify that the cut score used to divide pass from fail does genuinely distinguish the minimally competent from those who are not competent. You should not just choose a score of 60%, 70% or 80% arbitrarily, but instead you should work out the cut score based on the difficulty of questions and what you are measuring.

8. Test more than just knowledge recall

Most real-world jobs and skills need more than just knowing facts. Questions which test remember/recall skills are easy to write but they only measure knowledge. For most tests, it is important that a wider range of skills are included in the test. This can be done with conventional questions that test above knowledge or with other kinds of tests such as observational assessments.

9. Consider more than just multiple choice questions

Multiple choice tests can assess well; however in some regions, multiple choice questions sometimes get a “bad press”. As you design your test, you may want to consider including enhanced stimulus and a variety of question types (e.g. matching, fill-in-blanks, etc.) to reduce the possibility of error in measurement and enhance stakeholder satisfaction.

10. Robust and secure test delivery process

A critical part of the chain of evidence is to be able to show that the test delivery process is robust, that the scores are based on answers genuinely given by the test-taker and that there has been no tampering or mistakes. This requires that the software used to deliver the test is reliable and dependably records evidence including the answers entered by the test-taker and how the score is calculated. It also means that there is good security so that you have evidence that the right person took the test and that risks to the integrity of the test have been mitigated.

For more on these considerations, please check out our best practice guide on Defensibility and Legal Certainty for Tests and Exams, which also contains some legal cases to illustrate the points. You can download the guide HERE – it is free with registration.

Defensibility and Legal Certainty for Tests and Exams

John KleemanPosted by John Kleeman

Questionmark has just published a new best practice guide on Defensibility and Legal Certainty for Tests and Exams. Download the guide HERE.

We are all familiar with the concept of a chain of custody for evidence in a criminal case. If the prosecution seeks to provide evidence to a court of an object found at a crime scene, they will carefully document its provenance and what has happened to it over time, to show that the object offered as evidence at court is the object recovered from the crime scene.

There is a useful analogy between this concept and defensibility and legal certainty in tests and exams. Assessments have a “purpose” or a “goal”, for example, the need to check a person’s competence before allowing them to perform a job task. It is important that an assessment programme defines its purpose clearly, ensures that this purpose is then enshrined in the design of the test or exam, and checks that the assessment and delivery is consistent with the defined purpose. Essentially, there should be a chain from the purpose to design to delivery to decision, which makes the end decision defensible. If you follow that chain, your assessments may be defensible and legally certain; if that chain has breaks or gaps, then your assessments are likely to become less certain and more legally vulnerable.

Defensibility of assessments

Defensibility, in the context of assessments, concerns the ability of a testing organisation to withstand legal challenges. These legal challenges may come from individuals or groups who claim that the organisation itself, the processes followed (e.g., administration, scoring, setting pass scores, etc.), or the outcomes of the testing (e.g., a person is certified or not) are not legally valid. Essentially, defensibility has to do with the question: “Are the assessment results, and more generally the testing program, defensible in a court of law?”.

Ensuring that assessments are defensible means ensuring that assessments are valid, reliable and fair and that you have evidence and documentation available to demonstrate the above, in case of a challenge.

Legal certainty for assessments

Legal certainty (“Rechtssicherheit” in German) means that the law (or other rules) must be certain, in that the law is clear and precise, and its legal implications foreseeable. If there is legal certainty, people should understand how to conduct themselves in accordance with the law. This contrasts with legal indeterminacy, where the law is unclear and may require a court’s ruling to determine what it means

  • Lack of legal certainty can provide grounds to challenge assessment results. For instance many organisations have rules for how they administer assessments or make decisions based on the results of assessments. A test-taker might claim that the organisation has not followed its own rules or that the rules are ambiguous.
  • Some public bodies are constrained by law in which case they can only deliver assessments in a way that laws and regulations permit, and if they veer from this, they can be challenged under legal certainty.
  • Legal certainty issues can also arise if the exam process goes awry. For example, someone might claim that their answers have been swapped with those of another test-taker or that the exam was unfair because the user interface was confusing, e.g. they unintentionally pressed to submit their answers and finish the exam before actually intending to do so.

The best practice guide describes the principles and key steps to make assessments that are defensible and that provide legal certainty, and which are less likely to be successfully challenged in courts. The guide focuses primarily on assessments used in the workplace and in certification. It focuses particularly on legal cases and issues in Europe but will also be relevant in other regions.

You can download the guide HERE – it is free with registration.

FAQ – “Testing Out” of Training

Posted by Kristin Bernor

Let’s explore what it means to “test out”, what the business benefits include and how Questionmark enables you to do this in a simple, timely and valid manner.

“Testing out” of training saves time and money by allowing participants to forego unneeded training. It makes training more valid and respected, and so more likely to impact behavior, because it focuses training on the people who need it and further allows those that do know it, to learn additional knowledge, skills and abilities.

The key to “testing out” of training is that the test properly measures what it is you are training. If that is the case, then if someone can demonstrate by passing the test that they know it already, then they don’t need to do the training. Why testing out can sometimes be a hard sell is if the test doesn’t really measure the same outcomes as the training – so just because you pass the test, you might not in fact know the training. So, the key is to write a good test.

Online assessments are about both staying compliant with regulatory requirements AND giving business value. Assessments help ensure your workforce is competent and reduce risk, but they also give business value in improved efficiency, knowledge and customer service.

What does it mean to “test out” of training?

Many organizations create tests that allow participants to “test out” of training if they pass. Essentially, if you already know the material being taught, then you don’t need to spend time in the training. Testing them on training that is already know is a waste of time, value and resources. Directing them to training that is necessary ensures the candidate is motivated and feels they are spending their time wisely. Everyone wins!

Why is this so important? Or What are the advantages to incorporating “testing out”?

The key advantage of this approach is that you save time when people don’t have to attend the training that they don’t need. Time is money for most organizations, and saving time is an important benefit.

Suppose, for example, you have 1,000 people who need to take some training that lasts 2 hours. This is 2,000 hours of people’s time. Now, suppose you can give a 20-minute test that 25% of people pass and therefore skip the training. The total time taken is 333 hours for the test and 1,500 hours for the training, which adds up to 1,833 hours. So having one-fourth of the test takers skip the training saves 9% of the time that would have been required for everyone to attend the training.

In addition to saving time, using diagnostic tests in this way helps people who attend training courses focus their attention on areas they don’t know well and be more receptive to the training that is beneficial.

Is it appropriate to allow “testing out” of all training?

Obviously if you follow this approach, you’ll need to ensure that your tests are appropriate and sufficient – that they measure the right knowledge and skills that the training would otherwise cover.

You’ll need to check your regulations to confirm that this is permissible for you, but most regulators will see sense here.

How Questionmark can be used to “test out”

Online assessments are a consistent and cost-effective means of validating that your workforce knows the law, your procedures and your products. If you are required to document training, it’s the most reliable way of doing so. When creating and delivering assessments within Questionmark, it’s quite simple to qualify a candidate once they reach a score threshold. If they correctly answer a series of items and pass the assessment, this denotes that further training is not needed. It is imperative that the assessment accurately tests for the requisite knowledge that are part of the training objectives.

The candidate can then focus on training that is pertinent, worthwhile and beneficial to both themselves and the company. If they answer incorrectly and are unable to pass the assessment, then training is necessary until they are able to master the information and demonstrate this in a test.

How many errors can you spot in this survey question?

John KleemanPosted by John Kleeman

Tests and surveys are very different. In a test, you look to measure participant knowledge or skill; you know what answer you are looking for, and generally participants are motivated to answer well. In a survey, you look to measure participant attitude or recollection; you don’t know what answer you are looking for, and participants may be disinterested.

Writing good surveys is an important skill. If you’re interested in how to write good surveys of opinion and attitude in training, learning, compliance, certification, based on research evidence, you might be interested in a webinar I’m giving on May 15th on Designing Effective Surveys. Click HERE for more information or to register.

In the meantime, here’s a sample survey question. How many errors can you spot in the question?

The material and presentation qualty at Questionmark webinars is always excellent. Strongly Agree Agree Slightly agree Neither agree nor disagree Disagree Strongly disagree


There are quite a few errors. Try to count the errors before you look at my explanation below!!



I count seven errors:

  1. I am sure you got the mis-spelling of “quality”. If you mis-spell something in a survey question, it indicates to the participant that you haven’t taken time and trouble writing your survey, so there is little incentive for them to spend time and trouble answering.
  2. It’s not usually sensible to use the word “always” in a survey question. Some participants make take the statement literally, and it’s much more likely that webinars are usually excellent than that every single one is excellent.
  3. The question is double-barreled. It’s asking about material AND presentation quality. They might be different. This really should be two questions to get a consistent answer.
  4. The “Agree” in “Strongly Agree” is capitalized but not in other places, e.g. “Slightly agree”. Capitalization should be equal in every part of the scale.

You can see these four errors highlighted below.

Red marking corresponding to four errors above

Is that all the errors? I count three more, making a total of seven:

  1. The scale should be balanced. Why is there a “Slightly agree” and not a “Slightly disagree”?
  2. This is a leading or “loaded” question, not a neutral one, it encourages you to a positive answer. If you genuinely want to get people’s opinion in a survey question, you need to ask it without encouraging the participant to answer a particular way.
  3. Lastly, any agree/disagree question has acquiescence bias. Research evidence suggests that some participants are more likely to agree when answering survey questions. Particularly those who are more junior or less educated who may tend to think that what is asked of them might be true. It would be better to word this question to ask people to rate the webinars rather than agree with a statement about them.

Did you get all of these? I hope you enjoyed this little exercise. If you did, I’ll explain more about this and more about good survey practice in our Designing Effective Surveys webinar, click HERE to register:

This webinar is based on some sessions I’ve given at past Questionmark user conferences which got high ratings. I’ll do my best to give you interesting material and engaging presentation quality in the webinar!

Beyond Recall : Taking Competency Assessments to the Next Level

A pyramid showing create evaluate analyze apply understand remember / recall

John KleemanPosted by John Kleeman

I’d like to share details about a webinar we are running next Tuesday, April 30th  on how to improve your assessments. You can register for the webinar here.

A lot of assessments focus on testing knowledge or facts. Questions that ask for recall of facts do have some value. They check someone’s knowledge and they help reduce the forgetting curve for new knowledge learned.

But for most jobs, knowledge is only a small part of the job requirements. As well as remembering or recalling information, people need to understand, apply, analyze, evaluate and create as shown in Bloom’s revised taxonomy right. Most real world jobs require many levels of the taxonomy, and if your assessments focus only on recalling knowledge, they may well not test job competence validly.

Evaluating includes exercising judgement, and using judgement is a critical factor in competence required in a lot of job roles. But a lot of assessments don’t assess judgement, and this webinar will explain how you can do this.

There are many approaches to creating assessments that do more than test recall, including:

  • You can write objective questions which test understanding and application of knowledge, or analysis of situations. For example you can present questions within real-life scenarios which require understanding a real-life situation and working out how to apply knowledge and skills to answer it. It’s sometimes useful to use media such as videos to also make the question closer to the performance environment.
  • You can use observational assessments, which allow an observer to watch someone perform a task and grade their performance. This allows assessment of practical skills as well as higher level cognitive ones.
  • You can use simulations which assess performance within a controlled environment closer to the real performance environment
  • You can set up role-playing assessments, which are useful for customer service or other skills which need interpersonal skills
  • You can assess people’s actual job performance, using 360 degree assessments or performance appraisal.

In our webinar, we will give an overview of these methods but will focus on a method which has always been used in pre-employment but which is increasingly being used in post-hire training, certification and compliance testing. This method is Situational Judgement Assessments – which are questions carefully written to assess someone’s ability to exercise judgement within the domain of their job role.

It’s not just CEOs who need to exercise judgment and make decisions, almost every job requires an element of judgement. Many costly errors in organizations are caused by a failure of judgement. Even if people have appropriate skill, experience and knowledge, they need to use judgement to apply it successfully, otherwise failures occur or successful outcomes are missed.

Situational Judgment Assessments (SJAs) present a dilemma to the participant (using text or video)  and ask them to choose options in response. The dilemma needs to be one that is relevant to the job, i.e. one where using judgement is clearly linked to a needed domain of knowledge, skill or competency in the job role. And the scoring needs to be based on subject matter experts alignment that the judgement is the correct one to make.

Context is defined (text or video); Dilemma that needs judgment; The participant chooses from options; A score or evaluation is made



Situational Judgement Assessments can be a valid and reliable way of measuring judgement and can be presented in a standalone assessment or combined with other kinds of questions. If you’re interested in learning more, come to our webinar next Tuesday April 30th. You can register here and I look forward to seeing some of you there.

How is the SAP Global Certification program going? A re-interview with SAP’s manager of global certification, part 2.

Posted by Zainab Fayaz

This is the second part of the two-part interview between, John Kleeman, Founder and Executive Director at Questionmark and Ralf Kirchgaessner, Manager of the SAP Global Certification program. This is a continuation of the use of Questionmark software in their Certification in the Cloud program. You can read the first part here. In the second part of the interview, John asks questions about the business benefits of certification and what advice Ralf has for other organizations.

John: What are the business benefits to SAP of certification?

Ralf: There are many benefits to the SAP Global Certification. So, let’s begin from the individual learner’s perspective.

Firstly, earning the SAP Global Certification increases your personal value; not only does it drive personal development; which often leads to increased responsibilities and promotion within your organization, but it also showcases and proves that you stay current and update your skills to the latest releases. Additionally, since 2018, professionals can gain wider recognition through sharing their SAP Global Certification digital badges.

SAP Global Certification is of great value not only for individuals but also for consultancies in the SAP ecosystem. SAP Global Certifications provide a clear measure of a company’s organizational capabilities, which give a competitive advantage, especially if the company has certified professionals in new and innovative areas, like SAP C/4HANA Cloud.

John: What about the customers? What benefits are there for them?

Ralf: Indeed, the most important benefit is the value for our customers. If SAP can ensure that the consultancy eco-system is well enabled and certified, it helps reduce the total cost of ownership (TCO) and ensures successful implementation costs. And in the end, this is of course also important for SAP, as this helps to increase the adoption of our software and reduces implementation risks.

John: Tell me a bit more about the recently introduced digital badges for people who get certified that you just mentioned. How useful is that?

Ralf: The introduction of digital badges for SAP Global Certification has been an absolute success! Making your workforce visible on the market is important and by sharing the digital badge proves that the workforce is currently in their knowledge. If on LinkedIn, you search for ‘certified SAP consultants’, you would find thousands of shared badges. Digital badge claim rates beyond industry standards show that people waited with much anticipation to share their achievements digitally.

We are constantly looking for ways to improve our services and with the help of Questionmark, going forward we will be able to issue badges, even faster. In the near future, once candidates have passed their SAP Global Certification exam this will trigger the issuing of badges in “real-time”!

We have reached our ultimate goal and an overall mission of our certification programme if customers ask consultants for their digital badges to show their SAP Global Certification status.

John: There seems a slow move across the community from test centers to online proctoring. I know that for SAP, you deliver some exams in your offices but most in the cloud with online proctoring. How do you see this changing in the industry in general? Will all IT exams be done by online proctoring one day soon?

Ralf: SAP very much uses the model of taking exams wherever and whenever it is most convenient. Nevertheless, we use one harmonized infrastructure, for all our exams and these can be taken at our offices, in classrooms or in the cloud.

I think much of this evolves from the changing landscape in learning behaviors and offerings. In terms of the advantages of using test centers and online proctoring; there is a legitimate reason for test centres to exist; as there are groups of people who will still want to learn together – in one place at one time. However, as the shift moves towards a rise in remote learning, both synchronous (live virtual classrooms) and asynchronous, which are supported by social and peer learning via online learning rooms, then of course, online proctoring will become more popular.

John: What advice would you give to other high-tech companies who are thinking of setting up or improving their certification program?

Ralf: Two things instantly come to mind – online proctoring and digital badging. Certification programs that do not use online proctoring and digital badging should urgently consider improving their program as the benefits of implementing both features are tremendous.

More on certification
Interested in learning more about certification programs?  Find out how you can build your own certification program in 10-easy steps.