Ten Key Considerations for Defensibility and Legal Certainty for Tests and Exams
Posted by John Kleeman
In my previous post, Defensibility and Legal Certainty for Tests and Exams, I described the concepts of Defensibility and Legal Certainty for tests and exams. Making a test or exam defensible means ensuring that it can withstand legal challenge. Legal certainty relates to whether laws and regulations are clear and precise and people can understand how to conduct themselves in accordance with them. Lack of legal certainty can provide grounds to challenge test and exam results.
Questionmark has just published a new best practice guide on Defensibility and Legal Certainty for Tests and Exams. This blog post describes ten key considerations when creating tests and exams that are defensible and encourage legal certainty.
Without documentation, it will be very hard to defend your assessment in court, as you will have to rely on people’s recollections. It is important to keep records of the development of your tests and ensure that these records are updated so that they accurately reflect what you are doing within your testing programme. Such records will be powerful evidence in the event of any dispute.
2. Consistent procedures
Testing is more a process than a project. Tests are typically created and then updated over time. It’s important that procedures are consistent over time. For example, a question added into the test after its initial development should go through similar procedures as those for a question when the test was first developed. If you adopt an ad hoc approach to test design and delivery, you are exposing yourself to an increased risk of successful legal challenge.
Validity, reliability and fairness are the three generally accepted principles of good test design. Broadly speaking, validity is how well the assessment matches its purpose. If your tests and exams lack validity, they will be open to legal challenge.
Reliability is a measure of precision and consistency in an assessment and is also critical.There are many posts explaining reliability and validity on this blog, one useful one is Understanding Assessment Validity and Reliability.
5. Fairness (or equity)
Probably the biggest cause of legal disputes over assessments is whether they are fair or not. The International standard ISO 10667-1:2011 defines equity as the “principle that every assessment participant should be assessed using procedures that are fair and, as far as
possible, free from subjectivity that would make assessment results less accurate”. A significant part of fairness/equity is that a test should not advantage or disadvantage individuals because of characteristics irrelevant to the competence or skill being measured.
6. Job and task analysis
The type of skills and competence needed for a job change over time. Job and task analysis are techniques used to analyse a job and identify the key tasks performed and the skills and competences needed. If you use a test for a job without having some kind of analysis of job skills, it will be hard to prove and defend that the test is actually appropriate to measure someone’s competence and skills for that job.
7. Set the cut or pass score fairly
It is important that you have evidence to reasonably justify that the cut score used to divide pass from fail does genuinely distinguish the minimally competent from those who are not competent. You should not just choose a score of 60%, 70% or 80% arbitrarily, but instead you should work out the cut score based on the difficulty of questions and what you are measuring.
8. Test more than just knowledge recall
Most real-world jobs and skills need more than just knowing facts. Questions which test remember/recall skills are easy to write but they only measure knowledge. For most tests, it is important that a wider range of skills are included in the test. This can be done with conventional questions that test above knowledge or with other kinds of tests such as observational assessments.
9. Consider more than just multiple choice questions
Multiple choice tests can assess well; however in some regions, multiple choice questions sometimes get a “bad press”. As you design your test, you may want to consider including enhanced stimulus and a variety of question types (e.g. matching, fill-in-blanks, etc.) to reduce the possibility of error in measurement and enhance stakeholder satisfaction.
10. Robust and secure test delivery process
A critical part of the chain of evidence is to be able to show that the test delivery process is robust, that the scores are based on answers genuinely given by the test-taker and that there has been no tampering or mistakes. This requires that the software used to deliver the test is reliable and dependably records evidence including the answers entered by the test-taker and how the score is calculated. It also means that there is good security so that you have evidence that the right person took the test and that risks to the integrity of the test have been mitigated.
For more on these considerations, please check out our best practice guide on Defensibility and Legal Certainty for Tests and Exams, which also contains some legal cases to illustrate the points. You can download the guide HERE – it is free with registration.