Ten Key Considerations for Defensibility and Legal Certainty for Tests and Exams

John KleemanPosted by John Kleeman

In my previous post, Defensibility and Legal Certainty for Tests and Exams, I described the concepts of Defensibility and Legal Certainty for tests and exams. Making a test or exam defensible means ensuring that it can withstand legal challenge. Legal certainty relates to whether laws and regulations are clear and precise and people can understand how to conduct themselves in accordance with them. Lack of legal certainty can provide grounds to challenge test and exam results.

Questionmark has just published a new best practice guide on Defensibility and Legal Certainty for Tests and Exams. This blog post describes ten key considerations when creating tests and exams that are defensible and encourage legal certainty.

1. Documentation

Without documentation, it will be very hard to defend your assessment in court, as you will have to rely on people’s recollections. It is important to keep records of the development of your tests and ensure that these records are updated so that they accurately reflect what you are doing within your testing programme. Such records will be powerful evidence in the event of any dispute.

2. Consistent procedures

Testing is more a process than a project. Tests are typically created and then updated over time. It’s important that procedures are consistent over time. For example, a question added into the test after its initial development should go through similar procedures as those for a question when the test was first developed. If you adopt an ad hoc approach to test design and delivery, you are exposing yourself to an increased risk of successful legal challenge.

3. Validity

Validity, reliability and fairness are the three generally accepted principles of good test design. Broadly speaking, validity is how well the assessment matches its purpose. If your tests and exams lack validity, they will be open to legal challenge.

4. Reliability

Reliability is a measure of precision and consistency in an assessment and is also critical.There are many posts explaining reliability and validity on this blog, one useful one is Understanding Assessment Validity and Reliability.

5.  Fairness (or equity)

Probably the biggest cause of legal disputes over assessments is whether they are fair or not. The International standard ISO 10667-1:2011 defines equity as the “principle that every assessment participant should be assessed using procedures that are fair and, as far as
possible, free from subjectivity that would make assessment results less accurate”. A significant part of fairness/equity is that a test should not advantage or disadvantage individuals because of characteristics irrelevant to the competence or skill being measured.

6. Job and task analysis

The type of skills and competence needed for a job change over time. Job and task analysis are techniques used to analyse a job and identify the key tasks performed and the skills and competences needed. If you use a test for a job without having some kind of analysis of job skills, it will be hard to prove and defend that the test is actually appropriate to measure someone’s competence and skills for that job.

7. Set the cut or pass score fairly

It is important that you have evidence to reasonably justify that the cut score used to divide pass from fail does genuinely distinguish the minimally competent from those who are not competent. You should not just choose a score of 60%, 70% or 80% arbitrarily, but instead you should work out the cut score based on the difficulty of questions and what you are measuring.

8. Test more than just knowledge recall

Most real-world jobs and skills need more than just knowing facts. Questions which test remember/recall skills are easy to write but they only measure knowledge. For most tests, it is important that a wider range of skills are included in the test. This can be done with conventional questions that test above knowledge or with other kinds of tests such as observational assessments.

9. Consider more than just multiple choice questions

Multiple choice tests can assess well; however in some regions, multiple choice questions sometimes get a “bad press”. As you design your test, you may want to consider including enhanced stimulus and a variety of question types (e.g. matching, fill-in-blanks, etc.) to reduce the possibility of error in measurement and enhance stakeholder satisfaction.

10. Robust and secure test delivery process

A critical part of the chain of evidence is to be able to show that the test delivery process is robust, that the scores are based on answers genuinely given by the test-taker and that there has been no tampering or mistakes. This requires that the software used to deliver the test is reliable and dependably records evidence including the answers entered by the test-taker and how the score is calculated. It also means that there is good security so that you have evidence that the right person took the test and that risks to the integrity of the test have been mitigated.

For more on these considerations, please check out our best practice guide on Defensibility and Legal Certainty for Tests and Exams, which also contains some legal cases to illustrate the points. You can download the guide HERE – it is free with registration.

Effectively Communicating the Measurement of Constructs to Stakeholders

greg_pope-150x1502

Posted by Greg Pope

I co-wrote this article Kerry Eades, Assessment Specialist, Oklahoma Department of Career and Technology Education, a Questionmark user who shares my interest in test security and many other topics related to online assessment.

Kerry Eades

There are many mentions on websites, blogs, YouTube, etc. about people (employees, students, educators, school administrators, etc.) cheating on tests. Cheating has always been an issue, but the last decade of increased certifications and high-stakes testing seems to have brought about a significant increase in cheating. As a result, some pundits now believe we should redefine cheating and that texting for help, accessing the Web, or using any Web 2.0 resources should be allowed during testing. The basic idea is that a student should no longer be required to learn “facts” that can be easily located on the internet and that instruction should shift to only teaching and testing conceptual content.

There are many reasons for testing (educational, professional certification and licensure, legislative, psychological, etc.) and the pressures that stakeholders feel to succeed at all costs by “teaching to the test” or to condone any form of cheating is obviously immense. Those of us in the testing industry should, to the best of our ability, educate stakeholders on the purpose of tests and on the development and measurement of constructs. Having better informed stakeholders would lessen the “need” and “excuses” for cheating and improve the testing environment for all concerned. A key element of this is promoting an understanding of how to match the testing environment to the nature of an assessment: it is appropriate to allow “open book” assessments in some cases but certainly not all. We must keep in mind that education, in general, builds upon itself over time, and for that reason, constructs must be assessed in a valid, reliable and appropriate manner.

Tests are usually developed to make a point-in-time decision about the knowledge, ability, or skills of an individual based upon a set of predetermined standards/objectives/measures. The “value” of any test is not only this “point-in-time” reference, but what it entails for the future. Although examinees may have passed an assessment they may still have areas of relative weakness that should be remediated in order for them to maximize their full potential as students or employees. Instructors should also observe how all their students are performing on tests in order to identify their own instructional weaknesses. For example, does the curriculum match up with the specified standards and the high level of thinking in those standards? This information can also be aggregated and analyzed at the local, district, or state level to determine program strengths or weaknesses. In order to use scores in a valid way to make decisions about students or programs, we must begin by clearly defining and measuring the psychological/educational constructs or traits that a test purports to measure.

Measuring a construct is certainly complex, but what it boils down to is ensuring that the construct is being measuring in a valid way and then reporting/communicating that process to stakeholders.  For example, if the construct we are trying to measure in an assessment is “Surgery Procedure” and if the candidate passes the test, we expect that the person can recall this information from memory where and when needed.  It wouldn’t be valid to let the participant look up where the liver is located on the Internet during the assessment, because they would not be able to use the Internet while they are halfway through a surgical procedure.

Another example would be “Crane Operation” knowledge and skills.  If this is the construct being measured and it is expected that candidates who pass the test can operate a crane properly, when and where they need to, then allowing them to tweet or text during their crane certification exam would not be a valid thing to do (it would invalidate the test scores) because they would not be able to do this in real life.

However, if the assessment is a low stakes quiz that is measuring the construct, “Tourist Hot Spots of Arkansas,” and the purpose of the quiz is to help people remember some good tourist places in Arkansas, then an “open book” or an “open source” format where the examinee can search the internet or use Web 2.0 resources is fine.

Effectively communicating the purpose of an assessment and the constructs being measured by it is essential  for reducing the instances of cheating. This important communication  can also help prevent cheating from being “redefined” to the detriment of test security.

For more information on assessment security issues and best practices, check out the Questionmark White Paper: “Delivering Assessments Safely and Securely.”

Item Analysis Analytics: The White Paper

greg_pope-150x1502

Posted by Greg Pope

I had a great time putting together an eight-part series on Item Analysis Analytics for this blog and was pleased with the interest it received.

When a reader asked if it would be possible to present all the posts in a single document I thought hey, let’s present the content of these articles in the form of a Questionmark White Paper! So here it is for you to download with our compliments.

I hope the paper helps you in your efforts to create test questions that make the grade!