10 Reasons Why Frequent Testing Makes Sense

Posted by John Kleeman

It matters to society, organizations and individuals that test results are trustable. Tests and exams are used to make important decisions about people and each failure of test security reduces that trustworthiness.

There are several risks to test security, but two important ones are identity fraud and getting help from others. With identity fraud, someone asks a friend to take the test for them or pays a professional cheater to take the test and pretend to be them. With getting help from others, a test-taker subverts the process and gets a friend or expert to help them with the test, feeding them the right answers. In both cases, this makes the individual test result meaningless and detracts from the value and trustworthiness of the whole assessment process.

There are lots of mitigations to these risks – checking identity carefully, having well trained proctors, using forensics or other reports and using technical solutions like secure browsers – and these are very helpful. But testing more frequently can also reduce the risk: let me explain.

Suppose you just need to pass a single exam to get an important career step – certification, qualification or other important job requirement, then the incentive to cheat on that one test is large. But if you have a series of smaller tests over a period, then it’s more hassle for a test taker to conduct identity fraud or to get help from others each time. He or she would have to pay the proxy test taker several times.  And make sure the same person is available in case photos are captured. And for the expert help you also must reach out more often, and evade whatever security there is each time

There are other benefits too; here is a list of ten reasons why more frequent testing makes sense:

  1. More reliable. More frequent testing contributes to more reliable testing. A single large test is vulnerable to measurement error if a test taker is sick or has an off day, whereas that is less likely to impact frequent tests.
  2. More up to date. With technology and society changing rapidly, more frequent tests can make tests more current. For instance, some IT certification providers create “delta” tests measuring understanding of their latest releases and encourage people to take quarterly tests to ensure they remain up to date.
  3. Less test anxiety. Test anxiety can be a big challenge to some test takers (see Ten tips on reducing test anxiety for online test-takers), and more frequent tests means less is at stake for each one, and so may help test takers be less anxious.
  4. More feedback. More frequent tests give feedback to test takers on how well they are performing and allow them to identify training or continuing education to improve.
  5. More data for testing organization. In today’s world of business intelligence and analytics, there is potential for correlations and other valuable insight from the data of people’s performance in a series of tests over time.
  6. Encourages test takers to target retention of learning. We all know of people who cram for an exam and then forget it afterwards. More frequent tests encourage people to plan to learn for the longer term.
  7. Encourages spaced out learning. There is strong evidence that learning at spaced out intervals makes it more likely knowledge and skills will be retained. Periodic tests encourage revision at regular intervals and so make it more likely that learning will be remembered.
  8. Testing effect. There is also evidence that tests themselves give retrieval practice and aid retention and more frequent tests will give more such practice.
  9. More practical. With online assessment software and online proctoring, it’s very practical to test frequently, and no longer necessary to bring test takers to a central testing center for one off large tests.
  10. Harder to cheat. Finally, as described above, more frequent testing makes it harder to use identity fraud or to get help from others, which reduce cheating.

I think we’re seeing a slow paradigm shift from larger testing events that happen at a single point in time to smaller, online testing events happening periodically. What do you think?

5 Things I Learned at the European Association of Test Publishers Conference Last Week

Posted by John Kleeman

I just attended the Association of Test Publisher’s European conference (EATP), held last week in Madrid, and wanted to share some of what I learned.

The Association of Test Publishers (ATP) is the trade association for the assessment industry and promotes good practice in assessment. Questionmark have been members for a long time and I am currently on their board of directors. The theme of the conference was “Transforming Assessments: Challenge. Collaborate. Inspire.”

Panel at European Association of Test Publishers

As well as seeing a bit of Madrid (I particularly enjoyed the beautiful Retiro Park), here are some things I learned at the conference. (These are all my personal opinions, not endorsed by Questionmark or the ATP).

1. Skills change. One area of discussion was skills change. Assessments are often used to measure skills, so as skills change, assessments change too. There were at least three strands of opinion. One is that workplace skills are changing rapidly – half of what you learn today will be out of date in five years, less if you work in technology. Another is that many important skills do not change at all – we need to collaborate with others, analyze information and show emotional resilience; these and other important skills were needed 50 years ago and will still be needed in 50 years’ time. And a third suggested by keynote speaker Lewis Garrad is that change is not new. Ever since the industrial revolution, there has been rapid change, and it’s still the case now. All of these are probably a little true!

2. Artificial Intelligence (AI). Many sessions at the conference covered AI. Of course, a lot of what gets called AI is in fact just clever marketing of smart computer algorithms. But nevertheless, machine learning and other things which might genuinely be AI are definitely on the rise and will be a useful tool to make assessments better. The industry needs to be open and transparent in the use of AI. And in particular, any use of AI to score people or identify anomalies that could indicate test cheating needs to be very well built to defend against the potential of bias.

3. Debate is a good way to learn. There were several debates at the conference, where experts debated issues such as performance testing, how to detect fraud and test privacy vs security, with the audience voting before and after. As the Ancient Greeks knew, this is a good format for learning, as you get to see the arguments on both sides presented with passion. I’d encourage others to use debates for learning.

4. Privacy and test security genuinely need balance. I participated in the privacy vs test security debate, and it’s clear that there is a genuine challenge balancing the privacy rights of individual test-takers and the needs of testing organizations to ensure results are valid and have integrity. There is no single right answer. Test-taker rights are not unlimited. And testing organizations cannot do absolutely anything they want to ensure security. The growing rise of privacy laws including the GDPR has brought discussion about this to the forefront as everyone seeks to give test-takers their mandated privacy rights whilst still being able to process data as needed to ensure test results have integrity. A way forward seems to be emerging where test-takers have privacy and yet testing organizations can assert legitimate interests to resist cheating.

5. Tests have to be useful as well as valid, reliable and fair. One of the highlights of the conference was a CEO panel, where Marten Roorda, CEO of ACT, Norihisa Wada, a senior executive at EduLab in Japan, Sangeet Chowfla, CEO of the Graduate Management Admission Council and Saul Nassé, CEO of Cambridge Assessment gave their views on how assessment was changing. I moderated this panel (see picture below) and it was great to hear these very smart thought leaders talk of the future.  There is widespread agreement that validity, reliability and fairness are key tenets for assessments , but also a reminder that we also need “efficacy” – i.e. that tests need to be useful for their purpose and valuable to those who use them.

There was a huge amount of other conference conversations including sessions on online proctoring, test translation, the update to the ISO 10667 standard, producing new guidelines on technology based assessment and much, much more.

I found it challenging, collaborative and inspiring and I hope this blog gives you a small flavor of the conference.

What time limit is fair to set for an online test or exam?

John KleemanPosted by John KleemanPicture of a sand timer

How do you know what time limit to set for a test or exam? I’m presenting a webinar on December 18th on some tips on how you can improve your tests and exams (it’s free of charge, register here) and this is one of the subjects I’ll be covering. In the meantime, this blog gives some good practice on setting a time limit.

Power tests

The first thing to identify is what the test is seeking to measure, and whether this has a speed element. Most tests are “power” tests in that they seek to measure someone’s knowledge or skill, not how fast it can be demonstrated. In a power test, you could set no time limit, but for practical purposes, it’s usual to set a time limit. This should allow most people to have enough time to complete answering the questions.

The best way to set a time limit is to pilot the test and measure how long pilot participants take to answer questions and use this to set an appropriate time period. If you have an established testing program, you may have organizational guidelines on time limits, for example you might allow a certain number of seconds or minutes per question; but even if you have such guidelines, you must still check that they are reasonable for each test.

Speed tests

Sometimes, speed is an important part of what you are trying to measure, and you need to measure that someone not only can demonstrate knowledge or skill but can also do so quickly. In a speed test, failure to be able to answer quickly may mean that the participant does not meet the requirements for what is being measured.

For example, in a compliance test for bank personnel to check their knowledge of anti-bribery and corruption laws, speed is probably not part of what is being measured. It will be rare in practice for people to encounter real-life issues involving bribery and very reasonable for them to think and consider before answering. But if you are testing a medical professional’s ability to react to a critical symptom in a trauma patient and make a decision on a possible intervention, rapid response is likely part of the requirement.

When speed is part of the requirements of what is being measured, the time limit for the test should be influenced by the performance requirements of the job or skill being measured.

Monitoring time limits

For all tests, it is important to review the actual time taken by participants to ensure that the time limit remains appropriate. You should regularly check the proportion of participants who answer all the questions in the test and those who skip or miss out some questions. In a speed test, it is likely that many participants will not finish the test. But if many participants are failing to complete a power test, then this should be investigated and may mean that the time limit is too short and needs extending.

If the time limit for a power test is too short, then essentially it becomes a speed test and is measuring how fast participants can demonstrate their skills. As such, if this is not part of the purpose of the test, it will impact the validity of the test results and it’s likely that the test will mis-classify people and so be unfair.

A particular point of concern is when you are using computerized tests to test people who are not proficient computer users. They will inevitably be slower than proficient computer users, and unless your test seeks to measure computer proficiency, you need to allow such people enough time.

What about people who need extra time?

It’s common to give extra time as accommodation for certain kinds of special needs. Extra time is also sometimes given for linguistic reasons e.g. taking an assessment in second language. Make sure that your assessment system lets you override the time limit in such cases. Ideally base the extra time in such cases on piloting, not just a fixed extra percentage.

Screenshot showing a setting where it is possible to exclude material from the assessment time limitWhen should a time limit start?

My last tip is that the time limit should only start when the questions begin. If you are presenting any of these:

  • Introductory material or explanation
  • Practice questions
  • An honor code to commit to staying honest and not cheating
  • Demographic questions

The time limit should start after these are done. If you are using Questionmark software, you can make this happen by excluding the question block from the assessment time limit.

 

If you are interested in more tips on improving your tests and exams, register to attend our free webinar on December 18th:  10 Quick Tips to Improve your Tests and Exams.

The Nineteen Responsibilities of an Assessment Data Controller under the GDPR

John KleemanPosted by John Kleeman

Back in 2014,  Questionmark produced a white paper covering what at the time was a fairly specialist subject – what assessment organizations needed to do to ensure compliance with European data protection law. With the GDPR in place in 2018, with its extra-territorial reach and potential of large fines, the issue of data protection law compliance is one that all assessment users need to consider seriously.

Data Controller with two Data Processors, one of which has a Sub-Processor

Myself, Questionmark Associate Legal Counsel Jamie Armstrong and Questionmark CEO Eric Shepherd have now rewritten the white paper to cover the GDPR and published it this week. The white paper is called  “Responsibilities of a Data Controller When Assessing Knowledge, Skills and Abilities”. I’m pleased to give you a summary in this blog article.

To remind you, a Data Controller is the organization responsible for making decisions about personal data, whereas a Data Processor is an organization who processes data on behalf of the Data Controller. As shown in the diagram, a Data Processor may have Sub-Processors. In the assessment context, examples of Data Controllers might be:

  • A company that tests its personnel for training or regulatory compliance purposes;
  • A university or college that tests its students;
  • An awarding body that gives certification exams.

Data Processors are typically companies like Questionmark that provide services to assessment sponsors. Data Processors have significant obligations under the GDPR, but the Data Controller has to take the lead.  The Nineteen Responsibilities of an Assessment Data Controller under the GDPR 1. Ensure you have a legitimate reason for processing personal data 2. Be transparent and provide full information to test-takers 3. Ensure that personal data held is accurate 4. Review and deal properly with any rectification requests 5. Respond to subject access requests 6. Respond to data portability requests 7. Delete personal data when it is no longer needed 8. Review and deal properly with any erasure requests 9. Put in place strong security measures 10. Use expert processors and contract with them wisely 11. Adopt privacy by design measures 12. Notify personal data breaches promptly 13. Consider whether you need to carry out a Data Protection Impact Assessment 14. Follow the rules if moving data out of Europe 15. If collecting “special” data, follow the particular rules carefully 16. Include meaningful human input as well as assessment results in making decisions 17. Respond to restriction and objection requests 18. Train your personnel effectively 19. Meet organisational requirementsBack in 2014, we considered there were typically 12 responsibilities for an assessment Data Controller. Our new white paper suggests there are now 19. The GDPR significantly expands the responsibilities Data Controllers have as well as makes it clearer what needs to be done and the likely penalties if it is not done.

The 25 page white paper:

  • Gives a summary of European data protection law
  • Describes what we consider to be the 19 responsibilities of a Data Controller (see diagram)
  • Gives Data Controllers a checklist of the key measures they need from a Data Processor to be able to meet these responsibilities
  • Shares how Questionmark helps meet the responsibilities
  • Comments on how the GDPR by pushing for accuracy of personal data might encourage more use of valid, reliable and trustworthy assessments and benefit us all

The white paper is useful reading for anyone who delivers tests and exams to people in Europe – whether using Questionmark technology or not. Although we hope it will be helpful, like all our blog articles and white papers, this article and the white paper are not a substitute for legal advice specific to your organization’s circumstances. You can see and download all our white papers at www.questionmark.com/learningresources and you can directly download this white paper here.

Judgement is at the Heart of nearly every Business Scandal: How can we Assess it?

Posted by John Kleeman
How does an organization protect itself from serious mistakes and resultant corporate fines?

An excellent Ernst & Young report on risk reduction explains that an organization needs rules and that they are immensely important in defining the parameters in which teams and individuals operate. But the report suggests that rules alone are not enough, it’s how they are adopted by people when making decisions that matter. Culture is a key part of such decision making. And that ultimately when things go wrong “judgement is at the heart of nearly every business scandal that ever occurred”.

Clearly judgement is important for almost every job role and not just to prevent scandals but to improve results. But how do you measure it? Is it possible to test individuals to identify how they would react in dilemmas and what judgement that would apply? And is it possible to survey an organization to discover what people think their peers would do in difficult situations?  One answer to these questions is that you can use Situational Judgement Assessments (SJAs) to measure judgement, both for individuals and across an organization.

Questionmark has published a white paper on Situational Judgement Assessments, written by myself and Eugene Burke. The white paper describes how to assess judgement where you:

  1. Identify job roles and competencies or aspects of that role in your organization or workforce where judgement is important.
  2. Identify dilemmas which are relevant to your organization and each of which requires a choice to be made and where that choice is linked to the relevant job role.
  3. Build questions based on the dilemmas which asks someone to select from the choices –   SJA (Situational Judgement Assessment) questions.

There are two ways of presenting such questions, either to survey someone or to assess individuals on their judgement.

  • You can present the dilemma and survey your workforce on how they think others would do in such a situation. For example “Rate how you think people in the organization are likely to behave in a situation like this. Use the following scale to rate each of the options below: 1 = Very Unlikely 2 = Unlikely 3 = Neutral 4 = Likely 5 = Very Likely”.
  • You can present the dilemma and test individuals on what they personally would do in such a situation, for example as shown in the screenshot below.

You work in the back office in the team approving new customers, ensuring that the organization’s procedures have been followed (such as credit rating and know your customer). Your manager is away on holiday this week. A senior manager in the company (three levels above you) comes into your office and says that there is an important new customer who needs to be approved today. They want to place a big order, and he can vouch that the customer is good. You review the customer details, and one piece of information required by your procedures is not present. You tell the senior manager and he says not to worry, he is vouching for the customer. You know this senior manager by reputation and have heard that he got a colleague fired a few months ago when she didn’t do what he asked. You would: A. Take the senior manager’s word and approve the customer B. Call your manager’s cellphone and interrupt her holiday to get advice C. Tell the manager you cannot approve the customer without the information needed D. Ask the manager for signed written instructions to override standard procedures to allow you to approve the customer

You can see this question “live” with other examples of SJA questions in one of our example assessments on the Questionmark website at www.questionmark.com/go/example-sja.

Once you deliver such questions, you can easily report on the results segmented by attributes of participants (such as business function, location and seniority as well as demographics such as age, gender and tenure). Such reports can help indicate whether compliance will be acted out in the workplace, evaluate where compliance professionals need to focus their efforts and measure whether compliance programs are gaining traction.

SJAs can be extremely useful as a tool in a compliance programme to reduce regulatory risk. If you’re interesting in learning more about SJAs, read Questionmark’s white paper “Assessing for Situational Judgment”, available free (with registration) at https://www.questionmark.com/sja-whitepaper.

New White Paper Examines how to Assess for Situational Judgment

Posted by John Kleeman

Is exercising judgment a critical factor in the competence of the employees and contractors who service your organization? If the answer to this is yes, as it most likely is, you may be interested in Questionmark’s white paper, just published this week on “Assessing for Situational Judgment”.

It’s not just CEOs who need to exercise judgment and make decisions, almost every job requires an element of judgment. Situational Judgment Assessments (SJAs) present a dilemma to the participant and ask them to choose options in response.


Context is defined -> There is a dilemma that needs judgment -> The participant chooses from options -> A score or evaluation is made

Here is an example: 

You work as part of a technical support team that produces work internally for an organization. You have noticed that often work is not performed correctly or a step has been omitted from a procedure. You are aware that some individuals are more at fault than others as they do not make the effort to produce high quality results and they work in a disorganized way. What do you see as the most effective and the least effective responses to this situation?
A.  Explain to your team why these procedures are important and what the consequences are of not performing these correctly.
B.  Try to arrange for your team to observe another team in the organisation who produce high quality work.
C.  Check your own work and that of everyone else in the team to make sure any errors are found.
D.  Suggest that the team tries many different ways to approach their work to see if they can find a method where fewer mistakes are made.

In this example, option C deals with errors but is time consuming and doesn’t address the behavior of team members. Option B is also reasonable but doesn’t deal with the issue immediately and may not address the team’s disorganized approach. Option D is asking a disorganized team to engage in a set of experiments that could increase rather than reduce errors in the work produced. This is likely to be the least effective of the options presented. Option A does require some confidence in dealing with potential pushback from the other team members, but is most likely to have a positive effect.

You can see some more SJA examples at http://www.questionmark.com/go/example-sja.

SJA items assess judgment and variations can be used in pre-hire, post-hire training, for compliance and for certification. SJAs offer assessment programs the opportunity to move beyond assessments of what people know (knowledge of what) to assessments of how that knowledge will be applied in the workplace (knowledge of how).

Questionmark’s white paper is written as a collaboration by Eugene Burke, well known advisor on talent, assessment and analytics and myself. The white paper is aimed at:

  • Psychometricians, testing professionals, work psychologists and consultants who currently create SJAs for workplace use (pre-hire or post-hire) and want to consider using Questionmark technology for such use
  • Trainers, recruiters and compliance managers in corporations and government looking to use SJAs to evaluate personnel
  • High-tech or similar certification organizations looking to add SJAs to increase the performance realism and validity of their exam

The 40 page white paper includes sections on:

  • Why consider assessing for situational judgment
  • What is an SJA?
  • Pre-hire and helping employers and job applicants make better decisions
  • Post-hire and using SJAs in workforce training and development
  • SJAs in certification programs
  • SJAs in support of compliance programs
  • Constructing SJAs
  • Pitfalls to avoid
  • Leveraging technology to maximize the value of SJAs

Situational Judgment Assessments are an effective means of measuring judgment and the white paper provides a rationale and blueprint to make it happen. The white paper is available free (with registration) from https://www.questionmark.com/sja-whitepaper.

I will also be presenting a session about SJAs in March at the Questionmark Conference 2018 in Savannah, Georgia – visit the conference website for more details.