Six tips to increase reliability in competence tests and exams

Posted by John Kleeman

Reliability (how consistent an assessment is in measuring something) is a vital criterion on which to judge a test, exam or quiz. This blog post explains what reliability is, why it matters and gives a few tips on how to increase it when using competence tests and exams within regulatory compliance and other work settings

What is reliability?

Picture of a kitchen scaleAn assessment is reliable if it measures the same thing consistently and reproducibly.

If you were to deliver an assessment with high reliability to the same participant on two occasions, you would be very likely to reach the same conclusions about the participant’s knowledge or skills. A test with poor reliability might result in very different scores across the two instances.

It’s useful to think of a kitchen scale. If the scale is reliable, then when you put a bag of flour on the scale today and the same bag of flour on tomorrow, then it will show the same weight. But if the scale is not working properly and is not reliable, it could give you a different weight each time.

Why does reliability matter?

Just like a kitchen scale that doesn’t work, an unreliable assessment does not measure anything consistently and cannot be used for any trustable measure of competency.

As well as reliability, it’s also important that an assessment is valid, i.e. measures what it is supposed to. Continuing the kitchen scale metaphor, a scale might consistently show the wrong weight; in such a case, the scale is reliable but not valid. To learn more about validity, see my earlier post Six tips to increase content validity in competence tests and exams.

How can you increase the reliability of your assessments?

Here are six practical tips to help increase the reliability of your assessment:

  1. Use enough questions to assess competence. Although you need a sensible balance to avoid tests being too long, reliability increases with test length. In their excellent book, Criterion-Referenced Test Development, Shrock and Coscarelli suggest a rule of thumb is 4-6 questions per objective, with more for critical objectives. You can also get guidance from an earlier post on this blog How many questions do I need on my assessment?
  2.  Have a consistent environment for participants. For test results to be consistent, it’s important that the test environment is consistent – try to ensure that all participants have the same amount of time to take the test in and have a similar environment. For example, if some participants are taking the test in a hurry in a public and noisy place and others are taking it at leisure in their office, this could impact reliability.
  3. Ensure participants are familiar with the assessment user interface. If a participant is new to the user interface or the question types, then they may not show their true competence due to the unfamiliarity. It’s common to provide practice tests to participants to allow them to become familiar with the assessment user interface. This can also reduce test anxiety which also influences reliability.
  4. If using human raters, train them well. If you are using human raters, for example in grading essays or in observational assessments that check practical skills, make sure to define your scoring rules very clearly and as objectively as possible. Train your observers/raters, review their performance, give practice sessions and provide exemplars.
  5. Measure reliability. There are a number of ways of doing this, but the most common way is to calculate what is called “Cronbach’s Alpha” which measures internal consistency reliability (the higher it is, the better). It’s particularly useful if all questions on the assessment measure the same construct. You can easily calculate this for Questionmark assessments using our Test Analysis Report.
  6. Conduct regular item analysis to weed out ambiguous or poor performing questions. Item analysis is an automated way of flagging weak questions for review and improvement. If questions are developed through sound procedures and so well crafted and non-ambiguously worded they are more likely to discriminate well and so contribute to a reliable test. Running regular item analysis is the best way to identify poorly performing questions. If you want to learn more about item analysis, I recently gave a webinar on “Item Analysis for Beginners”, and you can access the recording of this here.


I hope this blog post reminds you why reliability matters and gives some ideas on how to improve reliability.

Will testing employees reduce fines for compliance errors?

Posted by John Kleeman

If a bank faces a fine of millions for money laundering and then can prove, defensibly, that the ‘accused’ had passed competency tests, would that reduce or eliminate the fine? More generally, suppose employees do something wrong and the corporation is facing a regulatory fine. Does it make a difference if those employees were certified? Is it a defence against regulatory action that you took all the measures you could to prevent error?

We are asked this question from time to time, and the answer varies considerably by regulator and by offence. But in general having competent/certified people and good compliant processes will reduce the impact to the corporation of making a compliance mistake. In some cases it might eliminate a fine, but usually not.

Here are three specific examples where a good compliance program can reduce or eliminate fines.

Prosecutors should therefore attempt to determine whether a corporation’s compliance program is merely a “paper program” or whether it was designed, implemented, reviewed … in an effective manner. In addition, prosecutors should determine whether the corporation has provided for a staff sufficient to audit, document, analyze, and utilize the results of the corporation’s compliance efforts. Prosecutors also should determine whether the corporation’s employees are adequately informed about the compliance program and are convinced of the corporation’s commitment to it. This will enable the prosecutor to make an informed decision as to whether the corporation has adopted and implemented a truly effective compliance program that … may result in a decision to charge only the corporation’s employees and agents or to mitigate charges or sanctions against the corporation.

  • The UK Ministry of Justice guidance on the Bribery Act recommends communication and training around bribery and says that “it is a full defence for an organisation to prove that despite a particular case of bribery it nevertheless had adequate procedures in place to prevent persons associated with it from bribing.”
  • Similarly, in Spain, the Spanish criminal code has been updated so that companies may avoid criminal prosecution if they have an effective compliance program in effect including evidence that employees have had sufficient training in the compliance program.

Fines rising to over one billion pounds in 2014 and nearly one billion pounds in 2015In general, the issue is more diffuse. For example, the UK Financial Conduct Authority, which has issued many huge fines over the years (see graph right), does not seem to explicitly reduce fines based on compliance measures.

But its Penalties Manual does say that fines should be increased if the actions are deliberate or reckless or if the breach resulted from systematic weaknesses in the firm’s procedures. Equally, if the breach was inadvertent and there is no evidence that the breach indicates a widespread problem or weakness, the fine might be lower.

So how best to summarize this?

The biggest benefit of a programme for competency testing for employees is that, in conjunction with other compliance measures, it will reduce the chances of an infraction in the first place.

Having certified or competent people is not a “get out of jail free” card but if part of a professional compliance programme, it will help with many regulators in mitigating financial penalties after an infraction.