Trustworthy Assessment Results – A Question of Transparency

Austin FosseyPosted by Austin Fossey

Do you trust the results of your test? Like many questions in psychometrics, the answer is that it depends. Like the trust between two people, trustworthy assessment results have to be earned by the testing body.

trustMany of us want to implicitly trust the testing body, be it a certification organization, a department of education, or our HR department. When I fill a car with gas, I don’t want to have to siphon the gas out to make sure the amount of gas matches the volume on the pump—I just assume it’s accurate. We put the same faith in our testing bodies.

Just as gas pumps are certified and periodically calibrated, many high-stakes assessment programs are also reviewed. In the U.S., state testing programs are reviewed by the U.S. Department of Education, peer review groups, and technical advisory boards. Certification and licensure programs are sometimes reviewed by third-party accreditation programs, though these accreditations usually only look to see that certain requirements are met without evaluating how well they were executed.

In her op-ed, Can We Trust Assessment Results?, Eva Baker argues that the trustworthiness of assessment results is dependent on the transparency of the testing program. I agree with her. Participants should be able to easily get information on the purpose of the assessment, the content that is covered, and how the assessment was developed. Baker also adds that appropriate validity studies should be conducted and shared. I was especially pleased to see Baker propose that “good transparency occurs when test content can be clearly summarized without giving away the specific questions.”

For test results to be trustworthy, transparency also needs to extend beyond the development of the assessment to include its maintenance. Participants and other stakeholders should have confidence that the testing body is monitoring its assessments, and that a plan is in place should their results become compromised.

In their article, Cheating: Its Implications for ABFM Examinees, Kenneth Royal and James Puffer discuss cases where widespread cheating affects the statistics of the assessment, which in turn mislead test developers by making items appear easier. The effect can be an assessment that yields invalid results. Though specific security measures should be kept confidential, testing bodies should have a public-facing security plan that explains their policies for addressing improprieties. This plan should address policies for the participants as
well as for how the testing body will handle test design decisions that have been impacted by compromised results.

Even under ideal circumstances, mistakes can happen. Readers may recall that, in 2006, thousands of students received incorrect scores on the SAT, arguably one of the best-developed and carefully scrutinized assessments in U.S. education. The College Board (the testing body that runs the SAT) handled the situation as well as they could, publicly sharing the impact of the issue, the reasons it happened, and their policies for how they would handle the incorrect results. Others will feel differently, but I trust SAT scores more now that I have observed how the College Board communicated and rectified the mistake.

Most testing programs are well-run, professional operations backed by qualified teams of test developers, but there are the occasional junk testing programs such as predatory certificate programs, that yield useless, untrustworthy results. It can be difficult to tell the difference, but like Eva Baker, I believe that organizational transparency is the right way for a testing body to earn the trust of its stakeholders.

Get trustable results: How many test or exam retakes should you allow?

John Kleeman HeadshotPosted by John Kleeman

How many times is it fair and proper for a participant to retake an assessment if they fail?

One of our customers asked me about this recently in regard to a certification exam. I did some research and thought I’d share it  here.

For a few kinds of assessments, you would normally only allow a single attempt, typically if you are measuring something at a specific point in time. A pre-course or post-course test might only be useful if it is taken right before or right after a training course.

Regarding assessments that just give retrieval practice or reinforce learning, you needn’t be concerned. It may be fine to allow as many retakes as people want. The more times they practice answering the questions, the more they will retain the learning.

But how can you decide how many attempts to allow at a certification assessment measuring competence and mastery?

Consider test security

Retakes can jeopardize test security. Someone might take and retake a test to harvest the items to share with others. The more retakes allowed, the more this risk increases.

International Test Commission draft security guidelines say:

“Retake policies should be developed to reduce the opportunities for item harvesting and other forms of test fraud. For example, a test taker should not be allowed to retake a test that he or she “passed” or retake a test until a set amount of time has passed.”

Consider measurement error

All assessment scores have measurement error. A certification exam classifies people as having mastery (pass) or not (failing), but it doesn’t do so perfectly.

If you allow repeat retakes, you increase the risk of classifying someone as a master who is not competent, but  you also decrease the risk of classifying a competent person as having failed. This is because someone can suffer test anxiety or be ill or make a stupid mistake and fail the test despite being competent.

Require participants to wait for retakes

It’s usual to require a time period to elapse before a retake. This  stops people from  using quick, repeated retakes to take unfair advantage of measurement error. It also encourage reflection and re-learning before the next attempt. Standard 13.6 in the Standards for Educational and Psychological Testing says:

“students. . . should have a reasonable number of opportunities to succeed. . . the time intervals between the opportunities should allow for students to have the opportunity to obtain the relevant instructional experiences.”

If we had a perfectly reliable assessment, there would be no concern about multiple attempts. Picking the number of attempts is a compromise between what is fair to the participants and the limitations of our resources as assessment developers.

Think about test preparation

Could your retake policy affect how people prepare for the exam?

If retakes are easily available, some participants might prepare less effectively, hoping that they can “wing it” since  they can retake at will.  On the other hand, if retakes are limited, this could increase test anxiety and stress. It could also increase the motivation to cheat.

What about fairness?

Some people suffer test anxiety, some people make silly mistakes on the test or use poor time management, and some may be not at their full capacity on the day of the exam. It’s usually fair to offer a retake in such situations. If you do not offer sufficient opportunities to retake, this will impact the face validity of the assessment: people might not consider it fair.

If your exam is open the public, you may not be able to limit retakes. Imagine a country where you were not allowed to retake your driving test once you’d failed it 3 times! It might make the roads safer, but most people wouldn’t see it as equitable.

In my next post on this subject, I will share what some organizations do in practice and offer some steps for arriving at an answer that will be suitable for your organization.

South African Users Conference Programme Takes Shape

Chloe MendoncaPosted by Chloe Mendonca

In just five weeks, Questionmark users and other learning professionals will gather in Midrand for the first South African Questionmark Users Conference.

Delegates will enjoy a full programme, from case studies to features and functions sessions on the effective use of Questionmark technologies.

There will also be time for networking during lunches and our Thursday evening event.

Here are some of the sessions you can look forward to:

  • Case Study: Coming alive with Questionmark Live: A mind shift for lecturers – University of Pretoria
  • Case Study: Lessons and Discoveries over a decade with Questionmark – Nedbank
  • Case Study: Stretching the Boundaries: Using Questionmark in a High-Volume Assessment Environment – University of Pretoria
  • Features and Functions: New browser based tools for collaborative authoring – Overview and Demonstrations
  • Features and Functions: Analysing and Sharing Results with Stakeholders: Overview of Key New Questionmark Reporting and Analytics features
  • Features and Functions: Extending the Questionmark Platform: Updates and overviews of APIs, Standards Support, and integrations with third party applications
  • Customer Panel Discussion: New Horizons for eAssessment

You can register for the conference online or visit our website for more information.

We look forward to seeing you in August!

Chloe-Banner-SA-Conference-2014

Field Test Studies: Taking your items for a test drive

Austin FosseyPosted by Austin Fossey

In large-scale assessment, a significant amount of work goes into writing items before a participant ever sees them. Items are drafted, edited, reviewed for accuracy, checked for bias, and usually rewritten several times before they are ready to be deployed. Despite all this work, a true test of an item’s performance will come when it is first delivered to participants.

Even though we work so hard to write high-quality items, some bad items may slip past our review committees. To be safe, most large-scale assessment programs will try out their items with a field test.

A field test delivers items to participants under the same conditions used in live testing, but the items do not count toward the participants’ scores. This allows test developers and psychometricians to harvest statistics that can be used in an item analysis to flag poorly performing items.

There are two methods for field testing items. The first method is to embed your new items into an assessment that is already operational. The field test items will not count against the participants’ scores, but the participants will not know which items are scored items and which items are field test items.

The second method is to give participants an assessment that includes only field test items. The participants will not receive a score at the end of the assessment since none of the items have yet been approved to be used for live scoring, though the form may be scored later once the final set of items has  been approved for operational use.

In their chapter in Educational Measurement (4 th ed.), Schmeiser and Welch explain that embedding the items into an operational assessment is generally preferred. When items are field tested in an operational assessment, participants are more motivated to perform well on the items. The item data are also collected while the operational assessment is being delivered, which can help improve the reliability of the item statistics.

When participants take an assessment that only consists of field test items, they may not be motivated to try as hard as they would in an operational assessment, especially if the assessment will not be scored. However, field testing a whole form’s worth of items will give you better content coverage with the items so that you have more items that can be reviewed in the item analysis. If field testing an entire form, Shmeiser and Welch suggest using twice as many items as you will need for the operational form. Many items may need to be discarded or rewritten as a result of the item analysis, so you want to make sure you will still have enough to build an operational form at the end of the process.

Since the value of field testing items is to collect item statistics, it is also important to make sure that a representative sample of participants responds to the field test items. If the sample of participant responses is too small or not representative, then the item statistics may not be generalizable to the entire population.

Questionmark’s authoring solutions allow test developers to field test items by setting the item’s status to “Experimental.” The item will still be scored, and the statistics will be
generated in the Item Analysis Report, but the item will not count toward the participant’s final score.

qm Properties

Setting an item’s status to “Experimental” in Questionmark Live so that it can be field tested.

The future of the eBook

Steve Lay HeadshotPosted by Steve Lay

Anyone who follows me on Twitter will be aware that I recently attended two events concerning the future of eBooks and how they relate to learning, education and training.

The first session was organized by CETIS which is a sort of think-tank concerning itself with the use of technology in education and, in particular, technical standards to promote it.  The second event was organized in conjunction with the International Digital Publishing Forum IDPF, the IMS Global Learning Consortium and SC36, which is a Sub-Committee of ISO/IEC, the international organization for standards.

The  topic of discussion was the future of the eBook. From the point of view of publishers, e-Books — and in particular e-Textbooks — are missing one thing: formative quizzes at the end of each chapter. From the point of view of the education sector, e-Books are missing pretty much everything that we’ve come to know and love about the web, such as social interaction and collaboration, pick and mix of individual resources and so on.  Somewhere between these two visions may lie a technical standard around which the industry can organize itself.

One recurring theme in the world of education, and particularly in higher education, is the constant reinvention of the idea of virtual learning.  Each time a new technology comes along the community pounces on the opportunity to start again and design new, more interesting versions of the worldwide web. Many of the ideas are not new but simply elements of systems that were alternatives to the web in the 1990s but which were thrown away or fell into disuse as modern web browsers took off.

Mercifully, although history is only ever written by the winners, you can still find information about many of these systems on the web itself.  For example, some of the papers on Hyper-G/Harmony and Microcosm are worth a look if you are interested in the subject. Interestingly, Microcosm was developed at Southampton University, where Tim Berners-Lee is now a professor, Open Hypermedia and the Web is a good starting point if you want to do a deeper dive.

So what is a book?  Is it any different from a website? In an always- connected world, do we even need textbooks?  The victory of the worldwide web over the other more complex systems around at the time might still contain a lesson for the developers of e-Books.

You can learn more about the session from a blog post by CETIS’ Wilbert Kraan.

Early-birds: Save now on South African conference registration

Chloe MendoncaPosted by Chloe Mendonca

The first ever South African Questionmark Users Conference will start about seven weeks from now, so early-bird sign-ups will end July 14th.

If you want to save R350 on your conference registration, now is the time to register!

We are looking forward to welcoming customers as well as individuals interested in learning more about Questionmark to this event in Midrand, South Africa, on August 21-22.

We’re still building the conference programme and will provide much of the content ourselves, but customers enrich the programme by sharing case studies on successes and lessons learned. We are actively seeking ideas for these sessions, so Questionmark users are invited to check out the conference call for proposals and respond by 14th July.

Here are just a few reasons to attend this conference, hosted by Bytes:

  • Learn more about Questionmark’s Assessment Management System
  • Participate in assessment best practice sessions
  • See the latest Questionmark solutions and features
  • Improve your effectiveness in using Questionmark technologies
  • Hear about the product roadmap and influence future developments
  • Network with other testing and assessment professionals and learn from their experiences
  • Meet with Questionmark and Bytes staff

To learn more about the conference, visit the website.

We hope you will register soon, and we look forward to seeing you in South Africa.

SA Conf

Next Page »
SAP Microsoft Oracle HR-XML AAIC