The key to reliability and validity is authoring

John Kleeman HeadshotPosted by John Kleeman

In my earlier post I explained how reliability and validity are the keys to trustable assessments results. A reliable assessment means that it is consistent and a valid assessment means that it measures what you need it to measure.

The key to validity and reliability starts with the authoring process. If you do not have a repeatable, defensible process for authoring questions and assessments, then however good the other parts of your process are, you will not have valid and reliable assessments.

The critical value that Questionmark brings is its structured authoring processes, which enable effective planning, authoring, Questionmark Liveand reviewing of questions and assessments and makes them more likely to be valid.

Questionmark’s white paper “Assessment Results You Can Trust” suggests 18 key authoring measures for making trustable assessments – here are three of the most important.

Organize items in an item bank with topic structure

There are huge benefits to using an assessment management system with an item bank that structures items by hierarchical topics as this facilitates:

  • An easy management view of all items and assessments under development
  • Mapping of topics to relevant organizational areas of importance
  • Clear references from items to topics
  • Use of the same item in multiple assessments
  • Simple addition of new items within a topic
  • Easy retiring of items when they are no longer needed
  • Version history maintained for legal defensibility
  • Search capabilities to identify questions that need updating when laws change or a product is retired

Some stand alone e-Learning creation tools and some LMSs do not provide you with an item bank and require you to insert questions individually within an assessment. If you only have a handful of assessments or you rarely need to update assessments, such systems can work, but for anyone with more than a few assessments, you need an item bank to be able to make effective assessments.

Authoring tool subject matter experts can use directly

One of the critical factors in making successful items is to get effective input from subject matter experts (SMEs), as they are usually more knowledgeable and better able to construct and review questions than learning technology specialists or general trainers.

If you can use a system like Questionmark Live to harvest or “crowdsource” items from SMEs and have learning or assessment specialists review them, your items will be of better quality.

Easy collaboration for item reviewers to help make items more valid

Items will be more valid if they have been properly reviewed. They will also be more defensible if the past changes are auditable. A track-changes capability, like that shown in the example screenshot below, is invaluable to aid the review process. It allows authors to see what changes are being proposed and to check they make sense.

Screenshot of track changes functionality in Questionmark Live

These three capabilities – having an item bank, having an authoring tools SMEs can access directly and allowing easy collaboration with “track changes” are critical for obtaining reliable and valid, and therefore trustable assessments.

For more information on how to make trustable assessments, see our white paper “Assessment Results You can Trust” 

Trustworthy Assessment Results – A Question of Transparency

Austin FosseyPosted by Austin Fossey

Do you trust the results of your test? Like many questions in psychometrics, the answer is that it depends. Like the trust between two people, trustworthy assessment results have to be earned by the testing body.

trustMany of us want to implicitly trust the testing body, be it a certification organization, a department of education, or our HR department. When I fill a car with gas, I don’t want to have to siphon the gas out to make sure the amount of gas matches the volume on the pump—I just assume it’s accurate. We put the same faith in our testing bodies.

Just as gas pumps are certified and periodically calibrated, many high-stakes assessment programs are also reviewed. In the U.S., state testing programs are reviewed by the U.S. Department of Education, peer review groups, and technical advisory boards. Certification and licensure programs are sometimes reviewed by third-party accreditation programs, though these accreditations usually only look to see that certain requirements are met without evaluating how well they were executed.

In her op-ed, Can We Trust Assessment Results?, Eva Baker argues that the trustworthiness of assessment results is dependent on the transparency of the testing program. I agree with her. Participants should be able to easily get information on the purpose of the assessment, the content that is covered, and how the assessment was developed. Baker also adds that appropriate validity studies should be conducted and shared. I was especially pleased to see Baker propose that “good transparency occurs when test content can be clearly summarized without giving away the specific questions.”

For test results to be trustworthy, transparency also needs to extend beyond the development of the assessment to include its maintenance. Participants and other stakeholders should have confidence that the testing body is monitoring its assessments, and that a plan is in place should their results become compromised.

In their article, Cheating: Its Implications for ABFM Examinees, Kenneth Royal and James Puffer discuss cases where widespread cheating affects the statistics of the assessment, which in turn mislead test developers by making items appear easier. The effect can be an assessment that yields invalid results. Though specific security measures should be kept confidential, testing bodies should have a public-facing security plan that explains their policies for addressing improprieties. This plan should address policies for the participants as
well as for how the testing body will handle test design decisions that have been impacted by compromised results.

Even under ideal circumstances, mistakes can happen. Readers may recall that, in 2006, thousands of students received incorrect scores on the SAT, arguably one of the best-developed and carefully scrutinized assessments in U.S. education. The College Board (the testing body that runs the SAT) handled the situation as well as they could, publicly sharing the impact of the issue, the reasons it happened, and their policies for how they would handle the incorrect results. Others will feel differently, but I trust SAT scores more now that I have observed how the College Board communicated and rectified the mistake.

Most testing programs are well-run, professional operations backed by qualified teams of test developers, but there are the occasional junk testing programs such as predatory certificate programs, that yield useless, untrustworthy results. It can be difficult to tell the difference, but like Eva Baker, I believe that organizational transparency is the right way for a testing body to earn the trust of its stakeholders.

Recommended Reading: Learning on Demand by Reuben Tozman

Posted by Jim Farrell

I don’t know about you, but I often feel spoiled by Twitter.

Being busy forces me to mostly consume short articles and blog posts with the attention span similar to my 6-year-old son. Over the course of the year, the pile of books on my nightstand grows, and I fall behind in books I want to read. My favorite thing about this time of the year (besides football and eggnog) is catching up on my reading.

One book that I’ve been really looking forward to reading, since hearing rumors of its creation by the author, is Learning on Demand by Reuben Tozman.

For those of you who are regulars at e-learning conferences, the name Reuben Tozman will not be new to you. Reuben is not one for the status quo. Like many of us, he is constantly looking for the disruptive force that will move the “learner” from the cookie-cutter, one-size-fits-all model that many of us have grown up with to a world where everything revolves around the context of performance. I put the word learner in quotes because Reuben hates the word. We are all learners all of the time in the 70+20+10 world. You are not only a learner when you are logged into your LMS.

Learning on Demand takes the reader through the topics of understanding and designing learning material with the evolving semantic web, the new technologies available today to make learning more effective and efficient, structuring content for an on-demand system, and key skills for instructional designers.

Each chapter includes real-world examples that anyone involved in education will connect with. This isn’t a book that tells you to throw away the baby with the bath water: There are a lot of skills that Instructional Designers use today that will help them be successful in a learning-on-demand world.

Even the appendix of case studies has nuggets to take forward and expand into your everyday work. My favorite was a short piece on work Reuben did with the Forum for International Trade Training (FITT). They called it a “J3 vision” which goes beyond training to performance support. The “Js” are:  J1 – just enough, J2 – Just in time (regardless of time and/or location), and J3 – Just for me (delivered in the medium I like to learn in,) (Notice I did not say learning style: That is a discussion for another time.) To me, this is the perfect way to define good performance support.

I think it would be good for Instructional Designers to put their Dick and Carey books into the closet and keep Reuben’s book close at hand.

How Topic Feedback can give Compliance Assessments Business Value

Posted by John Kleeman

If you need to prove compliance with regulatory requirements, should your training and assessments focus on compliance needs? Or should you train and assess to improve skills that will impact your business primarily, and meet compliance needs as well?

I recently interviewed Frederick Stroebel and Mark Julius from a large South African financial services company, Sanlam Personal Finance, for the SAP blog. Sanlam have used Questionmark Perception for more than a decade and combine it with SAP HR and Learning software. You can see the full interview on the SAP site. Their view was that compliance and business-related needs must be combined:

“We deliver assessments both for compliance and e-learning. It’s a combination of business requirements and legislation. We predominantly started it off thinking that the purpose would be for business reasons, but as soon as the business realized the value for regulatory compliance, we received more and more requests for that purpose.”

One of the key ways in which Sanlam use the results of assessments to improve feedback is to use topic feedback, which identifies topics that may be weak points for the participant.

We set up our assessments so that at the end, the computer gives the participant a summary of the topics and what the score was per topic, so the participant can immediately see where they need further facilitation as well.

It is also valuable in providing feedback to the learner, where a facilitator sits with the learner. The facilitator can immediately determine from the coaching report where exactly the learner needs to go for re-training. We have done extremely well in terms of increasing our overall pass mark and per topic scores by using topic feedback.  For example, for brokers and advisers, there’s an initial assessment they do, and because questions are in different topics, once they’ve taken the assessment, the facilitator can immediately see which type of training that person must go on.

To illustrate, here is part of a Coaching Report that shows a participant has scored 80% in one topic (well above what is needed for competency) and 58% in another (slightly above what is needed).

 

Questionmark Perception coaching report

Topic feedback is a great way of getting value from assessments and I hope Sanlam’s experience and insight can help you.

Assessment Standards 101: SCORM

john_smallPosted by John Kleeman

This is the fourth of a series of  posts on standards that impact assessment.

The ADL SCORM standard rose out of an initiative by the US Department of Defence (DoD), who were large users of e-learning. They wanted to ensure that e-learning content could be interoperable and reusable, for instance to ensure that if e-learning content was developed by one vendor, it could run in another vendor’s environment.

The DoD has a track record of setting technical standards:  for instance  in the 1980s they helped popularize and make TCP/IP an effective standard. The DoD is also a very large customer for most companies in the learning and technology software industry. So when the DoD announced that it would only purchase e-learning software that worked with SCORM, the industry jumped quickly to support it!

One of the ways in which SCORM was made successful was by a series of Plugfests, where vendors could get together in practical labs and check that interoperability was possible in practice, not just in theory. These were well run events, a kind of technological speed dating, where each vendor could try out their compatibility with other vendors. It was great to have technical experts from each vendor in the room and be able to have many different LMSs all able to call our assessments.

In Questionmark Perception, to make an assessment run via SCORM, you use the Publish to LMS capability to create a content package, which is a small XML document that references the assessment. And as you can see in the screenshot below, you can choose from AICC and two flavours of SCORM. Once you’ve made the package, you simply upload it to a management system and participants can then be directed to it.

Publish to LMS screenshot with options including AICC, SCORM 1.2 and SCORM 2004

SCORM is used widely, both within the military and outside it. If you have a choice between AICC and SCORM, it’s often better to choose AICC (see my earlier post in this series), partly because SCORM has a potential security issue (see our past blog article). However, providing you are aware of this issue, SCORM can be a very effective means of calling assessments.

The ADL are currently reviewing SCORM and working out how to improve it, including potentially making it more useful for assessments. As part of their listening for this review, ADL’s technical advisor, Dan Rehak, who was one of the architects of SCORM, is running a session at Questionmark’s user conference in Miami in March to gain feedback on how SCORM could be improved. If you’re interested in influencing this standard to be better in future, this would be a great session to go to. Stay tuned here on the blog for a Questionmark Conference Close-up interview with Dan.