Performance testing is to certifications as simulation is to learning

Howard Eisenberg HeadshotPost by Howard Eisenberg

I just attended the Performance Testing Council Summit. Performance testing is “testing by doing.”  Exam developers create performance items that require candidates to actually perform real-world, authentic task not multiple-choice questions that have only one best answer or allow a low-ability candidate to guess the correct answer.  The outcome of the task is then evaluated to determine a score, or how well the candidate performed.

All but one of the attendees of this meeting representing certification programs were from software/IT companies. The IT domain lends itself very well to the adoption of performance testing. Advances in virtualization and software-as-a-service make it possible to provision “testing labs” with specific characteristics and traits in minutes and at low costs.  Moreover, as these labs can be hosted in the cloud nowadays, there’s no need for a candidate to travel to a specific location to take an exam.  This means that IT, performance-based certifications can and indeed ARE being delivered online and on-demand, with the help of remote proctoring and other technology-enabled security controls.

sunWe’d like to hear from you! The call for proposals is officially open for the Questionmark 2016 Users Conference. If you have an experience you would like to share with the Questionmark community, please submit a presentation proposal here: click here.

These labs are the real-world context in which an IT professional works.  It’s not a simulation of the software, tools, network connections, etc.: it is the real thing.  As such, it’s arguably a more valid methodology for assessing an IT professional’s ability to perform the tasks required by the job.  So using a performance exam for an IT certification makes sense.

Alas, not all certification exams and the professional domains they represent are as well suited to performance testing. It’s not as easy to recreate the environment in which a registered nurse performs his or her daily duties, for example.  In other domains where technology is not center-stage, Questionmark’s customers have historically done the next best thing.  What’s that, you ask?  Well, it’s simulating the performance environment within the test. And if high-fidelity simulation is not cost-effective to develop, then it’s using real-world exhibits, artifacts, and scenarios expressed through multimedia to bring as much of the performance context/environment to the test as is feasible and cost-effective.

Performance testing is to certifications as simulation is to learning.  It’s that “holy grail.”  If we can make the exam look and feel like the job, then it will have the greatest potential to be the truest measure of ability.  If we can make the training look and feel like the job, then it will have the greatest potential to adequately prepare the employee.  (I say “potential” only because the instrument or the simulation must still be well-designed).

I know that many Questionmark customers have struggled to attain this ideal. That is the reality of working with budgets, timelines and other limited resources.  But I’m willing to bet that many customers have creatively worked around these challenges to create valid tests and exams that provide solid measurement value to the programs in which they are used.

sunIf you have a story to tell about such challenges and solutions, then please share them with the Questionmark community at the Questionmark 2016 Users Conference. Click here to submit your presentation proposal. *Submission deadline is December 4. Slots are limited.

What do you want for the holidays?

Posted by Howard Eisenberg

All I want for the holidays is …

As the acting product owner for Questionmark’s Reporting and Analytics zone, I would love to hear how you would complete that sentence … with respect to assessment reporting and analytics, of course.

To help stimulate the ideas, I will highlight recent developments in our reporting and analytics suite.

The Introduction of the Results Warehouse and Questionmark Analytics

In version 5.3, we introduced the Results Warehouse. This is a database of assessment results that is separate from the database responsible for delivering assessment content and storing the participant’s responses. Results are extracted, transformed and loaded (ETL’ed) into the Results Warehouse from the delivery database on a recurring schedule. This database is the data source for the Questionmark Analytics reports.

With the advent of Analytics, we have introduced some new reports and we plan to continue building reports in Analytics. In the case of the Item Analysis report, we’ve actually ported that to Analytics entirely, and in so doing have delivered improvements to the visualization of item quality and report output options.

Addition of New Reports in Questionmark Analytics

Here’s a brief inventory of the reports currently available in Analytics. You can read-up on the purpose of each of these reports and see sample outputs by consulting the Analytics Help Center.

In the spirit of holiday gift-giving, allow me to expound on a few of these reports.

Results Over Time and Average Score by Demographic

These are two separate reports but they are similar in that each one displays an average assessment score within the context of a 95% confidence interval, and a count of the number of results (sample size).

The “Results Over Time” report plots the assessment mean over a period of time, the interval of which is selected by the user.

The “Demographic” report does the same in terms of displaying a mean score, but it groups the results by a demographic. In this way, it enables the report consumer to compare the mean across different demographic groups.

Assessment Completion Time

This report can be used to help investigate misconduct. It plots assessments results on a score axis and a completion time axis. Outliers may represent causes of misconduct. That is, if a participant scores above the mean, yet takes an abnormally short time to complete the assessment; this may represent a case of cheating. If a participant takes an abnormally long time to complete the assessment, yet scores very poorly; this may represent a case of content theft. The report allows the user to set the range for normal score and completion time.

Item Analysis

Finally, the item analysis report has been improved to provide users with better visualization of the quality of items on an assessment form, as well as more output options.

Suspect items are immediately visible because users can specify acceptable ranges for p-value and item-total correlation. Items that fall within acceptable ranges for each measure are green, those that fall outside of the acceptable range for one of the two measures are orange, and any that miss the mark for both p-value and item-total correlation are red.

Additionally, different sections and level of detail included in this report can be output to PDF and/or comma separated value files.

So … what’s on your wish list for the holidays?

The Open Assessment Platform in Action


Posted by Howard Eisenberg

I was impressed, during the recent Questionmark European Users Conference, to meet so many people who have been using Questionmark’s Open Assessment Platform to create solutions that address their organizations’ particular needs. These customers have used various elements of this platform, which utilizes  standard technologies, to address their specific challenges. These solutions make use of the readily available APIs (Application Program Interfaces), Questionmark Perception version 5 templates and other resources that are available through the Open Assessment Platform.

Some examples:

By incorporating the functionality of JQuery (a cross-browser open source JavaScript library) into Questionmark Perception version 5, the University of Leuven in Belgium has been able to set up client-side form validation. Their case study presenter demonstrated how to  differentiate between required and optional questions in a survey. Participants could be required, say, to answer the first question and third questions but not the second—and they wouldn’t be able to submit the survey until they answer the required questions. They also showed how a participant could be required to provide the date in a specific, pre-determined format.  And they demonstrated an  essay question that includes a paragraph containing misspelled words, which students identify by clicking on them. Customizations like these make creative use of the templates in Perception version 5 and demonstrate that it’s an extensible platform with which users can create their own tailor-made solutions.

A staff member from Rotterdam University demonstrated a technique for creating random numeric questions using Microsoft Excel and QML (Question Markup Language). This solution makes it possible to base questions on randomly generated values and other well-chosen variables, allowing for limits on lower and upper boundaries. Formulas in Excel make it possible to generate the numbers that appear in word problems  generated using  QML, which in turn can be used to create various iterations and clones of typical math question types.  QML— because it is complete, well structured and well documented – is proving its worth as a tool for generating large numbers of questions and even for providing “smart” feedback: Common mistakes can be diagnosed by establishing certain conditions within a question. For example, If the input is supposed to  be a number rounded to the nearest tenth and the correct answer is 55.5, it can be assumed that a person who put down 55.4 as their answer has probably made a rounding error.

Random conversations revealed other innovations such as automating the creation of participants, their enrollment  in appropriate groups and the scheduling of their assessments — all made possible through the use of QMWISe (Questionmark Web Integration Services environment).

It feels to me as if we have reached a threshold where the Open Assessment Platform is really being embraced and put to imaginative use. The stories I heard at the conference were certainly eye openers for me; I think that innovations like these will inspire other Questionmark users to come up with equally innovative solutions. So I am looking forward to hearing more great case studies at the 2011 Users Conference in Los Angeles! (The call for proposals is now open, so if you are a Perception user now is the time to think about how you would like to participate in the 2011 conference program.)

Conference Close-up: Assessments That Measure Knowledge, Skill & Ability

Posted by Joan Phaup

I’ve been having a great time talking to presenters at the Questionmark 2010 Users Conference – customers, our keynote speaker and Questionmark staff. I wanted to find out from Howard Eisenberg about the Best Practices presentation he will deliver at the conference on Effectively Measuring Knowledge, Skill and Ability with Well-crafted Assessments

Q: Could you explain your role at Questionmark?

A: I manage Training and Consulting, so I work with our customers to get the most of their assessments and their use of Questionmark Perception. For some that might mean training on how to use the software effectively. For others it might mean providing solutions that allow them to use the software within the context of their current business processes, such as synchronizing the data between the organization’s central user directory and Perception. In some cases we might need to create reports to supplement those that come with the product or do some other custom development. Sometimes we go on site, install the Perception software and set it up within the customers’ LMS and do any troubleshooting right on the spot. Whatever we do, our goal is to ensure customers’ speed to success, getting them operational faster.

Q: What will you be talking about during your Best Practice session in Miami?

A:  Over the years I’ve given presentations on Creating Assessments That Get Results, where I cover the dos and don’ts of writing test items. A question that always comes up during those talks is how to write test content that goes beyond testing information recall…content that tests a person’s ability to perform a task. There are limitations to using software like Perception to do that: certain things simply require that a person perform a task and have someone observe them, so that all the scoring and evaluation is done by an observer or rater. But there are a lot of possibilities for creating computer-scored items that can measure skill and ability rather than just recall of information. This session is designed to give people tools to take their tests to that level. First we need a framework for categorizing knowledge, skills and abilities: what makes a skill a skill and an ability an ability. We’ll help people classify their learning objectives along those lines and look at specific types of questions that can be used to measure skill and ability. The questions that provide this kind of measurement expand upon the question types that are supported in Questionmark Perception—selected response types as well as constructed responses.   We’ll use several real-world examples to illustrate how questions of this nature go beyond recall of knowledge and go to skill and ability.

Q: What are you looking forward to at the conference?

A: I am really looking forward to meeting customers and in some cases reconnecting with customers I’ve gotten to know over the years. That’s really a highlight for me…reconnecting with our great customers. I am consistently amazed and impressed about how passionate our customers are about what they do with our software and how smart they are in using it. Every year, after talking with a customer or sitting in on a case study, I come away thinking, Wow! That was really clever! So I’m looking forward to hearing those kinds of stories again this year.

The conference program is nearly finalized and includes case studies, tech training and best practice sessions for every experience level.  Check it out and plan to join us March 14 – 17 in Miami!

Do You Know How to Write Good Test Questions?


Posted by Howard Eisenberg

I had a typical education.  I took lots of tests.  Knowing what I know now about good testing practice, I wonder how many of those tests really provided an accurate measure of my knowledge.

Common testing practices often contradict what is considered best practice.  This piece will focus on four of the most common “myths” or “mistakes” that teachers, subject matter experts, trainers and educators in general make when writing test questions.

1) A multiple choice question must have at least four choices.  False.
Three to five choices is considered sufficient.  Of course the fewer the choices, the greater the chance a test-taker can guess the correct answer.  However, the point however is you don’t need four choices, and if you are faced with the decision of adding an implausible or nonsensical distracter to make four choices, it won’t add any measurement value to the question anyway.  Might as well just leave it at three choices.

2)  The use of “all of the above” as a choice in a multiple choice question is good practice.  False.
It may be widely used but it is poor practice.  “All of the above” is almost always the correct answer.  Why else would it be there?  It is tacked onto a multiple choice question so it can have only one best answer. After all, writing plausible distracters is difficult.  If at least two of the other choices answer the question, then “all of the above” is the answer. No need to consider any more choices.

3) Starting a question with “Which of the following is not …” is considered best practice.  False.

First, the use of negatives in test questions should be avoided (unless you are trying to measure a person’s verbal reasoning ability).  Second, the use of the “which of the following …” form usually results in a question that only tests basic knowledge or recall of information presented in the text or in the lecture.  You might as well be saying:  “Which of the following sentences does not appear exactly as it did in the manual?

A) Copy > paste (from manual) choice 1
B) Copy > past choice 2
C) Copy > past choice 3
D) Make something up

While that may have some measurement value, my experience tells me that most test writers prefer to measure how well a person can apply knowledge to solve novel problems.  This type of question just won’t reach that level of cognition.  If you really want to get to problem-solving, consider using a real-world scenario and then posing a question.

4) To a subject matter expert, the correct answer to a good test question should be apparent.  True.

A subject matter expert knows the content.  A person who really knows the content should be able to identify the best answer almost immediately.  Test writers often hold the misconception that a good test question is one that is tricky and confusing.  No, that’s not the point of a test.  The point is to attain an accurate measure of how well a person knows the subject matter or has mastered the domain.  The question should not be written to trick the test-taker, let alone the expert. That just decreases the value of the measurement.

There are many more “do’s” and “don’ts” when it comes to writing good test questions.  But you can start to improve your test questions now by considering these common misconceptions as you write your next test.