Item writing workshop in San Antonio March 4: Q&A with Mary Lorenz

Joan Phaup 2013 (3)Posted by Joan Phaup

Last week I shared a conversation with Melissa Fein  about her March 4 morning workshop on Test Development Fundamentals in San Antonio, prior to the Questionmark 2014 Users Conference.

Our afternoon workshop that day will give people a chance to drill down into the building blocks of good tests: well-written items. Mary Lorenz, who honed her test writing skills as a program specialist for the Texas State Board for Education Certification and during 11 years as a classroom teacher, will lead this 3-hour session on The Art and Craft of Item Writing.

Participants will learn how to construct well-written multiple choice items that measure course objectives. They’ll also evaluate their own items, identify flaws and polish up their work to achieve more meaningful results.

I asked Mary about her approach to this subject, which she has taught during workshops for ASTD, The University of Texas at Austin and other organizations.

What makes writing good test questions so challenging?


Mary Lorenz

I find that people write test questions from what it is they know, but all too often they have only ever known bad test questions! There are not many people who have training in how to identify and write a good test question.

What are the most common flaws you see in test questions?

A common problem is the lack of a single, clear, correct answer. Another is a poorly written stem that doesn’t provide enough detail. It’s essential to include all of the information necessary to provide a reasonable basis for responding.

Sometimes an author picks a statement out of a text book and use it as a test answer. Materials like that often cue test takers about the correct answer. People have become test-wise and can guess the right answer without really knowing the content. So you have to learn to write better test questions in order to accurately assess knowledge.

What’s the most important question to ask yourself when writing test items?

What is it you are asking test takers to do? It’s not about what information they can remember. It’s what you want them to do with that information. One of the best ways to get learners to think beyond what they “know” is to present them with a situation and ask them what they should do next. Make them apply what they’ve learned to a decision they need to make on the job. Each item should focus on an important concept, typically a common or potentially serious problem or issue related to their work. Don’t waste testing time with questions assessing knowledge of trivial facts. Don’t ask them to simply parrot a definition. Focus on problems they would encounter in real life. Avoid trivial, “tricky,” or overly complex questions.

So it’s really about objectives.

Yes! Test questions flow very easily from solid objectives, but people haven’t been well schooled in how to write a good objective. Designing an assessment, as well as crafting an objective, requires focus. So we will be looking at typical course objectives and comparing those to well-written, assessable objectives.

If you’ve written a good objective, the questions almost write themselves. Your objectives will also help you determine what item type would be most appropriate. I’ll be focusing on multiple choice items during this workshop, but we will touch on how to determine the right item types to use in different contexts.

How do you inject some fun into helping people learn to write good test questions?

First let me admit something: I am an item-writing nerd. Seriously, I have found myself on more than one occasion bordering on giddy when I come up with a novel way of approaching an objective and genuinely frustrated when I have to begin a question with “Which statement is true?”  In that spirit, I show students some classically bad questions and we all have a good laugh over those. I like them to be able to say, “Now that I know some things about how to discern an ‘okay’ question from a ‘good’ question, it’ will be easy for me to avoid writing bad questions.”

I also help people with their own questions, showing them how to make them better. This can be embarrassing at times, but people gain an awareness that they can do better. They understand that it takes effort and it takes time, but it’s worth it to be able to assess what’s really going on.

How should people prepare for this workshop?

I would like them to bring sample questions with them. I would also like them to bring the objectives on which they are trying to base their assessments.

What would you like people will take away from this session?

An awareness of how to do this better. How to take what they already have and make it a more valid and reliable exam. I’d also like them to leave knowing what a good test question looks like. I want them to leave excited about the notion of writing better test questions. I like seeing those light bulbs go off above people’s heads – to see people change their attitude about multiple choice items and discover that, If they’re written well, they can really assess a lot!

In addition to the two half-day workshops, we are offering a full-day Questionmark Boot Camp for Beginners, taught by Questionmark Trainer Rick Ault.

Check out the conference program to see all the educational sessions taking place in San Antonio March 4 – 7.

Register for the conference by January 30th to save $100. 



SAP to resell Questionmark software to its customers

John Kleeman HeadshotPosted by John Kleeman

Yesterday we were very happy to  announce the signing of a global reseller agreement with SAP AG, under which SAP will resell our technologies to its customers. You can see the press release here.

Under the agreement, SAP will resell assessment technologies from Questionmark under the name SAP® Assessment Management by Questionmark.
SAP® Assessment Management by Questionmark will enable SAP customers to create, deliver and analyze surveys, quizzes, tests and exams. The software  will complement SAP Learning Solution and SuccessFactors Learning, allowing use of assessments for purposes such as certification, regulatory compliance, and health and safety training for a company’s internal tracking purposes.

SAP will deliver the application on-premise and as a cloud offering to give customers a secure, collaborative environment for creating learning assessments in multiple languages and delivering to a wide variety of browsers and devices. We then anticipate that SAP® Assessment Management by Questionmark users will be able to evaluate results and inform stakeholders with timely, meaningful analyses and reports on items and tests.

This agreement builds on our established relationship as an SAP software solution and technology partner, as well as the longstanding compatibility of our assessment management software with both SAP Learning Solution and SuccessFactors Learning.

I am delighted that this agreement will help SAP customers get even more value from their investments in enterprise learning software. I’ve been working closely with SAP as they have reviewed our technology and I admire SAP’s commitment to quality in learning and technology and the strength and quality of their team.

I’m pleased to share a quote within the press release from Markus Schwarz, SVP and global head of SAP Education:

“Since the need to assess learning is at an all-time high, the addition of this offering to our portfolio of collaborative learning software from SAP Education is well-timed indeed. And because it works together with SuccessFactors Learning and SAP Learning Solution, the new SAP Assessment Management application by Questionmark will dovetail perfectly with our strategy to bring cloud-based learning to SAP customers and partners worldwide.”

Workshop on Test Development Fundamentals: Q&A with Melissa Fein

Posted by Joan PhaupJoan Phaup 2013 (3)

We will be packing three days of intensive learning and networking into the Questionmark 2014 Users Conference in San Antonio March 4 – 7.

From Bryan Chapman’s keynote on Transforming Open Data into Meaning and Action to case studies, best practice advice, discussions, demos and instruction in the use of Questionmark technologies, there will be plenty to learn!

Even before the conference starts, some delegates will be immersed in pre-conference workshops. This year we’re offering one  full-day workshop and two half-day workshops.

Here’s the line-up:

Today’s conversation is with Melissa Fein, an industrial-organizational psychology consultant and the author of Test Development:  Fundamentals for Certification and Evaluation.

Melissa’s workshop will help participants create effective criterion-reference tests (CRT). It’s designed for people involved in everything from workplace testing and training program evaluation to certifications and academic testing.

What would you say is the most prevalent misconception about CRT?
…that a passing score should be 70 percent. The cutoff for passing might end up to be 70 percent, but that needs to be determined through a standard-setting process. Often people decide on 70 percent because 70 percent is traditional.

What is most important thing to understand about CRT?
It’s crucial to understand how to produce and interpret scores in a way that is fair to all examinees and to those who interpret and use the scores in making decisions, such as hiring people, promoting people, and awarding grades. Scores are imperfect by nature; they will never be perfect. Our goal is to produce quality scores given the limitations that we face.

How does CRT differ in the worlds of workplace testing, training, certification and academic assessment?
The process used to identify testing objectives differs for these different contexts.  However, there are more similarities than differencesin developing CRTs for workplace testing, training, certification and academic assessment.  The principles underlying the construction of quality assessments — such as validity, reliability, and standard setting — don’t differ.

When is CRT the most appropriate choice, as opposed to norm-referenced testing?
Anytime test scores are being compared to a standard, you want to use criterion-referenced testing. With norm referenced tests, you just want to compare one examinee’s scores with another. If you had police officers who have to pass fitness standards — maybe they have to run a mile in a certain amount of time – you would use CRT. But if the officers are running a benefit 5K race, that’s norm-referenced. You just want to find out who comes in first, second and third.

I understand you will be covering testing enigmas during the workshop. What do you have in mind?
Testing enigmas reflect best practices that seem to defy common sense until you look more closely. The biggest enigma occurs in standard setting. When most people think of setting standards for certifications, they like to think of a maximally proficient person. When I ask them to think of a minimally competent person, they think I’m pulling the rug out from under them! But in standard setting, you are trying to determine the difference between passing and failing, so you are looking to identify the minimally competent person: you want to define the line that distinguishes the minimally competent person from someone who is not competent.

What do you hope people will take away from their morning with you?
I hope people will walk away with at least one new idea that they can apply to their testing program. I also hope that they walk away knowing that something they are already doing is a good idea – that the workshop validates something they are doing in their test development work. Sometimes we don’t know why we do certain thing, so it’s good to get some reassurance.

Click here to read a conversation with Rick Ault about Boot Camp. My next post will be a Q&A with item writing workshop instructor Mary Lorenz.

You will save $100 if you register for the conference by January 30th. You can add a workshop to your conference registration or choose your workshop later.

Item Analysis – Two Methods for Detecting DIF

Posted by Austin FosseyAustin Fossey-42

My last post introduced the concept of differential item functioning. Today, I would like to introduce two common methods for detecting DIF in a classical test theory framework: the Mantel-Haenszel method and the logistic regression method.

I will not go into the details of these two methods, but if you would like to know more, there are many great online resources. I also recommend de Ayala’s book, The Theory and Practice of Item Response Theory, for a great, easy-to-read chapter discussing these two methods.


The Mantel-Haenszel method determines whether or not there is a relationship between group membership and item performance, after accounting for participants’ abilities (as represented by total scores). The magnitude of the DIF is represented with a log odds estimate, known as αMH. In addition to the log odds ratio, we can calculate the Cochran-Mantel-Haenszel (CMH) statistic, which follows a chi squared distribution. CMH shows whether or not the observed DIF is significant, though there is no sense of magnitude as there is with αMH.

Logistic Regression

Unfortunately, the Mantel-Haenszel method is only consistent when investigating uniform DIF. If non-uniform DIF may be present, we can use logistic regression to investigate the presence of DIF. To do this, we run two logistic regression models where item performance is regressed on total scores (to account for the participants’ abilities) and group membership. One of the models will also include an interaction term between test score and group membership. We then can compare the fit of the two models. If the model with the interaction term fits better, then there is non-uniform DIF. If the model with no interaction term shows that group membership is a significant predictor of item performance, then there is uniform DIF. Otherwise, we can conclude that there is no DIF present.

Just because we find a statistical presence of DIF does not necessarily mean that we need to panic. In Odds Ratio, Delta, ETS Classification, and Standardization Measures of DIF Magnitude for Binary Logistic Regression, Monahan, McHorney, Stump, & Perkins note that it is useful to flag items based on the effect size of the DIF.

Both the Mantel-Haenszel method and the logistic regression method can be used to generate standardized effect sizes. Monahan et al. provide three categories of effect sizes: A, B, and C. These category labels are often generated in DIF or item calibration software, and we interpret them as follows: Level A is negligible levels of DIF, level B is slight to moderate levels of DIF, and level C is moderate to large levels of DIF. Flagging rules vary by organization, but it is common for test developers to only review items that fall into levels B and C.

Transforming Open Data into Meaning and Action: Q&A with Bryan Chapman

Joan Phaup 2013 (3)Learning strategist Bryan Chapman’s keynote address on Transforming Open Data into Meaning and Action will be a highlight of the Questionmark 2014 Users Conference in San Antonio March 4 – 7.

With customers now using the Questionmark OData API to harvest meaning from their assessment results with greater freedom and flexibility, we are excited to hear about the broader implications of open data and the opportunities for learning organizations to make the most of it.

The Open Data Protocol, an industry standard for accessing data sources via the web, provides new opportunities reporting and analyzing assessment results. OData feeds can be consumed and analyzed by many leading business intelligence applications, provide new options for custom reports and dashboards..

I asked Bryan recently about what the advent of OData will mean, particularly with regard to learning and measurement:

The concept of open data is being talked about a lot these days. Why is it important?

Bryan Chapman (2)

Bryan Chapman

I’ve got to confess that I’m a total data junkie, so I get very excited about the endless possibilities of open data. Think of how much data is being collected on a daily basis all over the world from scientific discoveries, government data, opinion polls, and even learning.  Gartner recently said that companies are collecting 300% more data than they did 4 years ago.  It’s crazy. But consider what we can discover by selectively combining data in meaningful ways – something OData enables us to do.

Here’s a non-learning example of how powerful data can be: The CDC (Centers for Disease Control) collects information showing incidence of heart disease on a county-by-county basis. A group called Third World Congress on Positive Psychology developed a way to parse through Twitter feeds across two counties, analyzing the use of 40,000 words in over 80 million tweets.

By combining these separate data feeds, they discovered a pattern between having a positive attitude and having lower risk of heart attack. This is a rather simple example, but just think of what kinds of patterns we will find as open data takes hold.

What is the significance of OData for testing and assessment professionals?

If you want to discover patterns of behavior that make companies successful, what better variable to plug in than testing and assessment data?

Case in point: A while ago I worked with a large software company.  Independent, outside data suggested that customers felt that the company’s help desk support team often lacked knowledge in specific technical areas.  We went in, created an assessment (using Questionmark, by the way!) and created a gap analysis across 60 technical skills areas for all help desk support.

We turned the data into a gap analysis heat map with red, yellow and green indicators showing a range of levels from expertise down to lack of knowledge.  When we presented their senior management with the map, it was very clear to see several things…where training was needed…sometimes having the wrong person on the wrong team, and a whole lot more.

This was great as far as it went, but it was just a single snapshot in time…it wasn’t ongoing. I think back on that project and wonder how much more impressive it would have been if the data was continually measured, linked to a dashboard and frequently compared to the independent audit of customer responses.

OData makes this possible.

How can the use of open data impact learning and performance?

First, with open data, it’s relatively easy to flow the results of learning, testing and assessment right into the performance review process.  I’ve been watching the major talent management vendors who have tools to conduct annual performance reviews, do staff planning, succession planning and pay for performance; many of them are gradually adding OData feeds (both in and out of their systems). So creating that level of interoperability is already starting to shape up. The bigger win is the ability to link learning with organizational performance: Kirkpatrick Level 4!

Most of us feel very comfortable collecting Level 1 and Level 2. Do they like the training?  And are they learning, comparing pre and post test score? Some get how to do level 3 by sending out delayed questionnaires asking what skills are being used on the job; or through performance observation. But open data is really the enabler for linking learning and performance with key company metrics, income, productivity, retention, and lots of bottom line results — especially as other parts of the business make their data available through open channels.

How will your keynote address relate to the specific interests of Questionmark customers?

I’m not a technical guru when it comes to open data…not by a long shot. There will be others there who can tell you all about Questionmark’s OData capabilities. But I really think the hardest part of this is re-imagining how data can be creatively combined to paint the whole picture, or at least understanding what others might do with data that we make available through testing and assessment.

I’ll be sharing several examples of innovative approaches, but that’s just the tip of the iceberg.  If our organizations are really collecting 300% more data than 4 years ago, there are simply way too many data streams to combine.  So we need to start off by keeping things simple – to figure out which data streams can get us where we need to go.

If do my job well, we’ll all start dreaming of ways we can marry data together and apply meaning. And before long, we can expect to see some very creative dashboards linking learning data with actual business performance.

Learn more about the conference program, which includes two new pre-conference workshops: Test Development Fundamentals and The Art and Craft of Item Writing. Register for the conference by January 30 to save $100. Another current learning opportunity: 3-day Questionmark training in Las Vegas February 4 – 6.

Six trends to watch in online assessment in 2014

John Kleeman HeadshotPosted by John Kleeman

As we gear up for 2014, here are six trends I suggest could be important in the coming year.

1. Privacy. The revelations in 2013 that government agencies intercept so much electronic data will reverberate in 2014. Expect a lot more questions from stakeholders about where their results are stored and how integrity, data protection and privacy are assured, including the location and ownership of suppliers and data centres. I suspect some organizations will look to build trust with stakeholders by adopting the ISO standard on assessments in the workplace ISO 10667.

2. Anticipation of problems. Many organizations already use assessments to look forward, not just backwards. In regulatory compliance, smart organizations don’t just use assessments to check competence; they analyze results from assessments to identify trends or problems that can indicate potential issues or weaknesses, and prompt corrective measures before it gets too late. Universities and colleges increasingly use assessments to predict problems and help prevent students from dropping out (see for instance Use a survey with feedback to aid student retention). It’s exciting that assessments can be used to find issues in this way and deal with them before they happen. Don’t just treat assessments as a rear-view mirror: use them to look forward.

3. Software as a service (SaaS). For all but the very large organizations, running online assessments via a software as a service is much more cost-effective than running an on-premise system. Delegating to a service provider like Questionmark,makes the hassle of upgrading, maintaining security patches and managing deployment goes away. Increasingly, delivering assessments via a SaaS model will become the default.connected

4. Smaller and more connected world. The Internet is bringing us all together. The world is becoming connected, and in some sense smaller. We can no longer think of another continent or country as being a world away, because we can all connect together so easily. This means it is increasingly important to make your assessments translatable, multi-lingual and cross-cultural. Most medium and large organizations work across much of the world, and assessments need to reflect that.

5. Environment. I wonder if 2014 could be the year when the environmental benefits of online assessments could start to be seriously recognized. Clearly, using computers rather than paper to deliver assessments saves trees, but a bigger benefit is in reduced carbon emissions due to less traveling. For service organizations, business travel is a large proportion of carbon emissions (see for example here), and delivering training and assessments online can make a useful difference. With many countries requiring reporting of carbon emissions by listed companies, this could be important.

6. Security. Last but definitely not least, assessment security will continue to matter. As there is more awareness of the risks, everyone will expect high levels of technical and organizational security in their assessment delivery.  If you are a provider, expect a lot more questions on security from informed users; and if you are a customer or user, check that your supplier and your internal team is genuinely up to date on its security.

Read this list and look at the starting letters, and you get P – A – S – S – E – S! I wish you a happy new year and hope that each of your test-takers passes their assessments in 2014 when it is appropriate that they do so.