Podcast: An Innovative Approach to Delivering Questionmark Assessments

 

Posted By Sarah Elkins

The University of Bradford has recently developed an innovative e-assessment facility, using cutting-edge thin client technology to provide a 100-seat room dedicated primarily to summative assessment. The room provides enhanced security features for online assessment and has been used for the first time in 2009 with considerable success. The room’s flexible design maximises its usage by allowing for formative testing, diagnostic testing and general teaching.

John Dermo is the e-Assessment Advisor at the University of Bradford.  In this podcast he explains the technology behind this unique setup and talks about the benefits and challenges in using this room. John Dermo will also be presenting a session at the 2009 European Users Conference, where he will go into more detail about the project.

Item Analysis Analytics Part 4: The Nitty-Gritty of Item Analysis

 

greg_pope-150x1502

Posted by Greg Pope

In my previous blog post I highlighted some of the essential things to look for in a typical Item Analysis Report. Now I will dive into the nitty-gritty of item analysis, looking at example questions and explaining how to use the Questionmark Item Analysis Report in an applied context for a State Capitals Exam.

The Questionmark Item Analysis Report first produces an overview of question performance both in terms of the difficulty of questions and in terms of the discrimination of questions (upper minus lower groups). These overview charts give you a “bird’s eye view” of how the questions composing an assessment perform. In the example below we see that we have a range of questions in terms of their difficulty (“Item Difficulty Level Histogram”), with some harder questions (the bars on the left), most average-difficulty questions (bars in the middle), and some easier questions (the bars on the right). In terms of discrimination (“Discrimination Indices Histogram”) we see that we have many questions that have high discrimination as evidenced by the bars being pushed up to the right (more questions on the assessment have higher discrimination statistics).

part-4-picture-1

Overall, if I were building a typical criterion-referenced assessment with a pass score around 50% I would be quite happy with this picture. We have more questions functioning at the pass score point with a range of questions surrounding it and lots of highly discriminating questions. We do have one rogue question on the far left with a very low discrimination index, which we need to look at.

The next step is to drill down into each question to ensure that each question performs as it should. Let’s look at two questions from this assessment, one question that performs well and one question that does not perform so well.

The question below is an example of a question that performs nicely. Here are some reasons why:

  • Going from left to right, first we see that the “Number of Results” is 175, which is a nice sample of participants to evaluate the psychometric performance of this question.
  • Next we see thateveryone answered the question (“Number not Answered” = 0), which means there probably wasn’t a problem with people not finishing or finding the questions confusing and giving up.
  • The “P Value Proportion Correct” shows us that this question is just above the pass score where 61% of participants ‘got it right.’ Nothing wrong with that: the question is neither too easy nor too hard.
  • The “Item Discrimination” indicates good discrimination, with the difference between the upper and lower group in terms of the proportion selecting the correct answer of ‘Salem’ at 48%. This means that of the participants with high overall exam scores, 88% selected the correct answer versus only 40% of the participants with the lowest overall exam scores. This is a nice, expected pattern.
  • The “Item Total Correlation” backs the Item Discrimination up with a strong value of 0.40. This means that of all participants who answered the questions, the pattern of high scorers getting the question right more than low scorers holds true.
  • Finally we look at the Outcome information to see how the distracters perform. We find that each distracter pulled some participants, with ‘Portland’ pulling the most participants, especially from the “Lower Group.” This pattern makes sense because those with poor state capital knowledge may make the common mistake of selecting Portland as the capital of Oregon.

The psychometricians, SMEs, and test developers reviewing this question all have smiles on their faces when they see the item analysis for this item.

part-4-picture-2

Next we look at that rogue question that does not perform so well in terms of discrimination-–the one we saw in the Discrimination Indices Histogram. When we look into the question we understand why it was flagged:

  • Going from left to right, first we see that the “Number of Results” is 175, which is again a nice sample size: nothing wrong here.
  • Next we see everyone answered the question, which is good.
  • The first red flag comes from the “P Value Proportion Correct” as this question is quite difficult (only 35% of participants selected the correct answer). This is not in and of itself a bad thing so we can keep this in memory as we move on,
  • The “Item Discrimination” indicates a major problem, a negative discrimination value. This means that participants with the lowest exam scores selected the correct answer more than participants with the highest exam scores. This is not the expected pattern we are looking for: Houston, this question has a problem!
  • The “Item Total Correlation” backs up the Item Discrimination with a high negative value.
  • To find out more about what is going on we delve into the Outcome information area to see how the distracters perform. We find that the keyed-correct answer of Nampa is not showing the expected pattern of upper minus lower proportions. We do, however, find that the distracter “Boise” is showing the expected pattern of the Upper Group (86%) selecting this response option much more than the Lower Group (15%). Wait a second…I think I know what is wrong with this one, it has been mis-keyed! Someone accidently assigned a score of 1 to Nampa rather than Boise.

part-4-picture-3

No problem: the administrator pulls the data into the Results Management System (RMS), changes the keyed correct answer to Boise, and presto, we now have defensible statistics that we can work with for this question.

part-4-picture-4

The psychometricians, SMEs, and test developers reviewing this question had a frown on their faces at first but those frowns were turned upside down when they realized it is just a simple mis-keyed question.

In my next blog post I would like share some observations on the relationship between Outcome Discrimination and Outcome Correlation.

Are you ready for some light relief after pondering all these statistics? Then have some fun with our own State Capitals Quiz.

Feedback in Questionmark Live

Posted by Jim Farrell

As I started thinking about what I wanted to blog about, I couldn’t get past the podcast done by our very own Joan Phaup and Dr. Will Thalheimer of Work-Learning Research on the use of feedback. One of the most powerful features in Questionmark Live is the ability to leave choice-based feedback. I will likely have many blog posts on this topic and Dr. Thalheimer’s white paper, but let’s start at the beginning:

Retrieval is more important than feedback. The role that feedback plays is to support retrieval.

This statement by Will seems simple, but it helps to understand how to write good feedback. There are so many things to think about when creating feedback in a question.

  • When is the retrieval opportunity presented?
  • What is the feedback for a correct answer?
  • What is the feedback for an incorrect answer?

How does Questionmark Live fit into this? Well, it is pretty easy to write feedback for late-in-learning retrieval since you are only trying to get the learner back on track. It is the early-in-learning feedback that needs to be more extensive so it can help the learner develop pathways to information to support later retrieval. Allowing a subject matter expert (SME) to create extensive feedback in Questionmark Live will ensure that your feedback is detailed and accurate. No one is expecting the SME to be an expert in question writing. You may need to tweak the question once you bring it into Perception, but your feedback will be far more powerful if you glean it from someone who knows about a subject in depth.

 

Multiple Choice Question with Feedback Showing

 

I really encourage you to read Dr. Thalheimer’s white paper to help you use feedback to improve the learning process.

Do You Know How to Write Good Test Questions?

howard-headshot-small1

Posted by Howard Eisenberg

I had a typical education.  I took lots of tests.  Knowing what I know now about good testing practice, I wonder how many of those tests really provided an accurate measure of my knowledge.

Common testing practices often contradict what is considered best practice.  This piece will focus on four of the most common “myths” or “mistakes” that teachers, subject matter experts, trainers and educators in general make when writing test questions.

1) A multiple choice question must have at least four choices.  False.
Three to five choices is considered sufficient.  Of course the fewer the choices, the greater the chance a test-taker can guess the correct answer.  However, the point however is you don’t need four choices, and if you are faced with the decision of adding an implausible or nonsensical distracter to make four choices, it won’t add any measurement value to the question anyway.  Might as well just leave it at three choices.

2)  The use of “all of the above” as a choice in a multiple choice question is good practice.  False.
It may be widely used but it is poor practice.  “All of the above” is almost always the correct answer.  Why else would it be there?  It is tacked onto a multiple choice question so it can have only one best answer. After all, writing plausible distracters is difficult.  If at least two of the other choices answer the question, then “all of the above” is the answer. No need to consider any more choices.

3) Starting a question with “Which of the following is not …” is considered best practice.  False.

First, the use of negatives in test questions should be avoided (unless you are trying to measure a person’s verbal reasoning ability).  Second, the use of the “which of the following …” form usually results in a question that only tests basic knowledge or recall of information presented in the text or in the lecture.  You might as well be saying:  “Which of the following sentences does not appear exactly as it did in the manual?

A) Copy > paste (from manual) choice 1
B) Copy > past choice 2
C) Copy > past choice 3
D) Make something up

While that may have some measurement value, my experience tells me that most test writers prefer to measure how well a person can apply knowledge to solve novel problems.  This type of question just won’t reach that level of cognition.  If you really want to get to problem-solving, consider using a real-world scenario and then posing a question.

4) To a subject matter expert, the correct answer to a good test question should be apparent.  True.

A subject matter expert knows the content.  A person who really knows the content should be able to identify the best answer almost immediately.  Test writers often hold the misconception that a good test question is one that is tricky and confusing.  No, that’s not the point of a test.  The point is to attain an accurate measure of how well a person knows the subject matter or has mastered the domain.  The question should not be written to trick the test-taker, let alone the expert. That just decreases the value of the measurement.

There are many more “do’s” and “don’ts” when it comes to writing good test questions.  But you can start to improve your test questions now by considering these common misconceptions as you write your next test.

The Secret of Writing Multiple-Choice Test Items

julie-smallPosted by Julie Chazyn

I read a very informative blog entry on the CareerTech Testing Center Blog that I thought was worth sharing. It’s about multiple-choice questions: how they are constructed and some tips and tricks to creating them.

I asked its author, Kerry Eades, an Assessment Specialist at the Oklahoma Department of Career and Technology teacherEducation (ODCTE), about his reasons for blogging on The Secret of Writing Multiple-Choice Test Items. According to Kerry, CareerTech Testing Center took this lesson out of a booklet they put together as a resource for subject matter experts who write multiple-choice questions for their item banks, as well as for instructors who needed better instruments to create strong in-class assessments for their own classrooms. Kerry points out that the popularity of multiple-choice questions “stems from the fact that they can be designed to measure a variety of learning outcomes.” He says it takes a great deal of time, skill, and adherence to a set of well-recognized rules for item construction to develop a good multiple-choice question item.

The CareerTech Testing Center works closely with instructors, program administrators, industry representatives, and credentialing entities to ensure skills standards and assessments meet Carl Perkins requirements, reflect national standards and local industry needs. Using Questionmark Perception, CareerTech conducts tests for more than 100 career majors, with an online competency assessment system that delivers approximately 75,000 assessments per year.

Check out The Secret of Writing Multiple-Choice Test Items.

For more authoring tips visit Questionmark’s Learning Café.

12 Tips for Writing Good Test Questions

Posted by Joan Phaup

Writing effective questions takes time and practice. Whether your goal is to measure knowledge and skills, survey opinions and attitudes or enhance a learning experience, poorly worded questions can adversely affect the quality of the results.

I’ve gleaned the following tips for writing and reviewing questions from Questionmark’s learning resources:

1. Keep stems and statements as short as possible and use clear, concise language.toolbox
2. Use questions whenever possible (What, Who, When, Where, Why and How).
3. Maintain grammatical consistency to avoid cueing.
4. List choices in a logical order.
5. Avoid negatives, especially double negatives.
6. Avoid unnecessary modifiers, especially absolutes (e.g. always, never, etc.).
7. Avoid “All of the above” and use of “None of the above” with caution.
8. Avoid vague pronouns (e.g. it, they).
9. Avoid conflicting alternatives.
10. Avoid syllogistic reasoning choices (e.g. “both a and b are correct”) unless absolutely necessary.
11. Avoid providing cues to correct answer in the stem.
12. Avoid providing clues to the answer of one question in another question.

If you would like more information about writing question and assessments, a good place to start is the Questionmark Learning Cafe.