Just as students are returning to school these days, we at Questionmark are starting our annual fall series of Breakfast Briefings and User Group meetings in the U.S.
Our Briefing guests will join us for a complimentary breakfast and a hands-on tutorial, during which they will be able to create questions and assessments using Questionmark Perception. The tutorials take place in computer training rooms, so this is a great way to try out the software and a good opportunity to ask questions. We will also be showing people how to use Questionmark Live, our new browser-based authoring tool for subject matter experts, and they will see new ways of managing multilingual assessments and delivering assessments on mobile devices.
Current Perception users are invited to join us for lunch, discussions about topics of their choice and in-depth looks at new product features and Questionmark services. Not only do these meetings give our customers the opportunity to talk with Questionmark managers; they also help our team understand customers’ challenges and needs with regard to assessments.
We will kick off in the Boston area tomorrow and work our way around to a total of seven cities. Here’s the schedule:
September 1: Boston (Burlington)
September 3: New York
September 24: Chicago
October 6: Dallas
October 8 and 9 : Washington, DC (second session added by popular demand!)
October 20: Atlanta
October 22: Los Angeles
There’s still time to sign up for a briefing and/or a user group meeting: click here for information about the Briefings, and here for User Group details.
There is no charge for attending these events, so we hope you will take advantage of these great learning opportunities and will join us!
Earlier this month, the number of questions in Questionmark Live passed 14,000. That is an amazing statistic for 3 months. We continue to add new features to this powerful, browser based authoring tool including the ability to add multimedia files to questions. Multimedia allows our users to create real-world scenarios that may include listening to a customer’s voice, watching a delicate surgical procedure, or fixing a complex piece of machinery via an interactive Flash video. Watch this video to see how easy it is to add audio or video to your questions.
The European Users Conference is less than one month away, and I’m pleased to announce that Tom King, Questionmark’s Interoperability Evangelist, and David Sloan from the University of Dundee will be leading the Tuesday morning General Session, focusing on E-Assessment and Interoperability, Standards and Accessibility.
Tom King is actively involved with many e-learning technology specification groups, and a regular contributor to this blog. Tom will provide an overview of the current status of major standards and the specification organisations behind them, and highlight some of the emerging needs and promising developments. David Sloan will give an overview of accessibility-related legislation, standards and best practice, and show how Questionmark can help support the creation of accessible assessments.
The conference is set to be an exciting two days for Perception Users with Best Practice sessions on the latest trends in assessment management, eight Case Study presentations, and some great Technical Training sessions. Make sure you check out the full conference agenda and if you haven’t already done so, register for the conference!
In my last post, I showed a few more examples of item analyses where we drilled down into why some questions had problems. I thought it might be useful to show a few examples of some questions that have bad and downright terrible psychometric performance to show the ugly side of item analysis.
Below is an example of a question that is fairly terrible in terms psychometric performance. Here are some reasons why:
Going from left to right, first we see that the “Number of Results” is 65, which is not so good: there are too few participants in the sample to be able to make sound judgements about the psychometric performance of the question
Next we see that 25 participants didn’t answer the question (“Number not Answered” = 25), which means there was a problem with people not finishing or finding the questions confusing and giving up.
The “P Value Proportion Correct” shows us that this question is hard with 20% of participants ‘getting it right.’
The “Item Discrimination” indicates very low discrimination, with the difference between the Upper and Lower group in terms of the proportion selecting the correct answer of ‘More than 40’ at only 5%. This means that of the participants with high overall exam scores, 27% selected the correct answer versus 22% of the participants with the lowest overall exam scores. This is a very small difference between the Upper and Lower groups. Participants who know the material should have got the question right more often.
The “Item Total Correlation” reflects the Item Discrimination with a negative value of -0.01. A value like this would definitely not meet most organizations’ internal criteria in terms of what is considered an acceptable item. Negative item-total correlations are a major red flag!
Finally we look at the Outcome information to see how the distracters perform. We find that participants are all over the map selecting distracters in an erratic way. When I look at the question wording I realize how vague and arbitrary this question is: the number of questions that should be in an assessment depends on numerous factors and contexts. It is impossible to say that in any context a certain number of questions are required. It looks like the Upper Group are selecting the response options ’21-40’ and ‘More than 40’ response options more than the other two options, which have smaller numbers of questions. This makes sense from a participant guessing perspective, because in many assessment contexts having more questions than fewer questions is better for reliability.
The psychometricians, SMEs, and test developers reviewing this question would need to send the SME who wrote this question back to basic authoring training to ensure that they know how to write questions that are clear and concise. This question does not really have a correct answer and needs to be re-written to clarify the context and provide many more details to the participants. I would even be tempted to throw out questions along this content line, because how long an assessment should be has no one “right answer.” How long an assessment should be depends on so many things that there will always be room for ambiguity, so it would be quite challenging to write a question that performs well statistically on this topic.
Below is an example of a question that is downright awful in terms psychometric performance. Here are some reasons why:
Going from left to right, first we see that the “Number of Results” is 268, which is really good. That is a nice healthy sample. Nothing wrong here, let’s move on.
Next we see that 56 participants didn’t answer the question (“Number not Answered” = 56), which means there was a problem with people not finishing or finding the questions confusing and giving up. It gets worse, much, much worse.
The “P Value Proportion Correct” shows us that this question is really hard, with 16% of participants ‘getting it right.’
The “Item Discrimination” indicates a negative discrimination, with the difference between the Upper and Lower group in terms of the proportion selecting the correct answer of ‘44123’ at -23%. This means that of the participants with high overall exam scores, 12% selected the correct answer versus 35% of the participants with the lowest overall exam scores. What the heck is going on? This means that participants with the highest overall assessment scores are selecting the correct answer LESS OFTEN than participants with the lowest overall assessment scores. That is not good at all; lets dig deeper.
The “Item Total Correlation” reflects the Item Discrimination with a large negative value of -0.26. This is a clear indication that there is something incredibly wrong with this question.
Finally we look at the Outcome information to see how the distracters perform. This is where the true psychometric horror of this question is manifested. There is neither rhyme nor reason here: participants, regardless of their performance on the overall assessment, are all over the place in terms of selecting response options. You might as well have blindfolded everyone taking this question and had them randomly select their answers. This must have been extremely frustrating for the participants who had to take this question and would have likely led to many participants thinking that the organization administering this question did not know what they were doing.
The psychometricians, SMEs, and test developers reviewing this question would need to provide a pink slip to the SME who wrote this question immediately. Clearly the SME failed basic question authoring training. This question makes no sense and was written in such a way to suggest that the author was under the influence, or otherwise not in a right state of mind, when crafting this question. What is this question testing? How can anyone possibly make sense of this and come up with a correct answer? Is there a correct answer? This question is not salvageable and should be stricken from the Perception repository without a second thought. A question like this should have never gotten in front of a participant to take, let alone 268 participants. The panel reviewing questions should review their processes to ensure that in the future questions like this are weeded out before an assessment goes out live for people to take.
I prepared a new segment on Understanding eLearning Standards. This segment addresses the “how” of elearning standards, and specifically run-time communication using the common AICC HACP specification. [Don’t worry SCORM fans, there will be another segment focusing on the SCORM runtime.]
Standards fans (and hockey fans) are likely to appreciate the analogies used to explain a run-time environment in general. The video also steps through the lifecycle of an activity running in an LMS environment. Then I drill down to the specific of AICC, including both the common browser-to-LMS and the compelling server-to-server uses of AICC HACP.
Finally, the segment closes with a review of key resources from the AICC web site to help you make the most of AICC HACP.
By the way, here is an extra resource for members of the Questionmark Software Support Plan Community. There is a great Knowledge Base article on customizing the Perception v4 PIP file for AICC. This article shows how you can use a custom PIP file to utilize additional demographic or custom variables from an AICC compatible LMS. Check it out.
Stay tuned to the Questionmark Blog for the next segment that will address SCORM Run-Time Communication.
I posted recently about an upcoming three-day learning event in Auckland, New Zealand, focusing on assessment best practices. Now I’d like to update you on the great turnout, exciting customers, and the full house that participated in the workshop!
The Online Assessments Symposium organized by Business Toolbox was packed with learning opportunities: the first two days were devoted to instruction on best practices in creating assessments, and the third brought together industry experts to share advice about moving assessments online.
It was motivating to see academic and corporate Questionmark users sharing their experiences in successfully implementing assessments and enjoying some impressive case study presentations. I took these photos with my BlackBerry to give you a sense of the group that gathered.
We will continue to perform workshops like this one around the world so stay tuned for our next location.