Questionmark Breakfast Briefings & User Group Meetings Start Tomorrow

Joan Phaup

Posted by Joan Phaup

Just as students are returning to school these days, we at Questionmark are starting our annual fall series of  Breakfast Briefings and User Group meetings in the U.S.

Our Briefing guests will join us for a complimentary breakfast and a hands-on tutorial, during which they will be able to create questions and assessments using Questionmark Perception. The tutorials take place in computer training rooms, so this is a great way to try out the software and a good opportunity to ask questions.  We will also be showing people how to use Questionmark Live, our new browser-based authoring tool for subject matter experts, and they will see new ways of managing multilingual assessments and delivering assessments on mobile devices.

Current Perception users are invited to join us for lunch, discussions about topics of their choice and in-depth looks at new product features and Questionmark services.  Not only do these meetings give our customers the opportunity to talk with Questionmark managers; they also help our team understand customers’ challenges and needs with regard to assessments.

We will kick off in the Boston area  tomorrow and work our way around to a total of seven cities. Here’s the schedule:

  • September 1: Boston (Burlington)
  • September 3: New York
  • September 24: Chicago
  • October 6:  Dallas
  • October 8 and 9 :  Washington, DC (second session added by popular demand!)
  • October 20: Atlanta
  • October 22: Los Angeles

There’s still time to sign up for a briefing and/or a user group meeting: click here for information about the Briefings, and here for User Group details.

There is no charge for attending these events, so we hope you will take advantage of these great learning opportunities and will join us!

How it Works: Adding Multimedia to a Question in Questionmark Live

jim_small

Posted by Jim Farrell

Earlier this month, the number of questions in Questionmark Live passed 14,000. That is an amazing statistic for 3 months. We continue to add new features to this powerful, browser based authoring tool including the ability to add multimedia files to questions. Multimedia allows our users to create real-world scenarios that may include listening to a customer’s voice, watching a delicate surgical procedure, or fixing a complex piece of machinery via an interactive Flash video. Watch this video to see how easy it is to add audio or video to your questions.

General Session at European Users Conference: E-assessment and Interoperability, Standards and Accessibility

sarah-small

Posted By Sarah Elkins

The European Users Conference is less than one month away, and I’m pleased to announce that Tom King, Questionmark’s Interoperability Evangelist, and David Sloan from the University of Dundee will be leading the Tuesday morning General Session, focusing on E-Assessment and Interoperability, Standards and Accessibility.

Tom King is actively involved with many e-learning technology specification groups, and a regular contributor to this blog. Tom will provide an overview of the current status of major standards and the specification organisations behind them, and highlight some of the emerging needs and promising developments.  David Sloan will give an overview of accessibility-related legislation, standards and best practice, and show how Questionmark can help support the creation of accessible assessments.

The conference is set to be an exciting two days for Perception Users with Best Practice sessions on the latest trends in assessment management, eight Case Study presentations, and some great Technical Training sessions. Make sure you check out the full conference agenda and if you haven’t already done so, register for the conference!

Item Analysis Analytics Part 8: Some problematic questions

greg_pope-150x1502

Posted by Greg Pope

In my last post, I showed a few more examples of item analyses where we drilled down into why some questions had problems. I thought it might be useful  to show a few examples of some questions that have bad and downright terrible psychometric performance to show the ugly side of item analysis.

Below is an example of a question that is fairly terrible in terms psychometric performance. Here are some reasons why:

  • Going from left to right, first we see that the “Number of Results” is 65, which is not so good: there are too few participants in the sample to be able to make sound judgements about the psychometric performance of the question
  • Next we see that 25 participants didn’t answer the question (“Number not Answered” = 25), which means there was a problem with people not finishing or finding the questions confusing and giving up.
  • The “P Value Proportion Correct” shows us that this question is hard with 20% of participants ‘getting it right.’
  • The “Item Discrimination” indicates very low discrimination, with the difference between the Upper and Lower group in terms of the proportion selecting the correct answer of ‘More than 40’ at only 5%. This means that of the participants with high overall exam scores, 27% selected the correct answer versus 22% of the participants with the lowest overall exam scores. This is a very small difference between the Upper and Lower groups. Participants who know the material should have got the question right more often.
  • The “Item Total Correlation” reflects the Item Discrimination with a negative value of -0.01. A value like this would definitely not meet most organizations’ internal criteria in terms of what is considered an acceptable item. Negative item-total correlations are a major red flag!
  • Finally we look at the Outcome information to see how the distracters perform. We find that participants are all over the map selecting distracters in an erratic way. When I look at the question wording I realize how vague and arbitrary this question is: the number of questions that should be in an assessment depends on numerous factors and contexts. It is impossible to say that in any context a certain number of questions are required. It looks like the Upper Group are selecting the response options ‘21-40’ and ‘More than 40’ response options more than the other two options, which have smaller numbers of questions. This makes sense from a participant guessing perspective, because in many assessment contexts having more questions than fewer questions is better for reliability.

The psychometricians, SMEs, and test developers reviewing this question would need to send the SME who wrote this question back to basic authoring training to ensure that they know how to write questions that are clear and concise. This question does not really have a correct answer and needs to be re-written to clarify the context and provide many more details to the participants. I would even be tempted to throw out questions along this content line, because how long an assessment should be has no one “right answer.” How long an assessment should be depends on so many things that there will always be room for ambiguity, so it would be quite challenging to write a question that performs well statistically on this topic.

part-8-pic-1

Below is an example of a question that is downright awful in terms psychometric performance. Here are some reasons why:

  • Going from left to right, first we see that the “Number of Results” is 268, which is really good. That is a nice healthy sample. Nothing wrong here, let’s move on.
  • Next we see that 56 participants didn’t answer the question (“Number not Answered” = 56), which means there was a problem with people not finishing or finding the questions confusing and giving up. It gets worse, much, much worse.
  • The “P Value Proportion Correct” shows us that this question is really hard, with 16% of participants ‘getting it right.’
  • The “Item Discrimination” indicates a negative discrimination, with the difference between the Upper and Lower group in terms of the proportion selecting the correct answer of ‘44123’ at  -23%. This means that of the participants with high overall exam scores, 12% selected the correct answer versus 35% of the participants with the lowest overall exam scores. What the heck is going on? This means that participants with the highest overall assessment scores are selecting the correct answer LESS OFTEN than participants with the lowest overall assessment scores. That is not good at all; lets dig deeper.
  • The “Item Total Correlation” reflects the Item Discrimination with a large negative value of -0.26. This is a clear indication that there is something incredibly wrong with this question.
  • Finally we look at the Outcome information to see how the distracters perform. This is where the true psychometric horror of this question is manifested. There is neither rhyme nor reason here: participants, regardless of their performance on the overall assessment, are all over the place in terms of selecting response options. You might as well have blindfolded everyone taking this question and had them randomly select their answers. This must have been extremely frustrating for the participants who had to take this question and would have likely led to many participants thinking that the organization administering this question did not know what they were doing.

The psychometricians, SMEs, and test developers reviewing this question would need to provide a pink slip to the SME who wrote this question immediately. Clearly the SME failed basic question authoring training. This question makes no sense and was written in such a way to suggest that the author was under the influence, or otherwise not in a right state of mind, when crafting this question. What is this question testing? How can anyone possibly make sense of this and come up with a correct answer? Is there a correct answer? This question is not salvageable and should be stricken from the Perception repository without a second thought. A question like this should have never gotten in front of a participant to take, let alone 268 participants. The panel reviewing questions should review their processes to ensure that in the future questions like this are weeded out before an assessment goes out live for people to take.

part-8-pic-2

Understanding eLearning Standards- AICC HACP

tomking_tn80x60-21

Posted by Tom King

I prepared a new segment on Understanding eLearning Standards. This segment addresses the “how” of elearning standards, and specifically run-time communication using the common AICC HACP specification. [Don't worry SCORM fans, there will be another segment focusing on the SCORM runtime.]

Standards fans (and hockey fans) are likely to appreciate the analogies used to explain a run-time environment in general. The video also steps through the lifecycle of an activity running in an LMS environment. Then I drill down to the specific of AICC, including both the common browser-to-LMS and the compelling server-to-server uses of AICC HACP.aicctm1

Finally, the segment closes with a review of key resources from the AICC web site to help you make the most of AICC HACP.

By the way, here is an extra resource for members of the Questionmark Software Support Plan Community. There is a great Knowledge Base article on customizing the Perception v4 PIP file for AICC. This article shows how you can use a custom PIP file to utilize additional demographic or custom variables from an AICC compatible LMS. Check it out.

Stay tuned to the Questionmark Blog for the next segment that will address SCORM Run-Time Communication.

Full house at learning event in New Zealand

rafael-conf-australia2Posted by Rafael Lami Dozo

img00113-20090811-2230-2I posted recently about an upcoming  three-day learning event in Auckland, New Zealand, focusing on assessment best practices. Now I’d like to update you on the great turnout, exciting customers, and the full house that participated in the workshop!

The Online Assessments Symposium organized by Business Toolbox was packed with learning opportunities: the first two days were devoted to instruction on best practices img00114-20090811-2231-3in creating assessments, and the third brought together industry experts to share advice about moving assessments online.

It was motivating to see academic and corporate Questionmark users sharing  their experiences in successfully implementing assessments and enjoying some impressive case study presentations. I took img00117-20090811-2231-2these photos with my BlackBerry to give you a sense of the group that gathered.

We will continue to perform workshops like this one around the world so stay tuned for our next location.

Helping test publishers profit from their quizzes and tests

Joan Phaup

Posted by Joan Phaup

Our release today of  “Technologies for Selling Tests” gives professional associations, textbook companies, awarding bodies and other test publishers a way to streamline individual and bulk sales of their online quizzes and tests.

This new web-based service is available to users of our hosted and subscription solutions. If you are a test publisher and want to sell your tests to large institutions or make quizzes, diagnostic assessments and other study aids available on a charge-per-use basis, “Technologies for Selling Your Tests” could be the right solution for you. You can use it for consumer purchases via e-commerce or bulk purchase by large institutions.

This solution offers way to improve the return on a test publisher’s investment in creating  and maintaining valid and reliable assessments. It’s all based on Questionmark Perception’s out-of-the-box functionality, and it’s available for you to try out any time you like.

Item Analysis Analytics Part 7: The psychometric good, bad and ugly

greg_pope-150x1502

Posted by Greg Pope

A few posts ago I showed an example item analysis report for a question that performed well statistically and a question that did not perform well statistically. The latter turned out to be a mis-keyed item. I thought it might be interesting to drill into a few more item analysis cases of questions that have interesting psychometric performance. I hope this will help all of you out there recognize the patterns of the psychometric good, bad and ugly in terms of question performance.

The question below is an example of a question that is borderline in terms of psychometric performance. Here are some reasons why:

  • Going from left to right, first we see that the “Number of Results” is 116, which is a decent sample of participants to evaluate the psychometric performance of this question.
  • Next we see everyone answered the question (“Number not Answered” = 0) which means there probably wasn’t a problem with people not finishing or finding the questions confusing and giving up.
  • The “P Value Proportion Correct” shows us that this question is average to easy, with 65% of participants “getting it right.”
  • The “Item Discrimination” indicates mediocre discrimination at best, with the difference between the upper and lower group in terms of the proportion selecting the correct answer of ‘Leptokurtic’ at 20%. This means that of the participants with high overall exam scores, 75% selected the correct answer versus 55% of the participants with the lowest overall exam scores. I would have liked to see a larger difference between the Upper and Lower groups.
  • The “Item Total Correlation” backs the Item Discrimination up with a lacklustre value of 0.14. A value like this would likely not meet many organizations’ internal criteria in terms of what is considered a “good” item.
  • Finally, we look at the Outcome information to see how the distracters perform. We find that each distracter pulls some participants, with ‘Platykurtic’ pulling the most participants and quite a large number of the Upper group (22%) selecting this distracter. If I were to guess what is happening, I would say that because the correct option and the distracters are so similar, and because this topic is so obscure you really need to know your material, participants get confused between the correct answer of ‘Leptokurtic’ and the distracter ‘Platykurtic’

The psychometricians, SMEs, and test developers reviewing this question would need to talk with instructors to find out more about how this topic was taught and understand where the problem lies: Is it a problem with the question wording or a problem with instruction and retention/recall of material? If it is a question wording problem, revisions can be made and the question re-beta tested. If the problem is in how the material is being taught, then instructional coaching can occur and the question re-beta tested as is to see if improvements in the psychometric performance of the question occur.

greg-11

The question below is an example of a question that has a classic problem. Here are some reasons why it is problematic:

  • Going from left to right, first we see that the “Number of Results” is 175. That is a fairly healthy sample, nothing wrong there.
  • Next we see everyone answered the question (“Number not Answered” = 0), which means there probably wasn’t a problem with people not finishing or finding the question confusing and giving up
  • The “P Value Proportion Correct” shows us that this question is easy, with 83% of participants ‘getting it right’. There is nothing immediately wrong with an easy question, so let’s look further.
  • The “Item Discrimination” indicates reasonable discrimination, with the difference between the Upper and Lower group in terms of the proportion selecting the correct answer of ‘Cronbach’s Alpha’ at 38%. This means that of the participants with high overall exam scores, 98% selected the correct answer versus 60% of the participants with the lowest overall exam scores. That is a nice difference between the Upper and Lower groups, with almost 100% of the Upper group choosing the correct answer. Obviously, this question is easy for participants who know their stuff!
  • The “Item Total Correlation” backs the Item Discrimination up with a value of 0.39. This value backs up the “Item Discrimination” statistics and would meet most organizations’ internal criteria in terms of what is considered a “good” item.
  • Finally, we look at the Outcome information to see how the distracters perform. Well, two of the distracters don’t pull any participants! This is a waste of good question real estate: Participants have to read through four alternatives when there are only two they even consider as being the correct answer.

The psychometricians, SMEs, and test developers reviewing this question would likely ask the SME who developed the question to come up with better distracters that would draw more participants. Clearly, ‘Bob’s Alpha’ is a joke distracter that participants dismiss immediately as is the ‘KR-1,000,000’, I mean Kuder-Richardson formula one million. Let’s get serious here!

part-8-pic-21

Podcast: An Innovative Approach to Delivering Questionmark Assessments

sarah-small

Posted By Sarah Elkins

The University of Bradford has recently developed an innovative e-assessment facility, using cutting-edge thin client technology to provide a 100-seat room dedicated primarily to summative assessment. The room provides enhanced security features for online assessment and has been used for the first time in 2009 with considerable success. The room’s flexible design maximises its usage by allowing for formative testing, diagnostic testing and general teaching.

John Dermo is the e-Assessment Advisor at the University of Bradford.  In this podcast he explains the technology behind this unique setup and talks about the benefits and challenges in using this room. John Dermo will also be presenting a session at the 2009 European Users Conference, where he will go into more detail about the project.

Next Page »