Your chance to comment on Standards for Educational and Psychological Testing

greg_pope-150x1502Posted by Greg Pope

The Standards for Educational and Psychological Testing are considered the gold standard for good practice testing and assessment. The document is produced by 3 bodies working together: the  American Educational Research Association (AERA), the American Psychological Association (APA) and the National Council on Measurement in Education (NCME), and is widely referenced by testing and assessment professionals in the USA and internationally seeking to follow best practice. I have a copy on my bookshelf and check it frequently.

The Standards were last published in 1999 and are now going through a revision process to take account of technology, legal and assessment developments in the last decade. The committee producing the new Standards has made a  draft of the new standards and they are consulting for review and comments. You can register to see the draft standards and make comments at There are Chapters on:

  • Validity
  • Errors of Measurement and Reliability/Precision
  • Fairness in Testing
  • Test Design and Development
  • Scores, Scales, Norms, Cut Scores, and Scores Linking
  • Test Administration, Scoring and Reporting
  • Supporting Documentation for Tests
  • The Rights and Responsibilities of Test Takers
  • Test Users’ Rights and Responsibilities
  • Psychological Testing and Assessment
  • Testing in Employment and Credentialing
  • Educational Testing and Assessment
  • Uses of Testing for Policy Purposes

Documents like this benefit from the widest possible review by stakeholders. The revised Standards are likely to significantly influence the assessment world for the next decade, so if you are interested, it’s a good time to review at The deadline for comments is April 20th, 2011.

A New Workshop on Interpreting Item and Test Analyses

Joan Phaup

Posted by Joan Phaup

Item and test analyses bring the most value when understood interpreted in an organizational context. Questionmark Analytics and Psychometrics Manager Greg Pope’s upcoming workshop on this subject will help participants make the most effective use of the valuable information they get from test and item analysis. The workshop will combine classical test theory with hands-on learning using Questionmark reporting tools to analyze exemplar assessments and test questions. Attendees are welcome to bring in their own item and test analysis reports to discuss during the session.

I spent a few minutes with Greg the other day, asking him for more details about this workshop, which will take place the morning of Tuesday, March 15th —  one of two workshops preceding the Questionmark 2011 Users Conference.

Greg Pope

Q: What value do organizations get from item analysis and test analysis reports?

A: Item and test analysis report provide invaluable psychometric information regarding the performance of assessments and the building blocks of assessments, items. Creating assessments that are composed of questions that all perform well benefits the organization funding the assessment program as well as the participant taking the assessment. The organization benefits by providing assessments that are valid and reliable (and therefore legally defensible) and potentially the organization is able to use fewer questions on assessments to get the same measurement power. Organizations and participants can have confidence that the scores that they obtain from the assessments reflect to a high degree what participants know and can do. Item and test analyses allow organizations to know which questions are performing well, which questions are not performing well, and most importantly, WHY.

Q: What are the challenges in using these reports effectively?

I think the main challenges center around a psychological barrier to entry. Many people feel anxiety at the thought of having to read and interpret something they have likely had little to no exposure to in their life. Psychometrics is a specialized area, to be sure, but to apply the basic foundations of it does not need to be akin to summiting Everest. I feel strongly that it is possible to give people the basic knowledge around item and test analysis in only a few hours to break down the psychological firewalls that often hinder using these reports effectively.

Q: How can individuals and organizations surmount these challenges?

A: I feel a gentle introduction to the subject area with lots of practical examples in plain English does the trick nicely. Sometimes psychometricians are accused of being pedantic, whether it is intentional or unintentional, making this information inaccessible for more people to understand and apply. I want to break down these barriers because I feel that the more people  who understand and can use psychometrics to improve assessment, the better off we all will be. I have tried to increase people’s understanding through my blog posts and I am really looking forward to personalizing this approach further in the workshop at the users conference.

Q: How have you structured the workshop?

A: I have structured the workshop to provide some of the basic theory behind item and test analysis and then get hands on to look at practical examples in different contexts. When I have done these workshops in the past, I have found that at first people can be sceptical of their own capacity to learn and apply knowledge in this area. However, by the end of the workshops I see people excited and energized by their newfound knowledge base and getting really involved in picking apart questions based on the item analysis report information. It is really inspiring for me to see people walk away with new found confidence and motivation to apply what they have learned when they get back to their jobs.

Q: What do you want people to take away with them from this session?

A: I want people to take away a newfound comfort level with the basics of psychometrics so that they can go back to their desks, run their item and test analysis reports, have confidence that they know how to identify good and bad items, and do something with that knowledge to improve the quality of their assessments.

You can sign up for this workshop at the same time you register for the conference (remembering that this Friday, January 21st, is the last day for earlybird savings). If you’re already registered for the conference, email to arrange for participation in the workshop. Click here to see the conference schedule.

Tips for delivering effective course evaluation surveys on mobile devices

Posted by John Kleeman

What’s different when you present a course evaluation survey on a mobile phone rather than on a desktop computer or paper?

Since many delegates to a course or event will have Internet mobile phones, it can be great to give a course evaluation survey at the end of the event and get rapid and immediate feedback. Or you can hand out iPod Touch devices for people to use to answer a survey at the venue.

Here are some good practice suggestions when you are doing such course evaluation or level 1 surveys. questionmarkitouchscreenshot

  • Use a short survey (e.g. 5 to 10 questions). People are on the move and won’t bother with a long survey.
  • Limit open-ended questions as people don’t type a lot on mobiles.
  • Keep question stimulus and any explanation text brief so that each question will fit on the page, without scrolling.
  • Use simple item types like Likert Scale.
  • Avoid Flash – it doesn’t work on Apple mobile phones.
  • Keep bandwidth usage low by avoiding large graphics. Not everyone has an unlimited data plan, and it could be costing them to take your survey.
  • Devote the screen real estate to showing the questions, keep branding and frills to a minimum.
  • If possible, use an app like Questionmark’s Apps for Apple or Android devices, which allows participants to easily access  course evaluations along with other assessments scheduled for them.

For another article on course evaluation surveys, see Greg Pope’s blog article on how to get better response rates from course evaluation surveys.

Pre-Conference Workshop: Interpreting Item and Test Analyses

Joan Phaup

Posted by Joan Phaup

We at Questionmark wish you a happy and prosperous New Year. We are delighted to be starting 2011 with this announcement about a new workshop on Interpreting Item and Test Analyses to be held prior to the Questionmark 2011 Users Conference.

Do you wish you had a better understanding of item test analyses — and how to interpret them within an organizational context? Ever wondered how you could make more effective use of information about a test’s mean, standard deviation, and Cronbach’s Alpha reliability? Interpreting this kind of  psychometric information  takes effort but is of tremendous value in determining whether an assessment is performing as it should.

Greg Pope, our analytics and psychometrics manager who is well known to readers of this blog, will conduct a workshop on this subject the morning of Tuesday, March 15th, in Los Angeles.

Greg Pope

After learning the basics of classical test theory, workshop participants will get some hands-on practice in analyzing exemplar assessments and test questions  using Questionmark reporting tools. Those that want to will be able to bring their own item and test analysis reports to discuss during the session.

If you would like to be able to interpret item and test analyses in a way that will help you  provide  better, more relevant assessments for your organization, this workshop is for you!

Interpreting Item and Test Analyses: Understanding the organizational context of test results is the first of two optional workshops we are offering prior to the conference, to be held at the Omni Los Angeles Hotel March 15th – 18th.

I’ll share details about the afternoon workshop,  Using Web Services to Integrate with Questionmark Perception, in another post. You can sign up for both of these pre-conference workshops when you register for the conference (something I heartily recommend you do by January 21st, our last day for $100 savings!)

Some favourite resources on data visualization and report design


Posted by Greg Pope

In my last post I talked about using confidence intervals and how they can be used successfully in assessment reporting contexts. Reporting design and development has always been interesting to me. It started when I worked for the high-stakes provincial testing program in my home province of Alberta, Canada.

When I did my graduate degree with Dr. Bruno Zumbo he introduced me to a new world of exciting data visualization approaches including the pioneering functional data analysis work of Professor Jim Ramsay. Professor Ramsay developed a fantastic free program called TESTGRAF that performs non-parametric item response modeling and differential item functioning. I have used TESTGRAF many times over my career to analyze assessment data.

The work of both these experts has guided me through all my work in report design. In working on exciting new reports to meet the needs of Questionmark customers, I’m mindful of what I have learned from them and from others who have influenced me over the years. In this season of giving, I’d like to share some ideas that might be helpful to you and your organization.

I greatly admire the work of Edward Tufte, whose books provide great food for thought on data analysis visualization in numerous contexts. My favourite of these is The Visual Display of Quantitative Information, which offers creative ways to display many variables together in succinct ways. I have spent many a Canadian winter night curled up with that gift, so I know it is a great gift idea for that someone special this holiday season!

The Standards for Educational and Psychological Testing contains a section highlighting the commitments we have as assessment professionals in terms of appropriate, fair, and valid reporting of information to multiple levels of stakeholder…including the most important stakeholder, the test taker! In the section on “Test Administration, Scoring, and Reporting” you will find a number of important standards around reporting that are worth checking out.

A colleague of mine, Stefanie Moerbeek at EXIN Exams, introduced me to a number of great papers written by Dr. Gavin Brown and Dr. John Hattie around the validity of score reports. Dr. Hattie did a session at NCME in 2009 entitled Visibly Learning from Reports: The Validity of Score Reports, in which he listed some recommended principles of reporting to maximize the valid interpretations of reports:

1.    Readers of Reports need a guarantee of safe passage
2.    Readers of Reports need a guarantee of destination recovery
3.    Maximize interpretations and minimize the use of numbers
4.    The answer is never more than 7 plus or minus two 5
5.    Each report needs to have a major theme
6.    Anchor the tool in the task domain
7.    Report should minimize scrolling, be uncluttered, and maximize the “seen’ over the ‘read’
8.    A Report should be designed to address specific questions
9.    A Report should provide justification of the test for the specific applied purpose and for the utility of the test in the applied setting
10.    A Report should be timely to the decisions being made (formative, diagnostic, summative and ascriptive)
11.    Those receiving Reports need information about the meaning and constraints of any report
12.    Reports need to be conceived as actions not as screens to print.

You can read a paper Hattie wrote on this subject in the Online Educational Research Journal; Questionmark’s white paper on Assessments through the Learning Process offers helpful general information about reporting on assessment results.

Applications of confidence intervals in a psychometric context


Posted by Greg Pope

I have always been a fan of confidence intervals. Some people are fans of sports teams, for me, it’s confidence intervals! I find them really useful in assessment reporting contexts, all the way from item and test analysis psychometrics to participant reports.

Many of us get exposure to the practical use of confidence intervals via the media, when survey results are quoted. For example: “Of the 1,000 people surveyed, 55% said they will vote for John Doe. The margin of error for the survey was plus or minus 5% 95 times out of 100.” This is saying that the “observed” percentage of people who  say they will vote for Mr. Doe is 55% and there is a 95% chance that the “true” percentage of people who will vote for John Doe is somewhere between 50-60%.

Sample size is a big factor in the margin of error: generally, the larger the sample the smaller the margin of error as we get closer to representing the population.  (We can’t survey approximately all 307,006,550 people in the US now, can we!) So if the sample was 10,000 instead of 1,000 we would expect that the margin of error would be smaller than plus or minus 5%.

These concepts are relevant in an assessment context as well. You may remember my previous post on Classical Test Theory and reliability in which I explained that an observed test score (the score a participant achieves on an assessment) is composed of a true score and error. In other words, the observed score that a participant achieves is not 100% accurate; there is always error in the measurement. What this means practically is that if a participant achieves 50% on an exam their true score could actually be somewhere between say 44% and 56%.

This notion that observed scores are not absolute has implications for verifying what participants know and can do. For example, a  participant who achieves 50% on a crane certification exam (on which the pass score is 50%) would pass the exam and be able to hop into a crane, moving stuff up and down and around. However, achieving a score right on the borderline means this person may not, in fact, know enough to pass the exam if he or she were to take it again and then be certified on crane operation. His/her supervisor might not feel very confident about letting this person operate that crane!

To deal with the inherent uncertainty around observed scores, some organizations factor this margin of error in when setting the cut score…but this is another fun topic that I touched on in another post. I believe a best practice is to incorporate a confidence interval into the reporting of scores for participants in order to recognize that the score is not an “absolute truth” and is an estimate of what a person knows and can do. A simple example of a participant report I created to demonstrate this shows a diamond that encapsulates the participant score; the vertical height of the diamond represents the confidence interval around the participant’s score.

In some of my previous posts I talked about how sample size affects the robustness of item level statistics like p-values and item-total correlation coefficients and provided graphics showing the confidence interval ranges for the statistics based on sample sizes. I believe confidence intervals are also very useful in this psychometric context of evaluating the performance of items and tests. For example, often when we see a p-value for a question of 0.600 we incorrectly accept this as the “truth” that 60% of participants got the question right. In actual fact, this p-value of 0.600 is an observation and the “true” p-value could actually be between 0.500 and 0.700, a big difference when we are carefully choosing questions to shape our assessment!

With the holiday season fast approaching, perhaps Santa has a confidence interval in his sack for you and your organization to apply to your assessment results reporting and analysis!