Practice versus perfection – 2014 Users Conference

Austin Fossey-42 Posted by Austin Fossey

The Questionmark team has just returned from the 2014 Users Conference, where we had a wonderful time showing off our latest work, discussing assessment strategies with our customers, and learning from each other in a great selection of seminars put on by both Questionmark staff and our clients.

At this year’s conference, I field tested two new presentations: Understanding Assessment Results and Principles of Psychometrics and Measurement Design. I got some great feedback from attendees so I can fine-tune them for the future, but these topics also started a lot of interesting conversations about what we as test developers would like to be doing and what we end up doing in practice.

A recurring theme of these conversations was that people felt there were occasionally aspects of their instruments that could be improved, especially in terms of capturing evidence for a measurement or supporting the validity of the results. In some cases they had an idea of what they wanted to improve, but they either did not know the test development methods they needed to apply, or they did not know how to convince their stakeholders and managers of the importance of specific initiatives. The concept of validity came up several times in these conversations—something we have touched on previously on this blog.

The ideals and realities of the assessment industry do not always align. For example, we may wish to do a construct validity study or an Angoff cut score meeting, but we may lack the resources, time, or stakeholder buy-in to engage in these activities.

I recognize how discouraging this can be for people who constantly want to improve the validity of their inferences, but I am excited to see so many people thinking critically about their assessment designs and searching for areas of improvement. Even if we cannot always implement every research study we are interested in, understanding the principles and best practices of good assessment design and interpretation can still guide our everyday work and help us to avoid invalid results. This blog is a good place to explore some of these principles, and so are Questionmark white papers and our Learning Café videos.

I look forward to continuing to work with (and learn from) our great client base throughout 2014 as we continue to advance our products. A special thanks to our attendees and presenters who joined us at the 2014 conference!

Understanding Assessment Validity: New Perspectives

Posted by Greg Pope

In my last post I discussed specific aspects of construct validity. I’m capping off this series with a discussion of modern views and thinking on validity.

Dr. Bruno D. Zumbo

Recently my former graduate supervisor, Dr. Bruno D. Zumbo at the University of British Columbia, wrote a fascinating chapter in the new book, The Concept of Validity: Revisions, New Directions and Applications, edited by Dr. Robert W. Lissitz. Bruno’s chapter, “Validity as Contextualized and Pragmatic Explanation, and its Implications for Validation Practice,” provides a great modern perspective on validity.

The chapter has two aims: to provide an overview of what Bruno considers to be the concept of validity, and to discuss the implications for the process of validation.

Something I really liked about the chapter was its focus on why we conduct psychometric analyses digging into how our assessments perform. As Bruno discusses, the real purpose of all the psychometric analysis we do is to support or provide evidence for the claims that we make about the validity of the assessment measures we gather. For example, the reason we would do a Differential Functioning Analysis (DIF), in which we ensure that test questions are not biased against/towards a certain group, is not only to protect test developers against lawsuits but also to weed out invalidity in order to help us set where the inferential limits of assessment results are.

Bruno drives home the point that examining validity is an ongoing process of validation. One doesn’t just do a validity study or two and then be done: validation is an ongoing process in which multilevel construct validation occurs and procedures are tied in to program evaluation and assessment quality processes.

I would highly recommend that people interested in diving more into the theoretical and practical details of validity check out this book, which includes chapters from many highly respected psychometrics and testing industry experts.

I hope that this series on validity has been useful and interesting! Stay tuned for more psychometric tidbits in upcoming posts.


Editor’s Note: Greg will be doing a presentation at the Questionmark Users Conference on Conducting Validity Studies within Your Organization. The conference will take place in Miami March 14 – 17. Learn more at

Understanding Assessment Validity: Construct Validity


Posted by Greg Pope

In my last post I discussed content validity. In this post I will talk about construct validity. Construct validity refers to whether/how well an assessment, or topics within an assessment, measure the educational/psychological constructs that the assessment was designed to measure. For example, if the construct to be measured is “sales knowledge and skills,” then the assessment designed to measure this construct should show evidence of actually measuring this “sales knowledge and skills” construct.

It will come as no surprise that measuring psychological constructs is a complicated thing to do. Human psychological constructs such as “depression,” “extroversion” or “sales knowledge and skills” are not as straightforward to measure as more tangible physical “constructs” such as temperature, length, or distance. However, luckily there are approaches which allow us to determine how well our assessments accomplish the measurement of these complex psychological constructs.

Construct validity is composed of a few areas with convergent and discriminant validity being the core:

validity 7In my next post I will drill down more into some of these areas of construct validity.

Understanding Assessment Validity: Content Validity


Posted by Greg Pope

In my last post I discussed criterion validity and showed how an organization can go about doing a simple criterion-related validity study with little more than Excel and a smile. In this post I will talk about content validity, what it is and how one can undertake a content-related validity study.

Content validity deals with whether the assessment content and composition are appropriate, given what is being measured. For example, does the test content reflect the knowledge/skills required to do a job or demonstrate that one grasps the course content sufficiently? In the example I discussed in the last post regarding the sales course exam, one would want to ensure that the questions on the exam cover the course content area of focus appropriately, in appropriate ratios. For example, if 40% of the four-day sales course deals with product demo techniques then we would want about 40% of the questions on the exam to measure knowledge/skills in the area of demo skills.

I like to think of content validity in two slices. The first slice of the content validity pie is addressed when an assessment is first being developed: content validity should be one of the primary considerations in assembling the assessment. Developing a “test blueprint” that outlines the relative weightings of content covered in a course and how that maps onto the number of questions in an assessment is a great way to help ensure content validity from the start. Questions are of course classified when they are being authored as fitting into the specific topics and subtopics. Before an assessment is put into production to be administered to actual participants, an independent group of subject matter experts should review the assessment and compare the questions included on the assessment against a blueprint. An example of a test blueprint is provided below for the sales course exam, which has 20 questions in total.

validity 4

The second slice of content validity is addressed after an assessment has been created. There are a number of methods available in the academic literature outlining how to conduct a content validity study. One way, developed by Lawshe in the mid 1970s, is to get a panel of subject matter experts to rate each question on an assessment in terms of whether the knowledge or skills measured by each question is “essential,” “useful, but not essential,” or “not necessary” to the performance of what is being measured (i.e., the construct). The more SMEs who agree that items are essential, the higher the content validity. Lawshe also developed a funky formula called the “content validity ratio” (CVR) that can be calculated for each question. The average of the CVR across all questions on the assessment can be taken as a measure of the overall content validity of the assessment.

validity 5

You can use Questionmark Perception to easily conduct a CVR study by taking an image of each question on an assessment (e.g., sales course exam) and creating a survey question for each assessment question to be reviewed by the SME panel, similar to the example below.

validity 6You can then use the Questionmark Survey Report or other Questionmark reports to review and present the content validity results.

So how does “face validity” relate to content validity? Well, face validity is more about the subjective perception of what the assessment is trying to measure than about conducting validity studies. For example, if our sales people sat down after the four-day sales course to take the sales course exam and all the questions on the exam were asking about things that didn’t seem related to the information they just learned on the course (e.g., what kind of car they would like to drive or how far they can hit a golf ball), the sales people would not feel that the exam was very “face valid” as it doesn’t appear to measure what it is supposed to measure. Face validity, therefore, has to do with whether an assessment looks valid or feels valid to the participant. However, face validity is somewhat important:  if participants or instructors don’t buy in to the assessment being administered, they may not take it seriously,  they may complain about and appeal their results more often, and so on.

In my next post I will turn the dial up to 11 and discuss the ins and outs of construct validity.

Understanding Assessment Validity: An Introduction


Posted by Greg Pope

In previous posts I discussed some of the theory and applications of classical test theory and test score reliability. For my next series of posts, I’d like to explore the exciting realm of validity. I will discuss some of the traditional thinking in the area of validity as well as some new ideas, and I’ll share applied examples of how your organization could undertake validity studies.

According to the “standards bible” of educational and psychological testing, the Standards for Educational and Psychological Testing (AERA/NCME, 1999), validity is defined as “The degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests.”

The traditional thinking around validity, familiar to most people, is that there are three main types:

validity 1

The most recent thinking on validity takes a more unifying approach which I will go into in more detail in upcoming posts.

Now here is something you may have heard before: “In order for an assessment to be valid it must be reliable.” What does this mean? Well, as we learned in previous Questionmark blog posts, test score reliability refers to how consistently an assessment measures the same thing. One of the criteria to make the statement, “Yes this assessment is valid,” is that the assessment must have acceptable test reliability, such as high Cronbach’s Alpha test reliability index values as found in the Questionmark Test Analysis Report and Results Management System (RMS). Other criteria for making the statement, “Yes this assessment is valid,” is to show evidence for criterion related validity, content related validity, and construct related validity.

In my next posts on this topic I will provide some illustrative examples of how organizations may undertake investigating each of these traditionally defined types of validity for their assessment program.