Understanding Assessment Validity: Content Validity


Posted by Greg Pope

In my last post I discussed criterion validity and showed how an organization can go about doing a simple criterion-related validity study with little more than Excel and a smile. In this post I will talk about content validity, what it is and how one can undertake a content-related validity study.

Content validity deals with whether the assessment content and composition are appropriate, given what is being measured. For example, does the test content reflect the knowledge/skills required to do a job or demonstrate that one grasps the course content sufficiently? In the example I discussed in the last post regarding the sales course exam, one would want to ensure that the questions on the exam cover the course content area of focus appropriately, in appropriate ratios. For example, if 40% of the four-day sales course deals with product demo techniques then we would want about 40% of the questions on the exam to measure knowledge/skills in the area of demo skills.

I like to think of content validity in two slices. The first slice of the content validity pie is addressed when an assessment is first being developed: content validity should be one of the primary considerations in assembling the assessment. Developing a “test blueprint” that outlines the relative weightings of content covered in a course and how that maps onto the number of questions in an assessment is a great way to help ensure content validity from the start. Questions are of course classified when they are being authored as fitting into the specific topics and subtopics. Before an assessment is put into production to be administered to actual participants, an independent group of subject matter experts should review the assessment and compare the questions included on the assessment against a blueprint. An example of a test blueprint is provided below for the sales course exam, which has 20 questions in total.

validity 4

The second slice of content validity is addressed after an assessment has been created. There are a number of methods available in the academic literature outlining how to conduct a content validity study. One way, developed by Lawshe in the mid 1970s, is to get a panel of subject matter experts to rate each question on an assessment in terms of whether the knowledge or skills measured by each question is “essential,” “useful, but not essential,” or “not necessary” to the performance of what is being measured (i.e., the construct). The more SMEs who agree that items are essential, the higher the content validity. Lawshe also developed a funky formula called the “content validity ratio” (CVR) that can be calculated for each question. The average of the CVR across all questions on the assessment can be taken as a measure of the overall content validity of the assessment.

validity 5

You can use Questionmark Perception to easily conduct a CVR study by taking an image of each question on an assessment (e.g., sales course exam) and creating a survey question for each assessment question to be reviewed by the SME panel, similar to the example below.

validity 6You can then use the Questionmark Survey Report or other Questionmark reports to review and present the content validity results.

So how does “face validity” relate to content validity? Well, face validity is more about the subjective perception of what the assessment is trying to measure than about conducting validity studies. For example, if our sales people sat down after the four-day sales course to take the sales course exam and all the questions on the exam were asking about things that didn’t seem related to the information they just learned on the course (e.g., what kind of car they would like to drive or how far they can hit a golf ball), the sales people would not feel that the exam was very “face valid” as it doesn’t appear to measure what it is supposed to measure. Face validity, therefore, has to do with whether an assessment looks valid or feels valid to the participant. However, face validity is somewhat important:  if participants or instructors don’t buy in to the assessment being administered, they may not take it seriously,  they may complain about and appeal their results more often, and so on.

In my next post I will turn the dial up to 11 and discuss the ins and outs of construct validity.

Understanding Assessment Validity: Criterion Validity


Posted by Greg Pope

In my last post I discussed three of the traditionally defined types of validity: criterion-related, content-related, and construct-related. Now I will talk about how your organization could undertake a study to investigate and demonstrate criterion-related validity.

So just to recap, criterion-related validity deals with whether assessment scores obtained for participants are predictive of something related to the goal of the assessment. For example, if a training program conducts a four-day sales training course, at the end of which an exam is administered designed to measure trainees’ knowledge and skills in the area of product sales, one may wonder whether the exam results have any relationship with actual sales performance. If the sales course exam scores are found to be related to/predict “real world” sales performance to a high degree, then we can say that there is a high degree of criterion-related validity between the intermediate variable (sales course exam scores) and the final or ultimate variable (sales performance).

So how does one find out whether high scores on the sales course exam correspond to high sales performance (and whether low scores on the sales course exam correspond to low sales performance)? Well, within an organization there may be some “feeling” about this, for example instructors seeing star students in the course bring in big sales numbers, but how do we get some hard numbers to back this up? You will be glad to hear that you don’t need a supercomputer and a room full of PhDs to figure this out! All you need to get some data on this are some good assessment results and some corresponding sales numbers for people who have gone through the course.

The first step is to gather the sales course exam scores for the participants who took the exam. In Questionmark Perception you can use the Export to ASCII or Export to Excel reports to output in a nice user-friendly format the assessment scores for the participants who took the sales course exam. Next you will want to match the participants for whom you have exam scores with their sales numbers (e.g., how much has each salesperson sold in the last 3 months). You may want to wait a few months after these participants have taken the exam and have been out in the field selling for a while, or you could look at historical sales data if you have it. Now you put this data together into an Excel spreadsheet (or SPSS or other analysis tool if you are savvy with those tools) to analyze in way similar to this:

validity 2Next you may want to produce a scatter plot and conduct a correlation and trend line between sales course exam scores and sales dollars for the last three months:

validity 5 correct

We find the correlation is 0.901, which is very high positive relationship (people with higher sales course exam scores bring in more sales dollars). This would suggest a high degree of criterion-related validity in that the sales course exam scores do indeed predict sales performance.

To go one step further, you can take the equation produced in Excel included on the scatter plot trend line and for new sales people taking the sales course exam you can predict how much sales revenue they might bring in: y = 21049x – 3366.2 (y=estimated sales performance in dollars, x= sales course exam score). Suppose a new sales person (Rick Thomas) obtains a sales course exam score of 73%. Just plug this into the equation and y=21049(0.73)-3366.2 = $11,999.57. Voila! Based on his sales course exam score, Rick Thomas can expect to bring in about $12,000 in revenue in the next three months. With more people analyzed (we only have 10 in this example), the greater confidence one can have in the correlation coefficients obtained and the predictive equations garnered. In “real life” I would want as many points of data as possible: hundreds of salesperson data points or more.

I will focus on content validity in my next, so stay tuned!