Item Development – Training Item Writers

Austin FosseyPosted by Austin Fossey

Once we have defined the purpose of the assessment, completed our domain analysis, and finalized a test blueprint, we might be eager to jump right in to item writing, but there is one important step to take before we begin: training!

Unless you are writing the entire assessment yourself, you will need a group of item writers to develop the content. These item writers are likely experts in their fields, but they may have very little understanding of how to create assessment content. Even if these experts have experience writing items, it may be beneficial to provide refresher trainings, especially if anything has changed in your assessment design.

In their chapter in Educational Measurement (4 th ed.), Cynthia Shmeiser and Catherine Welch note that it is important to consider the qualifications and representativeness of your item writers. It is common to ask item writers to fill out a brief survey to collect demographic information. You should keep these responses on file and possibly add a brief document explaining why you consider these item writers to be a qualified and representative sample.

Shmeiser and Welch also underscore the need for security. Item writers should be trained on your content security guidelines, and your organization may even ask them to sign an agreement stating that they will abide by those guidelines. Make sure everyone understands the security guidelines, and have a plan in place in case there are any violations.

Next, begin training your item writers on how to author items, which should include basic concepts about cognitive levels, drafting stems, picking distractors, and using specific item types appropriately. Shmeiser and Welch suggest that the test blueprint be used as the foundation of the training. Item writers should understand the content included in the specifications and the types of items they are expected to create for that content. Be sure to share examples of good and bad items.

If possible, ask your writers to create some practice items, then review their work and provide feedback. If they are using the item authoring software for the first time, be sure to acquaint them with the tools before they are given their item writing assignments.

Your item writers may also need training on your item data, delivery method, or scoring rules. For example, you may ask item writers to cite a reference for each item, or you might ask them to weight certain items differently. Your instructions need to be clear and precise, and you should spot check your item writers’ work. If possible, write a style guide that includes clear guidelines about item construction, such as fonts to use, acceptable abbreviations, scoring rules, acceptable item types, et cetera.

I know from my own experience (and Shmeiser and Welch agree) that investing more time in training will have a big payoff down the line. Better training leads to substantially better item retention rates when items are reviewed. If your item writers are not trained well, you may end up throwing out many of their items, which may not leave you enough for your assessment design. Considering the cost of item development and the time spent writing and reviewing items, putting in a few more hours of training can equal big savings for your program in the long run.

In my next post, I will discuss how to manage your item writers as they begin the important work of drafting the items.

How to stay within European law when sub-contracting assessment services

John Kleeman HeadshotPosted by John Kleeman

Questionmark has recently published a white paper on assessment and European data protection. I’ve shared some material from the white paper in earlier posts on the Responsibilities of a Data Controller When Assessing Knowledge, Skills and Abilities and The 12 responsibilities of a data controller, part 1 and part 2.

Data Controller to Data ProcessorHere are some points to follow if you as an assessment sponsor (Data Controller) are contracting with a Data Processor to conduct assessment services that involve the Data Processor handling personal data. As always, this blog cannot give legal advice – please check with your lawyer on contractual issues.

For processors inside and outside Europe

1. You should have a contract with the Data Processor and if they use Sub-Processors (e.g. a data center), their contract with such Sub-Processors must follow data protection rules.

2. Processors should only process data under your direction.

3. You should define the nature and duration of the processing to be performed.

4. The Data Processor and its Sub-Processors must implement appropriate technical and organizational measures to protect personal data against accidental or unlawful destruction or accidental loss, alteration, unauthorized disclosure or access. See the white paper for more guidance on what measures are required.

5. You should have some capability to review or monitor the security of the processing, for instance by viewing reports or information from the processor.

6. If you need to delete data, you must be able to make this happen.

7. If there is a data leakage or other failure, you need to be kept informed.

8. Under some countries in Europe, e.g. Germany, data protection law also applies to encrypted personal data, even if the processor does not have access to the encryption key. If you are concerned about this, you need to ensure that any backup providers holding encrypted material are also signed up to data protection law.

9. When the contract is over, you need to ensure that data is returned or deleted.

10. Data protection law is likely to change in future (with some proposals in review at present), so your relationship with your Data Processors should allow the possibility of future updates.

For processors outside the European Economic Area

For any Data Processor or Sub-Processor who is outside the European Economic Area (and outside Canada and a few other countries), the safest procedure  is to use a set of clauses called the EU Model Clauses, a set of contractual clauses which cannot be modified and which sign up the processor to follow EU data protection legislation.

Another potential route if using US processors is to rely on the US Government Safe Harbor list.  However, particularly in Germany, there is concern that with Safe Harbor, so you need to do additional checking. And many stakeholders will increasingly expect processors outside Europe to sign up to the EU Model Clauses.  Microsoft have recently made their services compliant with these clauses, and we can expect other organizations to as well.

I hope this summary is interesting and helpful. If you want to learn more, please read our free-to-download white paper: Responsibilities of a Data Controller When Assessing Knowledge, Skills and Abilities [requires registration].

The 12 responsibilities of a data controller, part 2

John Kleeman HeadshotPosted by John Kleeman

In my post last week, I shared some information on six of the responsibilities of assessment sponsors acting as Data Controllers when delivering assessments in Europe:

1. Inform participants
2. Obtain informed consent
3. Ensure that data held is accurate
4. Delete personal data when it is no longer needed
5. Protect against unauthorized destruction, loss, alteration and disclosure
6. Contract with Data Processors responsibly

Here is a summary of the remaining responsibilities:

7. Take care transferring data out of Europe

You need to be careful about transferring assessment results outside of the European Economic Area (though Canada, Israel, New Zealand and Switzerland are considered safe by the EU). If transferring to another country, you should usually enter into a contract with the recipient based on standard clauses called the “EU Model Clauses” and by performing due diligence.  You can also send to the US if the US company follows the US government Safe Harbor rules, but German data protection authorities require further diligence beyond Safe Harbor.

8. If you collect “special” categories of data, get specialist advice

Political candidate with "Vote" signs

The data protection directive defines “special” categories of data, covering data that reveals racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade-union membership, as well as data concerning health or sex life. Many assessment sponsors will choose not to collect such information as part of assessments, but if you do collect this, for example to prove assessments are not biased, the rules need to be carefully followed. Note that some information may be obtained even if not specifically requested. For example, the names Singh and Cohen may be an indication of race or religious belief. This is one reason why getting informed consent from data subjects is important.

9. Deal with any subject access requests

image

Data protection law allows someone to request information you are holding on them as Data Controller, and if you receive such a request, you will need to review it and respond.

You will need to check specific country rules for how this works in detail. There are typically provisions to prevent people from gaining access to exam results in advance of their formal adjudication and publication.

 

10. If the assessment is high stakes, ensure there is review of any automated decision making

Picture of person with two pathwaysThe EU Directive gives the right “to every person not to be subject to a decision which produces legal effects concerning him or significantly affects him and which is based solely on automated processing of data”. You need to be careful that important decisions are made by a person, not just by a computer.

For high-stakes assessments, you should either include a human review prior to making a decision or include a human appeal process. In general, an assessment score should be treated as one piece of data about a person’s knowledge, skills and/or attitudes and you should thoroughly review the materials, scores and reports produced by your assessment software to ensure that appropriate decisions are made.

11. Appoint a data protection officer and train your staff Picture of security console

This is not required everywhere, but it is a sensible thing to do. Most Data Controllers established in Germany need to appoint a data protection officer, and all organizations are likely to find it helpful to identify an individual or team who understands the issues, owns data protection in the organization and ensures that the correct procedures are followed. One of the key duties of the data protection officer is to train employees on data protection.

I recommend (and it’s something we do ourselves within Questionmark) that all employees are tested annually on data security to help ensure knowledge and understanding.

12. Work with supervisory authorities and respond to complaints

You need to register with supervisory authorities in many jurisdictions and provide a route to make complaints and must respond to complaints.

 

If you want to learn more, then please read our free-to-download white paper: Responsibilities of a Data Controller When Assessing Knowledge, Skills and Abilities [requires registration].

Responsibilities of a Data Controller When Assessing Knowledge, Skills and Abilities

John Kleeman HeadshotPosted by John Kleeman

If you are a European or multinational company delivering assessments in Europe or an awarding body providing certification in Europe, then you likely have responsibilities as a Data Controller of assessment results and data under European law.

European Commission logoThe European Data Protection Directive imposes an obligation on European countries to create national laws about collecting and controlling personal data. The Directive defines the role of “Data Controller” as the organization responsible for personal data and imposes strong responsibilities on that organization to process data according to the rules in the Directive. An assessment sponsor must follow the laws of the country in which it is established, and in some cases may also need to follow the laws of other countries.

To help assessment sponsors, we have written a white paper which explains your responsibilities as a Data Controller when assessing knowledge skills and abilities. If you are testing around the world, this is material you need to pay attention to.

One concept the white paper explains is that if you sub-contract with other companies (“Data Processors”) to help deliver your assessments, then you as Data Controller are responsible for the actions of the Data Processors and their Sub-Processors under data protection law.

Diagram showing a Data Controller with two Data Processors. One Data Processor has two Sub-Processors

Regulators are increasingly active in enforcing data protection rules, so failing in one’s responsibilities can have significant financial and reputational consequences. For example, a UK company was fined UK£250,000 in 2013 after a leakage of data as a result of a failure by a Data Processor. Other companies have faced significant fines or other regulatory action as a result of losing data, failing to obtain informed consent or other data protection failures.

The white paper describes the twelve responsibilities of a Data Controller with regard to assessments, summarized as:

  1. Inform participants
  2. Obtain informed consent
  3. Ensure that data held is accurate
  4. Delete personal data when it is no longer needed
  5. Protect against unauthorized destruction, loss, alteration and disclosure
  6. Contract with Data Processors responsibly
  7. Take care transferring data out of Europe
  8. If you collect “special” categories of data, get specialist advice
  9. Deal with any subject access requests
  10. If the assessment is high stakes, ensure there is review of any automated decision making
  11. Appoint a data protection officer and train your staff
  12. Work with supervisory authorities and respond to complaints

If you use a third party to help deliver assessments, you need to ensure it will help you meet data protection rules.  The white paper describes how Questionmark OnDemand can help in this respect.

Map of world focused on Europe

As well as ensuring you follow the law and reduce the risk of regulatory action, there are benefits in being pro-active to follow your responsibilities as a Data Controller. You build confidence with your participants that the assessment is fair and that they can trust you as assessment sponsor, which increases take-up and in encourages an honest approach to taking assessments. You also increase data quality and data security, and you gain protection against inappropriate data leakage.

Download the White Paper:

The white paper is free to download [requires registration].

Item Analysis Report – Item Difficulty Index

Austin FosseyPosted by Austin Fossey

In classical test theory, a common item statistic is the item’s difficulty index, or “p value.” Given many psychometricians’ notoriously poor spelling, might this be due to thinking that “difficulty” starts with p?

Actually, the p stands for the proportion of participants who got the item correct. For example, if 100 participants answered the item, and 72 of them answered the item correctly, then the p value is 0.72. The p value can take on any value between 0.00 and 1.00. Higher values denote easier items (more people answered the item correctly), and lower values denote harder items (fewer people answered the item correctly).

Typically, test developers use this statistic as one indicator for detecting items that could be removed from delivery. They set thresholds for items that are too easy and too difficult, review them, and often remove them from the assessment.

Why throw out the easy and difficult items? Because they are not doing as much work for you. When calculating the item-total correlation (or “discrimination”) for unweighted items, Crocker and Algina (Introduction to Classical and Modern Test Theory) note that discrimination is maximized when p is near 0.50 (about half of the participants get it right).

Why is discrimination so low for easy and hard items? An easy item means that just about everyone gets it right, no matter how proficient they are in the domain; the item does not discriminate well between high and low performers. (We will talk more about discrimination in subsequent posts.)

Sometimes you may still need to use a very easy or very difficult item on your test form. You may have a blueprint that requires a certain number of items from a given topic, and all of the available items might happen to be very easy or very hard. I also see this scenario in cases with non-compensatory scoring of a topic. For example, a simple driving test might ask, “Is it safe to drink and drive?” The question is very easy and will likely have a high p value, but the test developer may include it so that if a participant gets the item wrong, they automatically fail the entire assessment.

You may also want very easy or very hard items if you are using item response theory (IRT) to score an aptitude test, though it should be noted that item difficulty is modeled differently in an IRT framework. IRT yields standard errors of measurement that are conditional on the participant’s ability, so having hard and easy items can help produce better estimates of high- and low-performing participants’ abilities, respectively. This is different from the classical test theory where the standard error of measurement is the same for all observed scores on an assessment.

While simple to calculate, the p value requires cautious interpretation. As Crocker and Algina note, the p value is a function of the number of participants who know the answer to the item plus the number of participants who were able to correctly guess the answer to the item. In an open response item, that latter group is likely very small (absent any cluing in the assessment form), but in a typical multiple choice item, a number of participants may answer correctly, based on their best educated guess.

Recall also that p values are statistics—measures from a sample. Your interpretation of a p value should be informed by your knowledge of the sample. For example, if you have delivered an assessment, but only advanced students have been scheduled to take it, then the p value will be higher than it might be when delivered to a more representative sample.

Since the p value is a statistic, we can calculate the standard error of that statistic to get a sense of how stable the statistic is. The standard error will decrease with larger sample sizes. In the example below, 500 participants responded to this item, and 284 participants answered the item correctly, so the p value is 284/500 = 0.568. The standard error of the statistic is ± 0.022. If these 500 participants were to answer this item over and over again (and no additional learning took place), we would expect the p value for this item to fall in the range of 0.568 ± 0.022 about 68% of the time.

item analysis report 2

 

Item p value and standard error of the statistic from Questionmark’s Item Analysis Report

Writing Good Surveys, Part 2: Question Basics

Doug Peterson HeadshotPosted By Doug Peterson

In the first installment in this series, I mentioned the ASTD book, Survey Basics, by Phillips, Phillips and Aaron. The fourth chapter, “Survey Questions,” is especially good, and it’s the basis for this installment.

The first thing to consider when writing questions for your survey is whether or not the questions return the data for which you’re looking. For example,let’s say one of the objectives for your survey is to “determine the amount of time per week spent reading email.”

Which of these questions would best answer the question?

  1. How many emails do you receive per week, on average?
  2. On average, how many hours do you spend responding to emails every week?
  3. How long does it take to read the average email?
  4. On average, how many hours do you spend reading emails every week?

All four questions are related to dealing with email, but only one pertains directly to the objective. Numbers 1 and 3 could be combined to satisfy the objective if you’re willing to assume that every email received is read – a bit of a risky assumption, in my opinion (and experience). Number two is close, but there is a difference between reading an email and responding to it, and again, you may not respond to every email you read.

The next thing to consider is whether or not the question can be answered, and if so, ensuring that the question does not lead to a desired answer.

The authors give two examples in the book. The first describes a situation where the author was asked to respond to the question, “Were you satisfied with our service?” with a yes or no. He was not dissatisfied with the service he received, but he wasn’t satisfied with it, either. However, there was no middle ground, and he was unable to answer the question.

The second example involves one of the authors checking out of a hotel. When she tells the clerk that she enjoyed her stay, the clerk tells her that they rate customer satisfaction on a scale of one to ten, and asks if she would give them a ten. She felt pressured into giving the suggested response instead of feeling free to give a nine or an eight.

Another basic rule for writing survey questions is to make sure the respondent can understand the question. If they can’t understand it at all, they won’t answer or they will answer randomly (which is worse than not answering, as it is garbage data that skews your results). If they misunderstand the question, they’ll be answering a question that you didn’t ask. Remember, the question author is a subject matter expert (SME); he or she understands the big words and fancy jargon. Of course the question makes sense to the SME! But the person taking the survey is probably not an SME, which means the question needs to be written in plain language. You’re writing for the respondent, not the SME.

Even more basic than providing enough options for the respondent to use (see the “yes or no” example above) is making sure the respondent even has the knowledge to answer. This is typically a problem with “standard” surveys. For example, a standard end-of-course survey might ask if the room temperature was comfortable. While this question is appropriate for an instructor-led training class where the training department has some control over the environment, it really doesn’t apply to a self-paced, computer-based e-learning course.

Another example of a question for which the respondent would have no way of knowing the answer would be something like, “Does your manager provide monthly feedback to his/her direct reports?” How would you know? Unless you have access to your manager’s schedule and can verify that he or she met with each direct report and discussed their performance, the only question you could answer is, “Does your manager provide you with monthly feedback?” The same thing is true about asking questions that start off with, “Do your coworkers consider…” – the respondent has no idea what his/her coworkers thoughts and feelings are, so only ask questions about observable behaviors.

Finally, make sure to write questions in a way that respondents are willing to answer. Asking a question such as “I routinely refuse to cooperate with my coworkers” is probably not going to get a positive response from someone who is, in fact, uncooperative. Something like “Members of my workgroup routinely cooperate with each other” is not threatening and does not make the respondent look bad, yet they can still answer with “disagree” and provide you with insights as to the work atmosphere within the group.

Here’s an example of a course evaluation survey that gives the respondent plenty of choices.

« Previous PageNext Page »