Determining Content – Test Design & Delivery Part 2

Posted By Doug Peterson

In Part 1 of this series, I covered how an assessment must be reliable and valid, with “valid” meaning “tests what it’s supposed to test.” This means that before you can start writing items for an assessment, you need to know what topics to cover and how many items are needed for each topic.

This process starts with a Job Task Analysis (JTA) exercise. You need to understand what tasks a person in the position in question performs or supervises, how often they do the task, how important the task is to their job,
and how hard it is to perform the task. The results of the JTA are typically used to develop a competency model.

From the JTA/competency model you then develop a Test Content Outline (TCO), which might also be called a test blueprint or a test specification. This document drives the test item development process. Test items developed this way can then be easily mapped back to individual tasks/competencies in the JTA or competency model, ensuring that your assessment is testing what it is supposed to test.

The TCO describes the content areas to be covered in the assessment. The next step is to determine how many items should be written for each content area. There are several factors that must be taken into account when performing this step:

  • Criticality of the content – Is this content “must know” or “nice to have”? Required knowledge necessitates more thorough testing, which means more items.
  • Size of the content area – A larger content area requires more test items than a smaller content area.
  • Homogeneity – Does everything in the content area require the same knowledge, skills or abilities? If so, fewer questions are needed.
  • Consequences – What happens if the learner doesn’t grasp the concepts in the content area? Do they have to take more training? Do they lose their job? As the stakes go higher, you need more items to ensure that the learner’s true knowledge is being assessed.
  • Available resources during testing – If the test is going to be open book and/or open notes, you will need more (and more difficult) items to truly assess the learner’s knowledge vs. their ability to quickly look things up.

Using the factors listed above, test content areas should be weighted to help determine the number of items to be written for each area. This is best done by a group of Subject Matter Experts (SMEs) in an exercise similar to the Angoff method of determining a cut score. Each SME should rate each content area of the TCO by

1. Criticality – 0 (unimportant) to 4 (extremely critical)

2. Difficulty – 0.5 (easy), 1.0 (moderate), or 1.5 (hard)

3. Size relative to the other content areas – 0 (too small to include) to 4 (very large)

The ratings from the SMEs for each factor (criticality, difficulty and size) should then be averaged to come up with a single rating for each factor for each content area. Then it’s time for a little math.

# items on the test for a content area = criticality x size x difficulty

# items needing to be written for a content area = 3 x # items on the test for a content area

If a content area is determined to require less than 4 – 6 items on the test, it should be dropped or combined with another content area. Once you know how many items you need to write, you can assume an average of 10 minutes to write an item, and you should allow time for revisions (.25 x total writing time).

7 Responses to “Determining Content – Test Design & Delivery Part 2”

  1. […] Part 2 of this series, we looked at how to determine how many items you needed to write for each content area covered by […]

  2. […] Part 1, Part 2, and Part 3 of this blog series, we looked at the planning that goes into developing a test before […]

  3. […] in Part 2 we discussed determining how many items needed to be written for each content area covered by the […]

  4. Silvester Draaijer says:

    Dear Doug,

    This is a nice post. But please, I think that stating that writing test items takes about 10 minutes and than 25% extra in terms of revision time is strongly underestimated.

    Of course, it is possible to write an item in about 10 minutes. But will it be a good item? Most often not directly. A good item needs multiple times of revision. Quite a number of items will not be good and have to be discarded. Developing good distractors is the hardest part and takes most time.

    Nobody really knows how much time it takes, but I would say that in total, time needed to develop high quality items is in between 30 minutes to 60 minutes at least.

    For more information. See:

    Mayenga, C. (2009). Mapping item writing tasks on the item writing ability scale. Presented at the Canadian Society of Safety Engineering, CSSE – XXXVIIth Annual Conference, Carleton University (Ottawa, Canada). Retrieved from http://ocs.sfu.ca/fedcan/index.php/csse2009/csse2009/paper/viewFile/1966/625

    Case, S. M., Holtzman, K., & Ripkey, D. R. (2001). Developing an item pool for CBT: A practical comparison of three models of item writing. Academic Medicine, 76(10), S111–S113.

    Naeem, N., Vleuten, C., & Alfaris, E. A. (2011). Faculty development on item writing substantially improves item quality. Advances in Health Sciences Education. doi:10.1007/s10459-011-9315-2

  5. Doug,

    I’m reading the blog on a catch as catch can basis as I put up the Christmas lights, etc. One point about this entry: Sharon and I don’t emphasize the difficulty at all. In fact it is not included in our recommendations about test development. We do use Criticality and Domain size. The difficulty level, to us, is pretty much irrelevant for criteiron-referenced testing in light of the other two factors. Where we do emphasize the difficulty level is in the instructional design process for teaching the skill assessed by the item.

    We find the time estimate for item writing is about right for lower level cognitive items–but probably not generous enough for higher Bloom level items.

    Just FYI,
    Bill

  6. […] practitioners what really happens in a role, is a great way to correct this. See Doug Peterson’s article in this blog or mine on Job Task Analysis in Questionmark for more on […]

Leave a Reply