Eight ways to check if security is more than skin deep

Picture of computer and padlockJohn Kleeman HeadshotPosted by John Kleeman

The assessment industry has always been extremely careful about exam security and ways to prevent cheating. As cloud and online assessment takes over as delivery models, it’s critical we all deeply embed IT security in our culture to ensure that computer vulnerabilities don’t leak sensitive data or disrupt the integrity of the assessment process.

Many years ago, Questionmark realized that data protection and IT security were critical to our success. We re-formed our culture to make security a priority. We followed our own path and looked for opportunities to learn from others such as Bill Gates and his famous trustworthy computing memo, part of which is quoted below:

… when we face a choice between adding features and resolving security issues, we need to choose security. Our products should emphasize security right out of the box, and we must constantly refine and improve that security as threats evolve. …  These principles should apply at every stage of the development cycle of every kind of software we create …

Questionmark understands that we’re in an arms race. We stay vigilant and look for opportunities to improve our security. Here are eight key ways in which we have embedded security deep within our company. If you are an assessment provider, we’d encourage you to find your own way to follow suit. And if you are a customer, here are eight questions you can ask to identify whether an assessment provider is truly working to be as secure as it can,  instead of just claiming to be secure when in fact security is only skin deep.

1. Who does the security function report to?

At Questionmark our security officer reports directly to me as Questionmark Chairman. If security reports directly into IT or product development, a security concern might be overruled by operational need. We’ve found this separation very helpful to ensure security gets listened to throughout the organization.

2. Would a security flaw hold up a release?

In any sensible company, this has to be true. Feature improvements in software are important, but if there is a serious security issue, it needs to be fixed first. Developers need to know that they can’t make a release unless it is secure.

3. How do you check your employees know about security?

Questionmark trains all our employees on data security but how do we know they understand? We practice what we preach and everyone from senior management to sales to accounting to developers needs to take and pass a data security test every year to check understanding. I’d encourage everyone in the assessment industry to follow this approach.

4. How deep is your team’s knowledge of IT security?

SaaS security is complex. There are many layers to security and any weakness can lead to a vulnerability. Equally throwing resources in the wrong place won’t really help. We are fortunate to have at least half a dozen experts within Questionmark who have deep knowledge of and passion for different aspects of security. This helps us get things right,.

5. Is your ecosystem secure?

Every company operates in an ecosystem , and it’s the ecosystem that needs to be secure. Questionmark works with our suppliers, subcontractors and partners to help them to be secure, including offering training and advice. We even want our competitors to be secure as any breaches in the assessment industry would be hurtful to all.

6. How transparent and open are you on your security?

Security by obscurity is not secure. Questionmark shares information on the security of our OnDemand service in white papers (Security of Questionmark’s US OnDemand Service and Security of Questionmark’s EU OnDemand Service) and have “red papers” which describe our security and business continuity planning in detail, available under NDA to prospective customers. The review process as customers ask questions about these provides comfort for customers and input to us to improve our security.

7. What kinds of external review do you allow?

As we shared in Third-party audits verify our platform’s security, we run regular penetration tests by a third party company, Veracode on Questionmark OnDemand. We are also fortunate to have many customers who care deeply about security and undertake their own audits and reviews by experts. We welcome such review and learn from it to improve our own security.

8. Are you completely satisfied with your security?

imageAbsolutely not. There is an arms race happening in the security world. Hackers and other bad actors are increasing their capabilities and however good you are, if you rest on your laurels, the arms race will overtake you. See for example the graph to the right from Verizon showing the increase in breaches over time.

Questionmark, like other good SaaS companies, has a policy of continual improvement – we want to be much better each year than the last.

This video provides an overview of how Questionmark builds security into its products from day one. Watch below:

Writing JTA Task Statements

Austin Fossey-42Posted by Austin Fossey

One of the first steps in an evidence-centered design (ECD) approach to assessment development is a domain analysis. If you work in credentialing, licensure, or workplace assessment, you might accomplish this step with a job task analysis (JTA) study.

A JTA study gathers examples of tasks that potentially relate to a specific job. These tasks are typically harvested from existing literature or observations, reviewed by subject matter experts (SMEs), and rated by practitioners or other stakeholder groups across relevant dimensions (e.g., applicability to the job, frequency of the task). The JTA results are often used later to determine the content areas, cognitive processes, and weights that will be on the test blueprint.

 Questionmark has tools for authoring and delivering JTA items, as well as some limited analysis tools for basic response frequency distributions. But if we are conducting a JTA study, we need to start at the beginning: how do we write task statements?

One of my favorite sources on the subject is Mark Raymond and Sandra Neustel’s chapter, “Determining the Content of Credentialing Examinations,” in The Handbook of Test Development. The chapter provides information on how to organize a JTA study, how to write tasks, how to analyze the results, and how to use the results to build a test blueprint. The chapter is well-written, and easy to understand. It provides enough detail to make it useful without being too dense. If you are conducting a JTA study, I highly recommend checking out this chapter.

Raymond and Neustel explain that a task statement can refer to a physical or cognitive activity related to the job/practice. The format of a task statement should always follow a subject/verb/object format, though it might be expanded to include qualifiers for how the task should be executed, the resources needed to do the task, or the context of its application. They also underscore that most task statements should have only one action and one object. There are some exceptions to this rule, but if there are multiple actions and objects, they typically should be split into different tasks. As a hint, they suggest critiquing any task statement that has the words “and” or “or” in it.

Here is an example of a task statement from the Michigan Commission on Law Enforcement Standards’ Statewide Job Analysis of the Patrol Officer Position: Task 320: “[The patrol officer can] measure skid marks for calculation of approximate vehicle speed.”

I like this example because it is pretty specific, certainly better than just saying “determine vehicle’s speed.” It also provides a qualifier for how good their measurement needs to be (“approximate”). The context might be improved by adding more context (e.g., “using a tape measure”), but that might be understood by their participant population.

Raymond and Neustel also caution researchers to avoid words that might have multiple meanings or vague meanings. For example, the verb “instruct” could mean many different things—the practitioner might be giving some on-the-fly guidance to an individual or teaching a multi-week lecture. Raymond and Neustel underscore the difficult balance of writing task statements at a level of granularity and specificity that is appropriate for accomplishing defined goals in the workplace, but at a high enough level that we do not overwhelm the JTA participants with minutiae. The authors also advise that we avoid writing task statements that describe best practice or that might otherwise yield a biased positive response.

Early in my career, I observed a JTA SME meeting for an entry-level credential in the construction industry. In an attempt to condense the task list, the psychometrician on the project combined a bunch of seemingly related tasks into a single statement—something along the lines of “practitioners have an understanding of the causes of global warming.” This is not a task statement; it is a knowledge statement, and it would be better suited for a blueprint. It is also not very specific. But most important, it yielded a biased response from the JTA survey sample. This vague statement had the words “global warming” in it, which many would agree is a pretty serious issue, so respondents ranked it as of very high importance. The impact was that this task statement heavily influenced the topic weighting of the blueprint, but when it came time to develop the content, there was not much that could be written. Item writers were stuck having to write dozens of items for a vague yet somehow very important topic. They ended up churning out loads of questions about one of the few topics that were relevant to the practice: refrigerants. The end result was a general knowledge assessment with tons of questions about refrigerants. This experience taught me how a lack of specificity and the phrasing of task statements can undermine the entire content validity argument for an assessment’s results.

If you are new to JTA studies, it is worth mentioning that a JTA can sometimes turn into a significant undertaking. I attended one of Mark Raymond’s seminars earlier this year, and he observed anecdotally that he has had JTA studies take anywhere from three months to over a year. There are many psychometricians who specialize in JTA studies, and it may be helpful to work with them for some aspects of the project, especially when conducting a JTA for the first time. However, even if we use a psychometric consultant to conduct or analyze the JTA, learning about the process can make us better-informed consumers and allow us to handle some of work internally, potentially saving time and money.

JTA

Example of task input screen for a JTA item in Questionmark Authoring.

For more information on JTA and other reporting tools that are available with Questionmark, check out this Reporting & Analytics page

Proving compliance – not just attendance

This is a re-post from a popular blog entry previously published by John Kleeman

Many regulators require you to train employees – in financial services, pharmaceuticals, utilities and in health & safety across all industries. You need to train them and when you are audited or if something goes wrong, you need to document that you did the training. To quote the US regulator OSHA: Documentation can also supply an answer to one of the first questions an accident investigator will ask: “Was the injured employee trained to do the job?”

Is it good enough to get the participant to sign something saying that they’ve attended the training or read the safety manual? An excellent blog series on the SafetyXChange says no:

Some companies ask their workers to sign a form after training sessions acknowledging that they understood the lesson and will put it into practice. Don’t let these forms lull you into a false sense of security. “Most workers will just sign these things without even reading them, let alone making sure that they understood everything you told them,” says a health and safety attorney in New York City. This is especially true if the training and instructions are complicated.

In the safety field, a US Appeals Court law case ruled in 2005 (my underlining):

Merely having an individual sign a form acknowledging his responsibility to read the safety manual is insufficient to insure that the detailed instructions contained therein have actually been communicated.

Two good ways to show that someone not only attending the training but also understood it:

Workplace assessment on ladder useGive employees a test or quiz at the end of the training to confirm that they understood it. This will also give them practice retrieving information to slow the forgetting curve (see Answering Questions directly helps you learn). And it will allow you to pick out people who didn’t get the learning or weak points in the class.

For more practical skills, you might want to observe people to check they understood the training and can practice it, or in the safety world demonstrate that they can do the job safely. For example the screenshot on the right shows how a supervisor can use an iPad to check and log someone’s skill on using a ladder.

My view is that you want to give these kinds of tests for two reasons. First and most importantly, you want to prevent your employees from falling off ladders or making other mistakes. Second, if something does go awry, you want evidence that you’ve trained people well.

A busy discussion on the LinkedIn’s Compliance Exchange forum dove into this further. I have paraphrased some of the views there:

Yes, you should give a quiz as it proves attendance – videoing the training is another option.

Yes, you should give a test and regulators in particular the US FDIC are increasingly demanding this

No. Danger of a test is that you need to take action if scores are bad, which may give you a lot of work. Safer not to ask the questions in case you don’t like the answers.

Yes, you should give a test but it can be a very easy and simple one, to check basic understanding and prove attendance

Yes, you should test, as well as confirming understanding it will also highlight vulnerabilities in the training

What do you think? Use the reply form below and contribute to the dialog.

For information on how to make trustable assessments, see John Kleeman and Questionmark CEO Eric Shepherd’s newest white paper “Assessment Results You can Trust”  –  This 26-page white paper will help corporate and government stakeholders create, deliver and report on assessments to produce trustable results that can effectively measure the competence of employees and their extended workforce.

Question Type Report: Use Cases

Austin Fossey-42Posted by Austin Fossey

A client recently asked me if there is a way to count the number of each type of item in their item bank, so I pointed them toward the Question Type Report in Questionmark Analytics. While this type of frequency data can also be easily pulled using our Results API, it can be useful to have a quick overview of the number of items (split out by item type) in the item bank.

The Question Type Report does not need to be run frequently (and Analytics usage stats reflect that observation), but the data can help indicate the robustness of an item bank.

This report is most valuable in situations involving topics for a specific assessment or set of related assessments. While it might be nice to know that we have a total of 15,000 multiple choice (MC) items in the item bank, these counts are trivial unless we have a system-wide practical application—for example planning a full program translation or selling content to a partner.

This report can provide a quick profile of the population of the item bank or a topic when needed, though more detailed item tracking by status, topic, metatags, item type, and exposure is advisable for anyone managing a large-scale item development project. Below are some potential use cases for this simple report.

Test Development and Maintenance:
The Question Type Report’s value is primarily its ability to count the number of each type of item within a topic. If we know we have 80 MC items in a topic for a new assessment, and they all need be reviewed by a bias committee, then we can plan accordingly.

Form Building:
If we are equating multiple forms using a common-item design, the report can help us determine how many items go on each form and the degree to which the forms can overlap. Even if we only have one form, knowing the number of items can help a test developer check that enough items are available to match the blueprint.

Item Development:
If the report indicates that there are plenty of MC items ready for future publications, but we only have a handful of essay items to cover our existing assessment form, then we might instruct item writers to focus on developing new essay questions for the next publication of the assessment.

Question type

Example of a Question Type Report showing the frequency distribution by item type.

 

High-stakes assessment: It’s not just about test takers

Lance bio picPosted by

In my last post I spent some time defining how I think about the idea of high-stakes assessment. I also talked about how these assessments affect the people who take them including how important it is to their ability to get or do a job.

Now I want to talk a little bit about how these assessments affect the rest of us.

The rest of us

Guess what? The rest of us are affected by the outcomes of these assessments. Did you see that coming?

But seriously, the credentials or scores that result from these assessments affect large swathes of the public. Ultimately that’s the point of high-stakes assessment. The resulting certifications and licenses exist to protect the public. These assessments are acting as barriers preventing incompetent people from practicing professions where competency really matters.

 It really matters

What are some examples of “really matters”? Well, when hiring, it really matters to employers that the network techs they hire knows how to configure a network securely, not that the techs just say they do. It matters to the people crossing a bridge that the engineers who designed it knew their physics. It really matters to every one of us that our doctor, dentist, nurse, or surgeon know what they are doing when they treat us. It really matters to society at large when we measure (well) the children and adults who take large-scale assessments like college entrance exams.

At the end of the day, high-stakes exams are high-stakes because in a very real way, almost all of us have a stake in their outcome.

 Separating the wheat from the chaff

There are a couple of ways that high stakes assessments do what they do. Some assessments are simply designed to measure “minimal competence,” with test takers either ending above the line—often known as “passing”—or below the line. The dreaded “fail.”

Other assessments are designed to place test takers on a continuum of ability. This type of assessment assigns scores to test takers, and the range of
score often appear odd to laypeople. For example, the SAT uses a 200 – 800 scale.

Want to learn more? Hang on till next time!

When to Give Partial Credit for Multiple-Response Items

Austin Fossey-42 Posted by Austin Fossey

Three different customers recently asked me how to decide between scoring a multiple-response (MR) item dichotomously or polytomously; i.e., when should an MR item be scored right/wrong, and when should we give partial credit? I gave some garrulous, rambling answers, so the challenge today is for me to explain this in a single blog post that I can share the next time it comes up.

In their chapter on multiple-choice and matching exercises in Educational Assessment of Students (5th ed.), Anthony Nitko and Susan Brookhart explain that matching items (which we may extend to include MR item formats, drag-and-drop formats, survey-matrix formats, etc.) are often a collection of single-response multiple choice (MC) items. The advantage of the MR format is that is saves space and you can leverage dependencies in the questions (e.g., relationships between responses) that might be redundant if broken into separate MC items.

Given that an MR items is often a set of individually scored MC items, then a polytomously scored format almost always makes sense. From an interpretation standpoint, there are a couple of advantages for you as a test developer or instructor. First, you can differentiate between participants who know some of the answers and those who know none of the answers. This can improve the item discrimination. Second, you have more flexibility in how you choose to score and interpret the responses. In the drag-and-drop example below (a special form of an MR item), the participant has all of the dates wrong; however, the instructor may still be interested in knowing that the participant knows the correct order of events for the Stamp Act, the Townshend Act, and the Boston Massacre.

stamp 1

Example of a drag-and-drop item in Questionmark where the participant’s responses are wrong, but the order of responses is partially correct.

Are there exceptions? You know there are. This is why it is important to have a test blueprint document, which can help clarify which item formats to use and how they should be evaluated. Consider the following two variations of a learning objective on a hypothetical CPR test blueprint:

  • The participant can recall the actions that must be taken for an unresponsive victim requiring CPR.
  • The participant can recall all three actions that must be taken for an unresponsive victim requiring CPR.

The second example is likely the one that the test developer would use for the test blueprint. Why? Because someone who knows two of the three actions is not going to cut it. This is a rare all-or-nothing scenario where knowing some of the answers is essentially the same (from a qualifications standpoint) as knowing none of the answers. The language in this learning objective (“recall all three actions”) is an indicator to the test developer that if they use an MR item to assess this learning objective, they should score it dichotomously (no partial credit). The example below shows how one might design an item for this hypothetical learning objective with Questionmark’s authoring tools:

stamp 2

Example of a Questionmark authoring screen for MR item that is scored dichotomously (right/wrong).

To summarize, a test blueprint document is the best way to decide if an MR item (or variant) should be scored dichotomously or polytomously. If you do not have a test blueprint, think critically about what you are trying to measure and the interpretations you want reflected in the item score. Partial-credit scoring is desirable in most use cases, though there are occasional scenarios where an all-or-nothing scoring approach is needed—in which case the item can be scored strictly right/wrong. Finally, do not forget that you can score MR items differently within an assessment. Some MR items can be scored polytomously and others can be scored dichotomously on the same test, though it may be beneficial to notify participants when scoring rules differ for items that use the same format.

If you are interested in understanding and applying some basic principles of item development and enhancing the quality of your results, download the free white paper written by Austin: Managing Item Development for Large-Scale Assessment

Next Page »