Discussing validity at NCME

Austin FosseyPosted by Austin Fossey

To continue last week’s discussion about big themes at the recent NCME annual meeting, I wanted to give an update on conversations about validity.

Validity is a core concept in good assessment development, which we have frequently discussed on this blog. Even though this is such a fundamental concept, our industry is still passionately debating what constitutes validity and how the term should be used.

NCME hosted a wonderful coordinated session during which some of the big names in validity theory presented their thoughts on how the term validity should be used today.validity NCME

Michael Kane (Educational Testing Service) began the discussion with his well-established views around argument-based validity. In this framework, the test developer must make a validity claim about an inference (or interpretation as Kane puts it) and then support that claim with evidence. Kane argues that validity is not a property of the test or scores, but it is instead a property of the inferences we make about those scores.

If you have read some of my previous posts on validity or attended my presentations about psychometric principles, you already know that I am a strong proponent of Kane’s view that validity refers to the interpretations and use cases—not to the instrument.

But not everyone agrees. Keith Markus (City University of New York) suggested that nitpicking about whether the test or the inference is the object of validity causes us to miss the point. The test and the inference work only as a combination, so validity (as a term and as a research goal) should be applied to these as a pair.

Pamela Moss (University of Michigan) argued that we need to shift the focus of validity study away from intended inferences and use cases to the actual use cases. Moss believes that the actual use cases of assessment results can be quite varied and nuanced, but we are really more interested in these real-world impacts. She proposed that we work to validate what she called “conceptual uses.” For example, if we want to use education assessments to improve learning, then we need to research why students earn low scores.

Greg Cizek (University of North Carolina) disagreed with Kane’s approach, saying that the evidence we gather to support an inference says nothing about the use cases, and vice versa. Cizek argued that we make two inferential leaps: one from the score to the inference, and one from the inference to the use case. So we should gather evidence that supports both inferential leaps.

Though I see Cizek’s point, I feel that it would not drastically change how I would approach a validity study in practice. After all, you cannot have a use case without making an inference, so I would likely just tackle the inferences and their associated use cases jointly.

Steve Sireci (University of Massachusetts) felt similarly. Sireci is one of my favorite presenters on the topic of validity, plus he gets extra points for matching his red tie and shirt to the color theme on his slides. Sireci posed this question: can we have an inference without having a use case? If so, then we have a “useless” test, and while there may be useless tests out there, we usually only care about the tests that get used. As a result, Sireci suggested that we must validate the inference, but that this validation must also demonstrate that the inference is appropriate for the intended uses.

Integrating and Connectors – Blackboard

The Questionmark Blackboard Connector is a proprietary connector that provides unprecedented integration between the Blackboard LMS and Questionmark. Through the Blackboard Connector:

  • The first time an instructor interfaces with Questionmark, a Questionmark admin ID is created for them automatically.
  • When an instructor adds a Questionmark assessment to a Blackboard course, the course short name becomes a Questionmark group, and the instructor and any
    students launching the assessment are automatically added to the group.
  • The first time a student launches any Questionmark assessment, a participant ID is created in Questionmark for the student.

And all of this automatic synchronization is optional! You can just as easily set up the connector to require that instructors, students and/or groups be created by a Questionmark admin in Questionmark so that you can control exactly who, and what courses, can interface with Questionmark.

Watch this video for a complete explanation of what the Blackboard Connector can do for you:

bb Video

How to stay within European law when sub-contracting assessment services

John Kleeman HeadshotPosted by John Kleeman

Questionmark has recently published a white paper on assessment and European data protection. I’ve shared some material from the white paper in earlier posts on the Responsibilities of a Data Controller When Assessing Knowledge, Skills and Abilities and The 12 responsibilities of a data controller, part 1 and part 2.

Data Controller to Data ProcessorHere are some points to follow if you as an assessment sponsor (Data Controller) are contracting with a Data Processor to conduct assessment services that involve the Data Processor handling personal data. As always, this blog cannot give legal advice – please check with your lawyer on contractual issues.

For processors inside and outside Europe

1. You should have a contract with the Data Processor and if they use Sub-Processors (e.g. a data center), their contract with such Sub-Processors must follow data protection rules.

2. Processors should only process data under your direction.

3. You should define the nature and duration of the processing to be performed.

4. The Data Processor and its Sub-Processors must implement appropriate technical and organizational measures to protect personal data against accidental or unlawful destruction or accidental loss, alteration, unauthorized disclosure or access. See the white paper for more guidance on what measures are required.

5. You should have some capability to review or monitor the security of the processing, for instance by viewing reports or information from the processor.

6. If you need to delete data, you must be able to make this happen.

7. If there is a data leakage or other failure, you need to be kept informed.

8. Under some countries in Europe, e.g. Germany, data protection law also applies to encrypted personal data, even if the processor does not have access to the encryption key. If you are concerned about this, you need to ensure that any backup providers holding encrypted material are also signed up to data protection law.

9. When the contract is over, you need to ensure that data is returned or deleted.

10. Data protection law is likely to change in future (with some proposals in review at present), so your relationship with your Data Processors should allow the possibility of future updates.

For processors outside the European Economic Area

For any Data Processor or Sub-Processor who is outside the European Economic Area (and outside Canada and a few other countries), the safest procedure  is to use a set of clauses called the EU Model Clauses, a set of contractual clauses which cannot be modified and which sign up the processor to follow EU data protection legislation.

Another potential route if using US processors is to rely on the US Government Safe Harbor list.  However, particularly in Germany, there is concern that with Safe Harbor, so you need to do additional checking. And many stakeholders will increasingly expect processors outside Europe to sign up to the EU Model Clauses.  Microsoft have recently made their services compliant with these clauses, and we can expect other organizations to as well.

I hope this summary is interesting and helpful. If you want to learn more, please read our free-to-download white paper: Responsibilities of a Data Controller When Assessing Knowledge, Skills and Abilities [requires registration].

Discussing the revised Standards for Educational and Psychological Testing

Austin FosseyPosted by Austin Fossey

I just returned from the National Council for Measurement in Education (NCME) annual meeting in Philadelphia, which is held in conjunction with the American Educational Research Association (AERA) annual meeting.

There were many big themes around research and advances in assessment, but there were also a lot of interesting discussions about changes in practice. There seemed to be a great deal of excitement and perhaps some trepidation about the upcoming release of the next version of the Standards for Educational and Psychological Testing, which is the authority on requirements for good assessment design and implementation, and which has not been updated since 1999.NCME Standards

There were two big discussion sessions about the Standards during the conference. The first was a two-hour overview hosted by Wayne Camara (ACT) and Suzanne Lane (University of Pittsburgh). Presenters from several organizations summarized the changes to the various chapters in the Standards. In the second discussion, Joan Herman (UCLA/CRESST) hosted a panel that talked about how these changes might impact the practices that we use to develop and deliver assessments.

During the panel discussion, the chapter about Fairness came up several times. This appears to be an area where the Standards are taking a more detailed approach, especially with regard to the use of testing accommodations. From the discussion, it sounds like the next version will have better guidance about best practices for various accommodations and for documenting that those accommodations properly minimize construct-irrelevant variance without giving participants any unfair advantages over the general population.

During the discussion, Scott Marion (Center for Assessment) observed that the new Standards do not address Fairness in the context of some delivery mechanisms (as opposed to the delivery conditions) in assessment. For example, he noted that computer-adaptive tests (CATs) use item selection algorithms that are based on the general population, but there is no requirement to research whether the adaptation works comparably in subpopulations, such as students with cognitive disabilities who might be eligible for other accommodations like extra time.

The panelists also mentioned that some of the standards have been written so that the language mirrors the principles of evidence-centered design (ECD), though the Standards do not specifically mention ECD outright. This seems like a logical step for the Standards, as nearly every presentation I attended about assessment development referenced ECD. Valerie Shute (Florida State University) observed that five years ago, only a fraction of participants would have known about ECD, but today it is widely used. Though ECD was around several years before the 1999 Standards, it did not have the following that it does today.

In general, it sounds like most of the standards we know and love will remain intact, and the revisions are primarily serving to provide more clarity or to accommodate the changing practices in assessment development. Nearly all of the presenters work on large-scale, high-stakes assessments that have been developed under the 1999 Standards, and many of them mentioned that they are already committing themselves to review their programs and documentation against the new Standards when they are published later this year.

Questionmark and Heartbleed

Heartbleet logo from heartbleet.comJohn Kleeman HeadshotPosted by John Kleeman

You may have heard of “Heartbleed,” a bug in a program used by many sites on the Internet that could allow the theft of data normally protected by SSL/TLS encryption. This bug was disclosed to the public on April 7th – and here is Questionmark’s response to the bug.

Our internal Computer Emergency Response Team (CERT)  immediately reviewed our servers and systems to identify any potential vulnerabilities.  Fortunately, most Questionmark systems do not use OpenSSL (the encryption system that was subject to this vulnerability).  The one affected system identified by our CERT team was promptly updated to address the issue and our customers were informed.

Here is some additional information for our customers and other users of Questionmark systems:

Questionmark’s cloud-based products and services:

Questionmark Live

  • Our collaborative authoring system, Questionmark Live was not vulnerable to the bug.

Questionmark’s US OnDemand Service

Questionmark’s European OnDemand Service

  • One component of our European OnDemand Service was identified is subject to this vulnerability: the “load balancer” that provides the entry point to the European OnDemand Service. This system was promptly patched last week, its SSL certificate replaced and the previous certificate revoked; it is no longer vulnerable.

None of the other systems that comprise the European Questionmark OnDemand service were affected: the application and database servers, where customer data is stored, were never subject to this vulnerability. We have no indication that any customer data or passwords were compromised. However, out of caution and in recognition of the theoretical risk, we are advising our customers to log into the system and change passwords and keys. We have reached out by email to all customers affected and will be following up by telephone.

Questionmark products for on-premises deployment:

Questionmark Perception

  • Our behind the firewall product, Questionmark Perception does not include OpenSSL and so is not itself vulnerable to the bug. But Questionmark Perception can be installed under SSL/TLS. If it is and the offending program (OpenSSL) is used, then an organization might be vulnerable due to its use of OpenSSL outside Questionmark software. If you use Questionmark Perception under SSL/TLS (you can tell this because the URL will include https rather than http), you should check with your organization’s IT team.

If any Questionmark user or customer has questions, please raise them with your Questionmark account manager or with technical support. I hope that this rapid response and full transparency highlights our commitment to security.This also illustrates the value of an OnDemand service. Rather than having to rely on internal IT to catch up and patch vulnerable systems, you can delegate this to Questionmark as your service provider.

Questionmark takes security seriously. Our OnDemand customers benefit not only from our 24/7 monitoring of systems and platform uptime – but also from a team of experts ready to address potential security threats as they arise – and before they arise.

Security Video Image

Watch this video for more about Questionmark’s commitment to security.

Using Questionmark to conduct a Performance Based Certification

John Kleeman HeadshotHow do you measure people’s practical skills, for example their ability to conduct electrical or electronic work? Is it possible to have someone use test equipment such as a hardware device or simulator and have this controlled by Questionmark technology?

Dave Fricton and Carolyn Parise The answer to this question is “yes”. I’d like to share an interesting story by Questionmark customer SpaceTEC, whose very inventive approach is making this happen.

SpaceTEC is the National Science Foundation, National Resource Center for Aerospace Technical Education. I’m grateful to DSpaceTec logoave Fricton and Carolyn Parise of SpaceTEC (pictured right) for presenting about this at the Questionmark Users Conference in San Antonio, Texas and for helping me write this blog article.

SpaceTEC, under its sister organization CertTEC, created an electronics certification that is offered to the electrical/electronics industry. Staff members there deliver knowledge based assessments with Questionmark and also deliver a practical exam where someone has to actually perform some electrical work, for instance finding a fault or make measurements. To do this, they use electrical test consoles and proprietary card sets like those shown below.

Model 130E Test console from NIDA                                 Electronic console

Traditionally the practical exam has been delivered manually by an examiner working with the candidate on the test equipment. But this is costly and difficult to organize nationwide as exams take 3 to 4 hours each and examiners need specialized training.

The innovation by SpaceTEC is that they have inserted HTML code inside Questionmark questions to control the test equipment. They drive the test equipment from within Questionmark software, making it no longer necessary for a trained examiner to run the practical test. They still have a proctor in the room to check the integrity of the process, but this is much easier to organize.

Here is a simple example of the kind of question they deliver. The candidate clicks on the Insert Fault button and this sets up the console with the appropriate configuration. Then the candidate measures the resistance on the console and types in their answer to Questionmark, which records and scores the question.

What is the resistance measurement in Kohms between TP1 and TP2?

If you want to know what happens behind the scenes, it’s very simple. The console (from NIDA Corporation) has an ActiveX driver which can be called by HTML code. It is necessary to install the driver on the candidate PC and then plug the console into the PC, but these are simple tasks. The call to the ActiveX driver is then encapsulated inside the Questionmark question wording.

Screenshot of HTML used

This is an example of Questionmark’s Open Assessment Platform, where you can connect Questionmark into other systems to get the solution you need. To quote Dave Fricton in his conference session: “The beauty of using Questionmark is you can do it all yourself”

Do you deliver any practical or performance exams using test equipment? If so, you might be able to follow the same route that SpaceTEC have gone, and link these up to Questionmark for easier administration. SpaceTEC are showing how performance and practical tests can be run in Questionmark, as well as knowledge assessments.