Discussing the revised Standards for Educational and Psychological Testing

Austin FosseyPosted by Austin Fossey

I just returned from the National Council for Measurement in Education (NCME) annual meeting in Philadelphia, which is held in conjunction with the American Educational Research Association (AERA) annual meeting.

There were many big themes around research and advances in assessment, but there were also a lot of interesting discussions about changes in practice. There seemed to be a great deal of excitement and perhaps some trepidation about the upcoming release of the next version of the Standards for Educational and Psychological Testing, which is the authority on requirements for good assessment design and implementation, and which has not been updated since 1999.NCME Standards

There were two big discussion sessions about the Standards during the conference. The first was a two-hour overview hosted by Wayne Camara (ACT) and Suzanne Lane (University of Pittsburgh). Presenters from several organizations summarized the changes to the various chapters in the Standards. In the second discussion, Joan Herman (UCLA/CRESST) hosted a panel that talked about how these changes might impact the practices that we use to develop and deliver assessments.

During the panel discussion, the chapter about Fairness came up several times. This appears to be an area where the Standards are taking a more detailed approach, especially with regard to the use of testing accommodations. From the discussion, it sounds like the next version will have better guidance about best practices for various accommodations and for documenting that those accommodations properly minimize construct-irrelevant variance without giving participants any unfair advantages over the general population.

During the discussion, Scott Marion (Center for Assessment) observed that the new Standards do not address Fairness in the context of some delivery mechanisms (as opposed to the delivery conditions) in assessment. For example, he noted that computer-adaptive tests (CATs) use item selection algorithms that are based on the general population, but there is no requirement to research whether the adaptation works comparably in subpopulations, such as students with cognitive disabilities who might be eligible for other accommodations like extra time.

The panelists also mentioned that some of the standards have been written so that the language mirrors the principles of evidence-centered design (ECD), though the Standards do not specifically mention ECD outright. This seems like a logical step for the Standards, as nearly every presentation I attended about assessment development referenced ECD. Valerie Shute (Florida State University) observed that five years ago, only a fraction of participants would have known about ECD, but today it is widely used. Though ECD was around several years before the 1999 Standards, it did not have the following that it does today.

In general, it sounds like most of the standards we know and love will remain intact, and the revisions are primarily serving to provide more clarity or to accommodate the changing practices in assessment development. Nearly all of the presenters work on large-scale, high-stakes assessments that have been developed under the 1999 Standards, and many of them mentioned that they are already committing themselves to review their programs and documentation against the new Standards when they are published later this year.

Questionmark and Heartbleed

Heartbleet logo from heartbleet.comJohn Kleeman HeadshotPosted by John Kleeman

You may have heard of “Heartbleed,” a bug in a program used by many sites on the Internet that could allow the theft of data normally protected by SSL/TLS encryption. This bug was disclosed to the public on April 7th – and here is Questionmark’s response to the bug.

Our internal Computer Emergency Response Team (CERT)  immediately reviewed our servers and systems to identify any potential vulnerabilities.  Fortunately, most Questionmark systems do not use OpenSSL (the encryption system that was subject to this vulnerability).  The one affected system identified by our CERT team was promptly updated to address the issue and our customers were informed.

Here is some additional information for our customers and other users of Questionmark systems:

Questionmark’s cloud-based products and services:

Questionmark Live

  • Our collaborative authoring system, Questionmark Live was not vulnerable to the bug.

Questionmark’s US OnDemand Service

Questionmark’s European OnDemand Service

  • One component of our European OnDemand Service was identified is subject to this vulnerability: the “load balancer” that provides the entry point to the European OnDemand Service. This system was promptly patched last week, its SSL certificate replaced and the previous certificate revoked; it is no longer vulnerable.

None of the other systems that comprise the European Questionmark OnDemand service were affected: the application and database servers, where customer data is stored, were never subject to this vulnerability. We have no indication that any customer data or passwords were compromised. However, out of caution and in recognition of the theoretical risk, we are advising our customers to log into the system and change passwords and keys. We have reached out by email to all customers affected and will be following up by telephone.

Questionmark products for on-premises deployment:

Questionmark Perception

  • Our behind the firewall product, Questionmark Perception does not include OpenSSL and so is not itself vulnerable to the bug. But Questionmark Perception can be installed under SSL/TLS. If it is and the offending program (OpenSSL) is used, then an organization might be vulnerable due to its use of OpenSSL outside Questionmark software. If you use Questionmark Perception under SSL/TLS (you can tell this because the URL will include https rather than http), you should check with your organization’s IT team.

If any Questionmark user or customer has questions, please raise them with your Questionmark account manager or with technical support. I hope that this rapid response and full transparency highlights our commitment to security.This also illustrates the value of an OnDemand service. Rather than having to rely on internal IT to catch up and patch vulnerable systems, you can delegate this to Questionmark as your service provider.

Questionmark takes security seriously. Our OnDemand customers benefit not only from our 24/7 monitoring of systems and platform uptime – but also from a team of experts ready to address potential security threats as they arise – and before they arise.

Security Video Image

Watch this video for more about Questionmark’s commitment to security.

Using Questionmark to conduct a Performance Based Certification

John Kleeman HeadshotHow do you measure people’s practical skills, for example their ability to conduct electrical or electronic work? Is it possible to have someone use test equipment such as a hardware device or simulator and have this controlled by Questionmark technology?

Dave Fricton and Carolyn Parise The answer to this question is “yes”. I’d like to share an interesting story by Questionmark customer SpaceTEC, whose very inventive approach is making this happen.

SpaceTEC is the National Science Foundation, National Resource Center for Aerospace Technical Education. I’m grateful to DSpaceTec logoave Fricton and Carolyn Parise of SpaceTEC (pictured right) for presenting about this at the Questionmark Users Conference in San Antonio, Texas and for helping me write this blog article.

SpaceTEC, under its sister organization CertTEC, created an electronics certification that is offered to the electrical/electronics industry. Staff members there deliver knowledge based assessments with Questionmark and also deliver a practical exam where someone has to actually perform some electrical work, for instance finding a fault or make measurements. To do this, they use electrical test consoles and proprietary card sets like those shown below.

Model 130E Test console from NIDA                                 Electronic console

Traditionally the practical exam has been delivered manually by an examiner working with the candidate on the test equipment. But this is costly and difficult to organize nationwide as exams take 3 to 4 hours each and examiners need specialized training.

The innovation by SpaceTEC is that they have inserted HTML code inside Questionmark questions to control the test equipment. They drive the test equipment from within Questionmark software, making it no longer necessary for a trained examiner to run the practical test. They still have a proctor in the room to check the integrity of the process, but this is much easier to organize.

Here is a simple example of the kind of question they deliver. The candidate clicks on the Insert Fault button and this sets up the console with the appropriate configuration. Then the candidate measures the resistance on the console and types in their answer to Questionmark, which records and scores the question.

What is the resistance measurement in Kohms between TP1 and TP2?

If you want to know what happens behind the scenes, it’s very simple. The console (from NIDA Corporation) has an ActiveX driver which can be called by HTML code. It is necessary to install the driver on the candidate PC and then plug the console into the PC, but these are simple tasks. The call to the ActiveX driver is then encapsulated inside the Questionmark question wording.

Screenshot of HTML used

This is an example of Questionmark’s Open Assessment Platform, where you can connect Questionmark into other systems to get the solution you need. To quote Dave Fricton in his conference session: “The beauty of using Questionmark is you can do it all yourself”

Do you deliver any practical or performance exams using test equipment? If so, you might be able to follow the same route that SpaceTEC have gone, and link these up to Questionmark for easier administration. SpaceTEC are showing how performance and practical tests can be run in Questionmark, as well as knowledge assessments.

Giving meaning to assessment scores

Austin FosseyPosted by Austin Fossey

As discussed in previous posts, validity refers to the proper inferences for and uses of assessment results. Assessment results are often in the form of assessment scores, and the valid inferences may depend heavily on how we format, label, report, and distribute those scores.

At the core of most assessment results are raw scores. Raw scores are simply the number of points earned by participants based on their responses to items in an assessment. Raw scores are convenient because they are easy to calculate and easy to communicate to participants and stakeholders. However, their interpretation may be constrained.

In their chapter in Educational Measurement (4th ed.), Cohen and Wollack explain that “raw scores have little clear meaning beyond the particular set of questions and the specific test administration.” This is often fine when our inferences are intended to be limited to a specific assessment administration, but what about further inferences?

Peterson, Kolen, and Hoover stated in their chapter in Educational Measurement (3rd ed.) that “the main purpose of scaling is to aid users in interpreting test results.” So when other inferences need to be made about the participants’ results, it is common to transform participants’ scored responses into a more meaningful measure.

When raw scores do not support the desired inference, then we may need to create a scale score. In his chapter in Educational Measurement (4th ed.), Kolen explains that “scaling is the process of associating numbers or other ordered indicators with the performance of examinees.” Scaling examples include percentage scores to be used for topic comparisons within an assessment, equating scores so that scores form multiple forms can be used interchangeably, or scaling IRT theta values so that all reported scores are positive values. SAT scores are examples of the latter two cases. There are many scaling procedures, and a full discussion is not possible here. (If you’d like to know more about this, I’d suggest reading Kolen’s chapter, referenced above).

Cohen and Wollack also describe two types of derived scores: developmental scores and within-group scores. These derived scores are designed to support specific types of inferences. Developmental scores show a student’s progress in relation to defined developmental milestones, such as grade equivalency scores used in education assessments. Within-group scores demonstrate a participant’s normative performance relative to a sample of participants. Within-group scores include standardized z scores, percentiles, and stanines.


Examples of within-group scores plotted against a normal distribution of participants’ observed scores.

Sometimes numerical scores cannot support the inference we want, and we give meaning to the assessment scores with a different ordered indicator. A common example is the use of performance level descriptors (PLDs, also known as achievement level descriptors or score band definitions). PLDs describe the average performance, abilities, or knowledge of participants who earn scores within a defined range. PLDs are often very detailed, though shortened versions may be used for reporting. In addition to the PLDs, performance levels (e.g., Pass/Fail, Does Not Meet/Meets/Exceeds) provide labels that tell users how to interpret the scores. In some assessment designs, performance levels and PLDs are reported without any scores. For example, an assessment may continue until a certain error threshold is met to determine which performance level should be assigned to the participant’s performance. If the participant performs very well consistently from the start, the assessment might end early and simply assign a “Pass” performance level rather than making the participant answer more items.

Integrating and Connectors – SharePoint

Doug Peterson HeadshotPosted By Doug Peterson

There’s not just one way to integrate Questionmark with your SharePoint portal. There’s not just two ways. There are actually three ways to integrate a Questionmark assessment into a SharePoint page!

For Perception (on-premise) customers, it’s possible to use Windows Authentication to present to a SharePoint user a list of assessments for which they have been scheduled – without having to re-authenticate the user in Questionmark.

Questionmark has also developed a SharePoint Connector for our OnDemand customers. It’s a SharePoint web part that automatically logs the user into Questionmark and displays a list of assessments for which they have been scheduled.

The third way to integrate a Questionmark assessment with a SharePoint page is to embed it in the page. This is great for simple, anonymous quizzes and knowledge checks.

Check out this video for a quick overview of all three methods of integrating Questionmark and SharePoint.

SharePoint Video

The 12 responsibilities of a data controller, part 2

John Kleeman HeadshotPosted by John Kleeman

In my post last week, I shared some information on six of the responsibilities of assessment sponsors acting as Data Controllers when delivering assessments in Europe:

1. Inform participants
2. Obtain informed consent
3. Ensure that data held is accurate
4. Delete personal data when it is no longer needed
5. Protect against unauthorized destruction, loss, alteration and disclosure
6. Contract with Data Processors responsibly

Here is a summary of the remaining responsibilities:

7. Take care transferring data out of Europe

You need to be careful about transferring assessment results outside of the European Economic Area (though Canada, Israel, New Zealand and Switzerland are considered safe by the EU). If transferring to another country, you should usually enter into a contract with the recipient based on standard clauses called the “EU Model Clauses” and by performing due diligence.  You can also send to the US if the US company follows the US government Safe Harbor rules, but German data protection authorities require further diligence beyond Safe Harbor.

8. If you collect “special” categories of data, get specialist advice

Political candidate with "Vote" signs

The data protection directive defines “special” categories of data, covering data that reveals racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade-union membership, as well as data concerning health or sex life. Many assessment sponsors will choose not to collect such information as part of assessments, but if you do collect this, for example to prove assessments are not biased, the rules need to be carefully followed. Note that some information may be obtained even if not specifically requested. For example, the names Singh and Cohen may be an indication of race or religious belief. This is one reason why getting informed consent from data subjects is important.

9. Deal with any subject access requests


Data protection law allows someone to request information you are holding on them as Data Controller, and if you receive such a request, you will need to review it and respond.

You will need to check specific country rules for how this works in detail. There are typically provisions to prevent people from gaining access to exam results in advance of their formal adjudication and publication.


10. If the assessment is high stakes, ensure there is review of any automated decision making

Picture of person with two pathwaysThe EU Directive gives the right “to every person not to be subject to a decision which produces legal effects concerning him or significantly affects him and which is based solely on automated processing of data”. You need to be careful that important decisions are made by a person, not just by a computer.

For high-stakes assessments, you should either include a human review prior to making a decision or include a human appeal process. In general, an assessment score should be treated as one piece of data about a person’s knowledge, skills and/or attitudes and you should thoroughly review the materials, scores and reports produced by your assessment software to ensure that appropriate decisions are made.

11. Appoint a data protection officer and train your staff Picture of security console

This is not required everywhere, but it is a sensible thing to do. Most Data Controllers established in Germany need to appoint a data protection officer, and all organizations are likely to find it helpful to identify an individual or team who understands the issues, owns data protection in the organization and ensures that the correct procedures are followed. One of the key duties of the data protection officer is to train employees on data protection.

I recommend (and it’s something we do ourselves within Questionmark) that all employees are tested annually on data security to help ensure knowledge and understanding.

12. Work with supervisory authorities and respond to complaints

You need to register with supervisory authorities in many jurisdictions and provide a route to make complaints and must respond to complaints.


If you want to learn more, then please read our free-to-download white paper: Responsibilities of a Data Controller When Assessing Knowledge, Skills and Abilities [requires registration].

Next Page »
SAP Microsoft Oracle HR-XML AAIC