Discussing the revised Standards for Educational and Psychological Testing
Posted by Austin Fossey
I just returned from the National Council for Measurement in Education (NCME) annual meeting in Philadelphia, which is held in conjunction with the American Educational Research Association (AERA) annual meeting.
There were many big themes around research and advances in assessment, but there were also a lot of interesting discussions about changes in practice. There seemed to be a great deal of excitement and perhaps some trepidation about the upcoming release of the next version of the Standards for Educational and Psychological Testing, which is the authority on requirements for good assessment design and implementation, and which has not been updated since 1999.
There were two big discussion sessions about the Standards during the conference. The first was a two-hour overview hosted by Wayne Camara (ACT) and Suzanne Lane (University of Pittsburgh). Presenters from several organizations summarized the changes to the various chapters in the Standards. In the second discussion, Joan Herman (UCLA/CRESST) hosted a panel that talked about how these changes might impact the practices that we use to develop and deliver assessments.
During the panel discussion, the chapter about Fairness came up several times. This appears to be an area where the Standards are taking a more detailed approach, especially with regard to the use of testing accommodations. From the discussion, it sounds like the next version will have better guidance about best practices for various accommodations and for documenting that those accommodations properly minimize construct-irrelevant variance without giving participants any unfair advantages over the general population.
During the discussion, Scott Marion (Center for Assessment) observed that the new Standards do not address Fairness in the context of some delivery mechanisms (as opposed to the delivery conditions) in assessment. For example, he noted that computer-adaptive tests (CATs) use item selection algorithms that are based on the general population, but there is no requirement to research whether the adaptation works comparably in subpopulations, such as students with cognitive disabilities who might be eligible for other accommodations like extra time.
The panelists also mentioned that some of the standards have been written so that the language mirrors the principles of evidence-centered design (ECD), though the Standards do not specifically mention ECD outright. This seems like a logical step for the Standards, as nearly every presentation I attended about assessment development referenced ECD. Valerie Shute (Florida State University) observed that five years ago, only a fraction of participants would have known about ECD, but today it is widely used. Though ECD was around several years before the 1999 Standards, it did not have the following that it does today.
In general, it sounds like most of the standards we know and love will remain intact, and the revisions are primarily serving to provide more clarity or to accommodate the changing practices in assessment development. Nearly all of the presenters work on large-scale, high-stakes assessments that have been developed under the 1999 Standards, and many of them mentioned that they are already committing themselves to review their programs and documentation against the new Standards when they are published later this year.