We will wrap up our discussion of themes at the National Council for Measurement in Education (NCME) annual meeting with an overview of the inescapable discussion about working with complex — and often messy– data sets.

It was clear from many of the presentations and poster sessions that technology is driving the direction of assessment, for better or for worse (or as Damian Betebenner put it, “technology eats statistics”). Advances in technology have allowed researchers to examine new statistical models for scoring participants, identify aberrant responses, score performance tasks, identify sources of construct-irrelevant variance, diversify item formats, and improve reporting methods.

As the symbiotic knot between technology and assessment grows tighter, many researchers and test developers are in the unexpected position of having too much data. This is especially true in complex assessment environments that yield log files with staggering amounts of information about a participant’s actions within an assessment.

Log files can track many types of data in an assessment, such as responses, click streams, and system states. All of these data are time stamped, and if they capture the right data, they can illuminate some of the cognitive processes that are manifesting themselves through the participant’s interaction with the assessment. Raw assessment data like Questionmark’s Results API OData Feeds can also be coupled with institutional data, thus exponentially growing the types of research questions we can pursue within a single organization.

NCME attendees learned about hardware and software that captures both response variables and behavioral variables from participants as they complete an online learning task.

Several presenters discussed issues and strategies for addressing less-structured data, with many papers tackling log file data gathered as participants interact with an online assessment or other online task. Ryan Baker (International Educational Data Mining Society) gave a talk about combine the data mining of log files with field observations to identify hard-to-capture domains, like student engagement.

Baker focused on the positive aspects of having oceans of data, choosing to remain optimistic about what we can do rather than dwell on the difficulties of iterative model building in these types of research projects. He shared examples of intelligent tutoring systems designed to teach students while also gathering data about the student’s level of engagement with the lesson. These examples were peppered with entertaining videos of the researchers in classrooms playing with their phones so that individual students would not realize that they were being subtly observed by the researcher via sidelong glances.

Evidence-centered design (ECD) emerged as a consistent theme: there was a lot conversation about how researchers are designing assessments so that they yield fruitful data for
intended inferences. Nearly every presentation about assessment development referenced ECD. Valerie Shute (Florida State University) observed that five years ago, only a fraction of participants would have known about ECD, but today it is widely used by practitioners.

