Operational Research in Assessm ent Program s as a W indow into Task and I tem Design Principles: Exam ples from NAEP Panel: Madeleine Keehner, Hilary Persky, and Luis Saldivia, Educational Testing Service Discussant: Robin Hill, Kentucky Department of Education 2018 NCSA, San Diego, CA
Overview of key aspects of hum an cognition that are relevant to item and task design Research findings and theory from cognitive science Madeleine Keehner
Design Decisions in Assessm ent Developm ent How we measure Task structure, item types, response modes, interactive capabilities, design devices, Constructs – graphics, text, media, Target KSAs layouts… What we measure
These Design Decisions I m pact Key Processes Cognitive Affective Perception and attention Engagement WM load, exec functions Motivation Intrinsic/extraneous load Enjoyment Task structure, item LTM schema activation Frustration types, response modes, Metacognition Boredom interactive capabilities, design devices, Behavioral Social graphics, text, media, Affordances for action Collaborative layouts… Embodiment Communicative
Zoom ing in on Cognition and Behavior Perception and How do external attention item and task design features influence these Long internal cognitive Working term processes? memory schema Action planning and control 5
How External Design Features affect I nternal Processes Perception can be Attention can be captured Perception and overloaded by too by salient features ; it can be attention much information directed through signaling Long Working term memory schema Action planning and control 6
How External Design Features affect I nternal Processes Perception and attention Total processing load may exceed WM capacity Long Working term memory schema With good design, extraneous load can be minimized, intrinsic Action planning load can be optimized and control 7
How External Design Features affect I nternal Processes Perception and Familiar response modes, attention technology, or task types can activate learned schema and reduce WM load Long Working term memory schema Schema may be inappropriately triggered by familiar-feeling formats Action planning and control 8
How External Design Features affect I nternal Processes Perception and attention Long Working term memory schema Action planning and control The affordances of a We may not know what display can make some behaviors we are behaviors more likely ‘inviting’ with our design 9
Conclusion: External Representations affect I nternal Processes Perception and E xternal item and attention task design features interact with internal Long cognitive Working term processes memory schema Action planning and control 10
NAEP Reading Exam ple: I nsights from Pretesting an I nnovative I nterface Design NAEP eReader design problem: • How to present reading passages and items on tablet • Allow students to interact fluently with them • Gather evidence of reading processes • Full-screen presentation would allow for widest variety of passages • Items presented in a separate window or panel would allow for wide variety of item types • Navigational aides provided to facilitate navigation between items and passage
Com parison of Different Layouts 1 vs 2 column passage Items swiped in from the Dinosaur Skeleton Fish Fossils right side Fish Fossils • WM load if items not always visible? • How do interactive behaviors differ with visual occlusion? Look-back buttons in items • Schema for use? • Sufficiently salient?
I nteraction Behaviors: Sw iping I tem s On and Off Two-column layouts: 4 th and 8 th • Swiping (L and R) happened more in • layouts where items overlap text Graders differed (two-column passages) – 4 th Graders: swiped on and then – Where there was no overlap (one- off column - blue) students still swipe L (on) but hardly ever swipe R (off) – 8 th Graders: swiped on, did other actions, then swiped off • Item is visible all the time • Is this too different from P&P? • Does it change the way students read/ search? Some performance differences: G4 did a little better with 1-column, G8 had longer CRs with 1-column
Overall I nsights and Eventual Design Decisions • Different behavioral affordances from 1 and 2 column layouts – Students do not remove items if they are not occluding text • Suggests less cognitive effort to leave on – only removed when in the way – Performance similar but not identical (note: no P&P baseline) – More process information when swiping on and off – Always-visible items might change reading strategy/ approach (diff from P&P) – Expert committee decision: Two-column layout appropriate operational trade-off – (Note: interface design still evolving) • Use of look-back buttons in items hardly ever observed – Interview questions indicated students had not noticed them – Suggests no schema to look for them and not salient enough to capture attention – Design tweak: Visual salience was enhanced, instruction added to tutorial
Take-Hom e 1 : Design Decisions I m pact Basic Processes, and the Reverse should Also be True Cognitive Affective Perception and attention Engagement WM load, exec functions Motivation Task structure, item Intrinsic/extraneous load Enjoyment types, response modes, LTM schema activation Frustration Metacognition Boredom interactive capabilities, design devices, graphics, text, media, Behavioral Social layouts… Affordances for action Collaborative Embodiment Communicative Knowledge of these basic processes should also impact our design decisions
Take-Hom e 2 : I nterdisciplinary Collaboration is needed to do Justice to both W hat and How • Assessment developers – Subject-matter content expertise, item and task design experience • Learning scientists – Subject-relevant cognitive and learning expertise • Cognitive scientists – Expertise in general cognitive, metacognitive, behavioral, social, and affective processes; usability and cognitive research methods; human-computer interaction, etc. – (And many others, of course… ..)
Take-Hom e 3 : More and Better Research Needed • Traditional items are supported by decades of psychometric research – Empirical data: item response characteristics, validity studies, etc. • Digital assessments allow many more options for: – Varied stimuli and representations – Different response modes and response behaviors – Other kinds of behaviors and interactions • Psychometric approach alone may not be enough – Basic properties of cognition need to be examined, and considered a priori – Requires experimental cognitive research methods and analyses – Meanwhile, let’s look at some insights from operational pretesting studies… . 19
A Pretesting Study: Effects of avatars ( and leveling) in SBTs on students Hilary Persky
Background The affordances of DBA allow assessments to better reflect authentic reading experiences, which are purpose driven, at times collaborative, and involve various types and levels of support. Many believe the construct of reading comprehension has broadened with advent of digital literacies. Purpose driven tasks have been taken up by the next generation state assessments as well as national and international assessments (PIRLS and PISA). 21
W hy the study? Avatars used in new NAEP reading tasks to: • introduce and reaffirm overall task and specific activity purposes • simulate conversation/ collaboration • assist in task transitions • reset student understanding (leveling) Some stakeholder concerns: • Do avatars add cognitive load? • Are avatars actually engaging? • Does “leveling” negatively affect students? 22
Study Questions Main focus: Does having student avatars affect • Test performance? • Test-taking behaviors? • Affective responses? Do we see any effects of leveling? 23
Study Design Two assessment tasks: literary and informational • Two versions of each task – Avatar vs Non-avatar • Leveling in both versions • Student survey on – Preferences and affective responses – Background information (digital access; reading motivation) 24
Study Approach Tryout (like normal admin): • 100 students recruited from the DC area • Randomly assigned to the Avatar or Non- avatar conditions (each student took only one task) Cog labs (one on one; think aloud, eye tracking, post-task interview): • 12 students, recruited from Trenton, Ewing, Princeton • Randomly assigned to the Avatar and Non- avatar conditions 25
Tryout Perform ance Results No significant effects on total tasks scores or item scores The number of high- and low-performing students was similarly distributed in the avatar and non-avatar conditions. No significant interactions with gender, race/ ethnicity, SES, or digital access (based on survey items included in the tryout). 26
Tryout Process Data Results No significant effect of avatars on reading behaviors such as reading speed, or the number of page turns. No significant effect of avatars on question answering behaviors such as the number of times answers are changed, back navigation, or specific item behavior, such as select in passage behavior. No significant effects of avatars on time use (that is, time on reading or items) 27
Recommend
More recommend