Approach to Comparability as Virginia Moved to Online Testing National Conference on Student Assessment June 2018 1
Introduction Virginia first investigated mode comparability in fall 2001. Three End-of-Course tests Algebra I Earth Science English Reading, Literature, and Research The philosophical approach was to examine whether the tests were of comparable difficulty in paper and online modes. If so, a common scale score table could be used for both modes. 2
The first study – Fall 2001 School divisions were invited to participate in the study after fall 2001 operational testing. Each district was certified to have adequate infrastructure to deliver the online tests. Approximately 2200 students participated. Students were assigned by each school to take either a paper version or an online version of the same test form. 3
The first study – Fall 2001 Paper versions of the EOC tests were administered statewide during the fall. Approximately one week later, participating students were administered an alternate form either on paper or online. Students were able to keep the higher score from the study test or the operational test. 4
The first study – Fall 2001 Live fall testing Comparability Study Form 1 Form 2 Form 2 Paper Paper Computer Randomly equivalent groups 5
The first study – Fall 2001 Condition EOC Test Computer Paper N of test takers N of test takers English: RLR 301 268 Algebra I 398 365 Earth Science 465 409 Total 1164 1042 6
The first study – Fall 2001 Comparison of fall operational test scores in the two study groups Standard EOC Test Condition Mean Median Minimum Maximum Deviation Paper 421.2 426 55.5 252 562 English: RLR Computer 445.1 445 54.3 285 592 Paper 411.4 408 37.0 328 547 Algebra I Computer 411.9 408 40.3 321 600 Paper 402.8 394 45.3 276 600 Earth Science Computer 409.3 403 48.8 314 554 7
The first study – Fall 2001 Comparison of study test raw scores in the two groups Standard EOC Test Condition Mean Median Minimum Maximum Deviation Paper 22.98 23 8.41 0 39 English: RLR Computer 26.78 27 7.00 9 42 Paper 26.03 25 8.35 0 48 Algebra I Computer 26.38 26 8.01 5 46 Paper 27.70 27 8.39 0 48 Earth Science Computer 29.04 28 8.97 0 49 8
The first study – Fall 2001 The computer and paper groups were similar but not as comparable as was hoped. Mean scores on the common live paper test were different for the two groups. Survey responses indicated possible differential motivation in the two groups. 9
The first study – Fall 2001 Do you consider this an extra opportunity to pass the EOC test? No Yes 154 754 N Computer % 17.0 84.0 N 265 627 Paper 29.7 70.3 % 10
The first study – Fall 2001 Compared to your last testing opportunity, how motivated were you to take this test? Less About the More 2 4 Motivated Same Motivated 91 91 372 225 128 N Computer 10.0 10.0 41.0 24.8 14.1 % 220 131 372 100 75 N Paper % 24.5 14.6 41.4 11.1 8.4 11
The first study – Fall 2001 Performance differences between the study groups on the operational fall tests and the study tests, as well as apparent motivational differences, made it difficult to conclude that scores were comparable in the two modes. The focus shifted to adjusting scores for possible mode effects. Equate the computer form to its previous administration in paper. 12
Fall 2001 – Revised Approach Live Testing Comparability Study Spring 2000 Fall 2001 Fall 2001 Fall 2001 Form 2 Form 1 Form 2 Form 2 Paper Paper Paper Computer Common Persons Common Items 13
Fall 2001 – Revised Approach Spring 2000 Fall 2001 Form 2 Form 2 Paper Computer Common items, non-equivalent groups 14
Fall 2001 – Revised Approach Mode effects may or may not be present. If they are, the scores can be adjusted by equating the computer version to the paper version of the test. Mode effects will be reflected by the magnitude of differences in item parameter estimates between the two modes. 15
Fall 2001 – Revised Approach The computer version is equated to the paper version through a set of linking items. Linking items with large displacements are dropped from the linking set. A difference in the average difficulty of non-linking items suggest a mode effect. 16
The Second study – Spring 2002 Two End-of-Course tests Algebra II – 1,305 students Biology – 1,882 students Divisions volunteered to participate. No assignment of students to conditions. 17
The Second study – Spring 2002 Students took a computer version of a different form than they had taken during the operational spring 2002 administration. The study form was administered operationally on paper in other divisions during spring 2002. Students were able to keep the higher score from the study test or the live test. 18
The Second study – Spring 2002 The computer version of the test was equated to the operational paper version. Linking items with large differences in item parameters were omitted from the anchor set. Resulting score tables showed differences between paper and computer raw scores of 0 to 1 scale score points for Biology and 1 to 2 scale score points for Algebra II. 19
The model for moving tests online Schools are given a choice of testing on paper or online. The first online form of a test is equated to a paper form. Subsequent new forms administered online are equated to a previous online form. Dual mode system New forms administered on paper are equated to a previous paper form New forms administered online are equated to a previous online form 20
Timeline for bringing SOL tests online Session SOL Assessments Added Fall 2001 Algebra I, Earth Science, English: Reading Spring 2002 Algebra II, Biology Fall 2002 VA & US History, World History I, World History II Spring 2003 World Geography, Chemistry Spring 2004 Geometry Spring 2005 Begin middle school Spring 2006 Begin elementary school Spring 2013 Paper tests are accommodated forms 21
Usability and Comparability of Different Devices Virginia has conducted or participated in various device studies over the past several years Spring 2012 – Writing Test Cognitive Lab External and on-screen keyboards with tablets Fall 2012 – Tablet Usability “Think - Aloud” Study Specific technology enhanced item types on tables with 10” and 7” screens Spring 2013 – Quantitative Written Composition Study Student writing on laptop, tablet, or tablet with external keyboard Spring 2014 – Quantitative study of Reading, Math, and Science Students were randomly assigned to take a test on computer or tablet 22
Recommend
More recommend