Data Forensics: Review of Findings and Use of Realistic Simulation • Mayuko Simon • Christie Plackner • David Chayer Data Recognition Corporation • June, 2014
• Introduction • What we have learned thus far • Experimentation with realistic simulation
Supporting clients with multiple measures to allow for more information and perspective on the data Focus is not on students Aware of emerging guidelines and best practices TILSA Test Security Guidebook (Olson & Fremer, 2013) CCSSO Operational Best Practices, part 2 (September, 2013) Testing Integrity Symposium: Issues and Recommendations for Best Practice (U.S. Dept of Education, Institute of Education Sciences, National Center for Education Statistics (2013) Testing and Data Integrity in the Administration of Statewide Student Assessment Programs (NCME, October 2012) Handbook of Test Security (Wollack & Fremer, 2013) Conference on the Statistical Detection of Test Fraud Test Fraud: Statistical Detection and Methodology (Kingston & Clark, 2014)
A Sample of Forensic Methods • Erasure • Scale Score • Pattern Analysis • Model Fit • Local Outlier Detection
WR Erasure Distribution • Wrong-to-right (WR) erasure rate higher than expected from random events • The baseline for the erasure analysis is the state average • Statistical test resulting in an Outlier Score
Erasure Map: Typical Behavior Math Session 3 Secure ID Total WR Math WR Read WR MLevel MLevel* RLevel RLevel* 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 21 3 2 1 Bel Bel Pro Pro 4* 3 A 1 A* 3 2 D C 4 1 B 3 A B 3 2 D A 3 3 2 4 3 22 3 2 1 Bel Bel Bel Bel 2 1 3 4* 2 2 A 3 2 4 1 3 D 2 4 1 D 3 A C* C D 2 C 23 3 3 0 Adv Adv Adv Adv C* 1 3 1 A 2* 2 D C C D B D A B B A* D B C B C 2* A 24 3 1 2 Adv Adv Adv Adv C B A B A 2 A D C C D B 2 A B B A 2 B C B C D A 25 3 1 2 Bel Bel Bas Bas 1 B A 4 2* 1 4 3 4 2 D 4 D 1 1 C B 4 1 A B 1 1 1 26 3 2 1 Pro Pro Bas Bel C B 3 1 A D 2 D 2 C 1 B D 4 A B D B* A C C 2 C 1 27 3 1 2 Bas Bas Bel Bel C B A B A 1 A D 4 1 1 4 1 1 2 4 2 A C 4 1 D 2 4 28 3 2 1 Bel Bel Bel Bel C* B 2 4 2 1 2 D 4 4 D* 3 1 A 4 4 2 4 1 2 C 3 B A 29 3 2 1 Bel Bel Bel Bel 2* 1 3 B 2 2 2 1 C 2 1 1 D 1 B* C 3 0 B C 4 B B A 30 2 2 0 Bas Bas Bel Bel C* B 4 B A 1 2* D 1 2 D B 2 A B 4 B D 4 D 2 A 1 2 31 2 2 0 Adv Pro Bel Bel 2 1 2 4 2 2 4 1 2 1 3 B D 2 4 3 D 4 C A C D 2 B 32 2 2 0 Bel Bel Bel Bel 1 1 2 B 3 2 3 D C 4 D B 3 4* 2 4 2 2 D C 1 4 3 2 33 2 0 2 Adv Adv Adv Adv C B A B A 2* A D C C D B D A B B A D A A D D B A 34 2 1 1 Bas Bel Pro Pro C B 2 B 4 2 3 1 2 C D 1 2 2 4 C A 3 C C D 1 B 4 35 2 2 0 Bel Bel Bel Bel C* B 3 B 2 D 3 2 4 1 3 B 3 4 B 4 3 1 B 3 2 A 1 3 36 2 2 0 Bas Bel Pro Pro 2 4 4 3 4 2 3 D 2 1 2 B D A 1 4 B D* 4 3 2 A D 3 37 2 0 2 Bel Bel Bas Bel 4 3 2 3 2 1 3 D C 2 1 B D 3 B 3 A D 3 3 D D 1 4 38 2 1 1 Bel Bel Bas Bas 2 4 3 1 4 2 2 D C 2 1 4 3 3 2 4 B 4 3 A 4 3 B 1 39 2 2 0 Bas Bas Bel Bel 2 1 2 4 2 2 3 2 C 2 1 B 2 C B 1 4 3 2 4 3 2 1 C 40 1 0 1 Bas Bas Pro Pro C B 3 3 A 2 3 1 2 4 1 B D 4 B 4 B D 1 2 2 4 D A
Erasure Map: Atypical Behavior Math Session 3 Secure ID Total WR Math WR Read WR MLevel MLevel* RLevel RLevel* 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 1 21 16 5 Pro Bas Pro Pro 2* B A* B A 3 A D* 2 4 1 1 1 3 B 1 A 1 3 C B 4 D A 2 20 11 9 Adv Adv Adv Pro C* B A B A* 2 2 D C C D B D* 4 1 C 1 D B 2* A* B B A 3 20 20 0 Adv Bas Bel Bel C 4 A* B* A D A 2 C* C* D B D 2 C 4 2 1 D* 2 C B B 4 4 19 18 1 Pro Bas Adv Adv 2 4 3 B A* 2* 4 D* C C* D 3 D 2 B* B A 3* B* C* 4 C* D A 5 19 13 6 Adv Pro Adv Pro C B A B A* D 2 D 2 C* D B D* A B 4 3 D 3 3 C 4 D 2 6 19 14 5 Adv Pro Pro Pro C* B* 2 1 A* D A D C* 4 D B D 3 1 2 3 1 B C A B B 3 7 18 8 10 Adv Pro Adv Pro 1 B 3 B A D 3* D 2 C 1 4 3 C B 4 3 2 2 C D C 1 4 8 18 12 6 Adv Pro Adv Pro C B* A 1 A 2 A D* C 4 2 B D A C 4 D* 4 3 C C B* B 4* 9 18 11 7 Adv Pro Adv Pro 2 1 A 3 4 2 A 1 1 2 D B D 4 1 4 B 3 4 2 2 A 3 4 10 17 11 6 Adv Pro Adv Pro C 3 A B A 2 A 1 C* C* D B 1 C A B D B A C C D C 1 11 17 8 9 Adv Adv Adv Adv C B A B* A D 3 D C C D B D C 2 B D A C 2 C D C B 12 17 11 6 Adv Pro Adv Pro C B A* B A* 2 A* D* 2 C D B D* C A 1 D B A C C D C C 13 16 9 7 Adv Pro Adv Adv 4 B A 4 A D 2 D C C D B 1 C A B D A C 2 1 D C B 14 16 3 13 Adv Adv Adv Pro 1 B A 3 2 D A D C C D B D A B C B D* B 2 2 A D A 15 16 11 5 Adv Adv Adv Adv C* B A B A* 2* A D C* C* D B D 4 B C D 3 B C A B B A 16 16 4 12 Adv Adv Adv Pro C* B A 3 A D A D C C D B D C A B D B A C C D C C 17 16 10 6 Adv Adv Adv Adv C* B A 1 A D 2 D 2 C D B D A B B A D B C B C D A 18 16 8 8 Bel Bel Bas Bel 1 3* A 3 A 3 A 3 4 C 2 4 1 3* B C 1 3 B C 2 4 B 4 19 16 13 3 Adv Pro Adv Adv C B A B A 1 A D C C D B D 3 C 2 B* A D A 3 D B 2 20 15 14 1 Adv Pro Pro Pro C* B A* B A D A D C* C D B* D 3 2 A 3 2 2 C 4 3 B 3
Erasure by Test Mode • Erasure behavior could be different by mode Primoli & Liassou (2013)
Scale Score Changes • Scale score changes statistically higher or lower than the previous year • Cohort and Non-cohort • Statistical test resulting in an Outlier Score
Pattern Analysis • Modified Jacob and Levitt • Combination of two indicators: – Index 1: unexpected test score fluctuations across years using a cohort of students, and – Index 2: unexpected patterns in student answers • Modified application of Jacob and Levitt (2003) – 2 years of data – Sample size
Measurement Model Misfit • Performed better or worse than expected • Rasch residuals summed across operational items and students
Regression Based Local Outlier Detection • We wish to find schools that are very similar to the peers in most respects (in terms of most independent variables) but differ significantly in current year’s score (the dependent variable). 12
RegLOD Example: Grade 4 Reading IV 2011 Math DV (G4) 2011 Reading 2010 2010 Math (G4) Reading (G4) (G4) 2010 Cohort 2010 Cohort Reading Math (G3) (G3) R 2 = 0.99 13
RegLOD Findings • RegLOD have shown great promise • Its applicability is not limited to cheating detection in educational testing • Given its robust design - specifically its model- based design (the concept of dependent and independent variables in data mining) - and ability to adapt makes it applicable to a wide range of outlier detection problems • We continue to study its capabilities, extend and apply it to other contexts and tasks 14
Multiple Methods Comparison • Used PCA to determine if multiple methods can be reduced for an efficient approach • All methods seem to account for variation in detecting test taking irregularities • Accounting for the most – Cohort regression – Cohort scale score change – Cohort performance level change
What We N eed to Try… • Using empirical data has a drawback: we don’t know how accurate we are detecting aberrant behavior • Typical simulation study uses simulation data but it is not real data • Solution: use real data and simulate aberrant behavior to examine the sensitivity of methods.
Realistic Simulation Design
Detection Techniques 1. Erasure analysis 2. Scale score (SS) analysis (non-cohort) 3. Cohort scale score (SSco) analysis 4. Measurement Model Misfit 5. Modified Jacob and Levitt: Index 2
Before and After
Sensitivity: Consistent Cheating Median 6 Erasure 100 SS 7 SSco 0 Rasch 1 MJL 24 12 Erasure 100 SS 13 SSco 4 Rasch 7 MJL 40 18 Erasure 100 SS 29 SSco 30 Rasch 10 MJL 44
Sensitivity: Copying Cheating Median 6 Erasure 72 SS 0 SSco 0 Rasch 1 MJL 18 12 Erasure 97 SS 3 SSco 0 Rasch 1 MJL 32 18 Erasure 99 SS 8 SSco 1 Rasch 3 MJL 41
Sensitivity: Random Cheating Median 6 Erasure 100 SS 20 SSco 21 Rasch 0 MJL 6 12 Erasure 100 SS 63 SSco 68 Rasch 0 MJL 31 18 Erasure 100 SS 88 SSco 93 Rasch 0 MJL 44
Sensitivity: Ability Cheating Median 6 Erasure 89 SS 3 SSco 0 Rasch 0 MJL 9 12 Erasure 99 SS 7 SSco 1 Rasch 0 MJL 21 18 Erasure 100 SS 14 SSco 8 Rasch 0 MJL 28
Recommend
More recommend