Automated Scoring and Rater Drift National Conference on Student - PowerPoint PPT Presentation

Automated Scoring and Rater Drift National Conference on Student Assessment Detroit, 2010 Wayne Camara The College Board

Rater drift • When ratings are made over a period of time there is a concern that ratings may become more lenient or harsh. • Occurs in all contexts, performance appraisals, scoring performance assessments, judging athletic events… • Increased risk when: • Rubrics (criteria) are more subjective. • Scoring occurs over time (within year, between years). • Pressure to score many tasks quickly

Detecting and Correcting Rater Drift • Tools may differ between assessments completed on paper and computer. • Multiple readers, with mixed assignments • Read behind • Seed papers from previous administration, benchmark papers (established mark) • Calibration of readers, retraining

Automated Scoring • Automated scoring systems – essays, spoken responses, short content items, numerical /graphical responses to math questions (with verifiable and limited set of correct responses). • Typically evaluated through comparison with human readers. • Correlations, Weighted Kappa (preferred over % agreement which is misleading and sensitive to rating scale – 4-pt vs 9-pt). • Exact and adjacent agreement is impacted by score scale (4 vs 9 pt) • Similar distributions as humans (variation in ratings, use of extremes in scale). • Also validated against external criteria (other test sections, previous scores on same test, scores on similar tests, grades)

Automated Scoring – Issues to consider in using scores for detecting drift • Rubric – whether it is general vs task specific; holistic vs mechanistic, unidimensional. • Using other sections of the test is useful (such as MC items). However, there are also weaknesses with using MC items. • Relationship between performance tasks and MC items should differ (assume they measure different parts of the construct). Need to ensure consistency across tasks before employing MC section corr. as criteria. • Best when computed separately for each dimension (not combined score) and each rater (not total score)

Papers by Lottridge and Schulz : Best Practices • Scoring engine must be trained – if drift exists then using papers from a brief time period can introduce similar error in system. • Note that raters will tend to avoid extreme scores – but some AS systems also avoid extreme scores • Selection of training sample – tasks already calibrated, representativeness of tasks. • Compare reader agreement AND distribution of scores across all results (readers)

Papers by Lottridge and Schulz : Best Practices • Year to year drift should be checked (e.g., rescore papers, N=500 to 1,000). • Intrareader correlations and agreement increased over time. • AS is treated as a single scorer in comparison to each reader • Propose using as second reader or solely to monitor reader quality. • Utility as second reader is established with knowledge that AS will focus on some dimensions (grammar, mechanics, vocab, semantic content or relevance, organization) • AS does not evaluate rhetorical skills, voice, the accuracy of concepts described, whether arguments are well founded.

Automated Scoring • Some cautions, but promise • Common Core Assessments – many of the leading proponents of different assessment models have over estimated the efficacy of AS and underestimated the cost and time required. • As noted earlier – limited to certain types of tasks and subjects. • As noted in papers – AS doesn’t make judgments, but scores on selected features. Readers, also score based on context and differential features, but have the ability to make judgments and consider all aspects of a paper (if time permitted). • AS moving beyond the big three (ETS, Vantage, PEM) to many new players, including Pacific Metrics, AIR…

Automated Scoring and Rater Drift National Conference on Student - PowerPoint PPT Presentation

Automated Scoring and Rater Drift National Conference on Student Assessment Detroit, 2010 Wayne Camara The College Board Rater drift When ratings are made over a period of time there is a concern that ratings may become more lenient or

Rater agreement - ordinal ratings Karl Bang Christensen Dept. of Biostatistics, Univ. of

Exercise 8: Scoring Exercise 8: Scoring FLUKA Beginners Course Exercise 8: Scoring Aim of the

Mountain High Swim League Scoring Presentation 2018 Scoring Committee 1 MHSL Scoring Training

Exercise 8: Scoring FLUKA Beginners Course Exercise 8: Scoring Aim of the exercise: 1- Add

Genetic drift (two types) Genetic drift: changes in allele frequencies due to chance. Founder

Automated Essay Scoring as Basic Regression Ashesh Singh Background What is Automated Essay

Continuous Flow Scoring of Prose Constructed Response: A Hybrid of Automated and Human Scoring

Automated Design of Digital Automated Design of Digital Automated Design of Digital Automated

Welcome to Scoring the ACIRI a Job Aid. 1 This job aid provides a brief review of the scoring

Investment Board April 21, 2014 Agenda UW-IT Portfolio Scoring Process Scoring Results

Mobile Credit Scoring: Powering Consumer Finance in Emerging Markets SUMMARY Credit Scoring

SI Scoring Guide SUBORDINATION INDEX USING SALT Discuss the scoring rules SALT SOFTWARE, LLC

TDNN: A Two-stage Deep Neural Network for Prompt-independent Automated Essay Scoring Outline

Drift cage electrical elements production Drift cage electrical elements production and QA and

Surfing and Drift Acceleration of Surfing and Drift Acceleration of Electrons at High Mach Number

Random genetic drift Genetic drift and mutation balance Population size is an important number

Catch My drift Consultation on the restoration options for East Chevington Nature Reserve

Approaches to Adversarial Drift Alex Kantchelian, Sadia Afroz, Ling Huang, Aylin Caliskan Islam,

Developments in Road Surface Maintenance Dr Howard Robinson, Chief Executive APSE Roads and

CONTEST 2020 RULES Entry Fee : $5.00 per individual. Make check payable to University of

Report on Measurements in the Lab with R11, R12, R13 Alexandra Moskaleva What is a resistive

Particle motion in 3D MHD equilibria versus relaxed states e 1 David Pfefferl 1 The University

University Academic Center Eastern USA Alexander Altemose I Structural Option University

Capital Project Our Partners: rtners: HIGH PRIORITY GOALS: 1. Instruction: Classroom