Developing Scale Scores & Cut Scores for On-Demand Assessments - PowerPoint PPT Presentation

Developing Scale Scores & Cut Scores for On-Demand Assessments of Individual Standards Nathan Dadey 1 , Shuqin Tao 2 , and Leslie Keng 1 1 2 NCME - New York, NY April 16th, 2018

Context • Much work has been done on improving a single assessment, in terms of efficiency and information. – Although the definition of an “assessment” continues to blur. • This work takes a different tack, instead examining how scale scores and cut scores can be developed for a set of assessments , motivated by the ideas around the concept of a system of assessments. 4/16/2018 On-Demand Assessments of Individual Standards 2

Context, Continued (Grade 4 Math) Key to this set of assessments is the idea of modularity . 4/16/2018 On-Demand Assessments of Individual Standards 3

Context, Continued (Grade 4 Math) Key to this set of assessments is the idea of modularity . Consider this hypothetical example: 1: Place Value Say a student takes a quiz, or “mini- assessment” on place value at the beginning of the year. 4/16/2018 On-Demand Assessments of Individual Standards 4

Context, Continued (Grade 4 Math) Key to this set of assessments is the idea of modularity . Consider this hypothetical example: 1: Place Value 2: Compare Whole Numbers Then takes another mini-assessment on whole numbers. 4/16/2018 On-Demand Assessments of Individual Standards 5

Context, Continued (Grade 4 Math) Key to this set of assessments is the idea of modularity . Consider this hypothetical example: 1: Place Value And so on…. 2: Compare Whole Numbers 3: Add and Subtract Whole Numbers … 4/16/2018 On-Demand Assessments of Individual Standards 6

Context, Continued (Grade 4 Math) Key to this set of assessments is the idea of modularity . Consider this hypothetical example: Let’s say the student also takes an “general” purpose assessment that surveys the full set of standards. … … 4/16/2018 On-Demand Assessments of Individual Standards 7

Context, Continued (Grade 4 Math) Key to this set of assessments is the idea of modularity . Consider this hypothetical example: Then the full set of assessment this hypothetical student might look like ↓ 4/16/2018 On-Demand Assessments of Individual Standards 8

Context, Continued (Grade 4 Math) Key to this set of assessments is the idea of modularity . Consider this hypothetical example: Then the full set of assessment this hypothetical student might look like ↓ 4/16/2018 On-Demand Assessments of Individual Standards 9

Given data like this, how can we make sense of it? In particular, how can we develop scale scores and achievement-level classifications? 4/16/2018 On-Demand Assessments of Individual Standards 10

Research Questions 1. In what ways can the mini-assessments be scaled? 2. How can provisional mastery classifications be created based on the results of the mini- assessment results? This work is exploratory and presents a picture of our first efforts to tackle this unique type of assessment in the context of fourth grade mathematics. 4/16/2018 On-Demand Assessments of Individual Standards 11

Measures • Assessments of Fourth Grade Mathematics based on the Common Core State Standards • Two types of on-demand, computer administered assessments: – 31 “mini-assessments” aligned to individual standards – A “general assessment” of the standards broadly (adaptive and vertically scaled) 4/16/2018 On-Demand Assessments of Individual Standards 12

Mini-Assessments (31) General Assessment Individual standards CCSS Fourth Grade Mathematics (e.g., 4.NBT.A.1) Flexibly administered Open Access to Items Secure Short & Fixed Form (7 Items) Longer & Adaptive (66 Items Max) Machine Scored, Instant Reporting Non-overlapping (no common Adaptive from the same item items) pool Scale scores, CCSS domain subscores, & classifications on -- individual standards 4/16/2018 On-Demand Assessments of Individual Standards 13

Data • 2016-2017 academic year • 91,440 of the students taking at least one mini- assessment & the general assessment • Mini-Assessments – Approximate number of administrations per mini- assessment: ranges from 3,000 to 47,000, mean of 12,000 and a median of 8,000 – Approximate number of forms per student: ranges from 1 to 80, with a median of 6 and a mean of 7.6 (including re-tests) 4/16/2018 On-Demand Assessments of Individual Standards 14

RQ1 Scaling the mini- assessments 4/16/2018 On-Demand Assessments of Individual Standards 15

One Set of Possible Approaches Conduct Rasch scaling, place the mini-assessments onto: • the scale of the general assessment (via a fixed theta calibration approach). • a single scale across all mini-assessments. • CCSS domain specific scales (5 in all). • individual scales for each mini-assessment. 4/16/2018 On-Demand Assessments of Individual Standards 16

One Set of Possible Approaches Conduct Rasch scaling, place the mini-assessments onto: • the scale of the general assessment (via a fixed theta calibration approach). • a single scale across all mini-assessments. • CCSS domain specific scales (5 in all). • individual scales for each mini-assessment. 4/16/2018 On-Demand Assessments of Individual Standards 17

Domain Scaling Approach • Create unidimensional scales for each CCSS Domain using the Rasch Model • Use a pooled item response matrix (item responses from different time points and different administration patterns) – Best case for detecting multidimensionality 4/16/2018 On-Demand Assessments of Individual Standards 18

Domain Scaling Approach • Examine results in terms of: – Unidimensionality via Principal Components Analysis of Item Residuals – Model Fit (Unweighted and Weighted Mean Squared Fit Statistics) 4/16/2018 On-Demand Assessments of Individual Standards 19

Results - PCA Does not exceed 2% 4/16/2018 On-Demand Assessments of Individual Standards 20

Results – Item Fit (Weighted MS) % <0.75 % > 1.33 # Items 0% 1% 72 Operations & Algebraic Thinking 0% 0% 72 Numbers & Operations - Base Ten 0% 0% 108 Numbers & Operations - Fractions 0% 2% 84 Measurement & Data 3% 3% 36 Geometry Max 3% 3% 4/16/2018 On-Demand Assessments of Individual Standards 21

Future Directions • Additional Dimensionality Investigations – EFA – DIMTEST & DETECT – Comparison Data • Modeling Approaches – Multigroup on time (e.g., month) – Selecting data that best matches recommended instructional sequences – Other models (e.g., treating the tests as attributes in a “system level DCM”; longitudinal Rasch model) 4/16/2018 On-Demand Assessments of Individual Standards 22

RQ2 Creating Classifications 4/16/2018 On-Demand Assessments of Individual Standards 23

One Set of Possible Approaches Create Preliminary Cut Scores, and thus Student Classifications based on: • Cluster analysis (e.g., what DCMs devolve into with one attribute) • Content Expert Judgments • The relationship between each mini-assessment and the matching standard classification from the general assessment 4/16/2018 On-Demand Assessments of Individual Standards 24

The Prediction Approach • Predict the probability of the “can do” classification from the general assessment using the raw scores from the mini-assessment. • To do so, conduct quantile regression where – The dependent variable is the probability of classification from the closest general assessment to the student’s mini-assessment administration – The independent variables are the mini-assessment raw score and the different between administrations (in days) • Evaluate at multiple probabilities & quantiles 4/16/2018 On-Demand Assessments of Individual Standards 25

Mini-Assessment 1A - Place Value This value seems reasonable, but the value for P = 0.67 is outside of Probability of “Can Do” or the range of most of the quantiles. Indicator Mastery 0.67 0.50 7.2 To investigate further, we looked at the relationship, but only using data from the second half of the year. Total Score 4/16/2018 On-Demand Assessments of Individual Standards 26

Mini-Assessment 1A - Place Value After January 1 st , 2017 Probability of “Can Do” or Indicator Mastery 0.67 0.50 5.5 But… the quantile regression controlled for time? Total Score 4/16/2018 On-Demand Assessments of Individual Standards 27

What’s going on? In general, the probability of the general assessment It comes down to the use case classification rate increases over the year, while the for each type of assessment. mini-assessment total scores do not. General Assessment Mini-Assessment 4/16/2018 On-Demand Assessments of Individual Standards 28

Future Directions Further examine the time issue. • Re-sample to have equal numbers of administrations by month? • Look at changes in scores on the mini- assessments? 4/16/2018 On-Demand Assessments of Individual Standards 29

Developing Scale Scores & Cut Scores for On-Demand Assessments - PowerPoint PPT Presentation

Developing Scale Scores & Cut Scores for On-Demand Assessments of Individual Standards Nathan Dadey 1 , Shuqin Tao 2 , and Leslie Keng 1 1 2 NCME - New York, NY April 16th, 2018 Context Much work has been done on improving a single

Chapter 5: z-Scores : Location of Scores Chapter 5: z-Scores : Location of Scores and Standardized

Cut per region Marc Verderi GEANT4 collaboration meeting 01/ 10/ 2002 Introduction Cut here

Carbon-emission and emission-cut measures in CHIYODA-ward A: emission-cut by increased operation

Bevel cut Rabbet Straight cut cut Speed The ultimate goal FoamCorps Literal cutting-edge

Cuts and Connectivity Cuts and Connectivity CSE, IIT KGP Vertex Cut and Connectivity Vertex Cut

Nash demand game Julio D avila 2009 Julio D avila Nash demand game Nash demand game

Cut Flower In High Tunnels Susan Cheek Small Farm Outreach Agent Cut Flowers: Field vs. High

De lInstant Payment un monde sans cut-off De lInstant Payment un monde sans cut-off

Idealized Power Curve Cut in windspeed, rated windspeed, cut-out windspeed Professor O. A.

Cut Not and Fail Cut, Not, and Fail York University CSE 3401 Vida Movahedi 1 York University

Lecture 6: Linear Programming for Sparsest Cut Sparsest Cut and SOS The SOS hierarchy

Chapter 11 Randomized Algorithms III Min Cut CS 573: Algorithms, Fall 2013 October 1, 2013

Generalized Flow-Cut Dualities Sanjeevi Krishnan (Upenn) Bremen 2013 MAX FLOW = MIN CUT The

Common use of CUT Horst Reichel Winter Term 2010/11 Horst Reichel () Common use of CUT Winter

Second Order Cut-Elimination Mikheil Rukhaia Supervisor: Prof. Alexander Leitsch Introduction

Parent Seminar Welcome! PSAT Scores SAT vs. ACT Next Steps Overview New PSAT Score Report

rs tst t

A variational method for the Rasch model Frank Rijmen and Ji r Vomlel Catholic

EXTRA SLIDES Model 2: Latent Regression LLTM + e Indices: p = person i = item j = person

Potential Game and Its Application to Control Daizhan Cheng Institute of Systems Science

Knowledge Tracing Machines: Families of models for predicting student performance Jill-Jnn Vie

Ninth ICTP Workshop on the Theory and Use of Regional Climate Models 28 May - 8 June 2018,

Analysis of variance and regression November 13, 2007 SAS language The SAS environments

Through Community-Engaged Experiential Learning Dr. Jonathan H. Westover Associate Professor of

Sambuz

Useful Links

Newsletter

Mail Us