EMPIRICAL ANALYSIS OF THE NYS APPR SYSTEM
Brenda Myers Superintendent of Schools Valhalla UFSD
Education Analytics’ Work with the Lower Hudson Council Analyzed state growth model methods and policy Acquired data from many districts in the council Analyzed results for these members Unique cross district data collaboration Allows for a better understanding of how state policy is affecting local decisions Individual district data not enough for a broad picture
Goals for Today Present a high level discussion of data findings Examine how APPR rating policy may affect measurement Look at where this all fits in with other states
Andrew Rice Vice President of Research and Operations Education Analytics
EA Mission Founded in 2012 by Dr. Robert Meyer, director of the Value-Added Research Center (VARC) at the University of Wisconsin-Madison “Conducting research and developing policy and management analytics to support reform and continuous improvement in American education” Developing and implementing analytic tools for systems of education based on research innovations developed in academia
What Are Our Biases? Support research and data based policy Scientific perspective on decision making Respect (not expertise) for political process If the data say: the emperor has no hat the emperor has no shoes the emperor has no robe We would conclude: it may be the case that the emperor has no clothes
Who We Work With Districts States Foundations (Walton, Gates, Dell) Unions (NEA, AFT) Understanding the data is useful to everyone
Measures vs. Ratings A Measure Has technical validity Can be evaluated by scientific inquiry SGP, Charlotte Danielson Rubric, Survey result etc. A Rating Is a policy judgment Cannot be evaluated without policy judgment APPR Categories, “Effective”, “Developing”, etc.
Measure to Rating Conversion
HEDI Scales State Growth, Comparable, Locally Selected State Growth Model 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Highly Ineffective Developing Effective Effective Comparable Growth & Locally Selected Measures 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Highly Ineffective Developing Effective Effective
If HEDI Scales were Consistent State Growth Model Highly Ineffective Developing Effective Effective Comparable Growth & Locally Selected Measures Highly Ineffective Developing Effective Effective Observation Rubrics and Practice Measures Highly Ineffective Developing Effective Effective Hypothetically Aligned Composite Rating Highly Ineffective Developing Effective Effective
The Actual Composite Rating Hypothetically Aligned Composite Rating 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 1 to to to to to to to to to to to to to to to to to to to to 0 4 9 14 19 24 29 34 39 44 49 54 59 64 69 74 79 84 89 94 99 0 Highly Ineffective Developing Effective Effective Actual Composite Rating 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 1 to to to to to to to to to to to to to to to to to to to to 0 4 9 14 19 24 29 34 39 44 49 54 59 64 69 74 79 84 89 94 99 0 Devel- Highly Ineffective Effective oping Effective
Impact on Observation and Practice Measures Rating Scale State Growth Model Highly Ineffective Developing Effective Effective Comparable Growth & Locally Selected Measures Highly Ineffective Developing Effective Effective Observation Rubrics and Practice Measures Actual Composite Rating Devel- Highly Ineffective Effective oping Effective
Alignment of Actual Lower Hudson Scores to Compressed Scale
Observation Rubrics and Practice Measures Scores Districts are responding to a particular set of rules that requires them to abandon almost all of the rating scale it would be optimal if they did not have to Does your district retain the “measures” for decision making Report “ratings” as compressed Effort not wasted as long as information retains value
State Growth Model Study Findings
SGP Model NY SGP model is rigorous and attempts to deal with many growth modeling issues In phase 1 (2011/2012) we were concerned with strong relationships between incoming test performance and SGP In phase 2 (2012/2013) we note that this has been changed through the addition of classroom average characteristics or “peer effect” variables
Distribution of State Growth Scores
State Growth Model Distributions are largely spread out over the scale High-growth teachers and low-growth teachers in almost every district Peer effects have evened out scores between high and low proficiency regions Class size has some impact but is mitigated with translation from measurement to rating
Comparable Measures Study Findings
Distribution of Comparable Measures Scores
Comparable Measures Substantial differences between districts in the way their policies measure effectiveness with Comparable Measures ratings OR their teacher’s ability to score highly on these metrics NYSED policy allows variance in implementation of comparable measures rating comparability between districts is suspect. Seems not possible for a teacher to attain any of the scores from 0-20 as required by regulation
Local Measures Study Findings
Distribution of Local Measures Scores
Local Measures Substantial differences between districts in the way their policies measure effectiveness with Local Measures ratings OR their teacher’s ability to score highly on these metrics NYSED policy allows variance in implementation of Local measures rating comparability between districts is suspect. Seems not possible for a teacher to attain any of the scores from 0-20 as required by regulation
Student Outcome Measures Comparability Across Districts Flexibility at the local level seems to have produced ratings that are not comparable across districts
State Growth or Local Observation Comparable Growth Measure / Practice
Overall System
What’s Driving Differentiation of Scores? Each Measure Distributed State Growth Drives Differences 100 100 90 90 80 80 70 70 60 60 50 50 40 40 30 30 20 20 10 10 0 0 75 83 92 100 75 83 92 100 Observation and Practice Observation and Practice Local Measure Local Measure State Growth or Comparable Growth State Growth or Comparable Growth
In Lower Hudson Only 10% of variance driven by Observations
Summary of Findings Strong variation between district implementations SLOs 3 points higher than MGP Local measures 4 points higher than MGP Observation ratings almost no differentiation Likely driven by the rules set forth in the composite score HEDI bands
Nationwide Context
Total system issues Two big rating systems Index: weighted points Matrix: categories based on multiple positions of measures A visual: 3 E H H H Measure 1 2 I D E H 1 I I D H 1 2 3 4 Measure 2
Pro/Con of Rating Systems Index Pro: easy to calculate Pro: easy to communicate Con: Compensatory model may incent cross component influence Matrix Pro: more flexible Con: more difficult to explain Pro: may allow for disagreeing measures to be dealt with in a different way than index
What About the Other 49 States? Much experimentation High weights on growth High weights on observations Student surveys Assessment system redesigns Index and Matrix approaches Who gets it right?
Exemplars Developing field No state has it right Some components work Growth on assessment measurement in NY is good No state has gotten SLO’s right (RI is getting there) Observations are coming under fire for poor implementation and possible bias (some great work on the measures – not yet on ratings) Total system scoring and policy is all over the map
Louis Wool Superintendent of Schools Harrison Central School District
Recommend
More recommend