Growth in Student Achievement: Issues of Measurement, Longitudinal Analyses & Accountability Damian W. Betebenner NCIEA CCSSO NCSA, June 23, 2010
Discussions of student growth lie at the intersection of three topics Longitudinal Data Analysis/Applied Statistics Overview Accountability/Education Policy/Data Use Measurement/Psychometrics
Measurement/Psychometrics Examining student growth requires multiple measurements of the same individual Growth in what? Overview How much growth? (How is scaling involved in answering this question?) Is it enough growth?
Longitudinal Data Analysis/Applied Statistics Many methods for analysis of longitudinal data What are the relevant questions? Are the analytic techniques capable of Overview answering those questions? Does the data possess properties sufficient for the analytic techniques employed? (e.g., vertical scale) Does the analysis sustain the inferences made from the data?
Accountability/Education Policy/Data Use Education Policy & Accountability have many goals and purposes Why growth in accountability? Overview What are the goals and purposes of accountability? What is the theory of action behind accountability? How can we judge the validity of the accountability system? What about the current policy context?
Measurement/Psychometric Issues Technical Considerations
Measurement/Psychometric Issues Technical Considerations Growth in what? How much growth? Scales for measuring growth Ordinal (within-year, across year) Interval (within-year, across year) Vertical Growth magnitude versus growth norm Is it enough growth? Norm- versus criterion- referencing (intersection of Accountability and Measurement)
Growth in what? Technical Considerations Beneath any notion of change (i.e., growth) is a construct that is changing over time Height and weight are common points of reference Constructs in education are “slippery” Need, at a minimum, an underlying semantical referent (e.g. reading or math)
How much growth? Technical Considerations Are growth magnitudes possible in education? If calculable, are they interpretable absent some norm? Approaches to growth magnitudes: Performance standards Vertical scale with interval properties Learning progressions (qualitative growth)
How much growth? Technical Considerations Performance Standards Limitations Strengths Few levels, mask Anchors reference substantial range within points for discussions levels thus masking about performance student growth within Growth is embedded level in accountability metric Vary greatly in stringency from state to state so that “proficient” performance lacks meaning
How much growth? Technical Considerations Scale Scores Strengths Limitations Difficult to interpret or Semi-continuous scores (many score points) explain to users Vertical scales are hard Can be used to create vertical scales across to defend grade levels Claims of interval Give the appearance of measurement interval scales needed properties don’t hold to by some analytical close scrutiny models
How much growth? Technical Considerations Vertical Scale Vertical & Interval scales required for some analytic techniques: Gain score calculation (magnitude of growth) Growth curve analysis (rate of growth) (e.g., Willett & Singer, 2003) Vertical & Interval scales required for some questions: Matthew effects: Do higher achievers grow faster than lower achievers? Growth rates relative to student age: Do students grow more in later grades than earlier grades?
How much growth? Technical Considerations Vertical Scale Vertical and/or Interval scales NOT required for some analytic techniques: Value-Added analyses: Most require interval, but not vertical, scale. See Ballou (2008), Briggs & Betebenner (2009). Auto-regressive analyses, growth norms Vertical and/or Interval scales NOT required for some questions: Is a student’s progress (ab)normal? Is a student’s growth sufficient to put them on track to reach/maintain proficiency? See Yen (2007) for an excellent list of questions
How much growth? Technical Considerations Magnitudes versus Norms Physical growth Two Growth Quantities 9 year old boy grew 5 inches in past year Magnitude of growth Average increase in height Relative amount of growth for boys between years 8 and 9 is 4 inches How much growth? Achievement growth 4 th grader grew 25 scale People expect an answer score points since 3 rd grade of magnitude Average 4 th grade scale People need magnitude score is 21 points higher embedded within a norm than average 3 rd grade score
How much growth? Technical Considerations Growth norms Although normative comparisons are spurned by criterion-referenced and standards-based measurement advocates, norms can provide a useful interpretive framework, especially in the interpretation of student growth “Scratch a criterion and you find a norm” W. H. Angoff (1974)
Longitudinal Data Analysis Issues Technical Considerations
Many Questions Technical Considerations How much annual growth did this (these) student(s) make in reading? Is (Are) this (these) student(s) making sufficient growth to reach/maintain desired achievement targets? (Growth-to-standard & Growth Model Pilot Program) Are students in particular subgroups (e.g., minority students) making as much progress as other students? How much did this teacher/school contribute to students’ growth over the last year? (Value-Added) Again, see Yen (2007) for an excellent list of questions
Many Techniques Technical Considerations Numerous data analysis techniques for use with longitudinal data: Gain scores (suitable scale required) Cross-tabulation based upon prior and current categorical achievement level attainment (e.g., value-tables, transition matrices) Regression based approaches: growth-curve analysis (HLM), fixed/mixed-effects models, growth norms
Questions 1 st , Analyses 2 nd Technical Considerations Different growth analysis techniques often address different questions Different questions lead to different conversations which lead to different uses and outcomes “It is better to have an approximate answer to the right question than a precise answer to the wrong question.” J. W. Tukey
Model Purpose Technical Considerations Three general uses associated with statistical models (Berk, 2004): Description: An account of the data. Model is true to the extent that it is useful. Model quality judged by craftsmanship (de Leeuw, 2004) Inference: Sample to Population. Model is true to the extent that the assumed chance process reflects reality (super-population fallacy) Causality: A causes B to happen. Model is true to the extent that plausible causal theory exists and design criteria are met Models are rarely descriptive despite minimal requirements Inference and causality require information external to the data. Can’t be validated solely from data Models are often causal in nature but rarely meet rigorous criteria necessary for such inferences
Value-Added Models Technical Considerations Causality Value-Added Models (e.g., EVAAS) are a frequently discussed type of growth model Value-Added Models attempt to quantify the portion of student progress attributable, usually to a teacher or school Value-Added is about the inferences made and not the actual model Causal attributions make value-added models well suited for accountability discussions In the absence of random assignment causal attributions are always suspect and subject to challenges (see, for example, Raudenbush, 2004; Rubin, Stuart & Zanutto, 2004)
Value-Added Models Technical Considerations Causality Value-added models return norm-referenced effectiveness quantities With regard to schools, quantities indicate whether a school is significantly more or less effective than the mean school effectiveness in the district or state In a standards based assessment environment, how much effectiveness is enough? Especially important in light of universal proficiency policy mandates Growth-to-standard models created to provide criterion-referenced growth models
Growth Model Pilot Program Technical Considerations Growth-to-standard In response to requests for growth model use as part of AYP, USED allowed states to apply to use growth models Fifteen states had models accepted Models required to adhere to the “bright line principle” of universal proficiency (growth-to- standard) Yen (2009) provides an excellent overview of the models Growth-to-standard models returned, in general, results that closely aligned with AYP status results.
Growth versus Value-Added Models Technical Considerations Description & Causality Growth measures are descriptive Accountability has skewed discussions of growth from description toward responsibility (i.e., causality) All measures (even VAM) are potentially descriptive. However, some measures are specially crafted for causal inference/attribution Good descriptive measures are interpretable, informative and capable of multiple uses
Recommend
More recommend