Can the Assessment Consortia Meet the Intended Comparability Goals? OR What Types of Comparability Goals Can Be Met? NCME COMMITTEE ON ASSESSMENT POLICY AND PRACTICE P R E S E N T A T I O N A T C C S S O ’ S N C S A C O N F E R E N C E J U N E , 2 0 1 1
NCME Committee on Assessment Policy 2 Formed as an Ad-Hoc committee May, 2010 Became a standing committee in April 2011 Purposes: To use the measurement expertise within NCME to influence and hopefully improve educational policies based on assessment results To increase the visibility of NCME so that it might be seen as a “go to” organization for assessment -related policy issues The Committee: Co-Chairs: Kristen Huff and Scott Marion Members: Judy Koenig, Joseph Martineau, Cornelia Orr, Christina Schneider, Zachary Warner NCME Assessment Policy Committee Comparability Symposium
The Committee’s Initial Approach 3 Recognizing the committee is completely comprised of volunteers and that NCME doesn’t have a staff for this, the committee decided to start modestly. Focus on a single issue, using the following approaches: An invited symposium at the NCME annual meeting with leading researchers Symposium at CCSSO-NCSA Follow-up policy brief Other? NCME Assessment Policy Committee Comparability Symposium
This Year’s Focus 4 The challenges of designing for and producing comparable score inferences across states, consortia, and countries NCME Symposium Participants: Authors: Mike Kolen, Suzanne Lane, Joseph Martineau Discussants: Bob Brennan and Deb Harris NCME Assessment Policy Committee Comparability Symposium
A Simplified View of Comparability 5 The Problem: State assessments produced results that were not comparable across states Positive state results were clearly not trusted The “Solution:” Common content standards Common large-scale assessments NCME Assessment Policy Committee Comparability Symposium
A Simplified Theory of Action 6 Accurate cross-state comparisons Can be used to benchmark within-state performance To motivate and guide reforms (and policies) And lead to improved student achievement Trying to put a positive spin on the call for comparability NCME Assessment Policy Committee Comparability Symposium
What the policy makers wanted…. 7 Governors and state chiefs (many of them) wanted a single assessment consortium Not clear about all the reasons, but clearly some had to do with efficiency of effort, lack of competition, and to facilitate comparisons across states… What they got… Two consortia with some very different ideas about large-scale assessment NCME Assessment Policy Committee Comparability Symposium
What is Meant by “Comparability” (Brennan)? 8 The Public and Policy-makers: Doesn’t matter which form is used Same score means the same thing for all students Math is math is math … Psychometricians: All sorts of things (equating, vertical scaling, scaling to achieve comparability, projection, moderation, concordance, judgmental standard setting) Some degree/type of comparability (i.e., linking) is attainable in practically any context Comparability “scale”: very strong to very weak Ideal: “matter of indifference” criterion NCME Assessment Policy Committee Comparability Symposium
What comparisons? 9 The term comparability is getting thrown around a lot and many appear focused on across-consortium and international comparability Brennan: What should be the comparability goals for the two consortia, and how should these goals be pursued? NCME Assessment Policy Committee Comparability Symposium
What comparisons? 10 Many appear to believe that because of a “common assessment,” within-consortium comparability is a given It’s not! Following Brennan’s reformulation, we argue that until within-consortium comparability can be assured, we should not distract ourselves with across-consortium comparability Further, most accountability designs, especially those that incorporate measures of student growth must be based on strong within-consortium (actually, within-state) year-to-year comparability NCME Assessment Policy Committee Comparability Symposium
Key interpretative Challenges 11 The central challenges that Mislevy (1992) outlined in Linking Educational Assessments are still with us as in the current Race to the Top environment: discerning the relationships among the evidence the assessments provide about conjectures of interest, and figuring out how to interpret this evidence correctly” (p. 21). In other words, just because we can statistically link, doesn’t mean we will be able to interpret these links NCME Assessment Policy Committee Comparability Symposium
Conditions for comparability 12 Mislevy and Holland focused considerable attention on the match between the two test blueprints and for good reason, because this is a crucial concern if we are to compare student-level scores from two different sets of tests There is a good reason that most end-of-year state tests are referred to as “standardized tests” NCME Assessment Policy Committee Comparability Symposium
What do we know about Comparability? 13 Score interchangeability is (approximately) achievable only under the strongest form of comparability ( equating in a technical sense) Weaker types of comparability often/usually do not lead to group invariance The degree of comparability desired/required should reflect the intended use of scores Score interchangeability and group invariance: Crucial sometimes Often not achievable Not so important sometimes NCME Assessment Policy Committee Comparability Symposium
Minimum Set of Conditions 14 Common test blueprint Specified test administration windows Common administration rules/procedures Standard accommodation policies Same inclusion rules Clearly specified and enforced security protocols Specific computer hardware and software (for CBT and CAT), or at least a narrow range of specifications. NCME Assessment Policy Committee Comparability Symposium
What’s working for within -consortium comparability? 15 Common, fairly well-articulated content standards that all consortium states are adopting All states within-consortia have promised to implement a common test(s) All states, within-consortium, have promised to adopt common performance-level descriptors and common achievement cutscores. NCME Assessment Policy Committee Comparability Symposium
Challenges for within-year, within-consortium 16 High stakes uses of test scores Significantly different fidelity of implementation of CCSS within state and across states Time frame is very short Demand for innovative item types is unrelenting Huge testing windows Large variability in school calendars Assessments designed to measure a wide range of K & S Differences in inclusion policy (potentially?) Differences in accommodations policies and certainly practices Differences in mode (computer-paper) Differences in hardware platforms NCME Assessment Policy Committee Comparability Symposium
Challenges across years (within consortium) 17 All of the previous, plus: Mixed format assessments, especially through course assessments For CR prompts/performance tasks, large p x t interaction is pervasive and small number of prompts is common. These two facts virtually guarantee that scores for CR forms can be only weakly comparable Quality of field testing (high quality sampling) to produce stable item parameters for pre-equating designs (at least SBAC) These are fairly significant challenges We have seen serious problems with year-to-year equating within-states Doing this in 20+ states is an not insignificant extension NCME Assessment Policy Committee Comparability Symposium
What does all of this Suggest About how to Proceed? 18 Need to help policy maker prioritize comparability goals Trying to assure comparability on the back-end will fail if we have not designed the assessments with the types of comparisons that we’d like to make in mind In attempting to achieve comparability, no type/amount of statistical manipulation of data can make up for mediocre test development Don’t promise more than can be delivered, but do deliver a high degree of comparability for scores that will be used for accountability (i.e., within- consortium, across years) NCME Assessment Policy Committee Comparability Symposium
How to Proceed (continued) 19 For accountability scores, data must be collected that: Permit obtaining score “equivalence” tables/relationships Facilitate examining the degree to which comparability goals have been attained Provide users with readily interpretable statements about comparability. View the attainment of comparability as a goal to be pursued, more than an end product to be attained at one point in time. NCME Assessment Policy Committee Comparability Symposium
Across Consortium Comparability 20 Tukey: “It is better to have an approximate answer to the right question than an exact answer to the wrong question.” In other words, we need to be humble about the types of comparisons that we can and should make in this arena NCME Assessment Policy Committee Comparability Symposium
Recommend
More recommend