Can the Assessment Consortia Meet the Intended Comparability Goals? OR What Types of Comparability Goals Can Be Met? NCME COMMITTEE ON ASSESSMENT POLICY AND PRACTICE P R E S E N T A T I O N A T C C S S O S N C S A C O N F E R E N C E

  1. Can the Assessment Consortia Meet the Intended Comparability Goals? OR What Types of Comparability Goals Can Be Met? NCME COMMITTEE ON ASSESSMENT POLICY AND PRACTICE P R E S E N T A T I O N A T C C S S O ’ S N C S A C O N F E R E N C E J U N E , 2 0 1 1

  2. NCME Committee on Assessment Policy 2  Formed as an Ad-Hoc committee May, 2010  Became a standing committee in April 2011  Purposes:  To use the measurement expertise within NCME to influence and hopefully improve educational policies based on assessment results  To increase the visibility of NCME so that it might be seen as a “go to” organization for assessment -related policy issues  The Committee:  Co-Chairs: Kristen Huff and Scott Marion  Members: Judy Koenig, Joseph Martineau, Cornelia Orr, Christina Schneider, Zachary Warner NCME Assessment Policy Committee Comparability Symposium

  3. The Committee’s Initial Approach 3  Recognizing the committee is completely comprised of volunteers and that NCME doesn’t have a staff for this, the committee decided to start modestly.  Focus on a single issue, using the following approaches:  An invited symposium at the NCME annual meeting with leading researchers  Symposium at CCSSO-NCSA  Follow-up policy brief  Other? NCME Assessment Policy Committee Comparability Symposium

  4. This Year’s Focus 4  The challenges of designing for and producing comparable score inferences across states, consortia, and countries  NCME Symposium Participants:  Authors: Mike Kolen, Suzanne Lane, Joseph Martineau  Discussants: Bob Brennan and Deb Harris NCME Assessment Policy Committee Comparability Symposium

  5. A Simplified View of Comparability 5  The Problem:  State assessments produced results that were not comparable across states  Positive state results were clearly not trusted  The “Solution:”  Common content standards  Common large-scale assessments NCME Assessment Policy Committee Comparability Symposium

  6. A Simplified Theory of Action 6  Accurate cross-state comparisons  Can be used to benchmark within-state performance  To motivate and guide reforms (and policies)  And lead to improved student achievement  Trying to put a positive spin on the call for comparability NCME Assessment Policy Committee Comparability Symposium

  7. What the policy makers wanted…. 7  Governors and state chiefs (many of them) wanted a single assessment consortium  Not clear about all the reasons, but clearly some had to do with efficiency of effort, lack of competition, and to facilitate comparisons across states…  What they got…  Two consortia with some very different ideas about large-scale assessment NCME Assessment Policy Committee Comparability Symposium

  8. What is Meant by “Comparability” (Brennan)? 8  The Public and Policy-makers:  Doesn’t matter which form is used  Same score means the same thing for all students  Math is math is math …  Psychometricians:  All sorts of things (equating, vertical scaling, scaling to achieve comparability, projection, moderation, concordance, judgmental standard setting)  Some degree/type of comparability (i.e., linking) is attainable in practically any context  Comparability “scale”: very strong to very weak  Ideal: “matter of indifference” criterion NCME Assessment Policy Committee Comparability Symposium

  9. What comparisons? 9  The term comparability is getting thrown around a lot and many appear focused on across-consortium and international comparability  Brennan: What should be the comparability goals for the two consortia, and how should these goals be pursued? NCME Assessment Policy Committee Comparability Symposium

  10. What comparisons? 10  Many appear to believe that because of a “common assessment,” within-consortium comparability is a given  It’s not! Following Brennan’s reformulation, we argue that until within-consortium comparability can be assured, we should not distract ourselves with across-consortium comparability  Further, most accountability designs, especially those that incorporate measures of student growth must be based on strong within-consortium (actually, within-state) year-to-year comparability NCME Assessment Policy Committee Comparability Symposium

  11. Key interpretative Challenges 11  The central challenges that Mislevy (1992) outlined in Linking Educational Assessments are still with us as in the current Race to the Top environment:  discerning the relationships among the evidence the assessments provide about conjectures of interest, and  figuring out how to interpret this evidence correctly” (p. 21).  In other words, just because we can statistically link, doesn’t mean we will be able to interpret these links NCME Assessment Policy Committee Comparability Symposium

  12. Conditions for comparability 12  Mislevy and Holland focused considerable attention on the match between the two test blueprints and for good reason, because this is a crucial concern if we are to compare student-level scores from two different sets of tests  There is a good reason that most end-of-year state tests are referred to as “standardized tests” NCME Assessment Policy Committee Comparability Symposium

  13. What do we know about Comparability? 13  Score interchangeability is (approximately) achievable only under the strongest form of comparability ( equating in a technical sense)  Weaker types of comparability often/usually do not lead to group invariance  The degree of comparability desired/required should reflect the intended use of scores  Score interchangeability and group invariance:  Crucial sometimes  Often not achievable  Not so important sometimes NCME Assessment Policy Committee Comparability Symposium

  14. Minimum Set of Conditions 14  Common test blueprint  Specified test administration windows  Common administration rules/procedures  Standard accommodation policies  Same inclusion rules  Clearly specified and enforced security protocols  Specific computer hardware and software (for CBT and CAT), or at least a narrow range of specifications. NCME Assessment Policy Committee Comparability Symposium

  15. What’s working for within -consortium comparability? 15  Common, fairly well-articulated content standards that all consortium states are adopting  All states within-consortia have promised to implement a common test(s)  All states, within-consortium, have promised to adopt common performance-level descriptors and common achievement cutscores. NCME Assessment Policy Committee Comparability Symposium

  16. Challenges for within-year, within-consortium 16  High stakes uses of test scores  Significantly different fidelity of implementation of CCSS within state and across states  Time frame is very short  Demand for innovative item types is unrelenting  Huge testing windows  Large variability in school calendars  Assessments designed to measure a wide range of K & S  Differences in inclusion policy (potentially?)  Differences in accommodations policies and certainly practices  Differences in mode (computer-paper)  Differences in hardware platforms NCME Assessment Policy Committee Comparability Symposium

  17. Challenges across years (within consortium) 17  All of the previous, plus:  Mixed format assessments, especially through course assessments  For CR prompts/performance tasks, large p x t interaction is pervasive and small number of prompts is common. These two facts virtually guarantee that scores for CR forms can be only weakly comparable  Quality of field testing (high quality sampling) to produce stable item parameters for pre-equating designs (at least SBAC)  These are fairly significant challenges  We have seen serious problems with year-to-year equating within-states  Doing this in 20+ states is an not insignificant extension NCME Assessment Policy Committee Comparability Symposium

  18. What does all of this Suggest About how to Proceed? 18  Need to help policy maker prioritize comparability goals  Trying to assure comparability on the back-end will fail if we have not designed the assessments with the types of comparisons that we’d like to make in mind  In attempting to achieve comparability, no type/amount of statistical manipulation of data can make up for mediocre test development  Don’t promise more than can be delivered, but do deliver a high degree of comparability for scores that will be used for accountability (i.e., within- consortium, across years) NCME Assessment Policy Committee Comparability Symposium

  19. How to Proceed (continued) 19  For accountability scores, data must be collected that:  Permit obtaining score “equivalence” tables/relationships  Facilitate examining the degree to which comparability goals have been attained  Provide users with readily interpretable statements about comparability.  View the attainment of comparability as a goal to be pursued, more than an end product to be attained at one point in time. NCME Assessment Policy Committee Comparability Symposium

  20. Across Consortium Comparability 20  Tukey: “It is better to have an approximate answer to the right question than an exact answer to the wrong question.”  In other words, we need to be humble about the types of comparisons that we can and should make in this arena NCME Assessment Policy Committee Comparability Symposium


