flexible latent trait metrics
play

Flexible Latent Trait Metrics An Application of the Filtered - PowerPoint PPT Presentation

Flexible Latent Trait Metrics An Application of the Filtered Monotonic Polynomial Item Response Model Leah Feuerstahler University of California, Berkeley 1/78 Overview Premise : In many applications of item response theory (IRT), reported


  1. Flexible Latent Trait Metrics An Application of the Filtered Monotonic Polynomial Item Response Model Leah Feuerstahler University of California, Berkeley 1/78

  2. Overview Premise : In many applications of item response theory (IRT), reported scores are nonlinear transformations of the IRT θ estimates. Goal : Develop an IRT framework such that θ is the continuous metric on which scores are reported. 2/78

  3. Overview Probability 0.0 0.2 0.4 0.6 0.8 1.0 −6 −4 −2 θ 0 2 4 6 Probability 0.0 0.2 0.4 0.6 0.8 1.0 0 20 True Score (T) 40 60 80 3/78

  4. Overview Premise : In many applications of item response theory (IRT), reported scores are nonlinear transformations of the IRT θ estimates. Goal : Develop an IRT framework such that θ is the continuous metric on which scores are reported. 1 Why • Why is the IRT θ metric often transformed? • Why is an IRT for transformed metrics needed? 2 How • Filtered monotonic polynomial (FMP) item response model • Item parameter linking 3 Applications • Functional metric transformations • Estimated metric transformations 4 Considerations, Limitations, Future Directions 4/78

  5. 1 Why 2 How 3 Applications 4 Considerations, Limitations, Future Directions 5/78

  6. Scaling [T]he process of associating numbers or other ordered indicators with the performance of examinees. 1 Scaled scores are often transformations of number-correct scores or IRT ˆ θ . What are the criteria for selecting a scale? Examples: 1 Facilitates appropriate interpretation by the public 2 Anchored to external indicators 3 Consistent with intuitions about how variables should behave 1 Kolen and Brennan (2014, p. 371) 6/78

  7. Scaling 1 Facilitates appropriate interpretation by the public • Normalized scores for a representative sample • z -scores (mean 0, sd 1) • T -scores (mean 50, sd 10) • Scores range from 0 to test length, or 0 to 100 • Domain Scores 1 • Optimal Scores 2 • Equated number-correct 3 • Constant measurement error • ACT scores (arcsine transformation of number-correct) 4 • Constant IRT information 5 1Bock, Thissen, & Zimowski (1997) 2Ramsay & Wiberg (2017) 3Stocking (1996) 4Kolen (1988) 5Samejima (1979) 7/78

  8. Scaling 2 Anchored to external indicators • Expected number-correct scores 1 • Grade-equivalent scores 2 • Equating with a different test form • Linear relationship with other variables (intended use) 3 • Dollars are nonlinearly related to quality of life 4 • Typed words per minute is nonlinearly related to practice/effort 5 1Stocking (1996) 2Schulz & Nicewander (1997) 3Nunnally (1967, p. 28) 4Jones (1971) 5Angoff (1971, pp. 509-510) 8/78

  9. Scaling 3 Consistent with beliefs about how variables should behave • Normally distributed ability 1 • Uncorrelated difficulty and discrimination parameters 2 • Does variability of achievement increase or decrease with grade level? • With Thurstonian scales, variability usually increases with grade level • IRT scales often exhibit “scale shrinkage” 3 • “Armchair” theorizing can lead to conflicting answers 4 • Interval level measurement “in some sense” 5 1Thurstone (1925) 2Lord (1975) 3Camilli (1988) 4Yen (1986, p. 312) 5Kolen & Brennan (2014, p. 374) 9/78

  10. Interval vs. Ordinal Stevens (1946): Nominal Ordinal Interval Ratio Scale type defined in terms of admissible operations. Ordinal Interval Any monotonic transformation Only linear transformations Invariant ordering of observations Meaningful intervals Median, Percentiles Mean, Standard deviation Hardness of minerals Temperature 10/78

  11. Interval vs. Ordinal Interval-level measurement is highly desirable for educational and psychological tests. What is actually MEANT by interval-level measurement? • Only linear transformations are admissible given the IRT model • The (Rasch) model fits • Declaring that scores are equal-interval ‘in some sense’ 1 • Scores are linearly related to the underlying construct 2 1Kolen & Brennan (2014, p. 374) 2Yen (1986) 11/78

  12. Where Does the IRT θ Come From? What do item response models assume? Simple case: Mokken’s (1971) monotone homogeneity model (MHM) assumes only 1 Unidimensionality 2 Local independence 3 Monotonicity If the MHM assumptions hold, individuals can be ordered uniquely. 12/78

  13. Where Does the IRT θ Come From? Under the MHM assumptions, any monotonic function of the latent trait implies an equally admissible item response model. 1 Suppose an IRT model with item response function (IRF) P i ( θ ) . For a continuous monotonic function h , where θ ⋆ = h − 1 ( θ ) , another item response model exists such that i [ h − 1 ( θ )] = P ⋆ P i ( θ ) = P ⋆ i ( θ ⋆ ) . Any reason to prefer θ to θ ⋆ ? 1Lord (1975) 13/78

  14. Where Does the IRT θ Come From? Under the MHM assumptions, an infinite number of IRT models can fit data equally well. Identification restrictions are needed in practice. Two main solutions: 1 Parametric IRT (PIRT) • Specify the IRF shape • (Usually) determines scale up to linear transformations • Assumes that the chosen IRF shape(s) fits all scale items 2 Nonparametric IRT (NIRT) • Specify the latent trait distribution (e.g., standard normal) • Often conditions on (a monotonic transformation of) sum scores • Nonparametrically estimates the IRF shape 14/78

  15. Nonlinear Transformations of the IRT Metric What does not change? • Ordering of examinees • Percentile rankings • Relative efficiency of item response curves What does change? • Item and test information • Standard errors • Confidence intervals • Reliability 15/78

  16. Item Information Metric transformations can have dramatic effects on information functions. Lord (1974, p. 353): I i ( θ ⋆ ) I i ( θ ) = � 2 � ∂h ( θ ⋆ ) ∂θ ⋆ The trait level that maximizes I i ( θ ) need not be the corresponding trait level that maximizes I i ( θ ⋆ ) . 16/78

  17. Metric Transformations Probability 0.0 0.2 0.4 0.6 0.8 1.0 A −3 −2 −1 θ 0 1 2 3 Probability 0.0 0.2 0.4 0.6 0.8 1.0 B −3 −2 −1 θ * 0 1 2 3 17/78

  18. Metric Transformations Information 0.0 0.2 0.4 0.6 0.8 1.0 C −3 −2 −1 θ 0 1 2 3 Information 0.0 0.2 0.4 0.6 0.8 1.0 D −3 −2 −1 θ * 0 1 2 3 18/78

  19. Relative Efficiency The relative efficiency of two information functions does not change with metric transformations. 1 RE = I ⋆ 1 ( θ ⋆ n ) n ) = I 1 ( θ n ) I ⋆ 2 ( θ ⋆ I 2 ( θ n ) The relative information provided by each item is invariant to monotonic transformations of the latent trait. The maximally informative item for a trait level is invariant to metric transformations. 1Lord (1974, 1980, p. 89) 19/78

  20. Relative Efficiency Probability 0.0 0.2 0.4 0.6 0.8 1.0 A −3 −2 −1 θ 0 1 2 3 Probability 0.0 0.2 0.4 0.6 0.8 1.0 B −3 −2 −1 θ * 0 1 2 3 20/78

  21. Relative Efficiency Information 0.0 0.5 1.0 1.5 2.0 C −3 −2 −1 θ 0 1 2 3 Information 0.0 0.5 1.0 1.5 2.0 D −3 −2 −1 θ * 0 1 2 3 21/78

  22. Why Specify IRT on a Transformed Metric? • Parsimony (avoid multi-step analyses) • Many scale transformations (e.g., quadratic) do not enforce monotonicity • Computerized adaptive testing (CAT) • Many item selection and termination rules are metric-dependent • CAT requires computationally efficient methods • No need to repeatedly solve for transformed quantities • Statistical properties (e.g., bias) of ˆ θ can change with metric transformations 1 • Appropriately account for measurement error when evaluating the relationship between the latent variable and external variables 1Yi et al. (2001) 22/78

  23. Desiderata for a Flexible-Metric IRT • Continuous, invertible metric transformations • Flexible, ability to express any continuous monotonic transformation • Model parameters that are readily portable to new contexts • Closed-form derivatives for computing information, standard errors, trait estimates • Reduction to commonly used IRT models (Rasch, 2PL, 3PL, etc.) 23/78

  24. 1 Why 2 How 3 Applications 4 Considerations, Limitations, Future Directions 24/78

  25. Filtered Monotonic Polynomial IRT Proposed as a new NIRT model by Liang & Browne (2015). Based on the work of Elphinstone (1983, 1985). P i ( θ ) = H [ m i ( θ )] { 1 + exp[ − m i ( θ )] } − 1 = where m i ( θ ) = b 0 i + b 1 i θ + b 2 i θ 2 + · · · + b 2 k i +1 ,i θ 2 k i +1 • b i = ( b 0 i , b 1 i , . . . , b 2 k i +1 ,i ) ′ : item parameters/polynomial coefficients • k i : item complexity parameter, higher k i → greater flexibility • If k i = 0 , FMP reduces to 2PL (slope-intercept parameterization) 25/78

  26. Filtered Monotonic Polynomial IRT With high enough k i , FMP can closely approximate any IRF that meets the MHM assumptions. Closeness of approximation can be characterized by the root integrated mean squared error (RIMSE) 1 : �� [ ˆ P i ( θ ) − P i ( θ )] 2 g ( θ ) dθ RIMSE i = g ( θ ) is the standard normal distribution 1Ramsay (1991) 26/78

  27. Example FMP Approximations RIMSE i = { . 034 , . 034 , . 004 } for k i = { 0 , 1 , 2 } Four−Parameter Model A 1.0 0.8 Probability 0.6 0.4 True 0.2 k i = 0 k i = 1 k i = 2 0.0 −3 −2 −1 0 1 2 3 θ 27/78

Recommend


More recommend