bias reduction in the estimation of rasch models outline
play

Bias reduction in the estimation of Rasch models Outline David Firth - PowerPoint PPT Presentation

Rasch Models Maximum likelihood estimation Bias reduction Parameterization Application Discussion References Bias reduction in the estimation of Rasch models Outline David Firth 1 Rasch Models 1 d.firth@warwick.ac.uk Ioannis Kosmidis 21


  1. Rasch Models Maximum likelihood estimation Bias reduction Parameterization Application Discussion References Bias reduction in the estimation of Rasch models Outline David Firth 1 Rasch Models 1 d.firth@warwick.ac.uk Ioannis Kosmidis 21 Maximum likelihood estimation 2 i.kosmidis@ucl.ac.uk Bias reduction 3 Heather Turner 1 ht@heatherturner.net Parameterization 4 1 Department of Statistics, University of Warwick Application 5 2 Department of Statistical Science, University College London Discussion 6 Psychoco, 2012 Rasch Models Maximum likelihood estimation Bias reduction Parameterization Application Discussion References Rasch Models Maximum likelihood estimation Bias reduction Parameterization Application Discussion References One-parameter logistic regression Rasch models 1PL model The 1PL Rasch model: (a special logistic regression model) π is log = η is = α i + γ s ( i = 1 , . . . , I ; s = 1 , . . . , S ) , 1 − π is Independent Bernoulli responses in a subject-item arrangement: Y is is the outcome of the s th subject on the i th item. where α i , γ s are uknown model parameters, and η is the predictor for the 1PL model. π is = P ( Y is = 1) : the probability that s th subject succeeds on the i th item, ( i = 1 , . . . , I ; s = 1 , . . . , S ) . Parameter vector: θ = ( α 1 , . . . , α I , γ 1 , . . . , γ S ) T , Parameter interpretation: α i (or − α i ): measure of the “ease” (or “difficulty”) of the i th item, γ s : the “ability” of the s th subject.

  2. Rasch Models Maximum likelihood estimation Bias reduction Parameterization Application Discussion References 2PL model: 5 subjects - 3 items Two-parameter logistic regression 2PL model Item 1: α 1 = 2, β 1 = 8 0.8 IRF 0.4 γ 1 γ 2 γ 3 γ 4 γ 5 0.0 ● ● ● ● ● The 2PL Rasch model: −10 −5 0 5 10 γ π is log = ˜ η is = α i + β i γ s ( i = 1 , . . . , I ; s = 1 , . . . , S ) , 1 − π is Item 2: α 2 = 0, β 2 = 2 0.8 where β i is a “discrimination” parameter for the i th item, and ˜ η is IRF 0.4 the predictor for the 2PL model. γ 1 γ 2 γ 3 γ 4 γ 5 0.0 ● ● ● ● ● θ = ( α 1 , . . . , α I , β 1 , . . . , β I , γ 1 , . . . , γ S ) T . −10 −5 0 5 10 Parameter vector: ˜ γ The larger | β i | is the steeper is the Item-Response Function (IRF) Item 3: α 3 = −2, β 3 = −1 (the map from γ s to π is ). 0.8 IRF 0.4 γ 1 γ 2 γ 3 γ 4 γ 5 0.0 ● ● ● ● ● −10 −5 0 5 10 γ Rasch Models Maximum likelihood estimation Bias reduction Parameterization Application Discussion References 1PL model: 5 subjects - 3 items Advantages Maximum likelihood estimation Item 1: α 1 = 2 0.8 IRF 0.4 γ 1 γ 2 γ 3 γ 4 γ 5 0.0 ● ● ● ● ● −10 −5 0 5 10 γ Item 2: α 2 = 0 → ML estimation is straighforward using generic tools (e.g. gnm uses a quasi Newton-Raphon iteration). 0.8 IRF 0.4 γ 1 γ 2 γ 3 γ 4 γ 5 → Generic inferential procedures (LR tests, likelihood-based confidence 0.0 ● ● ● ● ● intervals). −10 −5 0 5 10 γ Item 3: α 3 = −2 0.8 IRF 0.4 γ 1 γ 2 γ 3 γ 4 γ 5 0.0 ● ● ● ● ● −10 −5 0 5 10 γ

  3. Rasch Models Maximum likelihood estimation Bias reduction Parameterization Application Discussion References Rasch Models Maximum likelihood estimation Bias reduction Parameterization Application Discussion References Issues Issues Maximum likelihood estimation - Issues Maximum likelihood estimation - Issues Useful asymptotic frameworks (e.g. information grows with the As with many models for binomial responses, there is positive number of subjects or number of items): probability of boundary ML estimates. → Full maximum likelihood generally delivers inconsistent → Numerical issues in estimation. estimates. (Andersen, 1980, Chapter 6) → Problems with asymptotic inference (e.g. Wald-based → Loss of performance (e.g. coverage) of tests, confidence inferences). intervals. → Add small constants to the responses in the spirit of Haldane (1955) → (Partial) Solutions: Conditional likelihoods, integrated likelihoods, (?) modified profile likelihoods → can be hard to apply for 2PL due to nonlinearity. Rasch Models Maximum likelihood estimation Bias reduction Parameterization Application Discussion References Rasch Models Maximum likelihood estimation Bias reduction Parameterization Application Discussion References Adjusted score functions Adjusted score functions Bias-reducing adjusted score functions Bias-reducing adjusted score functions Firth (1993): appropriate adjustment A ( θ ) to the score vector for getting estimators with smaller asymptotic bias than ML: → In binomial/multinomial response GLMs, the reduced-bias estimates ∇ θ l ( θ ) + A ( θ ) = 0 . have been found to be always finite (Heinze and Schemper 2002; Bull et al. 2002; Zorn 2005; Kosmidis 2009) Applicable to models where the infromation on the parameters → Easy implementation: increases with the number of observations ( dim θ is independent of Iterative bias correction (Kosmidis and Firth 2010) the number of observations). Iterated ML fits on pseudo-data (Kosmidis and Firth 2011) → Not the case for Rasch models under useful asymptotic frameworks. → But expect less-biased estimators than ML.

  4. Rasch Models Maximum likelihood estimation Bias reduction Parameterization Application Discussion References Rasch Models Maximum likelihood estimation Bias reduction Parameterization Application Discussion References Adjusted score functions Adjusted score functions Adjusted score equations for 1PL Adjusted score equations for 1PL Adjusted score equations for 1PL (Firth 1993, logistic regressions) Adjusted score equations for 1PL (Firth 1993, logistic regressions) I S I S � y is + 1 � � y is + 1 � � � � � 0 = 2 h is + (1 + h is ) π is z ist ( t = 1 , . . . , I + S ) , 0 = 2 h is + (1 + h is ) π is z ist ( t = 1 , . . . , I + S ) , i =1 s =1 i =1 s =1 where where z ist = ∂η is /∂θ t is the ( s, t ) th element of the S × ( I + S ) matrix Z i , z ist = ∂η is /∂θ t is the ( s, t ) th element of the S × ( I + S ) matrix Z i , h is is the s th diagonal element of H i = Z i F − 1 Z T h is is the s th diagonal element of H i = Z i F − 1 Z T i Σ r (“hat value” i Σ r (“hat value” for the ( i, s ) th observation), for the ( i, s ) th observation), F = � T F = � T i =1 Z T i =1 Z T i Σ i Z i (the Fisher information), i Σ i Z i (the Fisher information), Σ i = diag { v i 1 , . . . , v iS } , v is = var( Y is ) Σ i = diag { v i 1 , . . . , v iS } , v is = var( Y is ) Rasch Models Maximum likelihood estimation Bias reduction Parameterization Application Discussion References Rasch Models Maximum likelihood estimation Bias reduction Parameterization Application Discussion References Adjusted score functions Adjusted score functions Adjusted score equations for 2PL Adjusted score equations for 2PL Adjusted score equations for 2PL (Kosmidis and Firth 2009, GNMs) Adjusted score equations for 2PL (Kosmidis and Firth 2009, GNMs) I S I S � y is + 1 � � y is + 1 � � � h is + (1 + ˜ ˜ � � h is + (1 + ˜ ˜ 0 = h is ) π is + c is v is z ist ˜ ( t = 1 , . . . , 2 I + S ) , 0 = h is ) π is + c is v is z ist ˜ ( t = 1 , . . . , 2 I + S ) , 2 2 i =1 s =1 i =1 s =1 where where η is /∂ ˜ η is /∂ ˜ θ t is the ( s, t ) th element of the S × (2 I + S ) matrix ˜ θ t is the ( s, t ) th element of the S × (2 I + S ) matrix ˜ z ist = ∂ ˜ ˜ Z i , ˜ z ist = ∂ ˜ Z i , ˜ ˜ h is is the “hat value” for the ( i, s ) th observation, h is is the “hat value” for the ( i, s ) th observation, ˜ i =1 ˜ i Σ i ˜ ˜ i =1 ˜ i Σ i ˜ F = � T F = � T Z T Z T Z i , Z i , Σ i = diag { v i 1 , . . . , v iS } , v is = var( Y is ) = π is (1 − π is ) , Σ i = diag { v i 1 , . . . , v iS } , v is = var( Y is ) = π is (1 − π is ) , c is is the asymptotic covariance of the ML estimators of β i and γ s c is is the asymptotic covariance of the ML estimators of β i and γ s (from the components of ˜ (from the components of ˜ F − 1 ). F − 1 ).

Recommend


More recommend