6 17 2011
play

6/17/2011 Screening Linking Items with Varied approaches C. Allen - PDF document

6/17/2011 Screening Linking Items with Varied approaches C. Allen Lau, Ph.D. Pearson Liru Zhang, Ph.D. Delaware Department of Education Equating Equating is a statistical process used to adjust scores on test forms so that scores on these


  1. 6/17/2011 Screening Linking Items with Varied approaches C. Allen Lau, Ph.D. Pearson Liru Zhang, Ph.D. Delaware Department of Education Equating Equating is a statistical process used to adjust scores on test forms so that scores on these equated forms can be used interchangeably (Kolen & Brennan, 2004). Common-item nonequivalent groups design 3 Screen Linking Items 1

  2. 6/17/2011 Differential item functioning (DIF) DIF exists when examinees of the same ability from different groups have a different probability of giving a certain response on a test item. DIF analysis could provide an indication of unexpected behavior of items. DIF analysis methods can be also employed to detect the linking item stability in equating 4 Screen Linking Items Net and global DIF Penfield defined two types of differential item functioning, naming net DIF and global DIF (Penfield, 2010). Net DIF concerns the item response across score points. Global DIF concerns the item response within each score point. 5 Screen Linking Items Observed-score base (OSB) method for detecting net and global DIF In conception, Mantel matches net DIF while generalized Mantel- Haenszel ( GMH ) matches global DIF in IRT partial credit model. In practice, Mantel is suitable for identifying net DIF while GMH is suitable for identifying global DIF. The item will be flagged if the critical p-value (significance level) is less than a preset value, say 0.05 (i.e., probability<0.05). 6 Screen Linking Items 2

  3. 6/17/2011 IRT-base method for detecting net and global DIF Item parameter value comparison (IPVC) approach • measure DIF by comparing the item parameter values of the same item estimated from different groups 1. Average item step-parameter value comparison (AISVC) – detecting net DIF 2. Item step-parameter value comparison (ISVC) – detecting global DIF The item will be flagged if the difference is large than a preset criterion, say 0.5 logit in absolute value (i.e., D>|0.5|) 7 Screen Linking Items Study: methods for screening linking items Investigate different screening methods to identify unstable anchor items in IRT equating using Rasch & partial credit models under different DIF conceptions Screening method IRT-base and observed-score base methods Other independent variables • type of DIF (net or global) • equating sample sizes • DIF intensity • flagging criterion 8 Screen Linking Items Methods (1) Monte Carlo simulation 64 combinations of conditions Independent variables Detecting method Item parameter value comparison (IPVC) method Average item step-parameter value comparison (AISVC) • Item step-parameter value comparison (ISVC) • Observed-score base (OSB) method Mantel • GMH • 9 Screen Linking Items 3

  4. 6/17/2011 Methods (2) DIF conception Net DIF • Global DIF • Equating Sample size 8000 • 4000 • 2000 • 1000 • 10 Screen Linking Items Methods (3) DIF intensity in logit 0.5 • 1.0 • Flagging criterion For IPVC approach D > |0.5| logit D > |0.3| logit For OSB approach p < 0.05 p < 0.01 11 Screen Linking Items Methods (4) Evaluation criterion (dependent variable) False positive: non-DIF item is classified as DIF item • False negative: DIF item is classified as non-DIF item • Total error: false positive + false negative • Accuracy : 1 – total error • 12 Screen Linking Items 4

  5. 6/17/2011 Results: detecting method Accuracy rate IPVC (IRT base) AISVC: 0.972 ISVC: 0.955 Observed-score base Mantel: 0.896 GMH: 0.906 • IPVC method was found outperforming OSB method by 6.3% in accuracy in average. 13 Screen Linking Items Results: DIF conception Accuracy rate net DIF: 0.937 global DIF: 0.931 • Across different conditions, the accuracy rates from net DIF and global DIF conception detecting methods were very close. 14 Screen Linking Items Results: equating sample size Accuracy rate N=8000 IPVC:0.972, OSB: 0.806 N=4000 IPVC:0.979, OSB: 0.875 N=2000 IPVC:0.972, OSB: 0.965 N=1000 IPVC:0.931, OSB: 0.958 • accuracy of IPVC approach was more independent to sample size. • accuracy of OSB approach was negatively correlated to sample size. 15 Screen Linking Items 5

  6. 6/17/2011 Results: DIF intensity Accuracy rate DIFI=0.5 IPVC: 0.948 OSB: 0.906 DIFI=1.0 IPVC: 0.979 OSB: 0.896 • IPVC approach was sensitive to DIFI. • OSB approach was not so sensitive to DIFI. 16 Screen Linking Items Results: flagging criterion Accuracy rate IPVC D>|0.5|: 0.955 D>|0.3|: 0.972 OSB p<0.05: 0.851 p<0.01: 0.951 • Both methods perform better by setting more strict flagging criteria especially for OSB approach. 17 Screen Linking Items Summary & Discussion (1) • Both IRT-base and OSB methods could be applied to screen linking items. IPVC approach especially shows promising. • No significant difference was found in applying different DIF conceptions. • Like other χ 2 tests, Mantel and GMH are sensitive to sample size. Both committed much more false positive errors when the equating sample size was large. • The flagging philosophy and mechanism are different in the IRT-base and OSB approaches. – IPVC aims at detecting strength of DIF – OSB aims at controlling type I error 18 Screen Linking Items 6

  7. 6/17/2011 Summary & Discussion (2) • Compared with OSB, the way to set flagging criterion in IPVC approach sounds more natural and it could effectively detect the DIF intensity. • In this study, IPVC was found a better approach for screening linking item in terms of accuracy, convenience, & information. – The average accuracy of IPVC was 6.3% higher than OSB – OSB approach needs running extra analysis – IPVC result provides DIF intensity value & direction (value can be +, -, or 0) – IPVC result is more stable with different sample sizes 19 Screen Linking Items Thank you 21 Screen Linking Items 7

Recommend


More recommend