Selection of Linking Items • Subset of items that maximally reflect the scale information function – Denote the scale information as � – Linear programming solver (in R, lp_solve 5.5) • min(y) • Subject to � – ∑ � � � θ�� � � � � � �, ��� ��� θs, where � � ��� ��4, �3.95, … , 3.95, 4 } � – ∑ � � � �, ��� – � � � 0, 1 , ��� ��� �, – � � 0. 37
An example: Subscale 2 • Sum of Information Functions for 6 ‐ , 7 ‐ , and 8 ‐ Item Linking Sets 38
An example: Subscale 3 39
Why Fisher information is useful? • In multidimensional CAT – The volume of the confidence ellipsoid around ��� is proportional to the determinant �� �� � of � (Anderson, 1984) – Maximize the determinant of the Fisher information matrix (Segall, 1996, Wang & Chang, 2011). D ‐ optimal method – � ��� � 40
Fisher information vs. confidence ellipse ��θ�� 15 10 � �� �θ�� 0.067 0 0 (Wang, et al., 2013) 0.1 � Σ 0 0 41
Fisher information vs. confidence ellipse ��θ�� 50 25 � �� � �θ�� 0.02 0 0 (Wang, et al., 2013) 0.04 � Σ 0 0 42
Mini ‐ max mechanism • � ��� � – Assuming there are three dimensions, then, � det � � ��� � �� �� �� � ��� � , � � , � ��,�� � det � � ��� � � det � � ��� � � � � � �� � � �� ��,�� ��,�� � � 2� �� � �� det � � ��� � � ⋯� ��,�� This criterion tends to pick the items that minimize the variance of the estimator lagging behind most 43
Item bank Information 44
Domain/Content balancing • Constraint weighted D ‐ optimal (Wang et al., 2017) – Suppose for each domain, we have maximum and minimum number of items set in advance, { � � , � � }, k =1,.., D – � # of items belong to domain k so far, and n is the current test length, ��� is the maximum test length – �� indicates whether item j belongs to domain k � �� � � �� � ∏ – � �� �� � (Cheng, et al., 2009) ��� � � �� � �� �� ��� �� � ������ � � – �� = , �� = � � � ��� �� � 45
A simulation study • Sample size N =2,000 • Multivariate normal, with mean of 0’s, and covariance matrix Σ = • Maximum a Posteriori (MAP) is used, and prior is multivariate normal with mean of 0’s and � • Evaluation criterion: root mean squared error (RMSE) 1 N ˆ 2 RMSE( )= ( ) 1 1 1 i i N 1 i 46
Results: Domain ‐ level recovery D ‐ optimal ( ‐ ) vs. Random selection ( ‐‐‐ ) 47
Results: Domain ‐ level recovery D ‐ optimal ( ‐ ) vs. Constraint ‐ weighted D ‐ optimal ( ‐‐ ) 48
Results: Domain ‐ level recovery D ‐ optimal ( ‐ ) vs. Constraint ‐ weighted D ‐ optimal ( ‐‐ ) 49
Reducing Test Length 50
(0, 0, 0) θ Confidence Interval 51 Test Length
(2, 2, 2) θ Confidence Interval 52 Test Length
Variable ‐ length CAT: Stopping rule Start 300+ items 53
Stopping rule Start 300+ items When the measurement precision criterion is satisfied (Dodd, Koch & De Ayala, 1993; Boyd, Dodd, & Choi, 2010) 54
Stopping rule Start 300+ items (a) Volume of the confidence ellipsoid (D ‐ rule) (b) Sum of S.E. per domain θ (c) Maximum axis of the confidence ellipsoid (d) Kullback ‐ Leibler divergence between to consecutive posteriors (Wang et al., 2013) 55
Cumulated information growth Fisher information matrix Determinant of Test Length 56
Stopping rule Start 300+ items 57
Stopping rule Start 300+ items 58
Stopping rule Start 300+ items � does not change much: When θ theta ‐ convergence rule (T ‐ rule) � ��� � � � � � 0.01 � (Babcock & Weiss, 2012 Wang et al., 2017+) 59
Why T ‐ rule is secondary? • 2PL � � ∗ is in the ��� � � – , ��� � � ∗ � � � ��� ) interval of ( (Chang & Ying, 2008) 60
Why T ‐ rule is secondary? • 2PL � � ∗ is in the ��� � � – , ��� � � ∗ � � � ��� ) interval of ( – It does not monotonically decrease when test length increases! • Terminate test pre ‐ maturely (Wang et al., 2017+) 61
Why T ‐ rule is secondary? • 2PL � � ∗ is in the ��� � � – , ��� � � ∗ � � � ��� ) interval of ( – Undermine test efficiency � � � 25 � � )<.2 (Dodd, et al., 1993) � � • Usually, the SE( � � ��� � � � � <.01 • If hypothetically � � � 1 , satisfying � � ∗ � 50 then � � (Wang et al., 2017+) 62
MGRM • Simple structure � � � � � ���,� �� ��� � � � � ∗ � �� � ��� �� � �� � � 0: �� � � � � ��� �,� � �� � �� � � � � � � � � � � � � � �,� � � �,��� � � ���,� � �� � � �� � ∈ 1, … , � � � 2 : �� � 1 � � � � � � � � � ��� �,� � ��� �,��� � � � � � � �� � � � � �,� � �� � � � � � 1: � � � � � ��� �,� � � � � � �,� � �,� � � exp �� � (Wang et al., 2017+) 63
MGRM • Simple structure � .5 � � � � � � ���,� �� ��� � � � � ∗ � �� � ��� �� � �� � � 0: �� � � � � ��� �,� � �� � �� � � � � � � � � � � � � � �,� � � �,��� � � ���,� � �� � � �� � ∈ 1, … , � � � 2 : �� � 1 � � � � � � � � � ��� �,� � ��� �,��� � � � � � � �� � � � � �,� � �� � � � � � 1: � � � � � ��� �,� � � � � � �,� � �,� � � exp �� � (Wang et al., 2017+) 64
MGRM • Complex structure – If item j measures the p th trait � � ��� � ∗ � � � � ��� � ��� � � � � �� � � � � � � ∗ ��� � ��� � (Wang et al., 2017+) 65
MGRM • Complex structure – If item j measures the p th trait � � ��� � ∗ � � � � ��� � ��� � � � � �� � � � � � � ∗ ��� � ��� � �� � ��� �� � � p th element of � � ��� The amount of information carried by item j (Wang et al., 2017+) 66
MGRM • Complex structure – If item j measures the p th trait � � ��� � ∗ � � � � ��� � ��� � � � � �� � � � � � � ∗ ��� � ��� � (Wang et al., 2017+) 67
MGRM • Complex structure – If item j measures the p th trait � � ��� � ∗ � � � � ��� � ��� � � � � �� � � � � � � ∗ ��� � ��� � – If item j measures multiple traits � ∗ ��� � ��� � � � ��� � ��� � � � � � � � � � � ∗ ��� � ��� � (Wang et al., 2017+) 68
Primary vs. Secondary stopping rules Minimum test length Start 300+ items (Babcock & Weiss, 2012 Wang et al., 2017+) 69
Primary vs. Secondary stopping rules Minimum test length Start 300+ items If D ‐ rule is satisfied? (Wang et al., 2017+) 70
Primary vs. Secondary stopping rules Minimum test length Start 300+ items If D ‐ rule is satisfied? No Yes If T ‐ rule is satisfied? (Wang et al., 2017+) 71
Primary vs. Secondary stopping rules Minimum test length Start 300+ items If D ‐ rule is satisfied? No Yes If T ‐ rule is satisfied? No Yes Continue (Wang et al., 2017+) 72
Primary vs. Secondary stopping rules Minimum Maximum test length test length Start 300+ items If D ‐ rule is satisfied? No Yes If T ‐ rule is satisfied? 94.9% No Yes 28.5 Continue 5.1% (Wang et al., 2017+) 73 61.5
Stopping rule results Applied Cognition Daily Activity Mobility SE θ 74
3D plot 75
Stopping rule Cont. Test length Overall precision Primary stop Mean SD Bias RMSE Determinant Actual Eventual 28.5 13.3 0.005 0.303 514.7 0.949 0.965 1.6% 76
Stopping rule Cont. Test length Overall precision Primary stop Mean SD Bias RMSE Determinant Actual Eventual 28.5 13.3 0.005 0.303 514.7 0.949 0.965 1.6% Test length Bias RMSE Stop End Stop End Stop End Mean SD N=31 58.7 15.3 72.2 15.5 0.162 0.136 0.430 0.391 N=71 64.5 13.0 120 0 0.207 0.204 0.592 0.525 77
Outline • Brief introduction to computerized adaptive testing (CAT) • Multidimensional CAT • “Computerized Adaptive Testing to Direct Delivery of Hospital ‐ Based Rehabilitation” (NIH R01HD079439, 2015 ‐ 2020) – Item bank calibration – Item selection – Stopping rules • Ongoing projects 78
Recommend
More recommend