selection of linking items
play

Selection of Linking Items Subset of items that maximally reflect - PowerPoint PPT Presentation

Selection of Linking Items Subset of items that maximally reflect the scale information function Denote the scale information as Linear programming solver (in R, lp_solve 5.5) min(y) Subject to


  1. Selection of Linking Items • Subset of items that maximally reflect the scale information function – Denote the scale information as � – Linear programming solver (in R, lp_solve 5.5) • min(y) • Subject to � – ∑ � � � θ�� � � � � � �, ��� ��� θs, where � � ��� ��4, �3.95, … , 3.95, 4 } � – ∑ � � � �, ��� – � � � 0, 1 , ��� ��� �, – � � 0. 37

  2. An example: Subscale 2 • Sum of Information Functions for 6 ‐ , 7 ‐ , and 8 ‐ Item Linking Sets 38

  3. An example: Subscale 3 39

  4. Why Fisher information is useful? • In multidimensional CAT – The volume of the confidence ellipsoid around ��� is proportional to the determinant �� �� � of � (Anderson, 1984) – Maximize the determinant of the Fisher information matrix (Segall, 1996, Wang & Chang, 2011). D ‐ optimal method – � ��� � 40

  5. Fisher information vs. confidence ellipse ��θ�� 15 10  � �� �θ�� 0.067 0 0 (Wang, et al., 2013) 0.1 � Σ 0 0 41

  6. Fisher information vs. confidence ellipse ��θ�� 50 25  � �� � �θ�� 0.02 0 0 (Wang, et al., 2013) 0.04 � Σ 0 0 42

  7. Mini ‐ max mechanism • � ��� � – Assuming there are three dimensions, then, � det � � ��� � �� �� �� � ��� � , � � , � ��,�� � det � � ��� � � det � � ��� � � � � � �� � � �� ��,�� ��,�� � � 2� �� � �� det � � ��� � � ⋯� ��,�� This criterion tends to pick the items that minimize the variance of the estimator lagging behind most 43

  8. Item bank Information 44

  9. Domain/Content balancing • Constraint weighted D ‐ optimal (Wang et al., 2017) – Suppose for each domain, we have maximum and minimum number of items set in advance, { � � , � � }, k =1,.., D – � # of items belong to domain k so far, and n is the current test length, ��� is the maximum test length – �� indicates whether item j belongs to domain k � �� � � �� � ∏ – � �� �� � (Cheng, et al., 2009) ��� � � �� � �� �� ��� �� � ������ � � – �� = , �� = � � � ��� �� � 45

  10. A simulation study • Sample size N =2,000 • Multivariate normal, with mean of 0’s, and covariance matrix Σ = • Maximum a Posteriori (MAP) is used, and prior is multivariate normal with mean of 0’s and � • Evaluation criterion: root mean squared error (RMSE) 1 N  ˆ     2 RMSE( )= ( ) 1 1 1 i i N  1 i 46

  11. Results: Domain ‐ level recovery D ‐ optimal ( ‐ ) vs. Random selection ( ‐‐‐ ) 47

  12. Results: Domain ‐ level recovery D ‐ optimal ( ‐ ) vs. Constraint ‐ weighted D ‐ optimal ( ‐‐ ) 48

  13. Results: Domain ‐ level recovery D ‐ optimal ( ‐ ) vs. Constraint ‐ weighted D ‐ optimal ( ‐‐ ) 49

  14. Reducing Test Length 50

  15. (0, 0, 0) θ Confidence Interval 51 Test Length

  16. (2, 2, 2) θ Confidence Interval 52 Test Length

  17. Variable ‐ length CAT: Stopping rule Start 300+ items 53

  18. Stopping rule Start 300+ items When the measurement precision criterion is satisfied (Dodd, Koch & De Ayala, 1993; Boyd, Dodd, & Choi, 2010) 54

  19. Stopping rule Start 300+ items (a) Volume of the confidence ellipsoid (D ‐ rule) (b) Sum of S.E. per domain θ (c) Maximum axis of the confidence ellipsoid (d) Kullback ‐ Leibler divergence between to consecutive posteriors (Wang et al., 2013) 55

  20. Cumulated information growth Fisher information matrix Determinant of Test Length 56

  21. Stopping rule Start 300+ items 57

  22. Stopping rule Start 300+ items 58

  23. Stopping rule Start 300+ items � does not change much: When θ theta ‐ convergence rule (T ‐ rule) � ��� � � � � � 0.01 � (Babcock & Weiss, 2012 Wang et al., 2017+) 59

  24. Why T ‐ rule is secondary? • 2PL � � ∗ is in the ��� � � – , ��� � � ∗ � � � ��� ) interval of ( (Chang & Ying, 2008) 60

  25. Why T ‐ rule is secondary? • 2PL � � ∗ is in the ��� � � – , ��� � � ∗ � � � ��� ) interval of ( – It does not monotonically decrease when test length increases! • Terminate test pre ‐ maturely (Wang et al., 2017+) 61

  26. Why T ‐ rule is secondary? • 2PL � � ∗ is in the ��� � � – , ��� � � ∗ � � � ��� ) interval of ( – Undermine test efficiency � � � 25 � � )<.2 (Dodd, et al., 1993)  � � • Usually, the SE( � � ��� � � � � <.01 • If hypothetically � � � 1 , satisfying � � ∗ � 50 then � � (Wang et al., 2017+) 62

  27. MGRM • Simple structure � � � � � ���,� �� ��� � � � � ∗ � �� � ��� �� � �� � � 0: �� � � � � ��� �,� � �� � �� � � � � � � � � � � � � � �,� � � �,��� � � ���,� � �� � � �� � ∈ 1, … , � � � 2 : �� � 1 � � � � � � � � � ��� �,� � ��� �,��� � � � � � � �� � � � � �,� � �� � � � � � 1: � � � � � ��� �,� � � � � � �,� � �,� � � exp �� � (Wang et al., 2017+) 63

  28. MGRM • Simple structure � .5 � � � � � � ���,� �� ��� � � � � ∗ � �� � ��� �� � �� � � 0: �� � � � � ��� �,� � �� � �� � � � � � � � � � � � � � �,� � � �,��� � � ���,� � �� � � �� � ∈ 1, … , � � � 2 : �� � 1 � � � � � � � � � ��� �,� � ��� �,��� � � � � � � �� � � � � �,� � �� � � � � � 1: � � � � � ��� �,� � � � � � �,� � �,� � � exp �� � (Wang et al., 2017+) 64

  29. MGRM • Complex structure – If item j measures the p th trait � � ��� � ∗ � � � � ��� � ��� � � � � �� � � � � � � ∗ ��� � ��� � (Wang et al., 2017+) 65

  30. MGRM • Complex structure – If item j measures the p th trait � � ��� � ∗ � � � � ��� � ��� � � � � �� � � � � � � ∗ ��� � ��� � �� � ��� �� � � p th element of � � ��� The amount of information carried by item j (Wang et al., 2017+) 66

  31. MGRM • Complex structure – If item j measures the p th trait � � ��� � ∗ � � � � ��� � ��� � � � � �� � � � � � � ∗ ��� � ��� � (Wang et al., 2017+) 67

  32. MGRM • Complex structure – If item j measures the p th trait � � ��� � ∗ � � � � ��� � ��� � � � � �� � � � � � � ∗ ��� � ��� � – If item j measures multiple traits � ∗ ��� � ��� � � � ��� � ��� � � � � � � � � � � ∗ ��� � ��� � (Wang et al., 2017+) 68

  33. Primary vs. Secondary stopping rules Minimum test length Start 300+ items (Babcock & Weiss, 2012 Wang et al., 2017+) 69

  34. Primary vs. Secondary stopping rules Minimum test length Start 300+ items If D ‐ rule is satisfied? (Wang et al., 2017+) 70

  35. Primary vs. Secondary stopping rules Minimum test length Start 300+ items If D ‐ rule is satisfied? No Yes If T ‐ rule is satisfied? (Wang et al., 2017+) 71

  36. Primary vs. Secondary stopping rules Minimum test length Start 300+ items If D ‐ rule is satisfied? No Yes If T ‐ rule is satisfied? No Yes Continue (Wang et al., 2017+) 72

  37. Primary vs. Secondary stopping rules Minimum Maximum test length test length Start 300+ items If D ‐ rule is satisfied? No Yes If T ‐ rule is satisfied? 94.9% No Yes 28.5 Continue 5.1% (Wang et al., 2017+) 73 61.5

  38. Stopping rule results Applied Cognition Daily Activity Mobility SE θ 74

  39. 3D plot 75

  40. Stopping rule Cont. Test length Overall precision Primary stop Mean SD Bias RMSE Determinant Actual Eventual 28.5 13.3 0.005 0.303 514.7 0.949 0.965 1.6% 76

  41. Stopping rule Cont. Test length Overall precision Primary stop Mean SD Bias RMSE Determinant Actual Eventual 28.5 13.3 0.005 0.303 514.7 0.949 0.965 1.6% Test length Bias RMSE Stop End Stop End Stop End Mean SD N=31 58.7 15.3 72.2 15.5 0.162 0.136 0.430 0.391 N=71 64.5 13.0 120 0 0.207 0.204 0.592 0.525 77

  42. Outline • Brief introduction to computerized adaptive testing (CAT) • Multidimensional CAT • “Computerized Adaptive Testing to Direct Delivery of Hospital ‐ Based Rehabilitation” (NIH R01HD079439, 2015 ‐ 2020) – Item bank calibration – Item selection – Stopping rules • Ongoing projects 78

Recommend


More recommend