efficiency of scoring innovative items in educational
play

Efficiency of Scoring Innovative Items in Educational Assessment - PowerPoint PPT Presentation

Efficiency of Scoring Innovative Items in Educational Assessment Shudong Wang NWEA Paper presented at the NCSA National Conference on Student Assessment June 24-26, 2019, Orlando, Florida I. Introduction Choosing Item Format/Type in


  1. Efficiency of Scoring Innovative Items in Educational Assessment Shudong Wang NWEA Paper presented at the NCSA National Conference on Student Assessment June 24-26, 2019, Orlando, Florida

  2. I. Introduction ▪ Choosing Item Format/Type in Assessment ✓ Selected-response/Objective scoring ✓ Constructed-response/Objective scoring ✓ Constructed-response/Subjective scoring ▪ Computer Use in Education and Technology-enhanced Items (TEI) Types ( Zenisky & Sireci, 2002; Bennett, 1993 ) ✓ Selection/identification (drag-and-drop, hot-spot) ✓ Reordering/rearrangement (concept-mapping, create-a-tree) ✓ Completion (graphical modeling, mathematical expressions) ✓ Construction (generating examples, formulating hypotheses, essay/short answer, passage-editing) ✓ Presentation (problem-solving vignettes, role play) 2

  3. Choice Multiple Graphic Gap Match 3

  4. ▪ Advantages and Disadvantages of Multiple-choice (MC) Items and TEIs MC: ✓ Advantages: efficient administration, automated scoring, broad content coverage, and high reliability ✓ Disadvantages: difficult to write MC items that evoke complex cognitive processes TEI: ✓ Advantages: improved construct representation - facilitate more authentic and direct measurement of knowledge, skills, and abilities (KSA) than the MC format allows - higher fidelity ✓ Disadvantages: source of construct irrelevant variance, such as computer literacy ▪ Five Dimensions of TEI ✓ Item format ✓ Response action ✓ Media inclusion ✓ Level of interactivity ✓ Scoring method 4

  5. ▪ Relationship between Score (D vs. P) and Item Type (MC vs. TEI) Dichotomous (D) Polytomous (P) P_P MC D & P TEI D_D D_P There are three commonly used scoring methods for TEI (N is number of components): 1. N Method 2. N/2 Method 3. All or Nothing Method (AONM) 5

  6. Table 1. Examples of Different Scoring Methods Score Dichotomous (D) Polytomous (P) Total Components (N) Total Categories Response D1 D2 D3 D4 N N/2 1 2 0 0 1 1 4 5 0 0 0 0 0 0 0 D_D 1 1 0 0 0 1 0 2 1 1 0 0 2 1 3 1 1 1 0 3 1 D_P 4 1 1 1 1 4 2 P_P D: 0 1 P: 0 1 2 3 4 0 = 0 ; 1 = 1, 2, 3, 4 AONM: 0 = 0, 1 ; 1 = 2, 3, 4 0 = 0, 1, 2 ; 1 = 3, 4 0 = 0, 1, 2, 3 ; 1 = 4 6

  7. ▪ Review of Researches on Scoring Methods for TEI Types Table 2: Types of Researches Response Item Test Results (Efficiency * ) Type of Research Time level level Involved 1 No Yes Yes P is better than D Relationship between Dichotomous (D) and Polytomous (P) 2 Yes Yes Yes D is better than P Optimal is better than both N and Partial Credit Scoring Method 3 Yes/No Yes Yes N/2 methods Relationship between Dichotomous-D (D-D) ? 4 No Yes Yes and Dichotomous-P (D-P) *: The efficiency is defined as the mean weighted item information divided by the average time spent on an item within an item type (Wan and Henly (2012). 1 & 2: Ripkey and Case,1996; Jiao etal.,2012; Bauer etal.,2011; Ben-Simonetal.,1997; Wan and Henly, 2012. 3 : Muckle, Becker, & Wu, 2011; Becker & Soni, 2013; Lorié, 2014; Clyne, 2015; Tao, 2018; Tao & Mix, 2017. 4: Current research 7

  8. Purposes of This Study: To investigate the efficiency of scoring method on TEIs in educational assessments 8

  9. II. Method 1. Monte Carlo technique seems to be an appropriate choice, and both descriptive methods and inferential procedures are used in this study. 2. Independent Variable: Scoring method (MC, CR3, 1CR4, 2CR4, 1CR5, 2CR5, 3CR5) in Table 2. Table 3: Scoring Method Scoring Method MC CR3 1CR4 2CR4 1CR5 2CR5 3CR5 Type of Item MC CR3 CR4 CR5 N of Category 2 3 4 5 Original Response String (ORS) 0, 1 0,1,2 0,1,2,3 0,1,2,3,4 New Response String (NRS) 0,1 0,1 0,1 0,1 0,1 0,1 0,1 Collapse Rule to generate NRS None 0=(0) 0=(0) 0=(0,1) 0=(0) 0=(0,1) 0=(0,1,2) 1=(1,2) 1=(1,2,3) 1=(2,3) 1=(1,2,3,4) 1=(2,3,4) 1=(3,4) 3. Dependent Variables p-Value, point-biserial, KR20 reliability, test information, and test efficiency (ratio of test information between two tests) 9

  10. 4. Major Steps of Simulation Step 1: Generate person (2000) and item parameter (20 for each of scoring methods) for each of tests MC(20) + CR3(20), MC(20) + CR4(20), MC(20) + CR5(20) Step 2: Generate items responses based on Rasch and PCM models for each of 40 item tests Step 3: Collapse original CR response strings into MC (D-P) response strings by different scoring methods used different collapsing rules Step 4: Calibrate item parameters by fixing person parameters Step 5: Repeat Step 2 to 4 for 100 times (100 simulated tests) and for each of 100 replications (tests), person parameters are different and item parameters are fixed across 100 replications Step 6: Calculate item and test statistics by the CTT and IRT methods (five types of dependent variables) based on results obtained from Step 4 and 5. 10

  11. III. Results 1. Item/Test Analysis Results from CTT Table 4. Overall Means (20 Items) of p-Value, Point-biserial, and KR20 for Different Scoring Methods KR20 Scoring Method p-Value Point-biserial D MC 0.52 0.44 0.78 D_CR3 0.67 0.46 0.81 D_1CR4 0.71 0.50 0.84 D_2CR4 0.50 0.55 0.88 D_1CR5 0.72 0.48 0.80 D_2CR5 0.58 0.51 0.84 D_3CR5 0.47 0.50 0.84 11

  12. 2. Item/Test Analysis Results from IRT Test Information of Different Type of Item Responses (Dichotomou to Polytomous) 15 14 13 12 11 10 Test Infomation 9 8 7 6 5 4 3 2 1 0 -4 -3 -2 -1 0 1 2 3 4 theta Inf_MC Inf_CR3 Inf_CR4 Inf_CR5 Figure 1. Person Test Information from Both Dichotomous and Polytomous Responses Based on True Item Parameters for a Given Test 1 (Replication 1) 12

  13. Test Information from Dichotomous Responses 5 4 Test Infomation 3 2 1 0 -4 -3 -2 -1 0 1 2 3 4 theta Inf_D_MC Inf_D_CR3 Inf_D_1CR4 Inf_D_2CR4 Inf_D_1CR5 Inf_D_2CR5 Inf_D_3CR5 Figure 2. Person Test Information from Dichotomous Responses Based on Estimated Item Parameters by Different Scoring Methods for a Given Test 80 (Replication 80) 13

  14. Rative Efficiency of Person Tests of Dichotomous Responses 3 2 Relative Efficiency 1 0 -4 -3 -2 -1 0 1 2 3 4 Theta EF_D_CR3 EF_D_1CR4 EF_D_2CR4 EF_D_1CR5 EF_D_2CR5 EF_D_3CR5 Figure 3. Relative Efficiency of Person Tests with Non-MC Dichotomous Responses Items Over MC Responses Based on Esitmated Item Parameters by Different Scoring Methods For a Given 14 Test 80 (Replication 80)

  15. Dependent Table 5. Overall Average Type Scoring Method N MIN MAX MEAN STD SEM Variable of Test Information and inf_MC 100 3.54 3.63 3.58 0.01 0.53 Efficiency for Different inf_CR3 100 6.79 6.91 6.84 0.02 0.38 I Scoring Methods inf_CR4 100 11.60 11.91 11.73 0.05 0.29 inf_CR5 100 12.43 12.56 12.49 0.02 0.28 inf_D_MC 100 3.58 3.68 3.62 0.02 0.53 inf_D_CR3 100 2.92 3.02 2.96 0.02 0.58 Information inf_D_1CR4 100 2.69 2.83 2.76 0.03 0.60 inf_D_2CR4 100 3.20 3.25 3.22 0.01 0.56 II inf_D_1CR5 100 2.23 2.70 2.47 0.07 0.64 inf_D_2CR5 100 2.24 2.51 2.39 0.11 0.65 inf_D_3CR5 100 2.18 2.25 2.21 0.01 0.67 EF_CR3 100 1.92 1.94 1.93 0.00 EF_CR4 100 3.25 3.27 3.26 0.00 I EF_CR5 100 3.53 3.60 3.57 0.01 EF_D_CR3 100 0.81 0.84 0.82 0.01 EF_D_1CR4 100 0.75 0.79 0.77 0.01 EF_D_2CR4 100 0.89 0.91 0.90 0.00 II EF_D_1CR5 100 0.63 0.76 0.69 0.02 EF_D_2CR5 100 0.63 0.72 0.68 0.03 EF_D_3CR5 100 0.61 0.65 0.64 0.01 Efficiency EF_D_CR3M 100 0.43 0.45 0.44 0.00 EF_D_1CR4M 100 0.24 0.25 0.24 0.00 EF_D_2CR4M 100 0.28 0.28 0.28 0.00 III EF_D_1CR5M 100 0.18 0.22 0.20 0.01 EF_D_2CR5M 100 0.18 0.20 0.19 0.01 15 EF_D_3CR5M 100 0.17 0.18 0.18 0.00

  16. 3. Inferential Statistics Results Statistical Hypotheses: There are no effects of the scoring method on all the dependent variables used in different simulation conditions . All hypotheses have been rejected, meaning, scoring method makes difference for any given dependent variable. Summary of Results ▪ Efficiency of person scores increases as number of category of item responses increases ▪ On average, information of D_P responses is less than that of D_D responses ▪ Based on the simulation conditions, for D_P scoring methods, optimal number of category of items is 4, not 5. 16

  17. IV. Conclusions Different scoring methods have impact on efficiency of scores 1. Scoring TEI as MC does not increase efficiency 2. Large number of categories (or components) are not 3. necessarily the best choice for D_P scoring method 17

  18. Thank you ! For any question: Shudong.wang@NWEA.org 18

Recommend


More recommend