Changes in test Scores w ith Multiple Sittings of CanTEST Philip - PowerPoint PPT Presentation

Changes in test Scores w ith Multiple Sittings of CanTEST Philip Nagy

Rationale Research Questions • Do test scores change on repeating the test? • Is change related to length of time between sittings? Test Development Questions • Can data from repeaters be used in test calibration for new form development? Context: Receptive Skills Official Languages and Bilingualism Institute

The Data Listening Tests: Six forms with 15 short and 25 long passage items Reading Tests: Seven forms with 15 skim-and-scan, 20 reading passage, and 25 cloze items The Sample: Mean first score of 3.6, compared to 4.3 for those who write only once Assumptions • Difficulty of forms is balanced across sittings (true) • Samples writing each form are equivalent (untested) Official Languages and Bilingualism Institute

Listening Results: Sitting 2 minus Sitting 1 (N=179) Change in Total Test Short Long Raw Score (40) Passages Passages (15) (25) Down >11 3 1 Down 6 to 10 18 2 11 Down 3 to 5 18 24 22 Same ± 2 43 91 72 Up 3 to 5 42 42 46 Up 6 to 10 36 20 24 Up >11 19 3 Official Languages and Bilingualism Institute

Listening Results, another look Change in Total Test Short Long Raw Score (40) Passages Passages (15) (25) Down some 22% 15% 19% About the 24% 51% 40% same Up some 54% 34% 41% Mean raw 2.6 1.3 1.3 gain Mean % gain 6.5% of 40 8.8% of 15 5.2% of 25 items items items Official Languages and Bilingualism Institute

Listening Results Interpretation How important is the improvement? • On average, 3.6 points needed out of 40 to improve one band • So, 2.6 points is about 75% of a band improvement Official Languages and Bilingualism Institute

Listening Results Interpretation Can the data be used for test calibration? • The changes in average item difficulty are different for the subtests •.088 for short passages •.052 for long passages • The difference of .036 (.088 - .052) is about the same as the standard error of the difficulty indices • Listening data from repeaters should not be used for item calibration Official Languages and Bilingualism Institute

Changes in Listening by Length of Time betw een Sittings Test → Total Short Long Time Between Test Passages Passages Tests ↓ > 6 months +2.13 +0.63 1 +1.49 (N=63) +1.69 1 < 6 months +2.87 +1.18 (N=116) 1 Difference significant, p=0.05 Those who repeat sooner do better than those who repeat later Official Languages and Bilingualism Institute

Reading Results: Sitting 2 minus Sitting 1 (N=284) Change in Raw Total (80) Skim-&-Scan Passage (20) Cloze (25) Score (15) Down 21 or more 17 Down 11 to 20 19 2 12 Down 6 to 10 21 12 18 32 Down 3 to 5 28 32 30 34 Same score ± 2 46 139 142 106 Up 3 to 5 33 65 63 52 Up 6 to 10 47 31 23 36 Up 11 to 20 48 3 8 12 Up 21 or more 25 Note: Reading Score is doubled to give a total out of 80 rather than 60. Official Languages and Bilingualism Institute

Reading Results, another look Change in Raw Score Total (80) Skim-&- Reading Cloze Scan (15) Passage Passage (20) (25) Down some 30% 16% 17% 27% About the same 16% 49% 50% 37% Up some 54% 35% 33% 35% Official Languages and Bilingualism Institute

Reading Results Interpretation How important is the improvement? • On average, 6.5 points needed (out of 80) to improve one band • So, 3.45 points is about 55% of a band improvement Official Languages and Bilingualism Institute

Reading Results Interpretation Can the data be used for test calibration? • The changes in average item difficulty are different for the subtests •+0.072 for skim-and-scan •+0.050 for reading passages •+0.002 for cloze • The largest difference of .070 (.072 - .002) is two to three times larger than the standard error of the difficulty indices • Reading data from repeaters should not be used for item calibration Official Languages and Bilingualism Institute

Changes in Reading by Length of Time betw een Sittings Test → Total (80) Skim-&Scan Reading Cloze Time Passage Passage Between Tests ↓ > 6 months -0.119 -0.292 1 -0.017 -0.079 (N=105) < 6 months +0.070 +0.171 1 +0.010 +0.046 (N=179) 1 Difference significant, p=0.05 Those who repeat later actually do worse than those who repeat sooner Official Languages and Bilingualism Institute

Conclusion • Listening: • 30% of sample do more poorly on 2 nd sitting • Average gain is 75% of a band score • Differences in gains across item types vary by an item standard error • Reading • 40% of sample do more poorly on 2 nd sitting • Average gain is 55% of a band score • Differences in gains across item types vary by 2-3 times an item standard error • Both • Those who rewrite within six months do better • Data from repeaters should not be used for item calibration Official Languages and Bilingualism Institute

Changes in test Scores w ith Multiple Sittings of CanTEST Philip - PowerPoint PPT Presentation

Changes in test Scores w ith Multiple Sittings of CanTEST Philip Nagy Rationale Research Questions Do test scores change on repeating the test? Is change related to length of time between sittings? Test Development Questions

Chapter 5: z-Scores : Location of Scores Chapter 5: z-Scores : Location of Scores and Standardized

Parent Seminar Welcome! PSAT Scores SAT vs. ACT Next Steps Overview New PSAT Score Report

Model-Based Testing (ISTQB Chapter 4) Arie van Deursen 1 4.1 ISTQB Test Design Test Scripts

1/12/2011 Chapter 5: z-Scores : Location of Scores and Standardized Distributions Introduction to

Top-k Queries over Uncertain Scores Qing Liu, Debabrota Basu, Talel Abdessalem, St ephane

CMAS: PARCC New state assessment scores arriving by new year New assessment to measure mastery

your PSAT Scores Ryan DeGuzman Pre-College Programs Manager Welcome! PSAT PSAT Score SAT

VIRTUAL SITTINGS: TECHNOLOGICAL CHALLENGES AND SOLUTIONS MOHAMED HUSSAIN 6 MAY 2020 Agenda

ADVANCE AUDIT & ASSURANCE PREPARATORY CLASS FOR NOVEMBER 2018 SITTINGS BY COBBINAH DICKSON

200511316 200511316 Test plan Test design specification g p

FLSA DUTIES TEST Exemption/Duties Test Types of Duties/Exemption Test Executive Exemption

Engineering Best Practices Test, test, test, and test some more; test as you go Start from a

Test automation Building automatically repeatable test suites Test automation n Test automation

Nehemiah Prays Nehemiah 1-2 Here is some test text Here is some test text Here is some test

Organ failure scores in Organ failure scores in neonatal sepsis. neonatal sepsis. Hugo

Using Quality Using Quality -of of-Life Scores to Life Scores to Guide Prostate Radiation

NEGOTIATIONS CHAPTER 5: TENTATIVE AGREEMENT AND COLLECTIVE BARGAINING AGREEMENTS Board of

Cost-effective e-Government Services Export Control System phase 2 (ECS2) Vladimir Alexiev, PhD,

your cell phones! Thank you for being considerate to the people around you. Financial Aid K-8

National Honor Society Martha Pennington Chapter Spring 2019 Applicants Welcome &

S4/5 Options Presentation S4/5 Options Presentation Head Teacher: Dean Smith Jim McKenna S4

Investor Presentation Who likes visiting the dentist?? A crown procedure necessitates between

Junior Future Planning Night 101 BELMONT HIGH SCHOOL GUIDANCE DEPARTMENT 1 Introductions:

Grade 12 Parent Information Evening OCTOBER 16 TH , 2019 Grade 12 Contacts Trina Infanta

Sambuz

Useful Links

Newsletter

Mail Us

Changes in test Scores w ith Multiple Sittings of CanTEST Philip - PowerPoint PPT Presentation

Changes in test Scores w ith Multiple Sittings of CanTEST Philip Nagy Rationale Research Questions Do test scores change on repeating the test? Is change related to length of time between sittings? Test Development Questions

Chapter 5: z-Scores : Location of Scores Chapter 5: z-Scores : Location of Scores and Standardized

Parent Seminar Welcome! PSAT Scores SAT vs. ACT Next Steps Overview New PSAT Score Report

Model-Based Testing (ISTQB Chapter 4) Arie van Deursen 1 4.1 ISTQB Test Design Test Scripts

1/12/2011 Chapter 5: z-Scores : Location of Scores and Standardized Distributions Introduction to

Top-k Queries over Uncertain Scores Qing Liu, Debabrota Basu, Talel Abdessalem, St ephane

CMAS: PARCC New state assessment scores arriving by new year New assessment to measure mastery

your PSAT Scores Ryan DeGuzman Pre-College Programs Manager Welcome! PSAT PSAT Score SAT

VIRTUAL SITTINGS: TECHNOLOGICAL CHALLENGES AND SOLUTIONS MOHAMED HUSSAIN 6 MAY 2020 Agenda

ADVANCE AUDIT &amp; ASSURANCE PREPARATORY CLASS FOR NOVEMBER 2018 SITTINGS BY COBBINAH DICKSON

200511316 200511316 Test plan Test design specification g p

FLSA DUTIES TEST Exemption/Duties Test Types of Duties/Exemption Test Executive Exemption

Engineering Best Practices Test, test, test, and test some more; test as you go Start from a

Test automation Building automatically repeatable test suites Test automation n Test automation

Nehemiah Prays Nehemiah 1-2 Here is some test text Here is some test text Here is some test

Organ failure scores in Organ failure scores in neonatal sepsis. neonatal sepsis. Hugo

Using Quality Using Quality -of of-Life Scores to Life Scores to Guide Prostate Radiation

NEGOTIATIONS CHAPTER 5: TENTATIVE AGREEMENT AND COLLECTIVE BARGAINING AGREEMENTS Board of

Cost-effective e-Government Services Export Control System phase 2 (ECS2) Vladimir Alexiev, PhD,

your cell phones! Thank you for being considerate to the people around you. Financial Aid K-8

National Honor Society Martha Pennington Chapter Spring 2019 Applicants Welcome &amp;

S4/5 Options Presentation S4/5 Options Presentation Head Teacher: Dean Smith Jim McKenna S4

Investor Presentation Who likes visiting the dentist?? A crown procedure necessitates between

Junior Future Planning Night 101 BELMONT HIGH SCHOOL GUIDANCE DEPARTMENT 1 Introductions:

Grade 12 Parent Information Evening OCTOBER 16 TH , 2019 Grade 12 Contacts Trina Infanta

Sambuz

Useful Links

Newsletter

Mail Us

ADVANCE AUDIT & ASSURANCE PREPARATORY CLASS FOR NOVEMBER 2018 SITTINGS BY COBBINAH DICKSON

National Honor Society Martha Pennington Chapter Spring 2019 Applicants Welcome &