The problem solving problem: Can comparative judgement help? Ian - PowerPoint PPT Presentation

The problem solving problem: Can comparative judgement help? Ian Jones & Matthew Inglis Mathematics Education Centre Loughborough University I.Jones@lboro.ac.uk

p Problem solving in mathematics How much can we trust opinion polls !! ??

��

Plan • Marking and Comparative Judgement; • The study: • Designing the paper; • Evaluating the paper; • Assessing the paper; • Judge feedback.

Marking • Assumes precise, predictable responses • Validity grounded in detailed criteria • Low inter-rater reliability for sustained problem solving Murphy (1982) Newton (1996) Willmott & Nuttall (1975)

Comparative Judgement • Assumes varied, unpredictable responses • Validity grounded in collective expert opinion • High inter-rater reliability for sustained problem solving? Bramley (2007) Pollitt (2012) Thurstone (1927)

Pilot study • 18 scripts, three awarding bodies • Two tiers, grades A* to D • Two groups of judges ( N 1 = 12, N 2 = 12)

Results Inter-rater reliability Validity r = .873 r = .900 1.0 1 0.5 Parameter estimate 2 Parameter estimate 1 0.0 0 -0.5 -1.0 -1 -1.5 -2 -2.0 -2 -1 0 1 D C B A A* Parameter estimate 1 GCSE grade

Designing the paper Evaluating the paper Assessing the paper Judge feedback

Design brief • Four GCSE exam writers, two awarding bodies • Familiar with Comparative Judgement • Constraints: • “GCSE like” exam paper; • no mark scheme, no marks; • suitable for both tiers; • to be administered early in Year 10; • candidates allowed 50 minutes.

Outcome • 11 pages • Included a “Resource sheet” • Pupils write on question paper • No marks! • Questions have names not numbers • Most questions contextualised

Teacher survey 1. How well do you think the paper assesses mathematical problem solving? 2. How well do you think the paper assesses mathematical content? 3. How well do you think the paper assesses the Key Stage 4 Process Skills in mathematics? 4. How well do you think your students would perform on this paper? A lot less than a typical current GCSE paper ↕ A lot more than a typical current GCSE paper

Teacher survey Better 4 Compared to Current Papers 3 2 1 0 P M P S r r t a u o o Worse t d b c h e e l s e s n m C s t o P S S n e k o t r i e l l f v l o n s i i r t n m g a n c N = 94 e All significantly different to GCSE at p < .001

Open text feedback

Open text feedback Please do not continue with the project which appears to be watering down the course even more than the current version does Where is the assessment of mathematical rigour? This obsession with functionality ignores the need for study of algebraic manipulation as training for further study

Open text feedback I don ʼ t see much testing of algebra, it ʼ s better for practical mathematics but not as good for the academic Love the paper and the focus on functional mathematics ... This style would ʻ force ʼ the adoption of developing what is the most neglected element of the mathematics curriculum

Open text feedback The literacy needs are quite high. There is a lot of questions that require a strong level of literacy. The literacy level is above the mathematical level [some questions] look difficult to assess - it might be difficult to compare alternative, valid solutions. Markers would need to exercise more professional judgement

• Administered to 750 Y10 pupils of all abilities • Retrospective mark scheme constructed • 750 scripts marked, sample 250 remarked • 750 scripts judged, sample 250 rejudged • Predicted grades

Mark scheme • Retrospective mark scheme (16 pages) • One examiner commissioned • Based on sample of student scripts ( N ≈ 30) • Trialled with two experienced teachers

Pool This notice was at one end of an indoor swimming pool. Explain why the notice is silly.

Answer Marks Examples and Comments Pool Marks may be awarded for each point relevant to the response. 1 st point: Accuracy Indicates that 1.000m is too 1 There are too many zeros accurate You don't need the decimal places or Explains why 1.000m is too 2 That would be to the nearest millimetre accurate a measurement Only 100 cm in one m 2 nd point: The social context Note: Both these marks may be awarded if appropriate. Indicates that feet and inches are 1 People don't understand old measurements too unfamiliar to be useful and/or Indicates that the extra zeros 1 People might think it meant 1000 metres could be confusing 3 rd point: The physical context Indicates that 1000m is too deep 1 This answer gets one mark because, although irrelevant, it is a true statement for the shallow end and indicates that the student has at least engaged with the context or Explains why 1.000m is too 2 The water will be choppy so the exact depth will vary accurate in this context 4 th point: Measurement Indicates that the two 1 3ft 3 ! inches is not exactly 1.000m measurements are not exactly equal or Shows working comparing the 2 3ft 3 ! inches is a bit less than 1.000m (with supporting working) measurements Note: Using the figures given, 3ft 3 ! inches = 1.004m; 1.000m = 3ft 3.34 or inches Observes that the figures given 3 You can't really change the 1.000m to inches because it says 'to 3 significant are accurate to only 3 significant figures' figures Maximum marks available for Pool : 8

Number of pupils 0 100 200 300 400 0 “Pool” marks 1 2 3 Mark 4 5 6 7 8

MARKING (750 scripts) • Two highly experienced and one experienced teacher • Two hours familiarisation and preparation • Paid per script, assuming 6 minutes per script REMARKING (249 scripts) • One highly experienced teacher

Marking outcome 35 Internal consistency = .720 30 (Cronbach ʼ s α ) 25 Number of pupils 20 15 10 5 0 0 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 50 Mark

Marking outcome Inter-rater reliability ( N = 249) Validity ( N = 750) r = .907 r = .718 50 40 40 30 30 Mark 1 Mark 20 20 10 10 0 0 0 10 20 30 40 50 <G G F E D C B A A* Mark 2 Predicted GCSE grade

JUDGING (750 scripts) • 15 teachers and researchers of varied experience • One hour familiarisation • 30 minute training session • 250 - 300 judgements each, assuming 72 seconds per judgement REJUDGING (250 scripts) • 5 teachers of varied experience

Judging outcome 2 1 Parameter estimate 0 Internal consistency = .958 -1 (Rasch Separation Reliability Coefficient) -2 0 200 400 600 'Worst' to 'best' script

Judging outcome Inter-rater reliability ( N = 249) Validity ( N = 750) r = .861 r = .708 2 2 1 1 Parameter estimate 1 Parameter estimate 0 0 -1 -1 -2 -2 -1 0 1 2 <G G F E D C B A A* Parameter estimate 2 Predicted GCSE grade

Judging and marking 750 scripts 250 scripts r = .860 r = .891 2 2 1 1 Parameter estimate Parameter estimate 0 0 -1 -1 -2 -2 0 10 20 30 40 50 0 10 20 30 40 50 Mark Mark

Assessment summary markin marking judging judging ʻ internal 0.720 0.720 0.958 0.958 consistency ʼ inter-rater 0.907 0.907 0.861 0.861 reliability validity 0.718 0.718 0.708 0.708 (c.f. grade) validity (judging 0.860 0.860 vs. marking)

Please indicate the influence of the listed features when judging your allocated pairs of students' work. 1. student displays originality and flair 2. presence of errors 3. use of formal notation 4. untidy presentation 5. structuredness of presentation 6. all questions attempted 7. student displays good factual recall 8. use of formal mathematical vocabulary strong positive influence ↕ strong negative influence

The problem solving problem: Can comparative judgement help? Ian - PowerPoint PPT Presentation

The problem solving problem: Can comparative judgement help? Ian Jones & Matthew Inglis Mathematics Education Centre Loughborough University I.Jones@lboro.ac.uk p Problem solving in mathematics How much can we trust opinion polls !! ??

Integrating Problem Solving 2020 Integrating Problem Solving 2020 Integrating Problem Solving

Southwark Judgement cases Quick overview of the ruling The Judgement of the House of Lords in

WP3 EX-POST Case studies Comparative Analysis Report Deliverable no.: 3.2 Comparative Analysis

Comparative Genomics: Comparative Genomics: Sequence, Structure, Sequence, Structure, and

Last time: Problem-Solving Problem solving: Goal formulation Problem formulation

Comparative Judgement An alternative approach to essay grading MEET THE research team Dr. Cox

Problem solving and search Chapter 3 Chapter 3 1 Outline Problem-solving agents Problem

Problem solving and search Chapter 3 Chapter 3 1 Outline Problem-solving agents Problem

Problem solving and search Chapter 3 Chapter 3 1 Outline Problem-solving agents Problem

between the policy and practice of assessment judgement in higher education Sue Bloxham The

Problem Solving and Search Chapter 3 Outline Problem-solving agents Problem formulation

You Can Teach Problem Solving and You Should Elizabeth Zwicky Great Circle, Inc Why do I

International Comparative Assessments 1 05/06/2015 1 International Comparative Assessments Key

Comparative Genomics Comparative Genomics Common Themes Gene and functional pathway

Comparative statics Comparative statics is the study of how endogenous variables respond to

Resumex COMPARATIVE OF EQUALITY AS + adjective + AS (to, tanto...quanto, como) COMPARATIVE OF

Layston C of E First School Assessment Presentation and Marking Policy Reviewed

TANNERS WOOD JMI SCHOOL MARKING AND PRESENTATION POLICY Adopted by Date Review Date

Functional Skill ICT Double Study Module 7: Combining and Time Lesson presenting information:

HOW TO USE FORMAL AND INFORMAL ASSESSMENT IN THE MUSIC CLASSROOM: TIPS FOR DEVELOPING MUSIC

Presentation Policy Coordinator Nicole Perkins Review Frequency Every 3 years Policy First

Stretford High School Assessment, Marking, Feedback and Presentation Policy Ratified by the

Presentation, Marking and Differentiation Guidelines Policy 41 Danielle Warren (Headteacher)

Master A Master Agreements: eements: Succe Success ss and Cha and Challenge llenges