Improving the quality of assessments in medical education Professor Reg Dennick Professor of Medical Education School of Medicine University of Nottingham United Kingdom
Overview Assessment of learning outcomes & competencies Curriculum alignment Quality assurance of Assessment Standard setting Validity & Reliability Assessment as measurement Psychometrics as a method of quality control Classical test theory: reducing ‘noise’ Item analysis Generalisability theory Factor analysis Item Response theory: Rasch modeling Summary
Bloom’s Taxonomy ‘The Domains of Learning ’ • Cognitive (knowledge) • Psychomotor (practical skills) • Affective (attitudes)
The cognitive (knowledge) domain (2001)
Outcome based curricula General Medical Council (1993, 2002, 2009)
General Medical Council 2009 • The doctor as scholar and scientist “ Doing the right thing ” • The doctor as practitioner “Doing the thing right” • The doctor as professional • “ The right person doing it ”
CANMEDS Curriculum (2005) • Medical Expert Role • Communicator Role • Collaborator Role • Manager Role • Health Advocate Role • Scholar Role • Professional Role
Using outcomes and competencies leads to ‘Constructive Alignment’
Curriculum Alignment Taught Planned Planned Taught Learned Assessed Assessed Learned
Quality Assessment • 1980s management culture • Quality of public services scrutinised • Teaching Quality Audit in Higher Education • ‘ How can you evaluate the effectiveness of a course if you don’t know what its outcomes should be?’ • Accountability • Patient safety • Insurance/Indemnity • Professional bodies (GMC)
The Professional challenge Can defining and listing outcomes and competencies equate to a definition of professional performance that can be objectively measured?
Assessing competence Essays MCQs **** OSCEs DOPs Work based MiniCEX assessments MSF (WBAs) CBD Simulation
From Novice to Expert The Dreyfus Model Miller’s Triangle • Level 1 Novice – Rigid adherence to taught rules or plans: ‘context -free elements’ – Little situational perception • Level 2 Advanced Beginner – Guidelines for action based on attributes or aspects (aspects are global characteristics of situations recognisable only after some prior experience) – Situational perception still limited • Level 3 Competent – Coping with crowdedness (pressure) – Now sees actions at least partially in terms of longer- term goals – Conscious deliberate planning and problem solving • Level 4 Proficient – See situations holistically rather than in terms of aspects – See what is most important in a situation – Uses intuition and ‘know - how’ • Level 5 Expert – No longer predominantly relies on rules, guidelines or maxims – Intuitive grasp of situations based on deep tacit understanding – Analytic approaches used only in novel situation or when problems occur
Assessment challenges Objectivity Validity Reliability Assessor training/skills Psychometric evaluation
Improve the assessors
The Examination Cycle Learning outcomes Feedback reports Teaching experiences New Question writing Item analysis Factor analysis Cluster analysis Bank Exam drafting Rasch Modelling G study Piloting Post-examination analysis Standard setting Assessment Modified from: Tavakol & Dennick (2011) “Post - examination analysis of objective tests”. Medical Teacher. 33(6):447 -58
How do you define the pass mark for an exam? STANDARD SETTING
Definition of Standards A standard is a statement about whether an examination performance is good enough for a particular purpose: – A defined score that serves as the boundary between passing and failing
Standards • Standards are based on judgments about examinees’ performances against a social or educational constructs: – Student ready for next phase: progression – Student ready for graduation – Competent practitioner
The Standard Setting Problem Competent Incompetent Pass Fail
Setting the pass mark The method has to be: • Defensible • Credible • Supported by body of evidence • Feasible • Acceptable to all stakeholders
Standard Setting Methods Relative methods • based on judgments about groups of test takers Absolute methods • based on judgments about test questions • based on judgments about the performance of individual examinees Borderline group method Contrasting group method
Types of Standards • Relative standards/ norm referenced methods: – Based on a comparison among the performances of examinees – A set proportion of candidates fails regardless of how well they perform. • Absolute standards/ criterion referenced methods: – Based on how much the examinees know – Candidates pass or fail depending on whether they meet specified criteria.
Relative Method Norm referencing - assessing students in relation to each other, to the norm or group average. Marks are normally distributed and grade boundaries inserted afterwards according to defined standards. Students pass or fail and are graded depending on the norm.
Absolute Method Criterion referencing - each student assessed against specific criteria of competence. Students pass or fail depending on the achievement of a minimum number of specified competencies.
Absolute Methods Judgments are made about individual test items – Angoff’s method – Ebel’s method
Angoff’s method • Select the judges • Discuss • Purpose of the test • Nature of the examinees • What constitutes adequate/inadequate knowledge • The borderline candidate
The ‘borderline’ candidate • How do you define this concept? • It is based on past experience and accumulated knowledge of assessments. • It is a subjective judgement made more reliable by using multiple assessors. • It is based on the consensus of experts.
Ebel’s Method • Difficulty-Relevance decisions – The judges make judgments about the proportion of items in each category that borderline test- takers would have answered correctly – Judges read each item and assign it to one of the categories in the classification table – Calculate passing score
Ebel’s method Easy Medium Hard Essential Important Nice to know
Ebel’s method For each category an estimate of the proportion of questions the ‘borderline’ candidate gets right is made . Easy Medium Hard Essential 0.95 0.85 0.80 Important 0.90 0.80 0.75 Nice to 0.80 0.60 0.50 know
Ebel’s Method The scores for each question are multiplied by the appropriate proportion and summed. Category Proportion Right # Questions Score Essential Easy 0.95 x 3 = 2.85 Medium 0.85 x 2 = 1.70 Hard 0.80 x 2 = 1.60 Important Easy 0.90 x 3 = 2.70 Medium 0.80 x 4 = 3.20 Hard 0.75 x 4 = 3.00 Nice to know Easy 0.80 x 2 = 1.60 Medium 0.60 x 2 = 1.20 Hard 0.50 x 3 = 1.50 25 19.20 Pass mark = 19.20/25 = 76.8%
Borderline Group method • Useful for OSCEs. • OSCE checklist has a ‘global’ score box: eg. pass, borderline, fail • Examiner(s) complete checklist but also judge a global score • At end of exam scores in the ‘borderline’ category are averaged to give the pass mark.
Contrast Group Method In an OSCE exam students are scored using a checklist but also given a GLOBAL score of PASS or FAIL. After the exam the two distributions are plotted and the pass mark determined from the overlap between the two groups Pass N Pass group mark Fail group
Standard setting: practical implications Choice of standard setting methods depends on: – Credibility – Resources available – High stakes level of exam
Post exam analysis How can we improve the quality of assessments? PSYCHOMETRICS
Psychometrics • Psychometrics is concerned with the quantitative characteristics of assessments as well as attitudes and psychological traits. • Psychometrics is concerned with the construction and validation of measurement tools such as exams, survey questionnaires and personality assessments. • The psychometric soundness of a test refers to how reliably and accurately a test measures what it purports to measure. • Psychometricians are increasingly being required to monitor and improve the quality of exams.
How can psychometrics improve student assessment? • Aberrant questions and stations can be detected and then restructured or discarded. • By improving the practical organisation of examinations. • By improving the credibility of the competence-based pass mark. • By improving the validity and reliability of checklists, rating and global rating scales. • By recognising, isolating and estimating measurement errors associated with students’ scores • By identifying the constructs within tests. • By relating student ability and item difficulty within tests.
Measures of Variability • Both the mark distributions have a mean of 50, but show a different pattern. • Examination A has a wide range of marks (some below 20 and some above 90), Examination B shows few students at either extreme. • Examination B is more homogenous than examination B. 39
Recommend
More recommend