Semantic Krippendorff’s α for measuring inter- rater agreement in SNOMED CT coding studies Daniel Karlsson a , Kirstine Rosenbeck Gøeg b , Håkan Örman a and Anne Randorff Højen b a Department of Biomedical Engineering, Linköping University, Sweden b Department of Health Science and Technology, Aalborg University, Denmark
Coding Variation and Inter- rater Agreement • Judgement variables • Differences in use of terms/codes of a terminology/coding system • Consistency important for reuse • Inter-rater agreement (or reliability) measures quantify these differences
Inter-rater Agreement 399210005 | neurological investigation (procedure) | 268970009 | central nervous system examination (procedure) | • Percentage agreement (“Simple 271888005 | on examination - agreement, proportion of cases in agreement) neurological (finding) | • Chance agreement 271924005 | neurological test • Statistical significance finding (observable entity) | • Cohen’s K & co. 18373002 | nervous system function (observable entity) | • Two coders, nominal scale ... • Weighted K • Paradoxes • ...
Agreement and Semantic 75367002 | blood pressure (observable entity) | Distance 251076008 | non-invasive arterial pressure (observable entity) | • Some kinds of coding 392570002 | blood pressure variation are worse than finding (finding) | others 6973005 | blood pressure taking (procedure) | • E.g., different granularity vs 371911009 | measurement of different entity types blood pressure using cuff method (procedure) | • Coding variation impact is use- ... case dependent
SNOMED CT and Inter-rater Agreement Studies • Fung 2005 and Vikström 2007 • Cohen’s Κ • Hwang 2006 and Chiang 2006 • Percentage agreement • Andrews 2007 • Krippendorff’s α
Semantic Krippendorff’s α 75367002 | blood pressure (observable entity) | 251076008 | non-invasive arterial pressure (observable entity) | • Difference function based on SNOMED CT hierarchy 2 2 = 4 • IsA-levels up to the least common subsumer 75367002 | blood pressure (observable entity) | • Ordinal scale Krippendorff’s α 6973005 | blood pressure taking (procedure) | 5 2 = 25 lcsPath ck ≝ (max (min(dist( c , LCS( c , k ))), min(dist( k , LCS( c , k ))))) 2
Distance calculation LCS 138875005 | SNOMED CT Concept | min(dist(c,k)) = 3 min(dist(c,k)) = 5 lcsPath ck = max(3, 5) 2 = 5 2 = 25 75367002 | blood pressure (observable entity) | 6973005 | blood pressure taking (procedure) |
A B Material human human • Two human coders A and B • 490 procedure codes cross- A C mapped from NCSP to SNOMED error CT • Percentage agreement 72 % • Datasets A D • Dataset AB: human coders random • Dataset AC: coder A + novice coding errors introduced • Dataset AD: coder A + random codes
A B Material human human Examples of errors • Two human coders A and B • 490 procedure codes cross- 399210005 | neurological investigation (procedure) | A C 271888005 | on examination - neurological (finding) | mapped from NCSP to SNOMED error 252641007 | gastrointestinal transit study (procedure) | CT 83909001 | gastrointestinal transit, function (observable entity) | • Percentage agreement 72 % 275155009 | needle biopsy of kidney (procedure) | • Datasets 309269002 | kidney biopsy sample (specimen) | A D • Dataset AB: human coders random • Dataset AC: coder A + novice coding errors introduced • Dataset AD: coder A + random codes
A B Material human human • Two human coders A and B • 490 procedure codes cross- A C mapped from NCSP to SNOMED error CT • Percentage agreement 72 % • Datasets A D • Dataset AB: human coders random • Dataset AC: coder A + novice coding errors introduced • Dataset AD: coder A + random codes
Method • Implementations for R and MatLab * • Difference function matrix precomputed • Bootstrapping, 10 000 iterations * https://github.com/LiU-IMT/semantic_kripp_alpha
Results Dataset AB Dataset AC Dataset AD (human-human) (human-novice) (human-random) Nominal α, 0.72 (0.68-0.76) 0.72 (0.68-0.76) 0.72 (0.68-0.76) mean (95 % CI) Semantic α, 0.89 (0.86-0.92) 0.84 (0.80-0.88) 0.47 (0.41-0.53) mean (95 % CI)
Discussion • Paradoxes: prevalence and bias • Only a few codes used > 1 • Real-life datasets • Vs. tailored datasets • Constant number of exact matches • Value of the Semantic α
Discussion • Difference function • Difference function should match use case • Human interpretation of difference vs. SNOMED CT aggregation lcsPath Lin 250411006 | bone marrow finding (finding) | vs 1 0.32 106048009 | respiratory finding (finding) | 8840004 | decreased breath sounds (finding) | vs 1 0.89 65503000 | absent breath sounds (finding) |
Conclusion • Semantic Krippendorff’s α captures the intuition that distance matters • Apples and oranges vs Fruit • The distance function needs consideration
https://github.com/LiU-IMT/semantic_kripp_alpha
Recommend
More recommend