text mining on clinical data
play

Text Mining on Clinical Data Robert McHardy Outline Motivation - PowerPoint PPT Presentation

Institut fr Maschinelle Sprachverarbeitung Text Mining on Clinical Data Robert McHardy Outline Motivation Medical Entity Recognition Anonymization of Medical Reports Knowledge-based Biomedical Word Sense Disambiguation


  1. Institut für Maschinelle Sprachverarbeitung Text Mining on Clinical Data Robert McHardy

  2. Outline • Motivation • Medical Entity Recognition • Anonymization of Medical Reports • Knowledge-based Biomedical Word Sense Disambiguation • Extraction of Potential Adverse Drug Events • Resources Universität Stuttgart 5.12.2017 2

  3. Motivation — Different Users Universität Stuttgart 5.12.2017 3

  4. Motivation — Why do we need Text Mining on Clinical Data? • Doctors need to know if a drug is safe to use or not Universität Stuttgart 5.12.2017 4

  5. Motivation — Why do we need Text Mining on Clinical Data? • Doctors need to know if a drug is safe to use or not • As fast as possible Universität Stuttgart 5.12.2017 4

  6. Motivation — Why do we need Text Mining on Clinical Data? • Doctors need to know if a drug is safe to use or not • As fast as possible • We don‘t want to suffer from unsafe drugs Universität Stuttgart 5.12.2017 4

  7. Motivation — Why do we need Text Mining on Clinical Data? • Doctors need to know if a drug is safe to use or not • As fast as possible • We don‘t want to suffer from unsafe drugs • Researchers want to use the data Universität Stuttgart 5.12.2017 4

  8. Motivation — Why do we need Text Mining on Clinical Data? • Doctors need to know if a drug is safe to use or not • As fast as possible • We don‘t want to suffer from unsafe drugs • Researchers want to use the data • It has to be anonymized Universität Stuttgart 5.12.2017 4

  9. Motivation — PubMed, again! Universität Stuttgart 5.12.2017 5

  10. Unified Medical Language System Metathesaurus (UMLS) Universität Stuttgart 5.12.2017 6

  11. Medical Entity Recognition — Overview • Abacha and Zweigenbaum: Consists of two parts • Detecting phrases referring to medical entities • Assigning semantic categories to the found entities Universität Stuttgart 5.12.2017 7

  12. Medical Entity Recognition — Overview Universität Stuttgart 5.12.2017 8

  13. Medical Entity Recognition — Overview Type 1 diabetes T1D Diabetes type 1 IDDM Juvenile diabetes Universität Stuttgart 5.12.2017 8

  14. Medical Entity Recognition — Noun Phrase Chunking Pharmacodynamic studies, including positron-emission tomography (PET) and computed tomography (CT) […] Universität Stuttgart 5.12.2017 9

  15. Medical Entity Recognition — Noun Phrase Chunking Pharmacodynamic studies, including positron-emission tomography (PET) and computed tomography (CT) […] • Many tools for NP chunking available Universität Stuttgart 5.12.2017 9

  16. Medical Entity Recognition — Noun Phrase Chunking Pharmacodynamic studies, including positron-emission tomography (PET) and computed tomography (CT) […] • Many tools for NP chunking available • Maximum recall is desired Universität Stuttgart 5.12.2017 9

  17. Medical Entity Recognition — Noun Phrase Chunking Pharmacodynamic studies, including positron-emission tomography (PET) and computed tomography (CT) […] • Many tools for NP chunking available • Maximum recall is desired • Open-domain tools like IMS‘ TreeTagger are suitable Universität Stuttgart 5.12.2017 9

  18. Medical Entity Recognition — MetaMap and the UMLS • MetaMap is a tool which maps noun phrases in raw text to UMLS concepts • This is done according to a matching score Universität Stuttgart 5.12.2017 10

  19. Medical Entity Recognition — MetaMap and the UMLS • Three problems with MetaMap Universität Stuttgart 5.12.2017 11

  20. Medical Entity Recognition — MetaMap and the UMLS • Three problems with MetaMap • Noun chunking performance is worse than with specialized NLP tools Universität Stuttgart 5.12.2017 11

  21. Medical Entity Recognition — MetaMap and the UMLS • Three problems with MetaMap • Noun chunking performance is worse than with specialized NLP tools • Medical entity detection often finds verbs and general words which aren‘t MEs Universität Stuttgart 5.12.2017 11

  22. Medical Entity Recognition — MetaMap and the UMLS • Three problems with MetaMap • Noun chunking performance is worse than with specialized NLP tools • Medical entity detection often finds verbs and general words which aren‘t MEs • Some ambiguity is left Universität Stuttgart 5.12.2017 11

  23. Medical Entity Recognition — MetaMap and the UMLS • Three problems with MetaMap • Noun chunking performance is worse than with specialized NLP tools • Medical entity detection often finds verbs and general words which aren‘t MEs • Some ambiguity is left • UMLS can provide several concepts for a term Universität Stuttgart 5.12.2017 11

  24. Medical Entity Recognition — MetaMap and the UMLS • Three problems with MetaMap • Noun chunking performance is worse than with specialized NLP tools • Medical entity detection often finds verbs and general words which aren‘t MEs • Some ambiguity is left • UMLS can provide several concepts for a term • and several semantic categories for a concept Universität Stuttgart 5.12.2017 11

  25. Medical Entity Recognition — MetaMap and the UMLS Pharmacodynamic studies, including positron-emission tomography (PET) and computed tomography (CT) […] Cold temperature Common cold Cold ( term) Cold storage ( term) Cold storage Chronic obstructive lung disease Universität Stuttgart 5.12.2017 12

  26. Medical Entity Recognition — MetaMap+ • Use tools like TreeTagger for the NP chunking Universität Stuttgart 5.12.2017 13

  27. Medical Entity Recognition — MetaMap+ • Use tools like TreeTagger for the NP chunking • Filter NPs with a stop-word list Universität Stuttgart 5.12.2017 13

  28. Medical Entity Recognition — MetaMap+ • Use tools like TreeTagger for the NP chunking • Filter NPs with a stop-word list • Search in specialized lists for candidate terms Universität Stuttgart 5.12.2017 13

  29. Medical Entity Recognition — MetaMap+ • Use tools like TreeTagger for the NP chunking • Filter NPs with a stop-word list • Search in specialized lists for candidate terms • Annotate entities with MetaMap Universität Stuttgart 5.12.2017 13

  30. Medical Entity Recognition — MetaMap+ • Use tools like TreeTagger for the NP chunking • Filter NPs with a stop-word list • Search in specialized lists for candidate terms • Annotate entities with MetaMap • Filter frequent errors and too broad semantic types Universität Stuttgart 5.12.2017 13

  31. Medical Entity Recognition — MetaMap+ • Voting mechanism to disambiguate semantic categories Universität Stuttgart 5.12.2017 14

  32. Medical Entity Recognition — Support Vector Machines (SVMs) • Word level features: • words of the NP • number of words of the NP • window of words around the NP • Orthographical features: • first letter capitalized • all letters upper-/lowercase • contains abbreviation(s) • POS tags Universität Stuttgart 5.12.2017 15

  33. Medical Entity Recognition — BIO-CRFs Pharmacodynamic studies, including positron-emission tomography (PET) and computed tomography (CT) […] • Words are annotated with the the tags B, I and O Universität Stuttgart 5.12.2017 16

  34. Medical Entity Recognition — BIO-CRFs Pharmacodynamic studies, including positron-emission tomography (PET) and computed tomography (CT) […] • Words are annotated with the the tags B, I and O • B-x: Begin of a phrase of class x Universität Stuttgart 5.12.2017 16

  35. Medical Entity Recognition — BIO-CRFs Pharmacodynamic studies, including positron-emission tomography (PET) and computed tomography (CT) […] • Words are annotated with the the tags B, I and O • B-x: Begin of a phrase of class x • I-x: Intermediate part of a phrase of class x Universität Stuttgart 5.12.2017 16

  36. Medical Entity Recognition — BIO-CRFs Pharmacodynamic studies, including positron-emission tomography (PET) and computed tomography (CT) […] • Words are annotated with the the tags B, I and O • B-x: Begin of a phrase of class x • I-x: Intermediate part of a phrase of class x • O: Outside entities Universität Stuttgart 5.12.2017 16

  37. Medical Entity Recognition — BIO-CRFs • Word level features: • The word itself • Window of words • Lemmas • Orthographical features: • Upper/lowercase • contains a digit • pre- and suffixes • POS tags • (Semantic category of word (provided by MetaMap+)) Universität Stuttgart 5.12.2017 17

  38. Medical Entity Recognition — Evaluation • Corpus contains discharge summaries and progress notes • De-identified and annotated by hand • Entities: Problem, Treatment and Test • Overall 76,665 sentences Universität Stuttgart 5.12.2017 18

  39. Medical Entity Recognition — Evaluation Setting Precision Recall F-Score MetaMap 15.52 16.10 15.80 MetaMap+ 48.68 56.46 52.28 SVM 43.65 47.16 45.33 BIO-CRF 70.15 83.31 76.17 BIO-CRF-Hybrid 72.18 83.78 77.55 Universität Stuttgart 20.01.2016 19

  40. Anonymization of Medical Reports Universität Stuttgart 20.01.2016 20

  41. Anonymization of Medical Reports — What is anonymization? • De-Identification Universität Stuttgart 5.12.2017 21

  42. Anonymization of Medical Reports — What is anonymization? • De-Identification • Completely remove all personal health information Universität Stuttgart 5.12.2017 21

Recommend


More recommend