using umls cuis for wsd in the biomedical domain
play

Using UMLS CUIs for WSD in the Biomedical Domain Bridget T. McInnes - PowerPoint PPT Presentation

Using UMLS CUIs for WSD in the Biomedical Domain Bridget T. McInnes Ted Pedersen and John Carlis University of Minnesota Twin Cities and University of Minnesota Duluth 09/11/07 1 What is WSD? The culture count doubled. Culture


  1. Using UMLS CUIs for WSD in the Biomedical Domain Bridget T. McInnes ¹ Ted Pedersen ² and John Carlis ¹ University of Minnesota Twin Cities ¹ and University of Minnesota Duluth ² 09/11/07 1

  2. What is WSD? The culture count doubled. Culture Anthropological Laboratory Culture Culture Sense Inventory 09/11/07 2

  3. Sense Inventory: UMLS Unified Medical Language System contains a list of Concept Unique Identifiers (CUIs) which are concepts (senses) associated with a word or term Culture Anthropological Laboratory Culture (C0010453) Culture (C0430400) Sense Inventory: UMLS 09/11/07 3

  4. UMLS: Semantic Network framework encoded with different semantic and syntactic structures Anthropological Laboratory Culture (C0010453) Culture (C0430400) Semantic Type(s): Semantic Type(s): Idea or Concept Laboratory Procedure Semantic Type: Mental Process semantic relation: semantic relation: assesses_effect_of result_of 09/11/07 4

  5. MetaMap Concept mapping system maps text to concepts in the UMLS provides a wealth of information for all words in a document phrasal information Part of speech (POS) of a word CUI of a word Semantic types of a word 09/11/07 5

  6. Example The culture count doubled count CUI: Count (C0750480) semantic type: Idea or Concept (idcn) pos: noun doubled CUI: Duplicate (C0205173) semantic type: Functional Concept (ftcn) pos: verb 09/11/07 6

  7. Supervised Approaches Leroy and Rindflesch 2005 Semantic types, semantic relations, part- of-speech, and head information (from MetaMap) Joshi, Pedersen and Maclin 2005 unigrams in the same sentence as the ambiguous word in the same abstract as the ambiguous word Liu, Teller and Friedman 2004 unigrams, direction and orientation of unigrams and collocations 09/11/07 7

  8. Questions 09/11/07 8

  9. Questions Would UMLS CUIs be an improvement over semantic types? 09/11/07 9

  10. Questions Would UMLS CUIs be an improvement over semantic types? Would the biomedical specific feature CUIs be an improvement over the more general feature unigrams? 09/11/07 10

  11. Questions Would UMLS CUIs be an improvement over semantic types? Would the biomedical specific feature CUIs be an improvement over the more general feature unigrams? Would increasing the context window in which surrounding CUIs are found improve the results? 09/11/07 11

  12. Our supervised approach Algorithm: Naïve Bayes from WEKA datamining package using 10 fold cross validation Features: UMLS CUIs obtained from MetaMap that occur in the same sentence as the ambiguous word more than one time (s-1-cui) that occur in the same abstract as the ambiguous word more than one time (a-1-cui) 09/11/07 12

  13. Example ... The culture count doubled. The cells multiplied by twice the expected rate ... Abstract: Sentence: C0750480 Count (2) C0750480 Count (2) C0205173 Duplicate (1) C0205173 Duplicate (3) ... C0007634 Cells (4) C1517001 Expected (1) C1521828 Rate (3) ... 09/11/07 13

  14. Algorithm Example Instances Extract Relevant CUIs Test Data Training Data Naïve Bayes Algorithm Sense Tagged Test Data 09/11/07 14

  15. Dataset National Library of Medicine's Word Sense Disambiguation (NLM-WSD) Dataset 50 words from the 1998 MEDLINE abstracts 100 instances for each of the 50 words Each instance has been tagged by MetaMap The target word was manually assigned a UMLS concept or None Average number of concepts per ambiguous word is 2.26 (not including None) 09/11/07 15

  16. Data subsets Liu subset Liu, Teller and Friedman 2004 22 out of the 50 words in NLM-WSD Leroy subset Leroy and Rindflesch 2005 15 out of the 50 words in NLM-WSD Joshi subset Joshi, Pedersen and Maclin 2005 28 out of the 50 words in NLM-WSD (union of Leroy and Liu subsets) 09/11/07 16

  17. Results 17

  18. Results for Question 1 Would CUIs be an improvement over semantic types? 09/11/07 18

  19. Comparative results with Leroy and Rindflesch 2005 Accuracy using Leroy subset 75 74.5% 70 71% 65 65.6% 60 55 50 45 40 35 30 25 20 15 10 5 0 s-1-cui a-1-cui s-0-Leroy 09/11/07 19

  20. Significance of Differences Pairwise t-test s-1-cui (71%) and s-0-Leroy (65.6%) p <= 0.001 a-1-cui (74.5%) and s-0-Leroy (65.6%) p <= .00005 09/11/07 20

  21. Results for Question 2 Would the biomedical specific feature CUIs be an improvement over the more general feature unigrams? 09/11/07 21

  22. Comparative results with Joshi, Pedersen and Maclin 2005 Accuracy using Joshi subset 90 80 82.5% 80% 77.7% 79.3% 70 60 50 40 30 20 10 0 s-1-cui a-1-cui s-4-Joshi a-4-Joshi 09/11/07 22

  23. Significance of Results Pairwise t-test s-1-cui (77.7%) and s-4-Joshi (79.3%) p < 0.135 a-1-cui (80.0%) and a-4-Joshi (82.5%) p < 0.003 09/11/07 23

  24. Results for Question 3 Would increasing the size of the context window in which surrounding CUIs are found improve the results, as seen by Joshi, Pedersen and Maclin using unigrams? 09/11/07 24

  25. Comparative results between size of context window Accuracy using NLM-WSD dataset 80 83.3% 85.6% 70 60 50 40 30 20 10 0 s-1-cui a-1-cui 09/11/07 25

  26. Significance of Results Pairwise t-test s-1-cui (83.3%) and a-1-cui (85.6%) p < 0.0006 09/11/07 26

  27. Comparative results with Liu, Teller and Friedman 2004 Accuracy using the Liu subset 90 80 85.5% 81.9% 70 60 50 40 30 20 10 0 a-1-cui s-0-Liu 09/11/07 27

  28. Significance of Results Pairwise t-test a-1-cui (81.9%) and s-1-Liu (85.5%) p < 0.001 09/11/07 28

  29. Conclusions CUIs result in more accurate disambiguation than semantic types and are comparable to unigrams Incorporating more surrounding context improves the results MetaMap generates useful information that can used as features for supervised disambiguation 09/11/07 29

  30. Future Work Combination approach Exploring additional UMLS features Unsupervised approach using information from the UMLS 09/11/07 30

  31. Software and Data CuiTools version 0.05 http://cuitools.sourceforge.net NLM-WSD Dataset http://wsd.nlm.nih.gov Pairwise t-test http://www.quantitativeskills.com/sisa/stati stics/ 09/11/07 31

Recommend


More recommend