tutorial t5 the unified medical language system umls and
play

Tutorial T5 The Unified Medical Language System (UMLS) and the - PowerPoint PPT Presentation

University of Pisa, Italy June 12, 2007 NETTAB 2007 - A Semantic Web for Bioinformatics Tutorial T5 The Unified Medical Language System (UMLS) and the Semantic Web Olivier Bodenreider Olivier Bodenreider Lister Hill National Center Lister


  1. University of Pisa, Italy June 12, 2007 NETTAB 2007 - A Semantic Web for Bioinformatics Tutorial T5 The Unified Medical Language System (UMLS) and the Semantic Web Olivier Bodenreider Olivier Bodenreider Lister Hill National Center Lister Hill National Center for Biomedical Communications for Biomedical Communications Bethesda, Maryland - Bethesda, Maryland - USA USA

  2. Outline Outline � Information integration in biomedicine � Information integration in biomedicine � Some issues: naming, normalization, mapping Some issues: naming, normalization, mapping � � Semantic Web perspective Semantic Web perspective � � Terminology integration in biomedicine Terminology integration in biomedicine � Unified Medical Language System Unified Medical Language System � Some differences between UMLS and SW Some differences between UMLS and SW � Lister Hill National Center for Biomedical Communications 2 Lister Hill National Center for Biomedical Communications Lister Hill National Center for Biomedical Communications

  3. Some issues: naming, normalization, mapping Information integration in biomedicine

  4. � Naming Naming � Many biomedical entities have several names � Many biomedical entities have several names (synonymy) (synonymy) � Drug names Drug names � � Gene names Gene names � � Disease names Disease names � � … … � � A given name may refer to several different � A given name may refer to several different entities (polysemy) entities (polysemy) � Nail (body part) Nail (body part) � � Nail (medical device) Nail (medical device) � Lister Hill National Center for Biomedical Communications 4 Lister Hill National Center for Biomedical Communications Lister Hill National Center for Biomedical Communications

  5. Brand names for paracetamol paracetamol (acetaminophen) (acetaminophen) Brand names for http://en.wikipedia.org/wiki/List_of_paracetamol_brand_names Lister Hill National Center for Biomedical Communications 5 Lister Hill National Center for Biomedical Communications Lister Hill National Center for Biomedical Communications

  6. Names for dystrophin dystrophin Names for http://www.ncbi.nlm.nih.gov/sites/entrez Lister Hill National Center for Biomedical Communications 6 Lister Hill National Center for Biomedical Communications Lister Hill National Center for Biomedical Communications

  7. Names for renal cell carcinoma renal cell carcinoma Names for http://www.clininfo.co.uk/clue5/clue.htm Lister Hill National Center for Biomedical Communications 7 Lister Hill National Center for Biomedical Communications Lister Hill National Center for Biomedical Communications

  8. Entity recognition Entity recognition � Identifying biomedical entities in text Identifying biomedical entities in text � � Names entity recognition Names entity recognition � � Tagging Tagging “ “mentions mentions” ” � � Semantic annotation Semantic annotation � � Supported by terminology Supported by terminology � � Collects the names used in the domain Collects the names used in the domain � � Often incompletely Often incompletely � � Example: Example: BioCreative BioCreative � � 1A 1A – – Gene name identification Gene name identification � � 2GM 2GM – – Gene mention tagging Gene mention tagging � Lister Hill National Center for Biomedical Communications 8 Lister Hill National Center for Biomedical Communications Lister Hill National Center for Biomedical Communications

  9. � Normalization Normalization � Biomedical entities are identified by unique � Biomedical entities are identified by unique identifiers in various terminology systems identifiers in various terminology systems � Resolve names into identifiers (in a given � Resolve names into identifiers (in a given namespace) namespace) � Supported (in part) by terminology resources � Supported (in part) by terminology resources � Example: Example: BioCreative BioCreative � � 1B and 2GN 1B and 2GN – – Gene Normalization Gene Normalization � Lister Hill National Center for Biomedical Communications 9 Lister Hill National Center for Biomedical Communications Lister Hill National Center for Biomedical Communications

  10. Identifier for paracetamol paracetamol (acetaminophen) (acetaminophen) Identifier for Master Drug Data Base. Medi-Span 5005 Acetaminophen FDA National Drug Code Directory 50612 PARACETAMOL FDA Structured Product Labels 362O9ITL9D ACETAMINOPHEN First DataBank NDDF Plus 001605 Acetaminophen SNOMED Clinical Terms 90332006 Acetaminophen (product) SNOMED Clinical Terms 387517004 Acetaminophen (substance) VA National Drug File 4017513 ACETAMINOPHEN Source: RxNorm database (5/3/2007) Lister Hill National Center for Biomedical Communications 10 Lister Hill National Center for Biomedical Communications Lister Hill National Center for Biomedical Communications

  11. for dystrophin dystrophin Identifier for Identifier http://www.ncbi.nlm.nih.gov/sites/entrez Lister Hill National Center for Biomedical Communications 11 Lister Hill National Center for Biomedical Communications Lister Hill National Center for Biomedical Communications

  12. for renal cell carcinoma renal cell carcinoma Identifier for Identifier http://www.clininfo.co.uk/clue5/clue.htm Lister Hill National Center for Biomedical Communications 12 Lister Hill National Center for Biomedical Communications Lister Hill National Center for Biomedical Communications

  13. � Mapping / Integration Mapping / Integration � Identify equivalent entities across systems Identify equivalent entities across systems � (across namespaces) (across namespaces) � Shared identifiers Shared identifiers � � Existing mappings (e.g., SNOMED CT to ICD Existing mappings (e.g., SNOMED CT to ICD- -9 9- -CM) CM) � � Ontology alignment techniques (lexical + structural) Ontology alignment techniques (lexical + structural) � � Align equivalent entities Align equivalent entities � � Pairwise: mapping Pairwise: mapping � � More broadly: integration More broadly: integration � � Forms the basis for information integration in the � Forms the basis for information integration in the Semantic Web (mashups mashups) ) Semantic Web ( Lister Hill National Center for Biomedical Communications 13 Lister Hill National Center for Biomedical Communications Lister Hill National Center for Biomedical Communications

  14. Identifier for paracetamol paracetamol (acetaminophen) (acetaminophen) Identifier for Master Drug Data Base. Medi-Span 5005 Acetaminophen FDA National Drug Code Directory 50612 PARACETAMOL FDA Structured Product Labels 362O9ITL9D ACETAMINOPHEN First DataBank NDDF Plus 001605 Acetaminophen SNOMED Clinical Terms 90332006 Acetaminophen (product) SNOMED Clinical Terms 387517004 Acetaminophen (substance) VA National Drug File 4017513 ACETAMINOPHEN RxNorm 161 Acetaminophen Lister Hill National Center for Biomedical Communications 14 Lister Hill National Center for Biomedical Communications Lister Hill National Center for Biomedical Communications

  15. for dystrophin dystrophin Identifier for Identifier http://www.ncbi.nlm.nih.gov/sites/entrez Lister Hill National Center for Biomedical Communications 15 Lister Hill National Center for Biomedical Communications Lister Hill National Center for Biomedical Communications

  16. for renal cell carcinoma renal cell carcinoma Identifier for Identifier 645875019 379798014 379801015 379800019 379797016 379803017 379802010 http://www.clininfo.co.uk/clue5/clue.htm Lister Hill National Center for Biomedical Communications 16 Lister Hill National Center for Biomedical Communications Lister Hill National Center for Biomedical Communications

  17. Information integration Semantic Web perspective in biomedicine

  18. HCLS mashup mashup HCLS PDSPki NeuronDB Reactome Gene Ontology BAMS Allen Brain BrainPharm Antibodies Atlas Entrez MeSH Gene NC Annotations PubChem Mammalian Phenotype SWAN AlzGene Homologene Publications http://esw.w3.org/topic/HCLS/HCLSIG_DemoHomePage_HCLSIG_Demo Lister Hill National Center for Biomedical Communications 18 Lister Hill National Center for Biomedical Communications Lister Hill National Center for Biomedical Communications

  19. Shared identifiers Example Example Shared identifiers GO Lister Hill National Center for Biomedical Communications 19 Lister Hill National Center for Biomedical Communications Lister Hill National Center for Biomedical Communications

Recommend


More recommend