CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC ANNOTATION ADAPTATION Cédric ¡PRUSKI ¡ ¡ Dri%-‑a-‑LOD@EKAW ¡2016, ¡ ¡ November ¡20 th , ¡Bologna, ¡Italy ¡ 1
MOTIVATION Outdated mappings and ? annotations may trigger undesirable results in K S ? K T biomedical systems = Crucial maintaining mappings Malignant malignancy and annotations valid neoplasm Malignant neoplasm data ¡ data malignancy Large size and complexity inaccessible malignancy Prevents a totally manual maintenance 2
PROBLEMATIC • What is the impact of concept drift (or ontology evolution) on ontology mappings and semantic annotations? • Quantitative • Qualitative • How can we formally characterize concept drift? • Basic changes (Addition/Deletion of concepts) • Complex changes (Split, merge, move of concepts) • Can we reuse information that characterizes concept drift to adapt ontology mappings and semantic annotations? • Prevention of re-alignment / re-annotation of whole datasets 3
AGENDA Concept drift for mapping adaptation ① a. DynaMO research project b. Change patterns Concept drift for semantic annotation maintenance ② a. ELISA research project b. Background knowledge Discussion ③ a. Concept drift for LOD 4
THE CASE OF MAPPING ADAPTATION 5
ONTOLOGY MAPPING ADAPTATION Definition and Problematic “Adaptation of existing mappings according to modifications affecting KOS elements at evolution time” M V2 =(s’, t, r’) M V1 =(s, t, r) Hypothesis: There is a correlation between the way KOS’ elements evolve and the way mappings are adapted 6
UNDERSTANDING MAPPING EVOLUTION • Identify potential interdependencies between changes affecting KOS entities and the mapping evolution How concept drift impact mappings? • Empirically examine official and real-world mappings over time • Evolution of SNOMED CT and ICD9CM as a case study SNOMED SNOMED SNOMED SNOMED CT CT CT CT Jan/10 Jul/10 Jan/11 Jul/11 ~400 000 M ST 2 M ST 3 M ST 1 M ST 4 Jul/10 Jan/11 mappings Jan/10 Jul/11 analyzed ICD9CM ICD9CM 2009 2010 7
KEY FINDINGS After Evolution Before Evolution ICD9CM Impaction of intestine ICD9CM Attributes -Concretion is-a of intestine is-a How to identify these is-a Time This concept -Enterolith This concept 560.39 changed was added -Fecal 560.39 attributes? 560.32 impaction Observed similarity SNOMED CT modifications SNOMED CT ≤ ≤ ≡ ≤ ≡ ≤ ≤ ≤ ≡ ≡ 197063004 168000 40515007 29162007 44635007 197063004 168000 44635007 40515007 29162007 Typhlolithiasis Fecal (disorder) impaction Concretion Mapping adaptation based on the evolution of of colon Fecal Fecal impaction Enterolith of intestine Fecal impaction of colon (disorder) (disorder) impaction relevant concept attributes 8
CHARACTERIZATION OF CHANGES Lexical change patterns specified behavioral problem inflammatory bowel diseases bronzed diabetes Ø Total Copy (TC) a sup1 , a sup2 , … , a supn Ø Total Transfer (TT) time j a 1 , a 2 , … , a n a sib1 0 c s Ø Partial Copy (PC) a sub1 , a sub2 , … , a subn Ø Partial Transfer (PT) time j+1 inflammatory bowel unspecified mental bronzed diabetes a sib1 , a sib2 diseases 1 behavioral problem a 1 , a 2 , … , a n 1 c s CONTEXT = SUP ∪ SUB ∪ SIB specified behavioral inflammatory time problem bowel diseases 9
CHARACTERIZATION OF CHANGES Semantic change patterns Focal atelectasis familial hyper chylomicronemia Kappa chain disease Diabetes type I Ø Equivalent (EQV) a sup1 , a sup2 , … , a supn Ø Partial Match (PTM) time j a 1 , a 2 , … , a n a sib1 0 c s Ø More Specific (MSP) Ø Less Specific (LSP) Helical atelectasis Diabetes type 1 time j+1 a sib1 , a sib2 a 1 , a 2 , … , a n Kappa light 1 c s chain disease familial a sub1 , … , a subn chylomicronemia time CONTEXT = SUP ∪ SUB ∪ SIB 10
LINKING CP AND MAINTENANCE ACTIONS Heuristics KOS K S KOS K T Kappa chain disease Affected by KOS changes unchanged semType relevant attributes time j a s1 , a s2 , a s3 , … , a sn a 1 , … , a k c t 0 c s ∃ !Lexical CP (Total Transfer) Semantic CP 1 c cand time j+1 a sib1 , a sib2 , … , a sibn a s1 , a s2 , a s3 , … , a sn 1 c s CONTEXT = SUP ∪ SUB ∪ SIB Kappa light MoveM(m st , c cand 1 ) chain disease 11
CONCEPT DRIFT FOR MAPPING ADAPTATION Lessons learned • Concept drift has a huge impact on ontology mappings but some changes in concept do not affect mappings • Drift of attribute values governs the mapping adaptation process • In most of the cases concept drift results in local changes • Change in super, sub concepts and siblings • Considering ontology versions alone is not enough to characterize concept drift • Need of external background knowledge to better determine the semantic relationship between versions of concept • Cf. semantic annotation adaptation 12
THE CASE OF SEMANTIC ANNOTATIONS ADAPTATION elisa elisa www.elisa-‑project.lu ¡ ¡ 13
SEMANTIC ANNOTATIONS ADAPTATION Problem 14
METHODOLOGY Impact of concept drift on semantic annotations 15
RESULTS 16
RESULTS 17
RESULTS 18
RESULTS 19
CONCEPT DRIFT FOR ANNOTATIONS Use of external knowledge source • Concept may have labels before and after evolution that are disjoint from the syntactic or lexical point of view • Ex: Cancer Malignant neoplasm • Lexical and Semantic change patterns cannot be applied • Consideration of external knowledge sources are required to characterize the evolution of concepts in such situations • We propose a methods exploiting Bioportal to overcome this limitation • Ontologies • Mappings • The method is able to find the semantic relationship between two versions of the same concepts • Equivalent, less specific, more specific, unrelated, partially matched 20
USE OF EXTERNAL KNOWLEDGE SOURCE Example “Pituitary)dwarfism”) “Pituitary)dwarfism)II”) (MeSH)) (MeSH)) Search)in)ontologies) 1 Search)in)ontologies) 1 SNOMED)CT,) (Direct)method)) ICD9CM,)MEDDRA,) NCIT,)DOID,)RCD,)HP,) No)common)ontologies) OMIM) NDFRT) DERMLEX,)NATPRO,) Use)mappings) CRISP,)SOPHARM,) (Indirect)method)) 2 BDO,)SNMI) 15)mappings)available) (OMIM)ontology)) “Pituitary)dwarfism)II”)(OMIM)) Mapped_to) “LaronRtype)isolated)somatotropin)defect”)(SNOMED)CT)) SNOMED)CT)is)the)common)ontology) “LaronRtype)isolated)somatotropin)defect”)and) “Pituitary)dwarfism”)have)the)same)super)concept) 3 (“short)stature)disorder”))they)are)siblings) 21
CONCEPT DRIFT IN ANNOTATION ADAPTATION Lessons learned (so far … ) • Ontology regions do not evolve in the same way • Unstable regions à handle with care • Interesting for predicting concept drift • Concept drift has a different impact on annotation tools • GATE • NCBO annotator • Background knowledge gives promising results for characterizing concept drift • Bioportal ontologies • RDF datasets, Web data under investigation • Will machine learning help in understanding concept drift? • Identification of relevant features • What ML techniques to use? 22
DISCUSSION Concept drift for LOD • Linked Open Data requires vocabulary for semantic interoperability purposes • LOD for characterizing concept drift • Quality of LOD is problematic • Some datasets rely on outdated vocabularies • Concept drift impacting LOD: • FOAF, DC not so dynamic as domain ontologies • No control over the datasets using controlled vocabularies à How to propagate changes observed in the vocabulary to RDF datasets? 23
COLLABORATORS • Silvio Cardoso, • Dr. Marcos Da Silveira, • Dr. Duy Dinh, • Dr. Julio Dos Reis, • Dr. Anika Gross, • Pr. Erhard Rahm • Pr. Chantal Reynaud-Delaître, • And all the others … 24
Recommend
More recommend