MIE2014, Istanbul Formalizing mappings to optimize automated schema alignment: application to rare diseases Meriem Maaroufi Rémy Choquet Paul Landais Marie-Christine Jaulent Paris, France
meriem.maaroufi@bndmr.fr Monday, September 1,2014 French Rare Disease Organization French rare disease French ministry of registry Decision making and health patients identification > 1000 clinical sites Towards heterogeneous and not centralized data 131 rare diseases excellence centers Researchers 2
meriem.maaroufi@bndmr.fr Monday, September 1,2014 Data integration process 3
meriem.maaroufi@bndmr.fr Monday, September 1,2014 Hypothesis Characterizing mappings in a (C1,C2,S ) complete formalization will improve alignment results usability 4
meriem.maaroufi@bndmr.fr Monday, September 1,2014 1st duality Data element – Value element ● Data element = the container ● Value element = the content Characteristics: ● An integer, a string, a ● Boolean value or an entry of Label, definition ● a list ● Data type Value domain, restrictions… ● Depends on ● Notation: Ei (i=1..n;n=card(schema)) ● ● Notation: eik (k=1..p;p=card(Ei)) 5
meriem.maaroufi@bndmr.fr Monday, September 1,2014 2nd duality Source element – Target element • A mapping is rarely a bijection. • It is often due to generalization/specification. is a Multidisciplinary Consultation is not consultation • A mapping has a direction : from source schema to target schema. Esi ETj esik eTjl Cytogenetic confirmation Confirmation mode True Genetic 6
meriem.maaroufi@bndmr.fr Monday, September 1,2014 Conditional structure: rules If… then… formalism is supported by most True programming languages Condition Well suitable for bi-level False mappings Instructions Can define exact mappings and data transformations 7
meriem.maaroufi@bndmr.fr Monday, September 1,2014 Result Mapping formalization Mapping = {ESi - ETj ; eSik-eTjl ; Rule(S- >T)} A rule defines the relation between the involved source and target data elements and value elements Esi ETj esik eTjl Rule Glycemia – Hypoglycemic state integer – true If glycemia < $threshold then hypoglycemic state = true Act type – Participant profession nurse intervention – nurse If Act type = nurse intervention then Participant profession = nurse 8
Application to BNDMR context
meriem.maaroufi@bndmr.fr Monday, September 1,2014 Rules generation methodology A specific process : - is a workflow - that involves some chosen alignment approaches - operating in a given order - on selected data elements - To detect specific mappings: defined rules. Mappings detection Elements Rules selection generation 10
meriem.maaroufi@bndmr.fr Monday, September 1,2014 Tools & experimentation Reference test CEMARA BNDMR extract Schema Schema 68 ETj 73 Esi Linguistic approach Incharge doctor – Care provider : 0,82 Activity context – Encounter type : 0,88 Experimentation 43 boolean ESi 16 boolean ETj 56 eSik (from 106 eTjl (from lists) lists) 11
meriem.maaroufi@bndmr.fr Monday, September 1,2014 Process example Source CEMERA 43 boolean DE 56 lists VE Reference test Experimentation If ESi=true then bool-bool 3 3 ETj=true If ESi= eSik then ETj=eTjl list-list 6 (DE-DE) 35 If ESi=true then ETj=eTjl bool-list 1 (DE-DE) 22 If ESi= eSik then ETj=true list-bool 0 1 Target BNDMR 106 lists VE 16 boolean DE If PropLink=propositus [source] then Propositus=true [target] Linguistic If ConfCyto=true [source] approach then ConfirmationMode=cytogenetic [target] Elements Mappings Rules selection detection generation 12
meriem.maaroufi@bndmr.fr Monday, September 1,2014 Conclusion • The proposed formalization “ mapping = {ESi-ETj ; eSik-eTjl ; Rule } ” is well suitable to characterize simple and complex mappings. • Mappings characterized by the proposed formalization can be directly used in data integration processes (e.g. ETL). • Depending on input data types, processes for mappings detection will be different. 13
meriem.maaroufi@bndmr.fr Monday, September 1,2014 Perspectives More specific processes to cover more data types Automating rules generation 14
meriem.maaroufi@bndmr.fr Thank you for your attention! Special thanks to: - BNDMR team - INSERM UMR-1142 team LIMICS
Recommend
More recommend