Interoperability driven integration of biomedical data sources Douglas TEODORO a,1 , Rémy CHOQUET c , Daniel SCHOBER d , Giovanni MELS e , Emilie PASCHE a , Patrick RUCH b , and Christian LOVIS a a SIMED, University Hospitals of Geneva and b HEG, University of Applied Sciences, Geneva, Switzerland; c INSERM, Université Pierre et Marie Curie, Paris, France; d Freiburg University Medical Center, Germany; e AGFA Healthcare, Ghent, Belgium Oslo, 30 August 2011 1
The DebugIT project • Funded by the European Community's Seventh Framework Program under grant agreement n° FP7– 217139 (7M € ) • Project period: from Jan 1st, 2008 to December 31st, 2011 (?) • 14 Partners Disclaimer: this presentation reflects solely the views of the DebugIT team. The European Commission, Directorate General Information Society and Media, Brussels is not liable for any use that may be made of the information contained therein • Douglas Teodoro - MIE 2011 2
Aim and objectives • Design of a data integration architecture for helping with researching and monitoring of antimicrobial resistance using existing operational microbiology databases • Integrate heterogeneous operational clinical information systems – Design methods to interoperate with various storage systems, – Implement a data source mediator • Provide common semantics to the data – Formalize data source models and data types • Provide ubiquitous access to the data – Expose laboratory data from the data sources on the Internet • Douglas Teodoro - MIE 2011 3
The virtual Clinical Data Repository • A data integration platform for existing clinical data – Primarily focused on antimicrobial data but extensible to other domains • Based on Semantic Web technologies • Follows the hybrid ontology-driven integration approach – Multiple semantically flat data definition ontologies are mapped to a common semantically defined domain ontology • Provide three levels of interoperability in the data integration process – Technical – Syntactic – Semantic • Douglas Teodoro - MIE 2011 4
Methods: Technical interoperability • An intermediate storage layer RDMBS was designed to provide a common storage system RDF RDF store store • Based on RDF store RDF – RDF model RDF ETL Internet store store HTTPS RDMBS HTTPS – SPARQL protocol RDF • ETL jobs provide interface to RDF store store the different storage systems <XML> Text files • Data sources are connected via files HTTPS/SPARQL protocol Intranet DMZ Internet Extract-Transform-Load Local security • Douglas Teodoro - MIE 2011 5
Methods: Syntactic interoperability • Data cataloging Drug Disease – Bottom-up process • Local data types are aligned SNOMED WHO- using biomedical terminologies CT ATC – WHO-ATC, SNOMED-CT, NEWT DDO • Multi-stage text-based DCO ICD-10 instance mapping (instances) mapping classification are used for automatic normalization – Ruch, Bioinformatics 2006; NEWT LOINC Daumke, GDMS 2010 • A domain ontology was designed to represent the field Bacteria Laboratory (DebugIT Core Ontology) • Terminologies are mapped to Local concepts Global concepts DCO using SKOS ontology Local formal concepts Data normalization • Douglas Teodoro - MIE 2011 6
Methods: Semantic interoperability • Local RDF data store (local EAV/ ER CR CDR) models are formalized m a p p i n g using a semantically flat data DDO DDO definition ontology (DDO) DCO • Local models are mapped to (classes DDO DDO and mapping their respective DDO mapping properties) • DDOs are mapped to DCO DDO DDO closing the gap between local and domain semantics HL7- open- RIM EHR Local model Local formalized model Shared domain model Data model mapping • Douglas Teodoro - MIE 2011 7
Methods: Query model CDR Query Domain Query • ?ab a ddo:Bacteriologie; • ?ab a ddo:hasDate ?date dco:Antibiogram; dco:hasResultDate ? date CDR • Results fetched and returned in the RDF graph Reasoning Mapping CDR format using local • Results fetched terminologies Validation and returning in Validation DDO1 DDO2 RDF graph Aggregation format using Aggregation DDO3 local Reasoning Reasoning terminologies DCO • Douglas Teodoro - MIE 2011 8
Results: Pilot network • Seven healthcare institutions are sharing antimicrobial resistance data using the framework GAMA (Sofia-BG), HUG (Geneva-CH), INSERM (Paris-FR), IZIP (Prague-CZ), LiU (Linköping-SE), TEILAM (Lamia-GR) and UKLFR (Freiburg-DE) • Douglas Teodoro - MIE 2011 9
Results: Ontology added-value • Use of ontology for automatic clustering of antibiograms (e.g. by antibiotic classes) • Douglas Teodoro - MIE 2011 10
Res esult ults: : Per erfor ormance mance ev evaluat aluation ion • In the preliminary tests, a set of long period queries were performed to evaluate the CDR response time • Network is • E.g.: “What is the evolution of resistance of • Klebsiella pneumonia from Jun 2005 to Jun 2009?” responsible for 41% to 49% of the Source #Tuples Retrieval time (s) #Tuples/s retrieved retrieval time for SPARQL Network GAMA 0 0.14 0.00 0 the sets containing HUG 74150 5.72 3.91 7704 more than 1000 INSERM 330360 20.38 14.22 9550 tuples IZIP 0 - - 0 LIU 9905 1.70 1.23 3371 TEILAM 30 0.36 0.00 83 UKLFR 155315 6.34 6.19 12394 • Douglas Teodoro - MIE 2011 11
Conclusions • Developing a full semantic web-compliant distributed CDR is feasible • Seven healthcare institutions compose the demonstration network • CDR exposes standardized and formalized microbiology clinical database • The query mediation process is limited – Logically impossible to map a priori from the global to local ontologies – To be usable by end-users (clinical researches, physicians) the system needs to be encapsulated by query templates • Douglas Teodoro - MIE 2011 12
Conclusions • In the query plan, most of the data aggregation is done centrally – Push reasoning down to local sources to improve network response • A production version of the CDR is expected to be available for surveillance systems and clinical research by the end of the year • Douglas Teodoro - MIE 2011 13
The Partners • Agfa HealthCare, Belgium • empirica Gesellschaft für Kommunikations- und Technologieforschung mbH, Germany • Gama Sofia Ltd., Bulgaria • Institut National de la Santé et de la Recherche Médicale, France • Internetov ý Pristup Ke Zdravotním Informacím Pacienta (IZIP), Czech Republic • Linköpings Universitetet, Sweden • Technologiko Expedeftiko Idrima Lamias, Greece • University College London, United Kingdom • Les Hôpitaux Universitaires de Genève, Switzerland • Universitätsklinikum Freiburg, Germany • Université de Genève, Switzerland • Averbis, Freiburg, Germany • MDA, Czech Republic • HEG, Geneva, Switzerland • Douglas Teodoro - MIE 2011 14
Methods: Query model CONSTRUCT � ?graph WHERE � { ?graph a � ddo:Concept . } � Aggregat ion Mappin g Retrieving • Douglas Teodoro - MIE 2011 15
Recommend
More recommend