Douglas Teodoro, Emilie Pasche, Julien Gobeill, Patrick Ruch, Christian Lovis Rémy Choquet, Christel Daniel
The project
The problem • Data technical and semantic heterogeneity – Different languages: French, German, Greek, Swedish, etc. – Different types: RDBS, free text, xml files Drug Quantity / Frequency CICLOSPORINE (SANDIMMUN, ;;;;;;;;;1;;;;;;;;;;;;;;; NEORAL) MYCOPHENOLATE T 60 MINUTES ;;;;;;;;1;;;;;;;;;;;;;;;; PROTOCOLE:TEST STIMULATION ;;;;;;;;1;;;;;;;;;;;;;;;; SYNACTHENE 1H 3E TUBE /3 co-trimoxazole lundi - mercredi - vendredi (3x/sem) ciprofloxacine 1x/sem (dimanche) vancomycine 1x18h
The problem • Data privacy – Very high concern – Patient identity and other confidential items cannot be revealed by any means to unauthorized people • Political barriers – External connection to ODBS – Security risks
Our solution • Clinical Data Repository (CDR): a distributed storage system, which provides transparent access to heterogeneous data sources, featuring SQL/SPARQL query interfaces and result sets in SQL tuple and RDF, where patient privacy is assured. • Based on two visions: 1. Pragmatic: uses database federation, which is a known technology, in order to provide faster data integration to the other project components 2. Innovative: uses semantic web technology. A new approach that will be explored during the whole project duration
CDR::Architecture • Database federation - based HUG AVERBIS INSERM LIU Query entry
CDR::Architecture • Semantic web - based INSERM LIU HUG AVERBIS Query entry
CDR::Information Model • Information Model: HL7-RIM based – Other candidates: OpenEHR, EAV/CR, customized model • The data stored in the CDR covers the following aspects: • Patient information • Pathogens related information • Objects related information • Information on locations • Operational data
CDR::Information Model Adverse events Health care setting Adverse events Prescriptions Antibiograms Cultures Patient data Diseases Pathogens
CDR::Business Model • Agents – Responsible for the CDR – CIS interoperability – Data management within the CDR – Communication with other DebugIT components Mark failed fail Try to Receive order Wait order execute order success • Orders: • DataExtraction Mark success • DataNormalisation • DataMigration • DataDepersonalisation • OntologyUpdate
CDR::Business Model • Federated engine – Based on MySQL Federated Engine – Federate the distributed data sources – Receive, create plan and execute SQL requests • SPARQL Engine – Based on D2R – Transform the ER model into a semantic linked data model – Receive, create plan and execute SPARQL requests
CDR::Data Privacy • Security – Sensitive data encrypted – Mapping table: original term encrypted term – Original term kept only within the intranet – Encrypted term exposed on the internet Artefact Original Encrypted ID Artefact Artefact 1 10 001b98ab4335f1d3da23946bce9e4279 2 59 0109cfbecd89a3aaeeb92fde6420f29b 3 39 010c1482764323fd479510ef6a8f5f48 Patient ID Patient Patient Age Sex 001b98ab4335f1d3da23946bce9e4279 58 F 0109cfbecd89a3aaeeb92fde6420f29b 38 F 010c1482764323fd479510ef6a8f5f48 19 M
Our results • DebugIT CDR has already its first pilot • SQL endpoints ready at HUG and LiU – Data integration via database federation – Based on MySQL Federated Engine – SQL requests and SQL tuple result sets • SPARQL endpoints set up at 3 demonstration centers: HUG, INSERM and LiU – Data integration via ‘linked data’ – Based on D2R – Transform the ER model into a semantically linked data model – SPARQL requests and RDF result sets
Database federation CDR +-------------+---------------+-------+-------------+ L i U | data_source | sensibility | value | result_date | HUG +-------------+---------------+-------+-------------+ | hug | indeterminate | 2 | 2006 | | hug | resistant | 72 | 2004 | • select cr.data_source data_source, | hug | resistant | 71 | 2005 | • cr.antibiotic_tested_result sensibility, | hug | resistant | 112 | 2006 | | hug | resistant | 94 | 2007 | • count(cr.antibiotic_tested_result) value, | hug | resistant | 8 | 2008 | • date_format(c.result_date, '%Y') result_date | hug | susceptible | 302 | 2004 | • from culture_results cr | hug | susceptible | 318 | 2005 | • join culture c on cr.culture_id = c.culture_id | hug | susceptible | 288 | 2006 | | hug | susceptible | 269 | 2007 | • join bacteria b on b.bacterium_id = cr.identified_bacteria_name | hug | susceptible | 4 | 2008 | • join drug d on d.drug_id = cr.antibiotic_tested • 1 min 14.68 sec | liu | indeterminate | 1 | 2007 | • where | liu | resistant | 10 | 2005 | • b.name = 'Escherichia coli' | liu | resistant | 21 | 2006 | | liu | resistant | 30 | 2007 | • and d.name = 'sulfamethoxazole and trimethoprim' | liu | resistant | 46 | 2008 | • group by 1,2,4 | liu | susceptible | 108 | 2005 | | liu | susceptible | 90 | 2006 | | liu | susceptible | 132 | 2007 | | liu | susceptible | 100 | 2008 | +-------------+---------------+-------+-------------+
Database federation Demonstration of CDR: query distributed between LiU and HUG CDR L i U H U G
SPARQL endpoints
SPARQL data query service
Next steps • Improve overall database performance • Scale to more sites • Tighter integration with the DebugIT Ontology • Finalise semantic web integration • Security access based on roles
Data Normalisation • CDR content automatically normalised • Terminologies used: SNOMED, NEWT, WHO-ATC, etc.
Database federation Ecoli resistance pattern over time (monthly) CDR L i U H U G
Recommend
More recommend