From Nancy, France to Pisa, Italia
Ontology-guided Data Preparation for Discovering Genotype-Phenotype Relationships Adrien Coulet, Malika Smaïl-Tabbone, Pascale Benlian, Amedeo Napoli and Marie-Dominique Devignes Laboratoire Lorrain de Recherche en Informatique et ses Applications (CNRS, INRIA, University of Nancy), Nancy, France
The Problem: Limits to KDD in life sciences Knowledge Discovery in Databases (KDD) Knowledge Discovery in Databases (KDD) Knowledge Discovery in Databases (KDD) Process Process Process D a ta m in in g D a ta m in in g D a ta m in in g D a ta m in in g Biological results: Biological results: Biological results: F o r m a ttin g F o r m a ttin g F o r m a ttin g F o r m a ttin g e.g. large scale clinical study e.g. large scale clinical study e.g. large scale clinical study S e le c tio n S e le c tio n S e le c tio n S e le c tio n F o rm a tt e d F o rm a tt e d F o rm a tt e d F o rm a tt e d P a tt e r n P a tt e r n P a tt e r n P a tt e r n d a ta d a ta d a ta d a ta I n te g r a tio n I n te g r a tio n I n te g r a tio n I n te g r a tio n S e le c t e d S e le c t e d S e le c t e d S e le c t e d D a t a D a t a D a t a D a t a Interpretation Interpretation In t e g r a te d In t e g r a te d In t e g r a te d In t e g r a te d D a ta D a ta D a ta D a ta D a ta D a ta D a ta D a ta B a s e s B a s e s B a s e s B a s e s COMPLEX DATA COMPLEX DATA COMPLEX DATA COMPLEX PROCESS COMPLEX PROCESS COMPLEX PROCESS COMPLEX RESULTS COMPLEX RESULTS COMPLEX RESULTS Results of KDD in biology are complex A. Coulet, Ontology-guided Data Preparation 3/5
Proposition: Use ontologies for guiding the KDD 1) Build bridges between data and knowledge Mapping between variant assertions of the KB and SNP-Ontology SNP-Ontology (detail) (detail) SNP-KB non_ (detail) coding_variant coding_variant attributes of the DB rs_003 rs_004 rs_005 Example: [LDL]b [LDL]b xanthoma xanthoma … … rs_001 rs_001 rs_002 rs_002 rs_003 rs_003 rs_004 rs_004 rs_005 rs_005 rs_006 rs_006 rs_007 rs_007 … … patient_001 patient_001 patient_002 patient_002 Large scale clinical study patient_003 patient_003 patient_004 patient_004 … … 2) Use knowledge in order to reduce the size of the data set Thanks to subsumptions , object properties , class definitions , etc. In order to simplify the interpretation step of KDD process A. Coulet, Ontology-guided Data Preparation 4/5
For more details … …see you around the poster Poster n°7 Contact: adrien.coulet@loria.fr A. Coulet, Ontology-guided Data Preparation 5/5
Recommend
More recommend