Querying a bioinformatic data sources registry with concept lattices Nizar Messai, Marie-Dominique Devignes, Amedeo Napoli and Malika Smail-Tabbone Nizar.Messai@loria.fr �✂✁ LORIA – UMR 7503 – BP 239, 54506 Vandoeuvre-l s-Nancy ICCS 2005 Kassel – July, 18 - 22, 2005 Querying a bioinformatic data sources registry with concept lattices – p.1/21
Outline 1. Motivation 2. BioRegistry: data source metadata repository 3. FCA for classifying and querying data sources 4. Ontology-based query refinement 5. Conclusion and future work Querying a bioinformatic data sources registry with concept lattices – p.2/21
Outline 1. Motivation 1.1 Bioinformatic data sources on the web 1.2 Existing solutions 1.3 Challenge 2. BioRegistry: data source metadata repository 3. FCA for classifying and querying data sources 4. Ontology-based query refinement 5. Conclusion and future work Querying a bioinformatic data sources registry with concept lattices – p.3/21
1.1 Bioinformatic data sources on the web Bioinformatic data sources available on the Web 719 in 2005 (171 more than 2004) Diversity of contents (e.g. particular/any organism(s)) Different data types (e.g. nucleic/proteic sequences) Different data qualities (e.g. update, revision, annotation) New data source appearance Querying a bioinformatic data sources registry with concept lattices – p.4/21
1.2 Existing solutions Thematic Portals Access to collection of selected data sources Correspond to given points of view Limited search capabilities Querying a bioinformatic data sources registry with concept lattices – p.5/21
1.2 Existing solutions Thematic Portals Access to collection of selected data sources Correspond to given points of view Limited search capabilities Structured catalogs Bioinformatic data source catalog: DBcat small set of "free text" metadata no more maintained (since 2001) Querying a bioinformatic data sources registry with concept lattices – p.5/21
1.3 Challenge Improve data source identification through: gathering metadata in a structured repository taking into account existing domain ontologies organising data sources for browsing and querying Querying a bioinformatic data sources registry with concept lattices – p.6/21
Outline 1. Motivation 2. BioRegistry: data source metadata repository 2.1 BioRegistry model 2.2 A subpart of the BioRegistry 3. FCA for classifying and querying data sources 4. Ontology-based query refinement 5. Conclusion and future work Querying a bioinformatic data sources registry with concept lattices – p.7/21
2.1 BioRegistry model Querying a bioinformatic data sources registry with concept lattices – p.8/21
2.1 BioRegistry model Querying a bioinformatic data sources registry with concept lattices – p.8/21
2.1 BioRegistry model Querying a bioinformatic data sources registry with concept lattices – p.8/21
� ✁ 2.1 BioRegistry model BioRegistry Associate metadata to the data sources (from ontologies) Idea Extract properties on the data sources from these metadata A formal context: data sources properties Querying a bioinformatic data sources registry with concept lattices – p.8/21
2.2 A subpart of the BioRegistry Querying a bioinformatic data sources registry with concept lattices – p.9/21
2.2 A subpart of the BioRegistry Data source properties extracted from the BioRegistry Data Source Sequence Organism Manual Revision Swissprot (S1) Proteic (PS) Any Organism (AO) Yes RefSeq (S2) Nucleic (NS),Proteic (PS) Any Organism (AO) Yes TIGR-HGI (S3) Nucleic (NS) Human (Hu) No GPCRDB (S4) Proteic (PS) Any Organism (AO) Yes HUGE (S5) Nucleic (NS),Proteic (PS) Human (Hu) No ENSEMBL (S6) Nucleic (NS) Animal (An) No MGDB (S7) Proteic (PS) Mouse (Mo) No VGB (S8) Nucleic (NS) Vertebrate (Ve) No Querying a bioinformatic data sources registry with concept lattices – p.9/21
2.2 A subpart of the BioRegistry Ontologies to valuate the properties (from NCBI) Querying a bioinformatic data sources registry with concept lattices – p.9/21
� 2.2 A subpart of the BioRegistry Corresponding formal context Sources Metadata NS PS AO An Ve Hu Mo MR S1 0 1 1 0 0 0 0 1 S2 1 1 1 0 0 0 0 1 S3 1 0 0 0 0 1 0 0 S4 0 1 1 0 0 0 0 1 S5 1 1 0 0 0 1 0 0 S6 1 0 0 1 0 0 0 0 S7 0 1 0 0 0 0 1 0 S8 0 1 0 0 1 0 0 0 Querying a bioinformatic data sources registry with concept lattices – p.9/21
Outline 1. Motivation 2. BioRegistry: data source metadata repository 3. FCA for classifying and querying data sources 3.1 Methodology 3.2 Data source classification 3.3 Query 3.4 Data source retrieval algorithm 3.5 Problem 4. Ontology-based query refinement 5. Conclusion and future work Querying a bioinformatic data sources registry with concept lattices – p.10/21
3.1 Methodology Querying a bioinformatic data sources registry with concept lattices – p.11/21
3.1 Methodology Querying a bioinformatic data sources registry with concept lattices – p.11/21
3.1 Methodology Querying a bioinformatic data sources registry with concept lattices – p.11/21
3.2 Data source classification Incremental construction of the concept lattices [Godin et Al. 1995] Add new data sources (Registry updating) Insert queries (Registry querying) Querying a bioinformatic data sources registry with concept lattices – p.12/21
☎ � � � ✂ 3.3 Query A set of properties Example : "Data sources, that are manually revised, containing nucleic sequences of Human organism " nucleic sequences (NS) human organism (Hu) manually revised (MR) Transform the query into a concept {Query} {nucleic sequences (NS), Human (Hu), Manual Revision (MR)} = ( , ) = ({Query}, {NS, Hu, MR}) ✁✄✂ Querying a bioinformatic data sources registry with concept lattices – p.13/21
3.4 Data source retrieval algorithm Querying a bioinformatic data sources registry with concept lattices – p.14/21
3.4 Data source retrieval algorithm Insert the query concept into the concept lattice [Carpineto 2000] Search relevant data sources: A data source is relevant to a query if it shares at least one of its properties Querying a bioinformatic data sources registry with concept lattices – p.14/21
✞ ☎✆✝ ✁ ✟ 3.4 Data source retrieval algorithm Step 0: Locate the new query concept in the resulting lattice Begin the result construction : Ø �✂✁✄ Querying a bioinformatic data sources registry with concept lattices – p.14/21
� ✁✄ ☎✆✝ ✞ ✁ 3.4 Data source retrieval algorithm Step 1: Get the query concept subsumers and continue the result construction = 1) S3, S5 (Hu,NS), S2 (NS,MR) Querying a bioinformatic data sources registry with concept lattices – p.14/21
� ✁✄ ☎✆✝ ✞ ✁ 3.4 Data source retrieval algorithm Step 2: = 1) S3, S5 (Hu,NS), S2 (NS,MR) 2) S1, S4 (MR), S6 (NS) Querying a bioinformatic data sources registry with concept lattices – p.14/21
� � ✁ ✄ ☎✆✝ ✞✁ 3.4 Data source retrieval algorithm Step 3: A concept with an empty intension is reached end of the algorithm return the result Querying a bioinformatic data sources registry with concept lattices – p.14/21
✁✄ ☎✆✝ � � ✞ � ✁ ✞✁ ☎✆✝ ✄ ✁ � � � 3.5 Problem When query properties are not in the context Examples : 1 - = ({Query}, {Chicken (Ch)}) = Ø although data sources dealing with vertebrate can be interesting 2 - = ({Query}, {Eucaryote (Eu)}) = Ø although data sources dealing with animals can be interesting Querying a bioinformatic data sources registry with concept lattices – p.15/21
☎✆✝ ✞ ✁✄ � � ✁ � ✞✁ ☎✆✝ ✄ ✁ � � � 3.5 Problem When query properties are not in the context Examples : 1 - = ({Query}, {Chicken (Ch)}) = Ø although data sources dealing with vertebrate can be interesting 2 - = ({Query}, {Eucaryote (Eu)}) = Ø although data sources dealing with animals can be interesting Idea : Ontology-based query refinement Querying a bioinformatic data sources registry with concept lattices – p.15/21
Outline 1. Motivation 2. BioRegistry: data source metadata repository 3. FCA for classifying and querying data sources 4. Ontology-based query refinement 4.1 Generalisation refinement 4.2 Specialisation refinement 5. Conclusion and future work Querying a bioinformatic data sources registry with concept lattices – p.16/21
4.1 Generalisation refinement Querying a bioinformatic data sources registry with concept lattices – p.17/21
4.1 Generalisation refinement Generalisation refinement Add to the query the ancestors of the considered property in the ontology Only those that are in the formal context Querying a bioinformatic data sources registry with concept lattices – p.17/21
� � ✁ ✄ ☎✆✝ ✞ ✁ 4.1 Generalisation refinement Refined query: = ({Query}, {Ve, An, AO}) New result: = 1) S6 (An) 1) S8 (Ve) 1) S1,S2,S4 (AO) Querying a bioinformatic data sources registry with concept lattices – p.17/21
4.2 Specialisation refinement Querying a bioinformatic data sources registry with concept lattices – p.18/21
Recommend
More recommend