semantic federation of distributed neurodata
play

Semantic federation of distributed neurodata Alban Gaignard, Johan - PowerPoint PPT Presentation

Semantic federation of distributed neurodata Alban Gaignard, Johan Montagnat, Catherine Faron Zucker, Olivier Corby alban.gaignard@i3s.unice.fr CNRS / UNS, lab. I3S, Sophia Antipolis, Modalis team INRIA Sophia Antipolis, Wimmics team A.


  1. Semantic federation of distributed neurodata Alban Gaignard, Johan Montagnat, Catherine Faron Zucker, Olivier Corby alban.gaignard@i3s.unice.fr CNRS / UNS, lab. I3S, Sophia Antipolis, Modalis team INRIA Sophia Antipolis, Wimmics team A. Gaignard, J. Montagnat, C. Faron Zucker, O. Corby - DCICTIA-MICCAI Workshop 2012 1

  2. Neuroscience data repositories • Raw neuroimaging data • Several natures : modalities • Several structures : formats, multi-dimensional datasets • Associated metadata • Relational databases ➡ Constraints • Distribution • Hardly relocatable (sensitive) data • Need for collaborations (multi-centric studies) • Autonomy Federation • Deal with legacy (relational) databases A. Gaignard, J. Montagnat, C. Faron Zucker, O. Corby - DCICTIA-MICCAI Workshop 2012 2

  3. NeuroLOG platform: federated data integration NeuroLOG services Metadata federated view Sophia Grenoble Paris Rennes NeuroLOG server NeuroLOG server NeuroLOG server NeuroLOG server Data Federator Data Federator Data Federator Data Federator Shanoir relational DB GIN-DMS relational DB Shanoir relational DB CAC relational DB Semantic federation ? A. Gaignard, J. Montagnat, C. Faron Zucker, O. Corby - DCICTIA-MICCAI Workshop 2012 3

  4. Objectives & Method Objectives: • Uniform semantic querying : • distribution • heterogeneity Method: • Feasibility of the approach (technology & tooling) • Performance issues A. Gaignard, J. Montagnat, C. Faron Zucker, O. Corby - DCICTIA-MICCAI Workshop 2012 4

  5. Semantic Web standards • Ontologies (OWL / RDF Schema) to capture domain knowledge : • Model the nature of data (classes) • Model data relationships (properties) • Graph-based data representation (RDF) • RDF triples (edge) : <subject> <property> <object> • RDF graphs • SPARQL querying as graph pattern matching • Sequences of edge requests : modality:MRI • High expressivity and reasoning ?x rdf:type modality:MRI rdf:type • Deduce the data nature , • Inference rules, etc. ?x A. Gaignard, J. Montagnat, C. Faron Zucker, O. Corby - DCICTIA-MICCAI Workshop 2012 5

  6. KGRAM: a Knowledge GRaph Abstract Machine • Representing, querying and reasoning on Knowledge Graphs (INRIA - Wimmics team) • Generic engine • SPARQL 1.1 interpreter • several data sources • several models (RDF, XML, SQL) • KGRAM - Producers : • navigating abstract Graphs and enumerating Edges and Nodes ; • Producer specific to a data structure ("graph mediator") ; • MetaProducers (glueing several producers). ➡ mashup applications over distributed Link Data A. Gaignard, J. Montagnat, C. Faron Zucker, O. Corby - DCICTIA-MICCAI Workshop 2012 6

  7. Dealing with data source heterogeneity • Uniform querying over: • one SQL producer and • multiple RDF producers. • Ad-hoc " on-the-fly " SQL producer : • Predefined mappings (RDF predicate → SQL sub-query) ; • SQL embedded into a generated SPARQL query ; • SQL tuples translated back as RDF triples . A. Gaignard, J. Montagnat, C. Faron Zucker, O. Corby - DCICTIA-MICCAI Workshop 2012 7

  8. KGRAM - Distributed Query Processor (DQP) • Distributed query processing performance : • Service parallelism • Static and dynamic optimizations A. Gaignard, J. Montagnat, C. Faron Zucker, O. Corby - DCICTIA-MICCAI Workshop 2012 8

  9. DQP Optimizations (1/2) : pushing applicable FILTERs • Idea = filtering irrelevant results the sooner (to avoid unnecessary network communications) ; ➡ Aggregating an applicable FILTER to each single edge request. Global SPARQL query Rewriten sub-query Optimized sub-query A. Gaignard, J. Montagnat, C. Faron Zucker, O. Corby - DCICTIA-MICCAI Workshop 2012 9

  10. DQP Optimizations (2/2) : pushing bindings • Idea = exploiting intermediates results to avoid re-evaluation (and transmission of already known values) ; ➡ Replacing variables by their known values for each single edge request. Rewriten sub-query Global SPARQL query Optimized sub-query Intermediate result ?x = http://dbpedia.org/resource/Bobby_Abel A. Gaignard, J. Montagnat, C. Faron Zucker, O. Corby - DCICTIA-MICCAI Workshop 2012 10

  11. Performance-oriented experiments • Three federation setups / 2 queries • DataFederator (SAP) reference high performance relational engine • Q1 : costly evaluation (336 remote invocations) • Q2 : selective query (5 only resulting T2-weighted datasets) ➡ DataFederator is slightly better for costly queries ( Q1 ) but KGRAM still performs similarly ; ➡ Comparable results for very selective queries. A. Gaignard, J. Montagnat, C. Faron Zucker, O. Corby - DCICTIA-MICCAI Workshop 2012 11

  12. Conclusion & perspectives • KGRAM: • Optimized distributed semantic querying ; • Heterogeneous data sources. • Still on-going: • Source selection through dynamic index creation, • allowing for coarse-grained parallelism (grouped sub-queries, query planning). • Benchmarking (FedBench) to compare KGRAM with state of the art approaches: SPLENDID, DARQ, FedX. • Other approaches to address data source heterogeneity: • R2RML, D2RQ, etc. A. Gaignard, J. Montagnat, C. Faron Zucker, O. Corby - DCICTIA-MICCAI Workshop 2012 12

Recommend


More recommend