Federated Semantic Data Management 25-30 June 2017 - dagstuhl - Germany Hala Skaf-Molli Pascal Molli Université de Nantes
NANTES
GDD: Distributed Data Management Group Foundations of Distributed Distributed Data Management ● Federated Query Processing : Systems ● Distributed algorithms ○ source selection, ● Distributed Data decomposition, Structures optimization, operators ● Consistency criteria & ● Data Integration ● Replication, synchronization & protocols ● Fog Computing consistency ● Queries in the Fog
Replication, synchronization, Consistency How to improve linked data together ? I see a mistake how to fix it, especially if i cannot edit ? ● Idea: Replicate and synchronize… git for RDF data... ● Live Linked Data: synchronising semantic stores with commutative replicated data types . JMSO13 Towards Writable and Scalable Linked Open Data. ISWC14 ●
Replication, synchronization, Consistency LinkedCT Lct:intervention1 [ Wiwiss-berlin.de:DB00087 Lct:type Drug . rdf:type Drug Lct:condition Lct:T-Cell-Lymhoma wifo5-mannheim.de:DB00087 rdfs:label ‘Alemtuzumab’ . rdf:type Drug rdfs:seeAlso wiwiss-berlin:DB00087] I’m ready to fix the problem. How can I update? 5
Lct:intervention1 [ Lct:intervention1 [ Lct:type Drug . Lct:type Drug . Lct:condition Lct:T-Cell-Lymphoma wifo5-mannheim.de:DB00087 Lct:condition Lct:T-Cell-Lymphoma rdfs:label ‘Alemtuzumab’ . DB:Half-Life 288h rdfs:label ‘Alemtuzumab’ . rdfs:seeAlso wiwiss-berlin:DB00087 rdfs:seeAlso wifo5-mannheim.de:DB00087] ] CONSTRUCT CONSTRUCT WHERE { CONSTRUCT WHERE { ?x rdf:type drugbank:drug } WHERE { ?x rdfs:seeAlso ?y} ?x rdfs:seeAlso ?y} MyOrg (My Update Feed Update Feed Endpoint) Lct:intervention1 rdfs:seeAlso wifo5-mannheim:DB00087 wifo5-mannheim.de:DB00087 DB:Half-Life 288h 6
Data Integration How to query the deep web and linked data with SPARQL ? Idea: Local-as-view mediator with smart materialization of ● views Semlav: Local-as-view mediation for sparql queries. TLDKS14 ● ● Semlav: Querying deep web and linked open data with SPARQL , ESWC14 (demo) Gun: An efficient execution strategy for querying the web of ● data Dexa2013
SELECT DISTINCT * WHERE { ?P foaf:member ?C . Client ?C rdfs:label “Semantic Web“ . ?P foaf:knows ?WKP . ?WKP foaf:name “Barack Obama“ } Global Schema Query Executor SemLAV rdfs:label foaf:na foaf:name foaf:name foaf:name foaf:name me rdfs:label rdfs:label rdfs:label rdfs:labe foaf:mem foaf:mem l ber ber 8
Query : Q(P,C,WKP,N):- member(P,C) , label(C,”Semantic Web”), knows(P,WKP), name(WKP, ,”Barack Obama”) LAV mappings: v1(P,A,I,C,L):-made(P,A),affiliation(P,I), member(P,C) ,label(C,L) v2(A,T,P,N,C):-title(A,T),made(P,A),name(P,N), member(P,C) v3(P,N,R,M):-name(P,N),name(R,M),knows(P,R) v4(P,N,G,R,C):-name(P,N),gender(P,G),knows(P,R), member(P,C) v5(P,N,R,C,L):-name(P,N),knows(P,R), member(P,C) ,label(C,L) member(P,C) label(C,L) knows(P,WKP) name(WKP,N) 4 v5(P,N,R,C,L) v5(P,N,R,C,L) v5(P,N,R,C,L) v5(P,N,R,C,L) 3 v4(P,N,G,R,C) v1(P,A,I,C,L) v4(P,N,G,R,C) v4(P,N,G,R,C) 2 v1(P,A,I,C,L) v3(P,N,R,M) v2(A,T,P,N,C) 2 v2(A,T,P,N,C) v3(P,N,R,M) 9
Federated Queries & Replication ● How to improve data availability of the linked data ? Idea: Partial data replication to create new data locality and smart ○ source selection ■ Federated SPARQL Queries Processing with Replicated Fragments. ISWC15 Idea: Partial data replication and query decomposition ○ ■ Decomposing federated queries in presence of replicated fragments JWS17
Data replication & query decomposition ● consider a BGP with three triple patterns tp1,tp2, and tp3. Endpoint C1 is relevant for tp1 and tp3, ○ Endpoint C2 is relevant for tp1 and tp2. ○ tp1@c1=tp1@c2 ○ ● Existing source selection strategies prevent from assigning tp1.tp3 to C1 and tp1.tp2 to C2, even if these sub-queries generate less intermediate results...
Federated queries & Replication ● How to improve query performance on the linked data ? ● Idea: Partial replication and intra-query parallelization ● PeNeLoop: Parallelizing Federated SPARQL Queries in Presence of Replicated Fragments - QUWEDA@ESWC17
PeNeLoop Query Processing Both E1 & E3 are used to process the join E1 M 7 M 1 , M 2 M 6 { tp1. tp2. } π Start ⋈ E2 { ?movie = dbo:Seven_Samurai, ?name = “Samurai movie” } Join ⋈ performed M 3 , M 4 in local at E2 E3 SELECT ?movie ?name WHERE { ?movie dbo:director ?director . (tp1) ?movie lmdb:genre ?genre . (tp2) ?genre lmdb:genre_name ?name . (tp3) } 13
Queries in the Fog How to have data availability * and* performances ? ● Idea: P2P resource sharing but on client side… in the fog of browsers ○ CyCLaDEs: A Decentralized Cache for Linked Data Fragments ESWC 2016 ○ SPARQL Queries in the Fog of Browsers Demo@ESWC 2017
DBpedia DrugBank DBpedia DrugBank LDF Server LDF Server HTTP Cache HTTP Cache c6 2 c3 c9 1 c1 2 c5 c6 c3 c9 c7 c8 c1 2 c2 c4 c5 c7 c8 c6 c2 c3 c9 c1 c4 c5 c7 c8 c2 c4
SPARQL Queries in the Fog of Browsers Fog of Browsers: P2P network of Browsers with Browser to browser Connections (WebRTC) WebRTC: https://webrtc.org/
FoB with Triple Pattern Fragments TPFs TPFs Servers run TPF servers C1 TPFc Browsers run TPF Clients: C1, C2... C2 TPFc TPF: Verborgh, Ruben, et al. "Triple Pattern Fragments: A low-cost knowledge graph interface for the Web." Web Semantics: Science, Services and Agents on the World Wide Web 37 (2016): 184-206.
Clients receive SPARQL queries... Any Client can receive at TPFs anytime SPARQL queries... TPFs W1:Q1,Q2, C1 Q3,Q4 TPFc C2 TPFc W2:Q5,Q6 TPF: Verborgh, Ruben, et al. "Triple Pattern Fragments: A low-cost knowledge graph interface for the Web." Web Semantics: Science, Services and Agents on the World Wide Web 37 (2016): 184-206.
Clients receive SPARQL queries... Any Client can receive anytime TPFs SPARQL queries. TPFs Do it yourself, or delegate some to neighbors : Client-side W1:Q1,Q2, C1 C3 C5 Q3,Q4 Inter-query parallelism ● Q4@C4, Q3@C3... C2 W2:Q5,Q6 Q4 C4
Clients receives SPARQL queries... TPFs TPFs Can we reduce the global Execution Time (ET) of W1 W1:Q1,Q2, and W2 by delegating queries C1 Q3,Q4 to neighbours ? ET(W1@C1 // W2@C2) > C2 ET({W1 ∪ W2}@{C1-C5} ? W2:Q5,Q6
ladda-demo.herokuapp.com
GDD Research Group Distributed Data Management 25-30 June 2017 - dagstuhl - Germany P. Molli - H. Skaf Mcf Univ Nantes
Recommend
More recommend