Scalable SPARQL Querying of Large RDF Graphs Jiewen Huang, Daniel - PowerPoint PPT Presentation

Scalable SPARQL Querying of Large RDF Graphs Jiewen Huang, Daniel J. Abadi and Kun Ren Yale Database Group

RDF Gaining Popularity ● Encouraged by major search engines  Google  Yahoo! ● More data sets available in RDF ● Governments ● Research communities

Linked Data Movement

Scalable Processing ● Single-node RDF management systems are abundant ● Sesame ● Jena ● RDF-3X ● 3store ● Research in clustered RDF management is less significantly explored: The focus of the talk

RDF as Triples and a Graph

SPARQL ● RDF query language ● A basic graph pattern ● Answering SPARQL can be seen as finding subgraphs in the RDF data that match the graph pattern

Example for Star Pattern ● Find the names of the strikers that play for FC Barcelona. SELECT ?name WHERE { ?player type footballer . ?player name ?name . ?player position striker . ?player playsFor FC_Barcelona . }

Another Example ● Find football players playing for clubs in a populous region where they were born.

System Architecture

Data Partitioning ● Hash vs Graph partitioning ● Hash: Only efficient for star patterns ● Graph: Taking advantage of graph model ● Edge vs Vertex partitioning ● Edge: Natural but inefficient for query execution ● Vertex: Superior for common graph patterns

Edge/Triple Placement ● Minimizing data shuffling/exchange ● Allowing data overlap ● N-hop guarantee ● The extent of data overlap ● If a vertex is assigned to a machine, any vertex that is within n-hop of this vertex is also stored in this machine

Example for N-Hop Guarantee

Query Processing ● Query execution is more efficient in RDF-stores than in Hadoop ● Pushing as much of the processing as possible into RDF-stores ● Minimizing the number of Hadoop jobs ● The larger the hop guarantee, the more work is done in RDF-stores

To Communicate, or not to Communicate ● Given a query and n-hop guarantee, is communication (Hadoop job) between nodes needed? ● Choose the “center” of the query graph ● Calculate the distance from the “center” to the furthest edge ● If distance > n, communication is needed; not needed otherwise

Back to the Example ● Find football players playing for clubs in a populous region where he was born.

Experimental Setup ● 20-machine cluster ● Leigh University Benchmark (LUBM): 270 million triples ● Competitors: ● Single-node RDF-3X ● SHARD: triple-store system in Hadoop ● Graph partitioning (the proposed system) ● Hash partitioning on subjects

Performance Comparison

Speedup ● Better than linear speedup

Summary ● We propose a new architecture for scalable RDF data management: RDF-stores + Hadoop ● We propose a new approach for data placement and corresponding query processing: Graph partitioning + N-hop guarantee ● The techniques in the talk can be generalized to the problems of subgraph pattern matching in other graphs ● The lesson we learned: Inter-node communication is expensive, avoid it.

Thank you!

Backup Slides: Optimization ● Problem: High-degree vertexes make the graph well-connected and difficult to partition ● Solution: Removing them in graph partitioning ● Problem: High-degree vertexes cause data explosion in n-hop guarantee ● Solution: Weakened n-hop guarantee

Scalable SPARQL Querying of Large RDF Graphs Jiewen Huang, Daniel - PowerPoint PPT Presentation

Scalable SPARQL Querying of Large RDF Graphs Jiewen Huang, Daniel J. Abadi and Kun Ren Yale Database Group RDF Gaining Popularity Encouraged by major search engines Google Yahoo! More data sets available in RDF Governments

Lecture 3: SPARQL (1.1) Aidan Hogan aidhog@gmail.com PREVIOUSLY First SPARQL (1.0) Then

The RDF* and SPARQL* Approach to Annotate Statements in RDF and to Reconcile RDF and Property

Module 15 RDF, SPARQL and Semantic Repositories Module 15 Outline 9.45-11.00 RDF/S and OWL

RDF* and SPARQL* An Alternatjve Approach to Statement-Level Metadata in RDF Olaf Hartjg

RDF Topics Finish up XML. What is RDF? Why is it interesting? SPARQL: The

The Resource Description Framework (RDF 1.1) M2 CPS RDF RDF is to the Semantic Web what HTML

SPARQL Query Language for RDF Motivation RDF, RDF Schema, OWL provide data and meta- data

The Tractability Frontier of Well-designed SPARQL Queries Miguel Romero (University of Oxford)

SPARQL - Querying the Web of Data Seminar WS 2008/2009 RDF and the Web of Data Olaf Hartig

Introduction to SPARQL Acknowledgements This presentation is based on the W3C Candidate

SPARQL 1.1 Peter Fischer DMQL SPARQL 1.0 limitations Limited graphs operations: How to

Introduction to RDF Sandro Hawke, W3C @sandhawke Semantic Web Tutorial ISWC 2010 Overview

Dydra define-declaration http:/ /dydra.com or ... don't walk sbcl 20 a sparql service ..

Knowledge Representation VII - IKT507 This sentence is false! SPARQL stands for SPARQL Protocol

Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema(Broekstra et. al.)

Economic and Environmental Rationales The RDF Industry Group welcomes you RDF Export: Analysis of

5G ESSENCE Embedded Network Services for 5G Experiences IEEE 5G Summit, Thessaloniki, July 11 th

Click to edit Master title style April 12 th 2017 Click to edit Master subtitle style Roisin

C OMET M C N AUGHT 2007 H ALLEY S C OMET 1986 H ALLEY S C OMET 1986 16 km x 8 km C OMET

Learning Joint Semantic Parsers from Disjoint Data Hao Peng 1 , Sam Thomson 2 , Swabha Swayamdipta

Overloading operators Why overload operators? (==, =, <, >>, +=, ) notational

QuickCheck John Hughes Chalmers University/Quviq AB What is QuickCheck? A library for

Resources for Educational Games (Emphasizing PuppyBot Rescue ) Mike Christel, Scott Stevens, Bryan

Ob je c tive s He a lth Ca re During a nd Afte r I nc a rc e ra tio n 1) R e vie w tr e

Scalable SPARQL Querying of Large RDF Graphs Jiewen Huang, Daniel - PowerPoint PPT Presentation

Scalable SPARQL Querying of Large RDF Graphs Jiewen Huang, Daniel J. Abadi and Kun Ren Yale Database Group RDF Gaining Popularity Encouraged by major search engines Google Yahoo! More data sets available in RDF Governments

Lecture 3: SPARQL (1.1) Aidan Hogan aidhog@gmail.com PREVIOUSLY First SPARQL (1.0) Then

The RDF* and SPARQL* Approach to Annotate Statements in RDF and to Reconcile RDF and Property

Module 15 RDF, SPARQL and Semantic Repositories Module 15 Outline 9.45-11.00 RDF/S and OWL

RDF* and SPARQL* An Alternatjve Approach to Statement-Level Metadata in RDF Olaf Hartjg

RDF Topics Finish up XML. What is RDF? Why is it interesting? SPARQL: The

The Resource Description Framework (RDF 1.1) M2 CPS RDF RDF is to the Semantic Web what HTML

SPARQL Query Language for RDF Motivation RDF, RDF Schema, OWL provide data and meta- data

The Tractability Frontier of Well-designed SPARQL Queries Miguel Romero (University of Oxford)

SPARQL - Querying the Web of Data Seminar WS 2008/2009 RDF and the Web of Data Olaf Hartig

Introduction to SPARQL Acknowledgements This presentation is based on the W3C Candidate

SPARQL 1.1 Peter Fischer DMQL SPARQL 1.0 limitations Limited graphs operations: How to

Introduction to RDF Sandro Hawke, W3C @sandhawke Semantic Web Tutorial ISWC 2010 Overview

Dydra define-declaration http:/ /dydra.com or ... don't walk sbcl 20 a sparql service ..

Knowledge Representation VII - IKT507 This sentence is false! SPARQL stands for SPARQL Protocol

Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema(Broekstra et. al.)

Economic and Environmental Rationales The RDF Industry Group welcomes you RDF Export: Analysis of

5G ESSENCE Embedded Network Services for 5G Experiences IEEE 5G Summit, Thessaloniki, July 11 th

Click to edit Master title style April 12 th 2017 Click to edit Master subtitle style Roisin

C OMET M C N AUGHT 2007 H ALLEY S C OMET 1986 H ALLEY S C OMET 1986 16 km x 8 km C OMET

Learning Joint Semantic Parsers from Disjoint Data Hao Peng 1 , Sam Thomson 2 , Swabha Swayamdipta

Overloading operators Why overload operators? (==, =, &lt;, &gt;&gt;, +=, ) notational

QuickCheck John Hughes Chalmers University/Quviq AB What is QuickCheck? A library for

Resources for Educational Games (Emphasizing PuppyBot Rescue ) Mike Christel, Scott Stevens, Bryan

Ob je c tive s He a lth Ca re During a nd Afte r I nc a rc e ra tio n 1) R e vie w tr e

Overloading operators Why overload operators? (==, =, <, >>, +=, ) notational