Characteristic Sets: Accurate Cardinality Estimation for RDF Queries - PowerPoint PPT Presentation

Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins Thomas Neumann Guido Moerkotte Presented By : Pranjal Gupta

Recap. RDF is the underlying query language of the Semantic Web. ● Data is represented as the set of triple (subject, predicate, object). ● Single table (3 columns) ●

Recap. RDF is the underlying query language of the Semantic Web. ● Data is represented as the set of triple (subject, predicate, object). ● Single table (3 columns) ● Query graph is made up of sequence of query patterns. ● SELECT DISTINCT ?e WHERE { ?e <author> “Jane Austen” , ?e <title> ?b, ?e <year> ?y }

Recap. RDF is the underlying query language of the Semantic Web. ● Data is represented as the set of triple (subject, predicate, object). ● Single table (3 columns) ● Query graph is made up of sequence of query patterns. ● SELECT DISTINCT ?e WHERE { ?e <author> “Jane Austen” , ?e <title> ?b, ?e <year> ?y } Multiple self joins -> need for query optimizer that produces efficient ● query plans that has optimal join ordering.

Star queries. Quite a common feature in queries. ● Characterized by sequence of query patterns having a common ● subject.

Star queries. Quite a common feature in queries. ● Characterized by sequence of query patterns having a common ● subject. Jane Austen ?b SELECT DISTINCT ?e > r o <title> h t u WHERE { a < ?e <author> “Jane Austen” , ?e ?e <title> ?b, ?e <year> ?y <year> } ?y

Objectives. Highly accurate cardinality estimation for Star Queries. ● By using Characteristic sets. ○ Extending the use of characteristic sets to calculate the cardinality of ● general queries. Using cardinality estimator with query optimizer. ●

Challenges. 1. Lack of explicit schema based on the structure. Cannot partition the data for estimation, since all data looks the same. 2. Predicates are correlated and hence, cardinality cannot be estimated using single-bucket histograms. 3. RDF predicates are usually string values -> histograms are deemed inappropriate for estimation. 4. RDF-3X’s solution.

Characteristic set IDEA 1. RDF data does not have a fixed schema 2. The outgoing “predicate” edges gives an idea about the “class” of the entity. e.g. - Artist, City, Country. 3. A “soft” schema hence occur in data, based on the predicates of a subject.

Characteristic set Set of all predicates that have atleast one tuple with the subject

Characteristic set Set of all predicates that have atleast one tuple with the subject { “product”, “founder”, S C (“Google”) = “founded_in”, “CEO”, “website” }

Set of characteristic set Set of characteristic sets of all subject s give that there exists atleast one pair of predicate p and object o

Set of characteristic set Set of characteristic sets of all subject s give that there exists atleast one pair of predicate p and object o “The girl with a dragon tattoo” “Namesake” { “Author”, “Title”, “Publisher”, “ISBN”, “Year”, “Language” } “Tell me your Dreams” “Amazon” “Google” { “Founder”, “Founded In”, “CEO”, “CFO”, “Product”, “Revenue”, “Profit” } “Tesla” “New York” { “Country”, “Province”, “Population”, “latitude”, “longitude” } “Mumbai” “Toronto”

Calculating simple cardinality Star-shaped edge structures are also present in queries. ● Each triple describes only one characteristic of the subject. ● Hence, queries have multiple triple patterns with one subject variable. ●

Calculating simple cardinality Star-shaped edge structures are also present in queries. ● Each triple describes only one characteristic of the subject. ● Hence, queries have multiple triple patterns with one subject variable. ● ?a ?b SELECT DISTINCT ?e > r <title> o h t u WHERE { ?e <author> ?a , ?e <title> ?b } a < ?e

Calculating simple cardinality ?a Q = ?b SELECT DISTINCT ?e > r <title> o WHERE { ?e <author> ?a , ?e <title> ?b } h t u a < S C (Q) = { “title”, “author” } ?e SOLUTION Sum of cardinalities of all the supersets of query characteristic sets in S c (R)

Occurrence annotations Limitation of previous calculations : Only works if there is a DISTINCT in the selection clause ●

Occurrence annotations Limitation of previous calculations : Only works if there is a DISTINCT in the selection clause ● John Green S C (<ent 416>) = { “title”, “author” } Let it Snow count = 1 <author> <title> <author> <ent #416> < Lauren Myracle a u t h o r > Ralph

Occurrence annotations Limitation of previous calculations : Only works if there is a DISTINCT in the selection clause ● John Green S C (<ent 416>) = { “title”, “author” } Let it Snow count = 1 <author> <title> SELECT DISTINCT ?e <author> <ent #416> WHERE { ?e <author> ?a , ?e <title> ?b } < Lauren Myracle a 3, not 1 u t h o r > Lauren Myracle

Occurrence annotations Predicate Annotations ! Number of occurrences for each predicate in the in the ● characteristic set is also stored eg. S = { p1, p2, p3 … }

Occurrence annotations Q = SELECT DISTINCT ?e WHERE { ?e <author> ?a , ?e <title> ?b } S C (Q) = { “title”, “author” }

Occurrence annotations Q = SELECT DISTINCT ?e S = { “title”, “author”, “year” } WHERE { ?e <author> ?a , ?e <title> ?b } S C (Q) = { “title”, “author” } avg. author 2323, not 1000 = 2300/1000 = 2.3 avg. title = 1010/1000 = 1.01 There can be a loss of precision ●

Queries with bounded objects We stored the count of predicate for each characteristic set it appeared ● in -> correlation b/w subject and predicate. Opt the same strategy for storing the correlation b/w subject predicate ● and object ? INEFFICIENT

Queries with bounded objects We stored the count of predicate for each characteristic set it appeared ● in -> correlation b/w subject and predicate. Opt the same strategy for storing the correlation b/w subject predicate ● and object ? INEFFICIENT OBSERVATION Subjects of a characteristic set follow similar behavior. ● In each characteristic set there is one predicate that is least selective -> ● key of a relational table. Other predicates follow the “key” predicate. ●

Queries with bounded objects Out of the multiple object bounded patterns, take the one most ● selective. Other object-bound is assumed to have soft functional dependency. ● Overestimation. ●

Cardinality of Star Joins Complete Algorithm

Cardinality of Star Joins Complete Algorithm Loops over all the characteristic sets in S C that is the super-set of the Query characteristic set

Cardinality of Star Joins Complete Algorithm Loops over all the triples that appear in the query

Cardinality of Star Joins Complete Algorithm if object is bounded, take the minimum of the selectivity lower bound among all object- bounded triples in query

Cardinality of Star Joins Complete Algorithm else, update the cummulative selectivity (m)

Cardinality of Star Joins Complete Algorithm Calculate the cardinality in current characteristic set and add to global cardinality

Handling diverse sets The number of characteristic sets in a data can be very large. ● Keeps only the most frequent 10,000 characteristic sets. ● Merge the others with the most frequent ones. ●

Handling diverse sets The number of characteristic sets in a data can be very large. ● Keeps only the most frequent 10,000 characteristic sets. ● Merge the others with the most frequent ones. ● MERGING SOLUTIONS S 1 = {(author, 120), 100} S 2 = {(title, 230), 200} S 3 = {(author, 2300), (title, 1001), (year, 1000), 1000 } S 4 = {(author, 30), (title, 20), 20}

Handling diverse sets The number of characteristic sets in a data can be very large. ● Keeps only the most frequent 10,000 characteristic sets. ● Merge the others with the most frequent ones. ● MERGING SOLUTIONS MERGING SOLUTIONS S 1 = {(author, 150), 120} S 1 = {(author, 120), 100} S1 S 2 = {(title, 230), 200} S4 S 2 = {(title, 250), 140} S 3 = {(author, 2300), (title, 1001), (year, 1000), S2 1000 } S 4 = {(author, 30), (title, 20), 20} UNDERESTIMATION ●

Handling diverse sets The number of characteristic sets in a data can be very large. ● Keeps only the most frequent 10,000 characteristic sets. ● Merge the others with the most frequent ones. ● MERGING SOLUTIONS MERGING SOLUTIONS S 1 = {(author, 120), 100} S3 S4 S 2 = {(title, 230), 200} S 3 = {(author, 2300), (title, 1001), (year, 1000), S 3 = {(author, 2330), (title, 1021), (year, 1000), 1000 } 1020 } S 4 = {(author, 30), (title, 20), 20} OVERESTIMATION ●

Characteristic Sets: Accurate Cardinality Estimation for RDF Queries - PowerPoint PPT Presentation

Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins Thomas Neumann Guido Moerkotte Presented By : Pranjal Gupta Recap. RDF is the underlying query language of the Semantic Web. Data is represented as

GUI Testing Chapter 19 GUI characteristic Figure 19.1 What is the main characteristic of

Singularities in characteristic zero and singularities in characteristic p Karl Schwede 1 1

Characteristic Functions Will Perkins February 14, 2013 Characteristic Functions Definition The

18.175: Lecture 15 Characteristic functions and central limit theorem Scott Sheffield MIT 1 18.175

MATH 105: Finite Mathematics 6-1: Sets Prof. Jonathan Duncan Walla Walla College Winter

Secondary Characteristic Classes and applications David L. Johnson April 3, 2014 Abstract

18.175: Lecture 14 Weak convergence and characteristic functions Scott Sheffield MIT 1 18.175

Euler Characteristic Rebecca Robinson May 15, 2007 Euler Characteristic Rebecca Robinson 1

Small-span characteristic polynomials of integer symmetric matrices James McKee (RHUL) ANTS 9,

Languages and Regular expressions Lecture 2 1 Strings, Sets of Strings, Sets of Sets of

Sets Sets A Set is an abstract data type representing an unordered Sets are unordered and

Singer difference sets and difference system of sets Akihiro Munemasa Graduate School of

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

Objectives FOLLOW Sets Dr. Mattox Beckman Compute the FOLLOW sets for the nonterminal symbols

S 3 identified by a rep. identified by a rep. n n = # of = # of Make Make- -Set

Sets Reading: EC 3.1-3.3 Peter J. Haas INFO 150 Fall Semester 2019 Lecture 11 1/ 21 Sets

Button Click Many ways use Button in Android SetOnClickListener (just click button)

Welcome to CSE 506 Introduction & Review Don Porter CSE 506: Operating Systems Why Grad OS?

pg_chameleon Federico Campoli Loxodata Few words about the speaker Born in 1972

Valence Matching in Saliba Mike Berger Universitt Leipzig mike.berger@uni-leipzig.de 20.8.20

Professionals to Prepare STEM Undergraduates for Research This work was sponsored by a

The Future Security Challenges in RFID Gildas Avoine, UCL Belgium Workshop in Information

History and Principles of Steganography CSM25 Secure Information Hiding Dr Hans Georg Schaathun

S chool Rove r STAND AND DELIVER TALA MORSEU FONT The weeks in Term 3 are flashing by and almost

Characteristic Sets: Accurate Cardinality Estimation for RDF Queries - PowerPoint PPT Presentation

Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins Thomas Neumann Guido Moerkotte Presented By : Pranjal Gupta Recap. RDF is the underlying query language of the Semantic Web. Data is represented as

GUI Testing Chapter 19 GUI characteristic Figure 19.1 What is the main characteristic of

Singularities in characteristic zero and singularities in characteristic p Karl Schwede 1 1

Characteristic Functions Will Perkins February 14, 2013 Characteristic Functions Definition The

18.175: Lecture 15 Characteristic functions and central limit theorem Scott Sheffield MIT 1 18.175

MATH 105: Finite Mathematics 6-1: Sets Prof. Jonathan Duncan Walla Walla College Winter

Secondary Characteristic Classes and applications David L. Johnson April 3, 2014 Abstract

18.175: Lecture 14 Weak convergence and characteristic functions Scott Sheffield MIT 1 18.175

Euler Characteristic Rebecca Robinson May 15, 2007 Euler Characteristic Rebecca Robinson 1

Small-span characteristic polynomials of integer symmetric matrices James McKee (RHUL) ANTS 9,

Languages and Regular expressions Lecture 2 1 Strings, Sets of Strings, Sets of Sets of

Sets Sets A Set is an abstract data type representing an unordered Sets are unordered and

Singer difference sets and difference system of sets Akihiro Munemasa Graduate School of

CS675: Convex and Combinatorial Optimization Spring 2018 Convex Sets Instructor: Shaddin Dughmi

Objectives FOLLOW Sets Dr. Mattox Beckman Compute the FOLLOW sets for the nonterminal symbols

S 3 identified by a rep. identified by a rep. n n = # of = # of Make Make- -Set

Sets Reading: EC 3.1-3.3 Peter J. Haas INFO 150 Fall Semester 2019 Lecture 11 1/ 21 Sets

Button Click Many ways use Button in Android SetOnClickListener (just click button)

Welcome to CSE 506 Introduction &amp; Review Don Porter CSE 506: Operating Systems Why Grad OS?

pg_chameleon Federico Campoli Loxodata Few words about the speaker Born in 1972

Valence Matching in Saliba Mike Berger Universitt Leipzig mike.berger@uni-leipzig.de 20.8.20

Professionals to Prepare STEM Undergraduates for Research This work was sponsored by a

The Future Security Challenges in RFID Gildas Avoine, UCL Belgium Workshop in Information

History and Principles of Steganography CSM25 Secure Information Hiding Dr Hans Georg Schaathun

S chool Rove r STAND AND DELIVER TALA MORSEU FONT The weeks in Term 3 are flashing by and almost

Welcome to CSE 506 Introduction & Review Don Porter CSE 506: Operating Systems Why Grad OS?