Optimization of in collaboration with: Parke Godfrey and Jarek Gryz Regular Path Queries in Large Graphs Nikolay Yakovets
Optimization of RPQs Scalable & e ffi cient evaluation of regular path queries Evaluation Implementation RPQs Optimization WAVEGUIDE Plans Linked Data Costs Semantics 2
Graph Query Languages ? ? ? Adjacency Query list all neighbours, find k- ? ? ? neighbourhood of a node G Pattern Matching Query ? find all sub-graphs in a database that are pattern isomorphic to a given query pattern graph Summarization Query + + summarize or operate on query results e.g. aggregation; avg(), min(), max(), etc Reachability/Path Query navigational query deals with paths in a graph test whether nodes are reachable in a graph paths of fixed or arbitrary lengths 3
SPARQL - Query Language adjacency pattern matching summarization S PARQL P rotocol a nd R DF Q uery L anguage (SPARQL) ‣ declarative, based on pattern matching ‣ graph patterns describe subgraphs of the queried RDF graphs ‣ those subgraphs that match a description yield a result ny:nikolay Query: Graph: variables foaf:based_near SELECT ?pop foaf:name WHERE { dbpedia:Oakville :Oakville :population ?pop } "Nikolay dp:population Yakovets" ?pop graph pattern "182520" 4
SPARQL Property Paths ‣ Part of SPARQL 1.1 W3C recommendation path ‣ Allow regular expressions to describe paths between nodes: p 1 | p 2 p 1 /p 2 disjunction concatenation p ? ˆ p zero or one inverted ! iri negated p + Kleene star one or more p ∗ ‣ Useful in many application domains: social networks , biological , encyclopedic ‣ Convenient declarative mechanism to answer queries without prior knowledge of underlying data paths 5
SPARQL Property Paths ‣ Example: DBPedia snippet, part of a LOD dataset ‣ Two datasets English and Japanese interlinked with OWL terms en: Gundam G: en: Tokyo en: Japan :isLocatedIn :sameAs en: Daiba :sameAs jp: ガンダム :isLocatedIn :isLocatedIn jp: 本州 jp: ⽇旦本 jp: 関東地⽅斺 jp: 東京 jp: お台場 select ?place Q: { en: Gundam (:sameAs*/:isLocatedIn)+/sameAs* ?place .} ‣ Query: Where is Gundam statue located? ‣ Solution: Need to resolve equivalent data entities ( :sameAs ) and traverse spacial hierarchy ( :isLocatedIn ) to fully utilize richer spacial information in Japanese dataset 6
Formal Evaluation ‣ Property Paths in SPARQL are essentially Regular Path Queries (RPQs) ‣ RPQs have been well-studied before the advent of RDF and SPARQL regular language ‣ Formal def.: Q = ( x, L ( r ) , y ) free variables ‣ Semantics of Evaluation: [[ Q ]] G - an evaluation of Q over graph database G a collection ( s, t ) such that ∃ a path p in G between s and t such that p conforms to regex r aka. solution counting ∀ a bag (allow duplicates) path-induced string λ ( p ) ∈ L ( r ) path is simple or arbitrary a set (discard duplicates) aka. existential semantics ∃ 7
Paths in SPARQL regular ∀ simple ∀ ∃ simple Counting procedures are # P- Evaluation of simple paths is complete on general graphs NP-complete on general (Arenas et al., Losemann et al., 2013 ) graphs (Mendelzon et al., 1987 ) Tractable on DAGs, or restricted Tractable on DAGs, or restricted compatible regex compatible regex regular ∃ SPARQL (W3C proposal for RDF query language) support of RPQs through SPARQL1.1 property paths 8
RPQ Evaluation [[ Q ]] G - an evaluation of Q over graph database G + considering existential semantics on regular paths FA-based 𝝱 -RA-based Use finite state machines in Use relational algebra evaluation extended with alpha- Mendelzon et al., 1987 operator which computes transitive closure Losemann et al., 2013 9
FA-based Evaluation select ?place Q: { en: Gundam (:sameAs*/:isLocatedIn)+/sameAs* ?place .} 3. Construct a product P of 1. From a parse tree, construct a query ε -NFA : query and graph automata. 4. Check P for reachable accepting states to produce an answer to a query 2. Minimize the query automaton, if necessary : 10
𝝱 -RA-based Evaluation select ?place Q: { en: Gundam (:sameAs*/:isLocatedIn)+/sameAs* ?place .} Have SPRJU-RA extended with 𝝱 𝝱 computes the least-fixpoint: 𝝱 computes the transitive closure of a given relation 1. From a parse tree, construct an RA tree: Q parse tree Q RA tree favourite RDBMS 11
Comparing Approaches Th: FA and are 𝝱 -RA incomparable plan spaces Pf.: translation into Datalog examine induced sequence of joins 𝝱 -RA FA e.g. (?x, (a/b)+, ?y) P FA =((((a ⋈ b) ⋈ a) ⋈ b) ⋈ a).. P aRA =(a ⋈ b) ⋈ (a ⋈ b) ⋈ (a ⋈ b).. P FA P aRA P aRA ∉ FA P FA ∉ 𝝱 -RA 𝝱 -RA ⊈ FA FA ⊈ 𝝱 -RA 12
WAVEGUIDE Goal: Need to consider both FA and 𝝱 -RA plan spaces Search driven by a waveplan which guides a number of wavefronts which iteratively explore the graph guided iterative waveplan graph search P ab + P ab + W W ab · W ab · W ab + : W ab + : W ab · W ab · U W W ab : W ab : · b · b a · a · U W 13
search wavefronts accepting states seed W l a wavefront wavefront labels • an expanding search unit label edge labels • guided by a wavefront automaton W l = ( l, S, q 0 , Q, δ , E, L, F ) W l = ( l, S, q 0 , Q, δ , E, L, F ) • labeled with regex it evaluates starting state S • seeded with set of states transition function δ a transition function appending or prepending • appending and prepending transitions δ : Q × (( E ∪ L ) × {· , ·} ∪ { ε } ) → 2 Q δ : Q × (( E ∪ L ) × {· , ·} ∪ { ε } ) → 2 Q • transitions over graphs and views graph edges pipeline or wavefront labels S a seed starting state W l • edge incoming into accepting state in W l W l q 0 q 0 • defined with an RPQ, a wavefront or by construction S • can be universal , any node in a graph seed 14
a waveplan a waveplan P Q Q • produces an answer to a given query • an ordered set of wavefront automata • order defines which labels can be used in the seed and transitions over a view • higher wavefronts can use lower wavefronts as their labels and seeds, but not vice-versa • query answered by the highest wavefront P ab + P ab + set of wavefronts ordering < P ab + < P ab + W ab · W ab · W ab + : W ab + : e.g., query (?x, (a/b)+, ?y) W ab · W ab · W ab • produces an answer for (a/b) regex U W ab W ab + • uses as a view to compute W ab : W ab : (a/b)+ · b · b a · a · U 15
WAVEGUIDE - iterative search Exploration procedure based on semi- naive evaluation Intermediate search results kept in the search cache cache keeps track of end-nodes and corresponding states in a plan • seed specifies node pairs to start from loop while discover new tuples • crank advances simultaneously in a graph and automaton • reduce prunes the delta, handles unbounded computation • cache materializes according to the specified strategy • extract produces answers 16
challenges! vs. other e ffi cient? optimal? techniques? enumerator size? plan space optimizations cost model analysis? enabled by WAVEGUIDE? 17
WAVEGUIDE Plan Space WP • subsumes both FA and 𝝱 - RA • adds exclusive new plans 𝝱 -RA ∪ FA ⊂ WP 𝝱 -RA FA • e.g., (?x, (a/b/c)+, ?y) 18
WAVEGUIDE Plan Space WP • subsumes both FA and 𝝱 - RA • adds exclusive new plans 𝝱 -RA ∪ FA ⊂ WP 𝝱 -RA FA • e.g., (?x, (a/b/c)+, ?y) a < P ( abc )+ < P ( abc )+ P ( abc )+ P ( abc )+ a · a · W ( abc )+ : W ( abc )+ : b b · b · a c a · a · c · c · start start U 19
WAVEGUIDE Plan Space WP • subsumes both FA and 𝝱 - RA • adds exclusive new plans 𝝱 -RA ∪ FA ⊂ WP 𝝱 -RA FA • e.g., (?x, (a/b/c)+, ?y) α < P ( abc )+ < P ( abc )+ P ( abc )+ P ( abc )+ W abc · W abc · . . / o = s / o = s W ( abc )+ : W ( abc )+ : W abc · W abc · U . . / o = s / o = s W abc : W abc : b · b · a · a · c · c · σ p = a σ p = a σ p = b σ p = b σ p = c σ p = c U T T T T T 20
WAVEGUIDE Plan Space WP • subsumes both FA and 𝝱 - RA • adds exclusive new plans 𝝱 -RA ∪ FA ⊂ WP 𝝱 -RA FA • e.g., (?x, (a/b/c)+, ?y) a · a · P ( abc )+ P ( abc )+ < P ( abc )+ < P ( abc )+ W ( abc )+ : W ( abc )+ : W bc · W bc · a · a · U W bc : W bc : · b · b c · c · U 21
Recommend
More recommend