LFDS - 17/11/2015 UCL, London Path Logics for Q uerying Graphs combining expressiveness and efficiency Diego Figueira CNRS, LaBRI France
Graph databases Semantic web / RDF / social networks / . . . a c a b "Entities + Relations" b c a c a b a a Modelled as: edge-labelled directed graphs b c Notion of path of central importance
Graph databases RPQ π 1 a c π 1 : (ab)* c a b b π 1 c a c a b a a b c Evaluation: P (combined) NL (data)
Graph databases CRPQ π 1 : (ab)* c π 1 a c π 2 : (ac)* a b π 3 : a c* b π 1 c a c π 2 π 3 a b a a Acyclic P b c — Evaluation: NP (combined) NL (data) Unions, inverse
Graph databases CRPQ π 1 : (ab)* c a c π 2 : (ac)* a b π 3 : a c* b π 1 c a c π 2 π 3 a b a a b c What about… “All the pairs ( u , v ) that can reach some node z in the same number of steps”
Graph databases CRPQ(S) What about testing for relations CRPQ on the paths? π 1 : (ab)* c π 2 : (ac)* • | π i | = | π j | π 3 : a c* • π i is a pre fi x of π j π 1 • π i is a subsequence of π j π 2 π 3 • π i is a factor of π j • π i = π j projected onto A R( π 1 , π 2 ), R ∈ S Motivations from: entity resolution, semantic associations, crime detection,…
Graph databases CRPQ(S) What about testing for relations CRPQ on the paths? π 1 : (ab)* c π 2 : (ac)* CRPQ + π 3 : a c* CRPQ(S) = tests R( π i1,…, π in ), R ∈ S π 1 π 2 π 3 S: Class of well- behaved R( π 1 , π 2 ), R ∈ S word relations…
Word relations RAT k rational REG k regular REC k recognizable
REC 2 binary relations recognizable R ⊆ 𝔹 * × 𝔹 * a b a b c b b c d a d d a c d c b REG 2 RAT 2 regular rational a b a b c b b a b a b c b b c d a d d a c d c b c d a d d a c d c b pre fi x, equal, equal length, ... su ffi x, in fi x, projection, subsequence, ...
Graph databases CRPQ + CRPQ(S) = tests R( π i1,…, π in ), R ∈ S CRPQ(REC) NP/NL complexity Can this be extended? CRPQ(REG) PSPACE/NL complexity CRPQ(RAT) undecidable Related to the Intersection Problem: Given relations R 1 ,…,R n , whether R 1 ∩ ··· ∩ R n ≠ ∅
intersection problem R ⋂ S = ∅ ? R , S : classes of binary relations it has been studied... input: R ∈ R , S ∈ S u n v 1 v n u 1 ( . . . , . . . ) i i i i output: R ⋂ S = ∅ ? REG ⋂ RAT = ∅ ? PCP already undecidable ...but what about like real world relations? u v v u a b a c a b a c u a b a c a b a c v a a b a b a c a c b c b a c v a a b a b a c a c b c b a c su ffi x ...? subsequence ...? subsequence subword ...?
Can we extend CRPQ beyond REG relations? Language Data complexity Combined complexity CRPQ(REG k ) NL PSPACE CRPQ(RAT k ) Undecidable Undecidable CRPQ(REG k + su ffi x) Undecidable Undecidable CRPQ(REG k + factor) Undecidable Undecidable CRPQ(REG k + subsequence) non-elementary non-PR CRPQ(su ffi x) NL PSPACE CRPQ(factor) PSPACE PSPACE CRPQ(subsequene) PSPACE NEXPTIME ∀ k>1
Can we extend CRPQ beyond REG relations? Proposed alternative: approximate RAT through REG + counters How? 1) take a an NFA 2) add counters 3) use it to read k -tuples of words
2 tapes over 𝔹 ≈ 1 tape over 𝔹 × {1,2} 1 a b a b b control word 2 b a a a b a [ [ ] = ] 1 2 2 1 2 1 1 2 2 1 2 ( ) , ababb baaaba a b a b a a b a b b a ∈ ∈ ( 𝔹 × {1,2})* 𝔹 * × 𝔹 * ( 𝔹 × {1,2})* 𝔹 * × 𝔹 * [ [ ] ] = (1|2)* -controlled [ [ ] ] = equality ((a,1)(a,2)|(b,1)(b,2))* (12)* -controlled L ⊆ {1,2}* | S ∈ REG( 𝔹 × {1,2}) is L -controlled } Rel ( L )= {[ [ ] ] S
Eg: Rel ((1|2)*)= RAT 2 Rel ((12)*(1*|2*))= REG 2 Rel (1*2*)= REC 2 Rel ((12)*)= length-preserving REG 2 rev Rel ((1*|2*)(12)*)= REG 2
Approximate with regular relations that can count patterns # of times (ab)*c appears in u R = { ( u , v ) | } = 2 · # of times c*b appears in v More than just counting letters Idea
Instead of regular languages … | S ∈ REG ( 𝔹 × {1,2}) is L -controlled } Rel ( L )= {[ [ ] ] S …use automata with counting Idea
Evaluation of CRPQ with counting is feasible PSPACE in combined complexity NL in data complexity
Parikh Automata ❉ [Klaedtke & Rueß] dimension NFA with n counters c 1 ,…,c n and a semilinear set S ⊆ ℕ n ( 𝔹 ,Q,q 0 , δ ,F, n , S ) Transitions of δ : (q,a, ( x 1 ,…, x n ),q') ∈ Q × 𝔹 × ℕ n × Q Run: counters • Initial con fi guration: (q 0 ,(0,…,0)) ∈ Q × ℕ n can only be incremented ∈ δ (q,a, y ,p) • (q, x ) (p,( x + y )) • Acceptance: last con fi guration in F × S ❉ Many equivalent de fi nitions (eg. reversal-bounded counter systems)
Parikh Automata Eg: L ba = ca = { } number of a ’s a fu er a b w | = number of a ’s a fu er a c c 1 ++ c 2 ++ c 1 ++ c 2 ++ c 2 ++ c 1 ++ a b a a c a b a c a c a b a Parikh Automaton A = ( 𝔹 , Q, q 0 , δ , F, 2, {(k,k) | k ∈ ℕ }) • dimension 2 (2 counters) • increment c 1 whenever we see “ ba ” • increment c 2 whenever we see “ ca ” • F=Q • Semilinear set assures that counters must be equal to accept a word
Parikh Automata Decidable Closed under non-emptiness, intersection, membership union, (inverse) homomorphisms, concatenation (not complementation/iteration)
| S ∈ PA ( 𝔹 × {1,2}) is L -controlled } Rel PA ( L )= {[ [ ] ] S PA relations
Eg: REG PA = Rel PA ((12)*(1*|2*)) 2 REG PA = Rel PA ((1*|2*)(12)*) 2 rev RAT PA = Rel PA ((1|2)*) 2 . . .
Word relations RAT k rational REG k regular REC k recognizable PA REG k Parikh-regular
Ti eorem: Evaluation of CRPQ( REG PA ) is PSPACE in combined complexity NL in data complexity Proof ingredients: • Intersection problem for Parikh Automata Given PA’s A 1 ,…, A n , is L( A 1 ) ∩ · · · ∩ L( A n ) ≠ ∅ ? is PSPACE-complete • Intersection closure for REG PA For all R,S ∈ REG PA , R ∩ S ∈ REG PA it su ffi ces to intersect the automata representing them • Closure under product of REG PA Ti eorem: Evaluation of CRPQ PA (no relations) is NP in combined complexity NL in data complexity
Approximating rational relations u ~ k v are k-similar i ff for all w with | w | ≤ k , they have the same number of appearances of w (as factor) (as subsequence) Given R ∈ RAT, R k = {(u,v) | u ~ k u', v ~ k v', ( u' , v’ ) ∈ R} ∈ REG PA
π 2 Alternative: Syntactic restrictions π 1 π 3 π 7 π 6 π 1 π 1 E.g. π 1 : (ab)* c R( π 1 , π 3 ) π 5 π 4 cyclic Maximum cardinality of π 2 : (ac)* S( π 3 , π 2 ) π 2 π 3 connected component π 3 π 2 Gaifman multi-graph π 3 : a c* R( π 3 , π 2 ) of path variables Ti eorem: Evaluation of acyclic -CRPQ( RAT PA ) is PSPACE in combined complexity NL in data complexity If also fi xed join size : NP combined complexity π 1 π 1 E.g. π 1 : (ab)* c R( π 1 , π 3 ) acyclic π 2 : (ac)* π 2 π 3 S( π 3 , π 2 ) π 3 π 2 π 3 : a c* If also fi xed PA dimension and unary representation : PTIME combined complexity
Conclusion Counting does not increase complexity Avoid the curse of of rational relations Approximating by regular relations with counting Or staying away from cycles in path relations Ti ank you
Recommend
More recommend