A Theory of Regular Queries Moshe Y. Vardi Rice University
Theory of Regular Languages, I Regular Languages - Robust Definability : • Regular expressions • DFA • NFA • 2NFA • AFA • 2AFA • Regular grammar • MSO • . . . But : Succinctness Gaps : E.g., NFA < RE, NFA < DFA, AFA < NFA, MSO < AFA, . . . 1
NFA A = (Σ , S, S 0 , ρ, F ) • Alphabet : Σ • States : S • Initial states : S 0 ⊆ S • Nondeterministic transition function : ρ : S × Σ → 2 S • Accepting states : F ⊆ S Input word : a 0 , a 1 , . . . , a n − 1 Run : s 0 , s 1 , . . . , s n • s 0 ∈ S 0 • s i +1 ∈ ρ ( s i , a i ) for i ≥ 0 Acceptance : s n ∈ F Recognition : L ( A ) – words accepted by A . 1 ✲ ✓✏ ✲ • • Example : ✛ 0 – ends with 1’s ✒✑ ✻ ✻ ✂ ✁ ✂ ✁ 0 1 2
Theory of Regular Languages, II Regular Languages - Robust Closure : • Union • Intersection • Complement • Concatenation • Kleene star • Reverse • Homomorphism • Inverse homomorpism • . . . 3
NFA Intersection Given: • A 1 = (Σ , S 1 , S 1 0 , ρ 1 , F 1 ) • A 2 = (Σ , S 2 , S 2 0 , ρ 2 , F 2 ) . Define: A 1 ∩ A 2 = (Σ , S 1 × S 2 , S 1 0 , ρ, F 1 × F 2 ) , 0 × S 2 where: • ρ (( s, t ) , a ) = { ( s ′ , t ′ ) : s ∈ ρ 1 ( s, a ) and t ′ ∈ ρ 2 ( t, a ) } 4
NFA Complementation Run Forest of A on w : • Roots: elements of S 0 . • Children of s at level i : elements of ρ ( s, a i ) . • Rejection : no leaf is accepting. Key Observation : collapse forest into a DAG – at most one copy of a state at a level; width of DAG is at most | S | . Subset Construction Rabin-Scott, 1959: • A c = (Σ , 2 S , { S 0 } , ρ c , F c ) • F c = { T : T ∩ F = ∅} • ρ c ( T, a ) = � t ∈ T ρ ( t, a ) • L ( A c ) = Σ ∗ − L ( A ) 5
Complementation Blow-Up A = (Σ , S, S 0 , ρ, F ) , | S | = n A c = (Σ , 2 S , { S 0 } , ρ c , F c ) Blow-Up : 2 n upper bound Can we do better ? Lower Bound : 2 n Sakoda-Sipser 1978, Birget 1993 L n = (0 + 1) ∗ 1(0 + 1) n − 1 0(0 + 1) ∗ • L n is easy for NFA • L n is hard for NFA 6
Theory of Regular Languages, III Regular Languages - Robust Decidability : Emptiness : L ( A ) = ∅ Nonemptiness Problem : Decide if given A is nonempty. NFA Nonemptiness : Directed Graph G A = ( S, E ) of NFA A = (Σ , S, S 0 , ρ, F ) : • Nodes : S • Edges : E = { ( s, t ) : t ∈ ρ ( s, a ) for some a ∈ Σ } Lemma : A is nonempty iff there is a path in G A from S 0 to F . • Decidable in time linear in size of A , using breadth- first search or depth-first search . • Complexity : NLOGSPACE-complete. 7
NFA Containment Containment : L ( A 1 ) ⊆ L ( A 2 ) Lemma : L ( A 1 ) ⊆ L ( A 2 ) iff A 1 ∩ A c 2 is empty. • Decidable in exponential time. • Complexity : PSPACE-complete [Stockmeyer&Meyer, 1973] • Result holds also for RE containment. 8
Database Query Languages • Standard database query languages (e.g., SQL 2.0) are essentially 1st-order. • Aho&Ullman, 1979: 1st-order languages are weak – add recursion • Gallaire&Minker,1978: add recursion via logic programs • SQL 3.0, 1999: recursion added Expressiveness/complexity trade-off : • 1st-order queries: Data complexity – LOGSPACE • Recursive queries: Data complexity – PTIME 9
Datalog Datalog [Maier&Warren, 1988]: • Function-free logic programs • Existential, positive fixpoint logic • Select-project-join-union-recurse queries Example : Transitive Closure Path ( x, y ) : − Edge ( x, y ) Path ( x, y ) : − Path ( x, z ) , Path ( z, y ) Example : Impressionable Shopper Buys ( x, y ) : − Trendy ( x ) , Buys ( z, y ) Buys ( x, y ) : − Likes ( x, y ) 10
Query Containment, I Query Optimization : Given Q , find Q ′ such that: • Q ≡ Q ′ • Q ′ is “easier” than Q Query Containment : Q 1 ⊑ Q 2 if Q 1 ( B ) ⊆ Q 2 ( B ) for all databases B . Fact : Q ≡ Q ′ iff Q ⊑ Q ′ and Q ′ ⊑ Q Consequence : Query containment is a key database problem. 11
Query Containment, II Other applications: • query reuse • query reformulation • information integration • cooperative query answering • integrity checking • . . . Consequence : Query containment is the fundamental database-reasoning problem. 12
Query Containment, III Decidability of Query Containment : • SQL : undecidable – Folk Theorem (unsolvability of FO) – Poor theory and practice of optimization • SPJU Queries: decidable – Chandra&Merlin–1977, Sagiv&Yannakakis–1982 – Rich theory and practice of optimization Select-Project-Join-Union Queries: • Existential positive FO : conjunction, disjunction, existental quantification • Covers the vast majority of real-life database queries Example : Triangle ( x, y ) : − Edge ( x, y ) , Edge ( y, z ) , Edge ( z, x ) 13
Query Containment, IV Datalog Containment : • Complexity: undecidable – Shmueli–1987 - easy reduction from CFG containment • Difficult theory and practice of optimization Unfortunately , most decision problems involving Datalog are undecidable - very few interesting, well-behaved fragments. Reminder : Datalog=SPJU+Recursion Question : Can we limit recursion to recover decidability? 14
1990s: Graph Databases WWW : • Nodes • Edges • Labels Semistructured Data : WWW, SGML documents, library catalogs, XML documents, Meta data, . . . . Graph Databases : ( D, E, λ ) • D - nodes • E ⊆ D 2 - edges • λ : E → Λ – labels (alt., also node labels) 15
Figure 1: Graph Database 16
Path Queries Active Research Topic : What is the right query language for graph databases? (“No SQL”) Basic Element of all proposals : path queries • Q ( x, y ) : − x L y • L : formal language over labels l 1 · · · l k · b • a · • Q ( a, b ) holds if l 1 · · · l k ∈ L Example : Regular Path Query Q ( x, y ) : − x ( Wing · Part + · Nut ) y 17
Regular Path Queries Observations : • A fragment of Transitive-Closure Logic (FO+TC) • A fragment of binary Datalog – Concatenation : E ( x, y ) : − E 1 ( x, z ) , E 2 ( z, y ) – Union : E ( x, y ) : − E 1 ( x, y ) E ( x, y ) : − E 2 ( x, y ) – Transitive Closure : P ( x, y ) : − E ( x, z ) P ( x, y ) : − E ( x, z ) , E ( z, y ) 18
Path-Query Containment Q 1 ( x, y ) : − x L 1 y Q 2 ( x, y ) : − x L 2 y Language-Theoretic Lemma 1 : Q 1 ⊑ Q 2 iff L 1 ⊆ L 2 Proof : Consider a database l 1 · · · l k · b with l 1 · · · l k ∈ L 1 a · Corollary : Path-Query Containment is • undecidable for context-free path queries • PSPACE-complete for regular path queries. Containment : PSPACE-complete via RE containment 19
Two-Way RPQs Extended Alphabet : Λ − = { a − : a ∈ Λ } Λ ′ = Λ ∪ Λ − Inverse Roles : Part ( x, y ) : y part of x Part − ( x, y ) : x part of y Example : (1 / 2) ∗ Siblings Q ( x, y ) : − x [( father − · father ) + ( mother − · mother )] + y Containment : Use 2NFA? • Hopcroft and Ullman, 1979: 2DFA • Hopcroft, Motwani and Ullman, 2000: ??? 20
2NFA A = (Σ , S, S 0 , ρ, F ) • Σ – finite alphabet • S – finite state set • S 0 ⊆ S – initial states • F ⊆ S – final states • ρ : S × Σ → 2 S ×{− 1 , 0 , +1 } – transition function Theorem : Rabin&Scott, Shepherdson, 1959 2NFA ≡ 1NFA 21
2RPQ Containment Difficulties : • 2NFA → 1NFA: exponential blow-up – Consequence : Doubly exponential complementation • Difference between query and language containment – Q 1 ( x, y ) : − x Parent y Q 2 ( x, y ) : − x Parent · Parent − · Parent y – Q 1 ⊑ Q 2 but L ( Parent ) �⊆ L ( Parent · Parent − · Parent ) 22
Back to Basics: 2NFA → 1NFA Theorem : Vardi, 1988 Let A = (Σ , S, S 0 , ρ, F ) be a 2NFA. There is a 1NFA A c such that • L ( A c ) = Σ ∗ − L ( A ) • || A c || ∈ 2 O ( || A || ) Proof : Guess a subset-sequence counterexample a 0 · · · a k − 1 �∈ L ( A ) iff there is a sequence T 0 , T 1 , · · · , T k of subsets of S such that 1. S 0 ⊆ T 0 and T k ∩ F = ∅ . 2. If s ∈ T i and ( t, +1) ∈ ρ ( s, a i ) , then t ∈ T i +1 , for 0 ≤ i < k . 3. If s ∈ T i and ( t, 0) ∈ ρ ( s, a i ) , then t ∈ T i , for 0 ≤ i < k . 4. If s ∈ T i and ( t, − 1) ∈ ρ ( s, a i ) , then t ∈ T i − 1 , for 0 < i ≤ k . 23
Foldings Definition : Let u, v ∈ Λ ′∗ . We say that u folds onto v , denoted u ❀ v , if u can be “folded” onto v , e.g., abb − bc ❀ abc. a b b b c a b c Pictorially, → · → · ← · → · → ❀ → · → · → Definition : Let E be an RE over Λ . Then fold ( E ) = { v : u ❀ v, u ∈ L ( E ) } . Language-Theoretic Lemma 2 : Let Q 1 ( x, y ) : − x E 1 y Q 2 ( x, y ) : − x E 2 y be 2RPQs. Then Q 1 ⊑ Q 2 iff L ( E 1 ) ⊆ fold ( E 2 ) . 24
2RPQ containment Theorem : Let E be an RE over Λ ′ . There is a 2NFA ˜ A E such that • L ( ˜ A E ) = fold ( E ) • || ˜ A E || ∈ O ( || E || ) Containment Q 1 ( x, y ) : − x E 1 y Q 2 ( x, y ) : − x E 2 y TFAE • Q 1 ⊑ Q 2 • L ( E 1 ) ⊆ fold ( E 2 ) . • L ( E 1 ) ⊆ L ( ˜ A E 2 ) . • L ( E 1 ) ∩ L ( ˜ A c E 2 ) = ∅ • L ( A E 1 ∩ ˜ A c E 2 ) = ∅ Bottom-line : 2RPQ containment is PSPACE- complete. 25
Recommend
More recommend