query answering is the most fundamental problem in db
play

Query answering is the most fundamental problem in DB Query Q Result - PowerPoint PPT Presentation

Query answering is the most fundamental problem in DB Query Q Result Q ( D ) Database D SELECT R.A, S.C R A B FROM R, S Q(D) A C u x WHERE R.B = S.B u y v x u z S B C v y x y v z x z Three crucial problems for query


  1. Query answering is the most fundamental problem in DB Query Q Result Q ( D ) Database D SELECT R.A, S.C R A B FROM R, S Q(D) A C u x WHERE R.B = S.B u y v x u z S B C v y x y v z x z

  2. Three crucial problems for query answering Q(D) A C R A B SELECT R.A, S.C u x u y FROM R, S WHERE R.B = S.B v x u z S B C v y x y x z v z 1. Enumeration ( u , y ) , ( u , z ) , ( v , y ) , ( v , z ) 2. Uniform generation ( u , y ) ∶ 1 4 , ( u , z ) ∶ 1 4 , ( v , y ) ∶ 1 4 , ( v , z ) ∶ 1 4 3. Counting ∣ Q ( D )∣ = 4

  3. In this paper, we study log-space complexity classes We consider the class RelationNL and show that it has good algorithmic properties in terms of: Enumeration. Approximate counting. Approximate uniform generation. We consider the subclass RelationUL and show that it has better algorithmic properties in terms of: Constant delay enumeration (polynomial time preprocessing). Exact counting. Exact uniform generation. We show applications of these results in information extraction, graph databases, and among others.

  4. Efficient log-space classes for enumeration, counting, and uniform generation Marcelo Arenas Luis Alberto Croqueville Cristian Riveros Rajesh Jayaram PUC & IMFD Chile Carnegie Mellon University

  5. Outline The class RelationNL FPRAS for RelationNL Conclusions

  6. Outline The class RelationNL FPRAS for RelationNL Conclusions

  7. Relations as instances of problems Let Σ be a finite alphabet. Definitions A problem is a relation R ⊆ Σ ∗ × Σ ∗ . If ( x , y ) ∈ R , then x is an input and y is a solution . We restrict to p -relations R where for every ( x , y ) ∈ Σ ∗ × Σ ∗ : 1. if ( x , y ) ∈ R , then y is of polynomial size with respect to x . 2. ( x , y ) ∈ R can be verified in polynomial time.

  8. Three main problems associated to a p -relation Given an input x we denote by W R ( x ) the set of solutions or witnesses : W R ( x ) = { y ∈ Σ ∗ ∣ ( x , y ) ∈ R } Problem: Enum ( R ) A word x ∈ Σ ∗ Input: Output: Enumerate all y ∈ W R ( x ) without repetitions Problem: Count ( R ) Input: A word x ∈ Σ ∗ Output: The size ∣ W R ( x )∣ Problem: Gen ( R ) A word x ∈ Σ ∗ Input: Output: Generate uniformly at random a word in W R ( x ) .

  9. A log-space complexity class: RelationNL Non-deterministic NL transducer M q 3 ⋱ q 2 q n q 1 q 0 Read/Write ⊢ W O R K log-space Read only ⊢ I N P U T T A P E Write only ⊢ O U T P U T T A P E

  10. A log-space complexity class: RelationNL Given an NL-transducer M and an input x , we define its set of outputs : M ( x ) = { y ∈ Σ ∗ ∣ there exists a run of M on x that halts in an accepting state with y in the output } Definition of RelationNL A relation R is in RelationNL iff there exists an NL-transducer M s.t.: R = {( x , y ) ∈ Σ ∗ × Σ ∗ ∣ y ∈ M ( x )}

  11. Main results for RelationNL Theorem If R ∈ RelationNL then: 1. Enum ( R ) can be solved with polinomial delay . 2. Count ( R ) admits an FPRAS (fully polynomial-time randomized approximation scheme). 3. Gen ( R ) admits a polynomial time “Las Vegas” uniform generator . We introduce a subclass RelationUL that has good properties w.r.t. constant delay enumeration, exact counting, and uniform gen.

  12. Outline The class RelationNL FPRAS for RelationNL Conclusions

  13. A complete problem for RelationNL a b a b 000000000 ⋯ 00 a r s t �ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ�ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ� n b How many words of length n are accepted by a non-deterministic finite state automaton (NFA)?

  14. A complete problem for RelationNL a b a b 000000000 ⋯ 00 a r s t �ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ�ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ� n b Problem: # NFA A NFA A = ( Q , Σ , ∆ , q 0 , F ) and 0 n . Input: Output: ∣{ w ∣ w ∈ L(A) and ∣ w ∣ = n }∣ . Proposition For every R ∈ RelationNL , there exists a parsimonious reduction from Count ( R ) to # NFA . If we find an FPRAS for # NFA , we have an FPRAS for every R ∈ RelationNL .

  15. Main ideas of FPRAS: Unfold the NFA until level n n -levels a a a a a ⋯ a r r 0 r 1 r 2 r 3 r n a a a a a a b b b b b s 0 s 1 s 2 s 3 ⋯ s n s b b b b b b b b b b b b b ⋯ a t 0 t 1 t 2 t 3 t n t a a a a a

  16. Main ideas of FPRAS: Unfold the NFA until level n n -levels a a a a a ⋯ a r r 0 r 1 r 2 r 3 r n a a a a a a b b b b b s 0 s 1 s 2 s 3 ⋯ s n s b b b b b b b b b b b b b ⋯ a t 0 t 1 t 2 t 3 t n t a a a a a

  17. Main ideas of FPRAS: Unfold the NFA until level n n -levels a a a a ⋯ a r r 0 r 1 r 2 r 3 r n a a a a a b b b s 1 s 2 s 3 ⋯ s b b b b b b b b b ⋯ a t 2 t 3 t n t a a a The problem is reduced to approximate the number of label-paths from the initial state to the final states.

  18. Main ideas of FPRAS: languages at level k Level- k r k ⋯ . . . ⋯ s k ⋯ ⋯ ⋯ t k Let Q k be the set of states at level k . For each P ⊆ Q k : L( P ) = all words that reach any state in P from the initial state. We want to approximate the size ∣L( P )∣ for any P ⊆ Q k . . . . we want to approximate ∣L( F )∣ where F ⊆ Q n .

  19. Main ideas of FPRAS: a sketch for each level Level- k For every q ∈ Q k r k ⋯ . . . N ( q ) ∶ N ( q ) ∼ ∣ L ( q )∣ an ( 1 ± ǫ ) -approximation. ⋯ s k ⋯ S ( q ) ∶ S ( q ) ⊆ L ( q ) uniform sample of poly-size. ⋯ ⋯ t k For every P ⊆ Q k and for any total order < of P : ∣ L ( q )∣ ⋅ ∣ L ( q ) / L ({ p ∈ P ∣ p < q })∣ ∣ L ( P )∣ = ∑ ∣ L ( q )∣ q ∈ P N ( q ) ⋅ ∣ S ( q ) / L ({ p ∈ P ∣ p < q })∣ ∼ ∑ ∣ S ( q )∣ q ∈ P This approximation can be computed in poly-time from N ( q ) and S ( q )

  20. Main ideas of FPRAS: a sketch for each level Level- k For every q ∈ Q k ⋯ r k . . . N ( q ) ∶ N ( q ) ∼ ∣L( q )∣ an ( 1 ± ǫ ) -approximation. ⋯ ⋯ s k S ( q ) ∶ S ( q ) ⊆ L( q ) uniform sample of poly-size. ⋯ ⋯ t k For every P ⊆ Q k and for any total order < of P : N ( q ) ⋅ ∣ S ( q ) / L({ p ∈ P ∣ p < q })∣ ∣L( P )∣ ∼ N ( P ) = ∑ ∣ S ( q )∣ q ∈ P For every P ⊆ Q k and q ∈ Q k − P (by Hoeffding’s inequality): ∣ ∣ S ( q ) / L( P )∣ − ∣L( q ) / L( P )∣ ∣ ≤ ǫ ∣ S ( q )∣ ∣L( q )∣ with (exponentially) high prob.

  21. Main ideas of FPRAS: update the sketch to the next level Level- k + 1 Level- k For every q ∈ Q k ⋯ a r k r k + 1 . . . N ( q ) ∶ N ( q ) ∼ ∣L( q )∣ a an ( 1 ± ǫ ) -approximation. ⋯ ⋯ b s k s k + 1 S ( q ) ∶ S ( q ) ⊆ L( q ) b b uniform sample of poly-size. ⋯ ⋯ t k t k + 1 a For every q ∈ Q k + 1 let P c = { p ∈ Q k ∣ ( p , c , q ) ∈ ∆ } for c ∈ { a , b } : N ( q ) = N ( P a ) + N ( P b ) To generate S ( q ) we use a technique from Jerrum, Valiant, and Vazirani for generating a uniform sample by using the ( 1 ± ǫ ) -approximations: { N ( P )} P ⊆ Q k ′ for every k ′ ≤ k .

  22. Outline The class RelationNL FPRAS for RelationNL Conclusions

  23. Conclusions and future work 1. We provide complexity classes that has good properties in terms of enumeration , counting , and uniform generation . 2. RelationNL is the first complexity class with a simple definition based on TM and where each problem admits an FPRAS . Future work: 1. Find an FPRAS for #NFA that can be used in practice with better polynomial factors and constants. 2. Find an FPRAS for #CFG. Thanks!

Recommend


More recommend