Query answering is the most fundamental problem in DB Query Q Result - PowerPoint PPT Presentation

Query answering is the most fundamental problem in DB Query Q Result Q ( D ) Database D SELECT R.A, S.C R A B FROM R, S Q(D) A C u x WHERE R.B = S.B u y v x u z S B C v y x y v z x z

Three crucial problems for query answering Q(D) A C R A B SELECT R.A, S.C u x u y FROM R, S WHERE R.B = S.B v x u z S B C v y x y x z v z 1. Enumeration ( u , y ) , ( u , z ) , ( v , y ) , ( v , z ) 2. Uniform generation ( u , y ) ∶ 1 4 , ( u , z ) ∶ 1 4 , ( v , y ) ∶ 1 4 , ( v , z ) ∶ 1 4 3. Counting ∣ Q ( D )∣ = 4

In this paper, we study log-space complexity classes We consider the class RelationNL and show that it has good algorithmic properties in terms of: Enumeration. Approximate counting. Approximate uniform generation. We consider the subclass RelationUL and show that it has better algorithmic properties in terms of: Constant delay enumeration (polynomial time preprocessing). Exact counting. Exact uniform generation. We show applications of these results in information extraction, graph databases, and among others.

Efficient log-space classes for enumeration, counting, and uniform generation Marcelo Arenas Luis Alberto Croqueville Cristian Riveros Rajesh Jayaram PUC & IMFD Chile Carnegie Mellon University

Outline The class RelationNL FPRAS for RelationNL Conclusions

Relations as instances of problems Let Σ be a finite alphabet. Definitions A problem is a relation R ⊆ Σ ∗ × Σ ∗ . If ( x , y ) ∈ R , then x is an input and y is a solution . We restrict to p -relations R where for every ( x , y ) ∈ Σ ∗ × Σ ∗ : 1. if ( x , y ) ∈ R , then y is of polynomial size with respect to x . 2. ( x , y ) ∈ R can be verified in polynomial time.

Three main problems associated to a p -relation Given an input x we denote by W R ( x ) the set of solutions or witnesses : W R ( x ) = { y ∈ Σ ∗ ∣ ( x , y ) ∈ R } Problem: Enum ( R ) A word x ∈ Σ ∗ Input: Output: Enumerate all y ∈ W R ( x ) without repetitions Problem: Count ( R ) Input: A word x ∈ Σ ∗ Output: The size ∣ W R ( x )∣ Problem: Gen ( R ) A word x ∈ Σ ∗ Input: Output: Generate uniformly at random a word in W R ( x ) .

A log-space complexity class: RelationNL Non-deterministic NL transducer M q 3 ⋱ q 2 q n q 1 q 0 Read/Write ⊢ W O R K log-space Read only ⊢ I N P U T T A P E Write only ⊢ O U T P U T T A P E

A log-space complexity class: RelationNL Given an NL-transducer M and an input x , we define its set of outputs : M ( x ) = { y ∈ Σ ∗ ∣ there exists a run of M on x that halts in an accepting state with y in the output } Definition of RelationNL A relation R is in RelationNL iff there exists an NL-transducer M s.t.: R = {( x , y ) ∈ Σ ∗ × Σ ∗ ∣ y ∈ M ( x )}

Main results for RelationNL Theorem If R ∈ RelationNL then: 1. Enum ( R ) can be solved with polinomial delay . 2. Count ( R ) admits an FPRAS (fully polynomial-time randomized approximation scheme). 3. Gen ( R ) admits a polynomial time “Las Vegas” uniform generator . We introduce a subclass RelationUL that has good properties w.r.t. constant delay enumeration, exact counting, and uniform gen.

A complete problem for RelationNL a b a b 000000000 ⋯ 00 a r s t �ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ�ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ� n b How many words of length n are accepted by a non-deterministic finite state automaton (NFA)?

A complete problem for RelationNL a b a b 000000000 ⋯ 00 a r s t �ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ�ÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜÜ� n b Problem: # NFA A NFA A = ( Q , Σ , ∆ , q 0 , F ) and 0 n . Input: Output: ∣{ w ∣ w ∈ L(A) and ∣ w ∣ = n }∣ . Proposition For every R ∈ RelationNL , there exists a parsimonious reduction from Count ( R ) to # NFA . If we find an FPRAS for # NFA , we have an FPRAS for every R ∈ RelationNL .

Main ideas of FPRAS: Unfold the NFA until level n n -levels a a a a a ⋯ a r r 0 r 1 r 2 r 3 r n a a a a a a b b b b b s 0 s 1 s 2 s 3 ⋯ s n s b b b b b b b b b b b b b ⋯ a t 0 t 1 t 2 t 3 t n t a a a a a

Main ideas of FPRAS: Unfold the NFA until level n n -levels a a a a ⋯ a r r 0 r 1 r 2 r 3 r n a a a a a b b b s 1 s 2 s 3 ⋯ s b b b b b b b b b ⋯ a t 2 t 3 t n t a a a The problem is reduced to approximate the number of label-paths from the initial state to the final states.

Main ideas of FPRAS: languages at level k Level- k r k ⋯ . . . ⋯ s k ⋯ ⋯ ⋯ t k Let Q k be the set of states at level k . For each P ⊆ Q k : L( P ) = all words that reach any state in P from the initial state. We want to approximate the size ∣L( P )∣ for any P ⊆ Q k . . . . we want to approximate ∣L( F )∣ where F ⊆ Q n .

Main ideas of FPRAS: a sketch for each level Level- k For every q ∈ Q k r k ⋯ . . . N ( q ) ∶ N ( q ) ∼ ∣ L ( q )∣ an ( 1 ± ǫ ) -approximation. ⋯ s k ⋯ S ( q ) ∶ S ( q ) ⊆ L ( q ) uniform sample of poly-size. ⋯ ⋯ t k For every P ⊆ Q k and for any total order < of P : ∣ L ( q )∣ ⋅ ∣ L ( q ) / L ({ p ∈ P ∣ p < q })∣ ∣ L ( P )∣ = ∑ ∣ L ( q )∣ q ∈ P N ( q ) ⋅ ∣ S ( q ) / L ({ p ∈ P ∣ p < q })∣ ∼ ∑ ∣ S ( q )∣ q ∈ P This approximation can be computed in poly-time from N ( q ) and S ( q )

Main ideas of FPRAS: a sketch for each level Level- k For every q ∈ Q k ⋯ r k . . . N ( q ) ∶ N ( q ) ∼ ∣L( q )∣ an ( 1 ± ǫ ) -approximation. ⋯ ⋯ s k S ( q ) ∶ S ( q ) ⊆ L( q ) uniform sample of poly-size. ⋯ ⋯ t k For every P ⊆ Q k and for any total order < of P : N ( q ) ⋅ ∣ S ( q ) / L({ p ∈ P ∣ p < q })∣ ∣L( P )∣ ∼ N ( P ) = ∑ ∣ S ( q )∣ q ∈ P For every P ⊆ Q k and q ∈ Q k − P (by Hoeffding’s inequality): ∣ ∣ S ( q ) / L( P )∣ − ∣L( q ) / L( P )∣ ∣ ≤ ǫ ∣ S ( q )∣ ∣L( q )∣ with (exponentially) high prob.

Main ideas of FPRAS: update the sketch to the next level Level- k + 1 Level- k For every q ∈ Q k ⋯ a r k r k + 1 . . . N ( q ) ∶ N ( q ) ∼ ∣L( q )∣ a an ( 1 ± ǫ ) -approximation. ⋯ ⋯ b s k s k + 1 S ( q ) ∶ S ( q ) ⊆ L( q ) b b uniform sample of poly-size. ⋯ ⋯ t k t k + 1 a For every q ∈ Q k + 1 let P c = { p ∈ Q k ∣ ( p , c , q ) ∈ ∆ } for c ∈ { a , b } : N ( q ) = N ( P a ) + N ( P b ) To generate S ( q ) we use a technique from Jerrum, Valiant, and Vazirani for generating a uniform sample by using the ( 1 ± ǫ ) -approximations: { N ( P )} P ⊆ Q k ′ for every k ′ ≤ k .

Conclusions and future work 1. We provide complexity classes that has good properties in terms of enumeration , counting , and uniform generation . 2. RelationNL is the first complexity class with a simple definition based on TM and where each problem admits an FPRAS . Future work: 1. Find an FPRAS for #NFA that can be used in practice with better polynomial factors and constants. 2. Find an FPRAS for #CFG. Thanks!

Query answering is the most fundamental problem in DB Query Q Result - PowerPoint PPT Presentation

Query answering is the most fundamental problem in DB Query Q Result Q ( D ) Database D SELECT R.A, S.C R A B FROM R, S Q(D) A C u x WHERE R.B = S.B u y v x u z S B C v y x y v z x z Three crucial problems for query

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Answering Queries Using Answering Queries Using Materialized view: result set is stored

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

query answering with description logic ontologies Meghyn Bienvenu ( CNRS & Universit de

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Query Answering and Rewriting in Ontology-based Data Access Riccardo Rosati DIAG, Sapienza

The Combined Approach to Query Answering in Horn-ALCHOIQ David Carral, Irina Dragoste, Markus

ontology-mediated query answering Harnessing knowledge to get more from data Meghyn Bienvenu (

Question Answering What is Ques+on Answering? Dan Jurafsky Ques%on

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a

Answering Queries from Statistics and Probabilistic Views Nilesh Dalvi and Dan Suciu, University

A Generic Mapping-based Query Translation A Generic Mapping-based Query Translation from SPARQL

Information Retrieval > Query Us User er Query Words Query Words Search Personalization

Tamari lattice, Intervals, and Enumeration Guillaume Chapuy, LIAFA (Paris) Mireille Bousquet-M

COUNTEREXAMPLE-GUIDED MODEL SYNTHESIS Mathias Preiner , Aina Niemetz and Armin

Exact inference (Ch. 14) Bayesian Network A Bayesian network (Bayes net) is: (1) a directed

ENUMERATION TYPES An enumeration type is a type whose values are defined by a list of named values

Deriving Types I COMP 1002/1402 Derived Types 1 The Type Definition typedef complex types

Structure Structures group multiple (heterogeneous) Structures, Unions, and variables

: Enumeration Type Enumeration Type Week Week 11 11: Monchai Sopitkamol, Ph.D. Monchai

: Enumeration Type Enumeration Type yp yp Week 11 Week 11: Monchai Sopitkamol, Ph.D.

Query answering is the most fundamental problem in DB Query Q Result - PowerPoint PPT Presentation

Query answering is the most fundamental problem in DB Query Q Result Q ( D ) Database D SELECT R.A, S.C R A B FROM R, S Q(D) A C u x WHERE R.B = S.B u y v x u z S B C v y x y v z x z Three crucial problems for query

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

Answering Queries Using Answering Queries Using Materialized view: result set is stored

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

query answering with description logic ontologies Meghyn Bienvenu ( CNRS &amp; Universit de

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Query Answering and Rewriting in Ontology-based Data Access Riccardo Rosati DIAG, Sapienza

The Combined Approach to Query Answering in Horn-ALCHOIQ David Carral, Irina Dragoste, Markus

ontology-mediated query answering Harnessing knowledge to get more from data Meghyn Bienvenu (

Question Answering What is Ques+on Answering? Dan Jurafsky Ques%on

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a

Answering Queries from Statistics and Probabilistic Views Nilesh Dalvi and Dan Suciu, University

A Generic Mapping-based Query Translation A Generic Mapping-based Query Translation from SPARQL

Information Retrieval &gt; Query Us User er Query Words Query Words Search Personalization

Tamari lattice, Intervals, and Enumeration Guillaume Chapuy, LIAFA (Paris) Mireille Bousquet-M

COUNTEREXAMPLE-GUIDED MODEL SYNTHESIS Mathias Preiner , Aina Niemetz and Armin

Exact inference (Ch. 14) Bayesian Network A Bayesian network (Bayes net) is: (1) a directed

ENUMERATION TYPES An enumeration type is a type whose values are defined by a list of named values

Deriving Types I COMP 1002/1402 Derived Types 1 The Type Definition typedef complex types

Structure Structures group multiple (heterogeneous) Structures, Unions, and variables

: Enumeration Type Enumeration Type Week Week 11 11: Monchai Sopitkamol, Ph.D. Monchai

: Enumeration Type Enumeration Type yp yp Week 11 Week 11: Monchai Sopitkamol, Ph.D.

query answering with description logic ontologies Meghyn Bienvenu ( CNRS & Universit de

Information Retrieval > Query Us User er Query Words Query Words Search Personalization