Probabilistic Query Evaluation on Bounded- Treewidth Instances SIGMOD/PODS PH.D. SYMPOSIUM JUNE 26, 2016, SAN FRANCISCO Mikaël Monet Supervised by Pierre Senellart
Context 2 Boolean queries (yes/no) on relational instances
Context 2 Boolean queries (yes/no) on relational instances We want the answer to contain more information than just « yes/no »: Add uncertainty Obtain provenance information
Context 2 Boolean queries (yes/no) on relational instances We want the answer to contain more information than just « yes/no »: Add uncertainty Obtain provenance information We need restrictions for all of this to be tractable
3 A probabilistic database R S a d d e f c f e a e d a c e b e Q c e f
3 A probabilistic database R S R S a d d e a d 0.2 d e 0.005 f c f c 0.9 f e f e 0.7 a e a e 0.7 d a d a 0.13 c e c e 0.23 b e b e 0.81 Q Q c e f c e f 0.66
3 A probabilistic database R S R S a d d e a d 0.2 d e 0.005 f c f c 0.9 f e f e 0.7 a e a e 0.7 d a d a 0.13 c e c e 0.23 b e b e 0.81 Q Q c e f c e f 0.66 TID model
4 Probability of a possible world R S R S a d 0.2 d e 0.005 f c 0.9 f e 0.7 f e A possible a e 0.7 d a 0.13 d a c e 0.23 c e world I b e 0.81 Q Q c e f 0.66 c e f
4 Probability of a possible world R S R S a d 0.2 d e 0.005 f c 0.9 f e 0.7 f e A possible a e 0.7 d a 0.13 d a c e 0.23 c e world I b e 0.81 Q Q c e f 0.66 c e f Probability Pr( I ) of this possible world = 0.7*0.13*0.23*0.66
4 Probability of a possible world R S R S a d 0.2 d e 0.005 f c 0.9 f e 0.7 f e A possible a e 0.7 d a 0.13 d a c e 0.23 c e world I b e 0.81 Q Q c e f 0.66 c e f Probability Pr( I ) of this possible world = 0.7*0.13*0.23*0.66 *(1-0.2)*(1-0.81)*(1-0.005)*(1-0.9)*(1-0.7)
Probabilistic query 5 evaluation (PQE) Focus on Boolean queries (yes/no)
Probabilistic query 5 evaluation (PQE) Focus on Boolean queries (yes/no) Probability of a query Q on probabilistic instance 𝖀 : P( Q ) = 𝐽 ⊆ 𝖀 , 𝐽 ⊨ Q Pr(𝐽)
Probabilistic query 5 evaluation (PQE) Focus on Boolean queries (yes/no) Probability of a query Q on probabilistic instance 𝖀 : P( Q ) = 𝐽 ⊆ 𝖀 , 𝐽 ⊨ Q Pr(𝐽) Problem: in general #P-hard
6 3 possible directions Approximate Restrict queries Restrict instances
1) Approximate probability 7 computation
1) Approximate probability 7 computation Monte-Carlo sampling
1) Approximate probability 7 computation Monte-Carlo sampling Inconvenient: running time quadratic in desired precision
1) Approximate probability 7 computation Monte-Carlo sampling Inconvenient: running time quadratic in desired precision ⇒ Not adequate for low probabilities.
2) Restricting the class of 8 queries [Dalvi and Suciu 2012] shows the following dichotomy for any UCQ Q :
2) Restricting the class of 8 queries [Dalvi and Suciu 2012] shows the following dichotomy for any UCQ Q : Either PQE is PTIME on all intances
2) Restricting the class of 8 queries [Dalvi and Suciu 2012] shows the following dichotomy for any UCQ Q : Either PQE is PTIME on all intances Or PQE is #P-hard on all instances
2) Restricting the class of 8 queries [Dalvi and Suciu 2012] shows the following dichotomy for any UCQ Q : Either PQE is PTIME on all intances Or PQE is #P-hard on all instances Simple conjunctive query ∃ x,y R(x),S(x,y),T(y) is already #P-hard!
2) Restricting the class of 8 queries [Dalvi and Suciu 2012] shows the following dichotomy for any UCQ Q : Either PQE is PTIME on all intances Or PQE is #P-hard on all instances Simple conjunctive query ∃ x,y R(x),S(x,y),T(y) is already #P-hard! Criterion is too crisp
3) Restricting the shape of 9 the instances Bound the treewidth of instances by a constant Treewidth: measure used to tell how far a graph is from being a tree
Treewidth 10 R S a d d e f c f e a e d a c e b e Q c e f
Treewidth 10 R S a d d e f c f e a e d a c e b e Q c e f
Treewidth 10 R S a d d e f c f e a e d a c e b e Q c e f
Treewidth 10 R S a d d e f c f e a e d a c e b e Q c e f
Treewidth 10 R S a d d e f c f e a e d a c e b e Q c e f
Treewidth 10 R S a d d e f c f e a e d a c e b e Q c e f Divide and conquer !
11 O(EXP(k ).|I|) O(|A|.|T| ) O(f(q,k ))
11 Instance I O(EXP(k ).|I|) of treewidth k O(|A|.|T| ) O(f(q,k ))
11 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) O(f(q,k ))
11 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, O(f(q,k )) int k
11 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A
11 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A Provenance circuit C
11 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A Provenance circuit C Probability
11 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A MSO Provenance circuit C Probability
11 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A MSO Provenance ? circuit C Probability
11 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) ? Query q, Automaton O(f(q,k )) int k A MSO Provenance ? circuit C Probability
Provenance circuit C of 12 query Q on instance I
Provenance circuit C of 12 query Q on instance I Boolean circuit (AND, OR, NOT gates)
Provenance circuit C of 12 query Q on instance I Boolean circuit (AND, OR, NOT gates) Inputs = the facts of I
Provenance circuit C of 12 query Q on instance I Boolean circuit (AND, OR, NOT gates) Inputs = the facts of I For every ν : I → {true, false} ν(I) ⊨ Q iff ν(C) = 1
Tree automata 13 A bottom-up deterministic tree automaton on {a, b}-trees is a tuple A = (Q, F, 𝛋 , 𝛆 ) where : Q : finite set of states F ⊆ Q : accepting states 𝛋 : {a, b} → Q , determining state for the leaves 𝛆 : {a, b} X Q² → Q , determining the state for internal nodes
Run of an automaton on a tree 14 Q = {O, O, O}
Run of an automaton on a tree 14 Q = {O, O, O} F = {O}
Run of an automaton on a tree 14 Q = {O, O, O} F = {O} 𝛋 = { (a, O), (b, O)}
Run of an automaton on a tree 14 Q = {O, O, O} F = {O} 𝛋 = { (a, O), (b, O)} lab q1 q2 out a O O O a O ? O a ? O O a O ? O 𝛆 = a ? O O b O O O b O O O b O O O b O O O b O ? O b ? O O
Run of an automaton on a tree 14 Q = {O, O, O} F = {O} 𝛋 = { (a, O), (b, O)} lab q1 q2 out a O O O a O ? O a ? O O a O ? O 𝛆 = a ? O O b O O O b O O O b O O O b O O O b O ? O b ? O O
Initialization of the leaves 14 Q = {O, O, O} F = {O} 𝛋 = { (a, O), (b, O)} lab q1 q2 out a O O O a O ? O a ? O O a O ? O 𝛆 = a ? O O b O O O b O O O b O O O b O O O b O ? O b ? O O
Initialization of the leaves 14 Q = {O, O, O} F = {O} 𝛋 = { (a, O), (b, O)} lab q1 q2 out a O O O a O ? O a ? O O a O ? O 𝛆 = a ? O O b O O O b O O O b O O O b O O O b O ? O b ? O O
Internal nodes 14 Q = {O, O, O} F = {O} 𝛋 = { (a, O), (b, O)} lab q1 q2 out a O O O a O ? O a ? O O a O ? O 𝛆 = a ? O O b O O O b O O O b O O O b O O O b O ? O b ? O O
Internal nodes 14 Q = {O, O, O} F = {O} 𝛋 = { (a, O), (b, O)} lab q1 q2 out a O O O a O ? O a ? O O a O ? O 𝛆 = a ? O O b O O O b O O O b O O O b O O O b O ? O b ? O O
And so on… 14 Q = {O, O, O} F = {O} 𝛋 = { (a, O), (b, O)} lab q1 q2 out a O O O a O ? O a ? O O a O ? O 𝛆 = a ? O O b O O O b O O O b O O O b O O O b O ? O b ? O O
This tree is in the language of A 14 Q = {O, O, O} F = {O} 𝛋 = { (a, O), (b, O)} lab q1 q2 out a O O O a O ? O a ? O O a O ? O 𝛆 = a ? O O b O O O b O O O b O O O b O O O b O ? O b ? O O
Major drawbacks 15 In general, computing the automaton has non- elementary complexity in the query
Major drawbacks 15 In general, computing the automaton has non- elementary complexity in the query Exponential dependence in the instance treewidth
Recommend
More recommend