probabilistic query
play

Probabilistic Query Evaluation on Bounded- Treewidth Instances - PowerPoint PPT Presentation

Probabilistic Query Evaluation on Bounded- Treewidth Instances SIGMOD/PODS PH.D. SYMPOSIUM JUNE 26, 2016, SAN FRANCISCO Mikal Monet Supervised by Pierre Senellart Context 2 Boolean queries (yes/no) on relational instances Context 2


  1. Probabilistic Query Evaluation on Bounded- Treewidth Instances SIGMOD/PODS PH.D. SYMPOSIUM JUNE 26, 2016, SAN FRANCISCO Mikaël Monet Supervised by Pierre Senellart

  2. Context 2  Boolean queries (yes/no) on relational instances

  3. Context 2  Boolean queries (yes/no) on relational instances  We want the answer to contain more information than just « yes/no »:  Add uncertainty  Obtain provenance information

  4. Context 2  Boolean queries (yes/no) on relational instances  We want the answer to contain more information than just « yes/no »:  Add uncertainty  Obtain provenance information  We need restrictions for all of this to be tractable

  5. 3 A probabilistic database R S a d d e f c f e a e d a c e b e Q c e f

  6. 3 A probabilistic database R S R S a d d e a d 0.2 d e 0.005 f c f c 0.9 f e f e 0.7 a e a e 0.7 d a d a 0.13 c e c e 0.23 b e b e 0.81 Q Q c e f c e f 0.66

  7. 3 A probabilistic database R S R S a d d e a d 0.2 d e 0.005 f c f c 0.9 f e f e 0.7 a e a e 0.7 d a d a 0.13 c e c e 0.23 b e b e 0.81 Q Q c e f c e f 0.66 TID model

  8. 4 Probability of a possible world R S R S a d 0.2 d e 0.005 f c 0.9 f e 0.7 f e A possible a e 0.7 d a 0.13 d a c e 0.23 c e world I b e 0.81 Q Q c e f 0.66 c e f

  9. 4 Probability of a possible world R S R S a d 0.2 d e 0.005 f c 0.9 f e 0.7 f e A possible a e 0.7 d a 0.13 d a c e 0.23 c e world I b e 0.81 Q Q c e f 0.66 c e f Probability Pr( I ) of this possible world = 0.7*0.13*0.23*0.66

  10. 4 Probability of a possible world R S R S a d 0.2 d e 0.005 f c 0.9 f e 0.7 f e A possible a e 0.7 d a 0.13 d a c e 0.23 c e world I b e 0.81 Q Q c e f 0.66 c e f Probability Pr( I ) of this possible world = 0.7*0.13*0.23*0.66 *(1-0.2)*(1-0.81)*(1-0.005)*(1-0.9)*(1-0.7)

  11. Probabilistic query 5 evaluation (PQE)  Focus on Boolean queries (yes/no)

  12. Probabilistic query 5 evaluation (PQE)  Focus on Boolean queries (yes/no)  Probability of a query Q on probabilistic instance 𝖀 : P( Q ) = 𝐽 ⊆ 𝖀 , 𝐽 ⊨ Q Pr(𝐽)

  13. Probabilistic query 5 evaluation (PQE)  Focus on Boolean queries (yes/no)  Probability of a query Q on probabilistic instance 𝖀 : P( Q ) = 𝐽 ⊆ 𝖀 , 𝐽 ⊨ Q Pr(𝐽)  Problem: in general #P-hard

  14. 6 3 possible directions  Approximate  Restrict queries  Restrict instances

  15. 1) Approximate probability 7 computation

  16. 1) Approximate probability 7 computation  Monte-Carlo sampling

  17. 1) Approximate probability 7 computation  Monte-Carlo sampling  Inconvenient: running time quadratic in desired precision

  18. 1) Approximate probability 7 computation  Monte-Carlo sampling  Inconvenient: running time quadratic in desired precision ⇒ Not adequate for low probabilities.

  19. 2) Restricting the class of 8 queries  [Dalvi and Suciu 2012] shows the following dichotomy for any UCQ Q :

  20. 2) Restricting the class of 8 queries  [Dalvi and Suciu 2012] shows the following dichotomy for any UCQ Q :  Either PQE is PTIME on all intances

  21. 2) Restricting the class of 8 queries  [Dalvi and Suciu 2012] shows the following dichotomy for any UCQ Q :  Either PQE is PTIME on all intances  Or PQE is #P-hard on all instances

  22. 2) Restricting the class of 8 queries  [Dalvi and Suciu 2012] shows the following dichotomy for any UCQ Q :  Either PQE is PTIME on all intances  Or PQE is #P-hard on all instances  Simple conjunctive query ∃ x,y R(x),S(x,y),T(y) is already #P-hard!

  23. 2) Restricting the class of 8 queries  [Dalvi and Suciu 2012] shows the following dichotomy for any UCQ Q :  Either PQE is PTIME on all intances  Or PQE is #P-hard on all instances  Simple conjunctive query ∃ x,y R(x),S(x,y),T(y) is already #P-hard!  Criterion is too crisp

  24. 3) Restricting the shape of 9 the instances  Bound the treewidth of instances by a constant  Treewidth: measure used to tell how far a graph is from being a tree

  25. Treewidth 10 R S a d d e f c f e a e d a c e b e Q c e f

  26. Treewidth 10 R S a d d e f c f e a e d a c e b e Q c e f

  27. Treewidth 10 R S a d d e f c f e a e d a c e b e Q c e f

  28. Treewidth 10 R S a d d e f c f e a e d a c e b e Q c e f

  29. Treewidth 10 R S a d d e f c f e a e d a c e b e Q c e f

  30. Treewidth 10 R S a d d e f c f e a e d a c e b e Q c e f Divide and conquer !

  31. 11 O(EXP(k ).|I|) O(|A|.|T| ) O(f(q,k ))

  32. 11 Instance I O(EXP(k ).|I|) of treewidth k O(|A|.|T| ) O(f(q,k ))

  33. 11 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) O(f(q,k ))

  34. 11 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, O(f(q,k )) int k

  35. 11 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A

  36. 11 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A Provenance circuit C

  37. 11 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A Provenance circuit C Probability

  38. 11 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A MSO Provenance circuit C Probability

  39. 11 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A MSO Provenance ? circuit C Probability

  40. 11 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) ? Query q, Automaton O(f(q,k )) int k A MSO Provenance ? circuit C Probability

  41. Provenance circuit C of 12 query Q on instance I

  42. Provenance circuit C of 12 query Q on instance I  Boolean circuit (AND, OR, NOT gates)

  43. Provenance circuit C of 12 query Q on instance I  Boolean circuit (AND, OR, NOT gates)  Inputs = the facts of I

  44. Provenance circuit C of 12 query Q on instance I  Boolean circuit (AND, OR, NOT gates)  Inputs = the facts of I  For every ν : I → {true, false} ν(I) ⊨ Q iff ν(C) = 1

  45. Tree automata 13  A bottom-up deterministic tree automaton on {a, b}-trees is a tuple A = (Q, F, 𝛋 , 𝛆 ) where :  Q : finite set of states  F ⊆ Q : accepting states  𝛋 : {a, b} → Q , determining state for the leaves  𝛆 : {a, b} X Q² → Q , determining the state for internal nodes

  46. Run of an automaton on a tree 14  Q = {O, O, O}

  47. Run of an automaton on a tree 14  Q = {O, O, O}  F = {O}

  48. Run of an automaton on a tree 14  Q = {O, O, O}  F = {O}  𝛋 = { (a, O), (b, O)}

  49. Run of an automaton on a tree 14  Q = {O, O, O}  F = {O}  𝛋 = { (a, O), (b, O)} lab q1 q2 out a O O O a O ? O a ? O O a O ? O 𝛆 = a ? O O b O O O b O O O b O O O b O O O b O ? O b ? O O

  50. Run of an automaton on a tree 14  Q = {O, O, O}  F = {O}  𝛋 = { (a, O), (b, O)} lab q1 q2 out a O O O a O ? O a ? O O a O ? O 𝛆 = a ? O O b O O O b O O O b O O O b O O O b O ? O b ? O O

  51. Initialization of the leaves 14  Q = {O, O, O}  F = {O}  𝛋 = { (a, O), (b, O)} lab q1 q2 out a O O O a O ? O a ? O O a O ? O 𝛆 = a ? O O b O O O b O O O b O O O b O O O b O ? O b ? O O

  52. Initialization of the leaves 14  Q = {O, O, O}  F = {O}  𝛋 = { (a, O), (b, O)} lab q1 q2 out a O O O a O ? O a ? O O a O ? O 𝛆 = a ? O O b O O O b O O O b O O O b O O O b O ? O b ? O O

  53. Internal nodes 14  Q = {O, O, O}  F = {O}  𝛋 = { (a, O), (b, O)} lab q1 q2 out a O O O a O ? O a ? O O a O ? O 𝛆 = a ? O O b O O O b O O O b O O O b O O O b O ? O b ? O O

  54. Internal nodes 14  Q = {O, O, O}  F = {O}  𝛋 = { (a, O), (b, O)} lab q1 q2 out a O O O a O ? O a ? O O a O ? O 𝛆 = a ? O O b O O O b O O O b O O O b O O O b O ? O b ? O O

  55. And so on… 14  Q = {O, O, O}  F = {O}  𝛋 = { (a, O), (b, O)} lab q1 q2 out a O O O a O ? O a ? O O a O ? O 𝛆 = a ? O O b O O O b O O O b O O O b O O O b O ? O b ? O O

  56. This tree is in the language of A 14  Q = {O, O, O}  F = {O}  𝛋 = { (a, O), (b, O)} lab q1 q2 out a O O O a O ? O a ? O O a O ? O 𝛆 = a ? O O b O O O b O O O b O O O b O O O b O ? O b ? O O

  57. Major drawbacks 15  In general, computing the automaton has non- elementary complexity in the query

  58. Major drawbacks 15  In general, computing the automaton has non- elementary complexity in the query  Exponential dependence in the instance treewidth

Recommend


More recommend