challenges for efficient query

Challenges for Efficient Query Evaluation on Structured - PowerPoint PPT Presentation

Challenges for Efficient Query Evaluation on Structured Probabilistic Data SUM2016 SEPTEMBER 23, 2016, NICE Antoine Amarilli, Silviu Maniu, Mikal Monet 2 A probabilistic database R S a d d e f c f e a e d a c e b e Q c

  1. Challenges for Efficient Query Evaluation on Structured Probabilistic Data SUM2016 SEPTEMBER 23, 2016, NICE Antoine Amarilli, Silviu Maniu, Mikaël Monet

  2. 2 A probabilistic database R S a d d e f c f e a e d a c e b e Q c e f

  3. 2 A probabilistic database R S R S a d d e a d 0.2 d e 0.005 f c f c 0.9 f e f e 0.7 a e a e 0.7 d a d a 0.13 c e c e 0.23 b e b e 0.81 Q Q c e f c e f 0.66

  4. 2 A probabilistic database R S R S a d d e a d 0.2 d e 0.005 f c f c 0.9 f e f e 0.7 a e a e 0.7 d a d a 0.13 c e c e 0.23 b e b e 0.81 Q Q c e f c e f 0.66 TID model

  5. 3 Probability of a possible world R S R S a d 0.2 d e 0.005 f c 0.9 f e 0.7 f e A possible a e 0.7 d a 0.13 d a c e 0.23 c e World I b e 0.81 Q Q c e f 0.66 c e f

  6. 3 Probability of a possible world R S R S a d 0.2 d e 0.005 f c 0.9 f e 0.7 f e A possible a e 0.7 d a 0.13 d a c e 0.23 c e World I b e 0.81 Q Q c e f 0.66 c e f Probability Pr( I ) of this possible world = 0.7*0.13*0.23*0.66

  7. 3 Probability of a possible world R S R S a d 0.2 d e 0.005 f c 0.9 f e 0.7 f e A possible a e 0.7 d a 0.13 d a c e 0.23 c e World I b e 0.81 Q Q c e f 0.66 c e f Probability Pr( I ) of this possible world = 0.7*0.13*0.23*0.66 *(1-0.2)*(1-0.81)*(1-0.005)*(1-0.9)*(1-0.7).

  8. 4 Probabilistic query evaluation (PQE)

  9. 4 Probabilistic query evaluation (PQE)  Focus on Boolean queries (yes/no)

  10. 4 Probabilistic query evaluation (PQE)  Focus on Boolean queries (yes/no)  Probability of a query Q on probabilistic instance 𝖀 : P( Q ) = 𝐽 ⊆ 𝖀 , Q ⊨𝐽 Pr(𝐽)

  11. 4 Probabilistic query evaluation (PQE)  Focus on Boolean queries (yes/no)  Probability of a query Q on probabilistic instance 𝖀 : P( Q ) = 𝐽 ⊆ 𝖀 , Q ⊨𝐽 Pr(𝐽)  Problem: in general #P-hard

  12. 5 1) Approximate probability computation

  13. 5 1) Approximate probability computation  Monte-Carlo sampling

  14. 5 1) Approximate probability computation  Monte-Carlo sampling  Inconvenient: running time quadratic in desired precision.

  15. 5 1) Approximate probability computation  Monte-Carlo sampling  Inconvenient: running time quadratic in desired precision. ⇒ Not adequate for low probabilities.

  16. 6 2) Restricting the class of queries  [Dalvi and Suciu 2012] show the following dichotomy for any UCQ Q :

  17. 6 2) Restricting the class of queries  [Dalvi and Suciu 2012] show the following dichotomy for any UCQ Q :  Either PQE is #P-hard on all intances

  18. 6 2) Restricting the class of queries  [Dalvi and Suciu 2012] show the following dichotomy for any UCQ Q :  Either PQE is #P-hard on all intances  Either PQE is PTIME on all instances

  19. 6 2) Restricting the class of queries  [Dalvi and Suciu 2012] show the following dichotomy for any UCQ Q :  Either PQE is #P-hard on all intances  Either PQE is PTIME on all instances  Simple conjunctive query ∃ x,y R(x),T(x,y),S(y) is already #P-hard !

  20. 6 2) Restricting the class of queries  [Dalvi and Suciu 2012] show the following dichotomy for any UCQ Q :  Either PQE is #P-hard on all intances  Either PQE is PTIME on all instances  Simple conjunctive query ∃ x,y R(x),T(x,y),S(y) is already #P-hard !  Criterion is to crisp

  21. 7 3) Restricting the shape of the instances  Bound the treewidth of instances by a constant.  Treewidth: mesure used to tell how far a graph is from being a tree

  22. 8 O(EXP(k ).|I|) O(|A|.|T| ) O(f(q,k ))

  23. 8 Instance I O(EXP(k ).|I|) of treewidth k O(|A|.|T| ) O(f(q,k ))

  24. 8 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) O(f(q,k ))

  25. 8 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, O(f(q,k )) int k

  26. 8 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A

  27. 8 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A Provenance circuit C

  28. 8 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A Provenance circuit C Probability

  29. 8 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A MSO Bool[X] Provenance circuit C Probability

  30. Provenance circuits 9 Some query q, fixed Some instance I with facts {f1, f2, f3, f4, f5, f6}

  31. Provenance circuits 9 Some query q, fixed Some instance I with facts {f1, f2, f3, f4, f5, f6}

  32. Provenance circuits 9 Some query q, fixed Some instance I with facts {f1, f2, f3, f4, f5, f6}

  33. Problems 10 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A MSO Provenance circuit C Probability

  34. Problems 10 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A MSO Provenance circuit C Probability

  35. Problems 10 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A MSO Provenance circuit Non-elementary C complexity in general Probability

  36. Problems 10 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A MSO Provenance circuit Non-elementary C complexity in general Probability

  37. Problems 10 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T Are real datasets treelike ? O(|A|.|T| ) Query q, Automaton O(f(q,k )) int k A MSO Provenance circuit Non-elementary C complexity in general Probability

  38. 11 Current work

  39. 11 Current work  From bottom-up tree automtata to alternating two-way automata

  40. 11 Current work  From bottom-up tree automtata to alternating two-way automata  Introduce Intensionally-Clique-Guarded Datalog (ICG- Datalog) parameterized by body-size

  41. 11 Current work  From bottom-up tree automtata to alternating two-way automata  Introduce Intensionally-Clique-Guarded Datalog (ICG- Datalog) parameterized by body-size  Provenance as a cyclic circuit ! (cycluit)

  42. Provenance cycluits 12 Some ICG-Datalog program, some instance I with facts {f1, f2, f3, f4}

  43. Provenance cycluits 12 Some ICG-Datalog program, some instance I with facts {f1, f2, f3, f4}

  44. Provenance cycluits 12 Some ICG-Datalog program, some instance I with facts {f1, f2, f3, f4}

  45. Provenance cycluits 12 Some ICG-Datalog program, some instance I with facts {f1, f2, f3, f4}

  46. Provenance cycluits 12 Some ICG-Datalog program, some instance I with facts {f1, f2, f3, f4} + negations (stratified)

  47. 13 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) ICG 2-way program P O(f(c,k )|P|) Automaton of body A size c, int k Provenance CYCLUIT C Probability

  48. 13 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) ICG 2-way program P O(f(c,k )|P|) Automaton of body A size c, int k Provenance CYCLUIT C Probability

  49. 13 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) ICG 2-way program P O(f(c,k )|P|) Automaton of body A size c, int k Provenance CYCLUIT C 2EXPTIME Upper-bound Probability

  50. Bad news… 14  We proved that: Path queries on tree instances (treewidth = 1) is already #P-hard. (reduction from #MONOTONE-2-SAT)  Still, we obtain a 2EXPTIME combined complexity upperbound

  51. 15 Treelike datasets

  52. 15 Treelike datasets  Transportation networks

  53. 15 Treelike datasets  Transportation networks  Partial decompositions

  54. 15 Treelike datasets  Transportation networks  Partial decompositions  Query-specific decompositions

  55. 16 Instance I Tree O(EXP(k ).|I|) of treewidth decomposition k T O(|A|.|T| ) ICG 2-way program P O(f(c,k )|P|) Automaton of body A size c, int k Provenance CYCLUIT C


More recommend