Detour: Tree Decompositions, Informally S () ← R ( a, b, d ) ∧ c < d ∧ T ( c, b, d ) ∧ U ( b, e ) ∧ V ( c, e ) ∧ b + e = f ∧ W ( b, e, g ) ∧ ∧ X ( i, j, h ) ∧ e − b = k. b, c, e bag a, b, c, d bag b, e, f, g, h bag h, i, j bag e, b, k bag
Detour: Tree Decompositions, Informally S () ← R ( a, b, d ) ∧ c < d ∧ T ( c, b, d ) ∧ U ( b, e ) ∧ V ( c, e ) ∧ b + e = f ∧ W ( b, e, g ) ∧ ∧ X ( i, j, h ) ∧ e − b = k. b, c, e bag a, b, c, d bag b, e, f, g, h bag h, i, j bag e, b, k bag ◮ Every relation is covered by some bag
Detour: Tree Decompositions, Informally S () ← R ( a, b, d ) ∧ c < d ∧ T ( c, b, d ) ∧ U ( b, e ) ∧ V ( c, e ) ∧ b + e = f ∧ W ( b, e, g ) ∧ g/f = h ∧ X ( i, j, h ) ∧ e − b = k. b, c, e bag a, b, c, d bag b, e, f, g, h bag h, i, j bag e, b, k bag ◮ Every relation is covered by some bag
Detour: Tree Decompositions, Informally S () ← R ( a, b, d ) ∧ c < d ∧ T ( c, b, d ) ∧ U ( b, e ) ∧ V ( c, e ) ∧ b + e = f ∧ W ( b, e, g ) ∧ ∧ X ( i, j, h ) ∧ e − b = k. b , c, e bag a, b , c, d bag b , e, f, g, h bag h, i, j bag e, b , k bag ◮ Every relation is covered by some bag ◮ Bags conntaining a given variable are connected
Detour: Tree Decompositions, Formally ◮ Hypergraph H = ([ n ] , E ) ◮ A Tree Decomposition of H is a pair ( T , χ ) where ◮ T = ( V ( T ) , E ( T )) is a tree See [Gottlob et al 2016], Gems of PODS.
Detour: Tree Decompositions, Formally ◮ Hypergraph H = ([ n ] , E ) ◮ A Tree Decomposition of H is a pair ( T , χ ) where ◮ T = ( V ( T ) , E ( T )) is a tree ◮ χ : V ( T ) → 2 [ n ] assigns a bag χ ( v ) to each tree-node v ◮ Every hyperedge F ∈ E is covered by some bag ( F ⊆ χ ( v ) ) ◮ Bags containing ∀ i ∈ [ n ] forms a subtree See [Gottlob et al 2016], Gems of PODS.
Option 2: A Single Tree Decomposition Fix ( T , χ ) Multiple Conjunctive Rules Q i ( D ) Q o Output Database D Q i � � iTime oTime T χ ( v ) R F Answer v ∈ V ( T ) F ∈E
Option 2: A Single Tree Decomposition Fix ( T , χ ) Multiple Conjunctive Rules Q i ( D ) Q o Output Database D Q i � � iTime oTime T χ ( v ) R F Answer v ∈ V ( T ) F ∈E ◮ Q i ( D ) is Olteanu’s factorized database (FDB)
Option 2: A Single Tree Decomposition Fix ( T , χ ) Multiple Conjunctive Rules Q i ( D ) Q o Output Database D Q i � � iTime oTime T χ ( v ) R F Answer v ∈ V ( T ) F ∈E ◮ Q i ( D ) is Olteanu’s factorized database (FDB) ◮ oTime = | Q i ( D ) | + | answer | ◮ Yannakakis for join, FDB/ InsideOut for aggregates
Option 2: A Single Tree Decomposition Fix ( T , χ ) Multiple Conjunctive Rules Q i ( D ) Q o Output Database D Q i � � iTime oTime T χ ( v ) R F Answer v ∈ V ( T ) F ∈E ◮ Q i ( D ) is Olteanu’s factorized database (FDB) ◮ oTime = | Q i ( D ) | + | answer | ◮ Yannakakis for join, FDB/ InsideOut for aggregates ◮ iTime = max = DC | Q i ( D ) | ≤ max = DC max v ∈ V ( T ) | P v ( D ) | D | D | � ◮ P v : T χ ( v ) ← R F v ∈ V ( T ) F ∈E
Option 2: A Single Tree Decomposition Fix ( T , χ ) Multiple Conjunctive Rules Q i ( D ) Q o Output Database D Q i � � iTime oTime T χ ( v ) R F Answer v ∈ V ( T ) F ∈E ◮ Q i ( D ) is Olteanu’s factorized database (FDB) ◮ oTime = | Q i ( D ) | + | answer | ◮ Yannakakis for join, FDB/ InsideOut for aggregates ◮ iTime = max = DC | Q i ( D ) | ≤ max = DC max v ∈ V ( T ) | P v ( D ) | D | D | � ◮ P v : T χ ( v ) ← R F v ∈ V ( T ) F ∈E
Option 2: A Single Tree Decomposition Fix ( T , χ ) Multiple Conjunctive Rules Q i ( D ) Q o Output Database D Q i � � iTime oTime T χ ( v ) R F Answer v ∈ V ( T ) F ∈E ◮ Q i ( D ) is Olteanu’s factorized database (FDB) ◮ oTime = | Q i ( D ) | + | answer | ◮ Yannakakis for join, FDB/ InsideOut for aggregates ◮ iTime = max = DC | Q i ( D ) | ≤ max = DC max v ∈ V ( T ) | P v ( D ) | D | D | � ◮ P v : T χ ( v ) ← R F v ∈ V ( T ) F ∈E v ∈ V ( T ) | P v ( D ) | ≤ N fhtw ( H ) ≤ N ghtw ( H ) ≤ N tw ( H )+1 ( T ,χ ) max min = CC max D |
Option 3: Multiple Tree Decompositions Q i Q i ( D ) Q o Output Database D � � � iTime oTime T χ ( v ) R F Answer F ∈E ( T ,χ ) v ∈ V ( T ) � ranges over non-redundant TDs ( T , χ ) ◮
Option 3: Multiple Tree Decompositions Q i Q i ( D ) Q o Output Database D � � � iTime oTime T χ ( v ) R F Answer F ∈E ( T ,χ ) v ∈ V ( T ) � ranges over non-redundant TDs ( T , χ ) ◮ ◮ oTime = | Q i ( D ) | + | answer | ◮ Union of Yannakakis on all TDs
Option 3: Multiple Tree Decompositions How to evaluate this? Q i Q i ( D ) Q o Output Database D � � � iTime oTime T χ ( v ) R F Answer F ∈E ( T ,χ ) v ∈ V ( T ) � ranges over non-redundant TDs ( T , χ ) ◮ ◮ oTime = | Q i ( D ) | + | answer | ◮ Union of Yannakakis on all TDs ◮ iTime = max = DC | Q i ( D ) | ≤ ? D |
Example: Q : S () ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 A 2 R 12 R 23 A 1 A 3 R 41 R 34 A 4
Example: Q : S () ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 A 2 A 2 A 2 A 1 A 3 A 2 R 12 R 23 A 1 A 3 A 1 A 3 A 1 A 3 R 41 R 34 A 4 A 4 A 4 A 4
Example: Q : S () ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 A 2 A 2 A 2 A 1 A 3 A 2 R 12 R 23 A 1 A 3 A 1 A 3 A 1 A 3 R 41 R 34 A 4 A 4 A 4 A 4 ( T 123 ∧ T 134 ) ∨ ( T 124 ∧ T 234 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41
Example: Q : S () ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 A 2 A 2 A 2 A 1 A 3 A 2 R 12 R 23 A 1 A 3 A 1 A 3 A 1 A 3 R 41 R 34 A 4 A 4 A 4 A 4 ( T 123 ∧ T 134 ) ∨ ( T 124 ∧ T 234 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 By distributivity, rewrite an equivalent the head: ( T 123 ∨ T 124 ) ∧ ( T 123 ∨ T 234 ) ∧ ( T 134 ∨ T 124 ) ∧ ( T 134 ∨ T 234 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 (Each “clause” has one bag per TD)
Option 3: Multiple Tree Decompositions Multiple Disjunctive Datalog Rules! Q i ( D ) Q o Output Database D Q i � � � iTime oTime R F T B Answer F ∈E B B ∈B ◮ oTime = | Q i ( D ) | + | answer | ◮ Union of Yannakakis on all TDs ◮ iTime = max = DC | Q i ( D ) | ≤ max = DC max | P B ( D ) | D | D | B � � ◮ P B : T B ← R F disjunctive datalog rule B ∈B F ∈E
Option 3: Multiple Tree Decompositions Multiple Disjunctive Datalog Rules! Q i ( D ) Q o Output Database D Q i � � � iTime oTime R F T B Answer F ∈E B B ∈B ◮ oTime = | Q i ( D ) | + | answer | ◮ Union of Yannakakis on all TDs ◮ iTime = max = DC | Q i ( D ) | ≤ max = DC max | P B ( D ) | D | D | B � � ◮ P B : T B ← R F disjunctive datalog rule B ∈B F ∈E | P B ( D ) | ≤ N subw ( H ) ≤ N fhtw ( H ) max = CC max D | B subw = submodular width (Daniel Marx, JACM’2013)
Example: Q : S () ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Option 1: T 1234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Q i :
Example: Q : S () ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Option 1: T 1234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Q i : Option 2: either Q i : T 123 ∧ T 134 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 or Q i : T 124 ∧ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41
Example: Q : S () ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Option 1: T 1234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Q i : Option 2: either Q i : T 123 ∧ T 134 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 or Q i : T 124 ∧ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Option 3: Q i : ( T 123 ∧ T 134 ) ∨ ( T 124 ∧ T 234 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Equivalent to: P 123 , 124 : T 123 ∨ T 124 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 123 , 234 : T 123 ∨ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 134 , 124 : T 134 ∨ T 124 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 134 , 234 : T 134 ∨ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41
Example: Q : S () ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 = DC | Q i ( D ) | = N 2 Option 1: iTime = max D | T 1234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Q i : Option 2: either Q i : T 123 ∧ T 134 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 or Q i : T 124 ∧ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Option 3: Q i : ( T 123 ∧ T 134 ) ∨ ( T 124 ∧ T 234 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Equivalent to: P 123 , 124 : T 123 ∨ T 124 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 123 , 234 : T 123 ∨ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 134 , 124 : T 134 ∨ T 124 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 134 , 234 : T 134 ∨ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41
Example: Q : S () ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 = DC | Q i ( D ) | = N 2 Option 1: iTime = max D | T 1234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Q i : = DC | Q i ( D ) | = N 2 Option 2: iTime = max D | either Q i : T 123 ∧ T 134 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 or Q i : T 124 ∧ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Option 3: Q i : ( T 123 ∧ T 134 ) ∨ ( T 124 ∧ T 234 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Equivalent to: P 123 , 124 : T 123 ∨ T 124 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 123 , 234 : T 123 ∨ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 134 , 124 : T 134 ∨ T 124 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 134 , 234 : T 134 ∨ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41
Example: Q : S () ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 = DC | Q i ( D ) | = N 2 Option 1: iTime = max D | T 1234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Q i : = DC | Q i ( D ) | = N 2 Option 2: iTime = max D | either Q i : T 123 ∧ T 134 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 or Q i : T 124 ∧ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 = DC | Q i ( D ) | = N 3 / 2 Option 3: iTime = max D | Q i : ( T 123 ∧ T 134 ) ∨ ( T 124 ∧ T 234 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 Equivalent to: P 123 , 124 : T 123 ∨ T 124 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 123 , 234 : T 123 ∨ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 134 , 124 : T 134 ∨ T 124 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 P 134 , 234 : T 134 ∨ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41
Roadmap Given degree constraints DC , and a disjunctive datalog rule � � P : T B ← R F B ∈B F ∈E
Roadmap Given degree constraints DC , and a disjunctive datalog rule � � P : T B ← R F B ∈B F ∈E Question (Worst-case Output Size Bound) Find a good upper-bound for max = DC | P ( D ) | D |
Roadmap Given degree constraints DC , and a disjunctive datalog rule � � P : T B ← R F B ∈B F ∈E Question (Worst-case Output Size Bound) Find a good upper-bound for max = DC | P ( D ) | D | Question (Algorithm) Design an algorithm evaluating P within the bound.
Roadmap Given degree constraints DC , and a disjunctive datalog rule � � P : T B ← R F B ∈B F ∈E Question (Worst-case Output Size Bound) Find a good upper-bound for max = DC | P ( D ) | D | Question (Algorithm) Design an algorithm evaluating P within the bound. Question (Gathering fruits) Plug bound/algorithm into Meta Algorithm, what do we get?
Table of Contents Connecting the Dots Output Size Bounds and Information Theory Shannon-flow Inequalities and the PANDA Algorithm Wrapping it up Appendix
High-level View of the Bound Given degree constraints DC , a disjunctive datalog rule � � P : T B ← R F B ∈B F ∈E We shall prove bounds of the form max = DC log | P ( D ) | ≤ some function of h D | s.t. h is (approximately) entropic and h satisfies degree constraints
An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12 A 2 R 12 R 23 A 1 A 3 Q ( A 1 , A 2 , A 3 , A 4 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . R 41 R 34 A 4
An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12 A 2 R 12 R 23 A 1 A 3 Q ( A 1 , A 2 , A 3 , A 4 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . R 41 R 34 A 4 A 1 A 2 A 2 A 3 A 3 A 4 A 4 A 1 a 1 1 c c 3 3 b b 1 1 d d 4 4 a b 2 2 c d 5 4 b
An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12 A 2 R 12 R 23 A 1 A 3 Q ( A 1 , A 2 , A 3 , A 4 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . R 41 R 34 A 4 A 1 A 2 A 3 A 4 a 1 d 4 b 1 c 3 b 1 d 4 b 2 c 3 A 1 A 2 A 2 A 3 A 3 A 4 A 4 A 1 a 1 1 c c 3 3 b b 1 1 d d 4 4 a b 2 2 c d 5 4 b
An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12 A 2 R 12 R 23 A 1 A 3 Q ( A 1 , A 2 , A 3 , A 4 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . R 41 R 34 A 4 A 1 A 2 A 3 A 4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 A 1 A 2 A 2 A 3 A 3 A 4 A 4 A 1 a 1 1 c c 3 3 b b 1 1 d d 4 4 a b 2 2 c d 5 4 b
An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12 A 2 R 12 R 23 A 1 A 3 Q ( A 1 , A 2 , A 3 , A 4 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . R 41 R 34 A 4 A 1 A 2 A 3 A 4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 A 1 A 2 A 2 A 3 A 3 A 4 A 4 A 1 a 1 1/4 1 c 1/4 c 3 2/4 3 b 2/4 b 1 2/4 1 d 2/4 d 4 2/4 4 a 1/4 b 2 1/4 2 c 1/4 d 5 0 4 b 1/4
An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12 A 2 R 12 R 23 A 1 A 3 Q ( A 1 , A 2 , A 3 , A 4 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . R 41 R 34 A 4 A 1 A 2 A 3 A 4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 H unif ( A 1 A 2 A 3 A 4 ) = log | Q | A 1 A 2 A 2 A 3 A 3 A 4 A 4 A 1 a 1 1/4 1 c 1/4 c 3 2/4 3 b 2/4 b 1 2/4 1 d 2/4 d 4 2/4 4 a 1/4 b 2 1/4 2 c 1/4 d 5 0 4 b 1/4
An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12 A 2 R 12 R 23 A 1 A 3 Q ( A 1 , A 2 , A 3 , A 4 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . R 41 R 34 A 4 A 1 A 2 A 3 A 4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 H unif ( A 1 A 2 A 3 A 4 ) = log | Q | A 1 A 2 A 2 A 3 A 3 A 4 A 4 A 1 a 1 1/4 1 c 1/4 c 3 2/4 3 b 2/4 b 1 2/4 1 d 2/4 d 4 2/4 4 a 1/4 b 2 1/4 2 c 1/4 d 5 0 4 b 1/4 H unif ( A 1 A 2 ) ≤ log | R 12 | , H unif ( A 2 A 3 ) ≤ log | R 23 | , H unif ( A 3 A 4 ) ≤ log | R 34 | , ...
An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12 A 2 R 12 R 23 A 1 A 3 Q ( A 1 , A 2 , A 3 , A 4 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . R 41 R 34 A 4 A 1 A 2 A 3 A 4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 H unif ( A 1 A 2 A 3 A 4 ) = log | Q | A 1 A 2 A 2 A 3 A 3 A 4 A 4 A 1 a 1 1/4 1 c 1/4 c 3 2/4 3 b 2/4 b 1 2/4 1 d 2/4 d 4 2/4 4 a 1/4 b 2 1/4 2 c 1/4 d 5 0 4 b 1/4 H unif ( A 1 A 2 ) ≤ log | R 12 | , H unif ( A 2 A 3 ) ≤ log | R 23 | , H unif ( A 3 A 4 ) ≤ log | R 34 | , ... � � � � H unif ( A 2 | A 1 = ‘a’ ) ≤ log � σ A 1 = ‘a’ R 12 � , H unif ( A 2 | A 1 = ‘b’ ) ≤ log � σ A 1 = ‘b’ R 12 � , ...
An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12 A 2 R 12 R 23 A 1 A 3 Q ( A 1 , A 2 , A 3 , A 4 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . R 41 R 34 A 4 A 1 A 2 A 3 A 4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 H unif ( A 1 A 2 A 3 A 4 ) = log | Q | A 1 A 2 A 2 A 3 A 3 A 4 A 4 A 1 a 1 1/4 1 c 1/4 c 3 2/4 3 b 2/4 b 1 2/4 1 d 2/4 d 4 2/4 4 a 1/4 b 2 1/4 2 c 1/4 d 5 0 4 b 1/4 H unif ( A 1 A 2 ) ≤ log | R 12 | , H unif ( A 2 A 3 ) ≤ log | R 23 | , H unif ( A 3 A 4 ) ≤ log | R 34 | , ... � � � � H unif ( A 2 | A 1 = ‘a’ ) ≤ log � σ A 1 = ‘a’ R 12 � , H unif ( A 2 | A 1 = ‘b’ ) ≤ log � σ A 1 = ‘b’ R 12 � , ... � � H unif ( A 2 | A 1 ) ≤ log max � σ A 1 = x R 12 � x � �� �
An Idea From Gottlob-Lee-Valiant-Valiant, JACM’12 A 2 R 12 R 23 A 1 A 3 Q ( A 1 , A 2 , A 3 , A 4 ) ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . R 41 R 34 A 4 A 1 A 2 A 3 A 4 a 1 d 4 1/4 b 1 c 3 1/4 b 1 d 4 1/4 b 2 c 3 1/4 H unif ( A 1 A 2 A 3 A 4 ) = log | Q | A 1 A 2 A 2 A 3 A 3 A 4 A 4 A 1 a 1 1/4 1 c 1/4 c 3 2/4 3 b 2/4 b 1 2/4 1 d 2/4 d 4 2/4 4 a 1/4 b 2 1/4 2 c 1/4 d 5 0 4 b 1/4 H unif ( A 1 A 2 ) ≤ log | R 12 | , H unif ( A 2 A 3 ) ≤ log | R 23 | , H unif ( A 3 A 4 ) ≤ log | R 34 | , ... � � � � H unif ( A 2 | A 1 = ‘a’ ) ≤ log � σ A 1 = ‘a’ R 12 � , H unif ( A 2 | A 1 = ‘b’ ) ≤ log � σ A 1 = ‘b’ R 12 � , ... � � H unif ( A 2 | A 1 ) ≤ log max � σ A 1 = x R 12 � x � �� � deg R 12 ( A 2 | A 1 )
Entropic Bound for Full Conjunctive Queries � ◮ Q : T [ n ] ← R F , and degree constraints DC F ∈E
Entropic Bound for Full Conjunctive Queries � ◮ Q : T [ n ] ← R F , and degree constraints DC F ∈E = DC log | Q ( D ) | ≤ sup h ([ n ]) max ◮ D |
Entropic Bound for Full Conjunctive Queries � ◮ Q : T [ n ] ← R F , and degree constraints DC F ∈E = DC log | Q ( D ) | ≤ sup h ([ n ]) max ◮ D | ◮ subject to (whatever H unif satisfies):
Entropic Bound for Full Conjunctive Queries � ◮ Q : T [ n ] ← R F , and degree constraints DC F ∈E = DC log | Q ( D ) | ≤ sup h ([ n ]) max ◮ D | ◮ subject to (whatever H unif satisfies): ◮ h is Entropic
Entropic Bound for Full Conjunctive Queries � ◮ Q : T [ n ] ← R F , and degree constraints DC F ∈E = DC log | Q ( D ) | ≤ sup h ([ n ]) max ◮ D | ◮ subject to (whatever H unif satisfies): ◮ h is Entropic ◮ There is some distribution on A [ n ] such that h ( X ) is the marginal entropy on A X , for all X
Entropic Bound for Full Conjunctive Queries � ◮ Q : T [ n ] ← R F , and degree constraints DC F ∈E = DC log | Q ( D ) | ≤ sup h ([ n ]) max ◮ D | ◮ subject to (whatever H unif satisfies): ◮ h is Entropic ◮ There is some distribution on A [ n ] such that h ( X ) is the marginal entropy on A X , for all X ◮ h satisfies DC
Entropic Bound for Full Conjunctive Queries � ◮ Q : T [ n ] ← R F , and degree constraints DC F ∈E = DC log | Q ( D ) | ≤ sup h ([ n ]) max ◮ D | ◮ subject to (whatever H unif satisfies): ◮ h is Entropic ◮ There is some distribution on A [ n ] such that h ( X ) is the marginal entropy on A X , for all X ◮ h satisfies DC def ◮ h ( Y | X ) = h ( Y ) − h ( X ) ≤ log N Y | X , X ⊂ Y ⊆ F ∈ E
Entropic Bound for Full Conjunctive Queries � ◮ Q : T [ n ] ← R F , and degree constraints DC F ∈E = DC log | Q ( D ) | ≤ sup h ([ n ]) max ◮ D | ◮ subject to (whatever H unif satisfies): ◮ h is Entropic ◮ There is some distribution on A [ n ] such that h ( X ) is the marginal entropy on A X , for all X ◮ h satisfies DC def ◮ h ( Y | X ) = h ( Y ) − h ( X ) ≤ log N Y | X , X ⊂ Y ⊆ F ∈ E ◮ Good Bound, but not computable!
Hierarchy of Set Functions h : 2 [ n ] → R + , non-negative, monotone, h ( ∅ ) = 0 h ( X ) ≤ h ( Y ) if X ⊆ Y SA n := { h | h is sub-additive } h ( X ∪ Y ) ≤ h ( X ) + h ( Y ) Γ n := { h | h is submodular } h ( X ∪ Y ) + h ( X ∩ Y ) ≤ h ( X ) + h ( Y ) ∗ n : topological closure of Γ ∗ Γ n Γ ∗ n = { h : h is entropic } M n : Modular � h ( X ) = h ( x ) x ∈ x
Bounds for Full Conjunctive Query � � ◮ HDC def | h ( Y | X ) ≤ log N Y | X , ∀ ( X, Y, N Y | X ) = h
Bounds for Full Conjunctive Query � � ◮ HDC def | h ( Y | X ) ≤ log N Y | X , ∀ ( X, Y, N Y | X ) = h ◮ Then, = DC log | Q ( D ) | ≤ max max h ([ n ]) entropic bound h ∈ Γ ∗ D | n ∩ HDC ≤ h ∈ Γ n ∩ HDC h ([ n ]) max polymatroid bound ≤ h ∈ SA n ∩ HDC h ([ n ]) max sub-additive bound .
Size Bounds for Full Conjunctive Queries Bound Entropic Bound Polymatroid Bound log | Q | ≤ max n ∩ HDC h ([ n ]) log | Q | ≤ h ∈ Γ n ∩ HDC h ([ n ]) max Definition h ∈ Γ ∗
Size Bounds for Full Conjunctive Queries Bound Entropic Bound Polymatroid Bound log | Q | ≤ max n ∩ HDC h ([ n ]) log | Q | ≤ h ∈ Γ n ∩ HDC h ([ n ]) max Definition h ∈ Γ ∗ AGM bound (Tight) AGM bound (Tight) CC only [Atserias et al. FOCS’08] [Atserias et al. FOCS’08]
Size Bounds for Full Conjunctive Queries Bound Entropic Bound Polymatroid Bound log | Q | ≤ max n ∩ HDC h ([ n ]) log | Q | ≤ h ∈ Γ n ∩ HDC h ([ n ]) max Definition h ∈ Γ ∗ AGM bound (Tight) AGM bound (Tight) CC only [Atserias et al. FOCS’08] [Atserias et al. FOCS’08] Entropic Bound for FD Polymatroid Bound for FD CC + FD only [Gottlob et al. JACM’12] [Gottlob et al. JACM’12] (Tight [Gogacz et al. ICDT’17] ) (Not tight [our work] )
Size Bounds for Full Conjunctive Queries Bound Entropic Bound Polymatroid Bound log | Q | ≤ max n ∩ HDC h ([ n ]) log | Q | ≤ h ∈ Γ n ∩ HDC h ([ n ]) max Definition h ∈ Γ ∗ AGM bound (Tight) AGM bound (Tight) CC only [Atserias et al. FOCS’08] [Atserias et al. FOCS’08] Entropic Bound for FD Polymatroid Bound for FD CC + FD only [Gottlob et al. JACM’12] [Gottlob et al. JACM’12] (Tight [Gogacz et al. ICDT’17] ) (Not tight [our work] ) Entropic Bound for DC Polymatroid Bound for DC DC (Tight [our work] ) (Not tight [our work] )
Disjunctive Datalog: Size Bounds � � | P ( D ) | def P : T B ( A B ) ← R F ( A F ) = min = P max B ∈B | T B | T : T | B ∈B F ∈E Theorem ( our work ) max = DC log | P ( D ) | ≤ max min B ∈B h ( B ) Tight h ∈ Γ ∗ D | n ∩ HDC � �� � Entropic bound ≤ h ∈ Γ n ∩ HDC min max B ∈B h ( B ) Not Tight � �� � Polymatroid bound Imply all known bounds for (Full) Conjunctive Queries!
Earlier Example A 2 � � T B ( A B ) ← P : R F ( A F ) R 12 R 23 B ∈B F ∈E A 1 A 3 R 41 R 34 | P ( D ) | def = min = P max B ∈B | T B | A 4 T : T | | R 12 | ≤ N, | R 23 | ≤ N, | R 34 | ≤ N, | R 41 | ≤ N. CC : P 123 , 234 : T 123 ∨ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 .
Earlier Example A 2 � � T B ( A B ) ← P : R F ( A F ) R 12 R 23 B ∈B F ∈E A 1 A 3 R 41 R 34 | P ( D ) | def = min = P max B ∈B | T B | A 4 T : T | | R 12 | ≤ N, | R 23 | ≤ N, | R 34 | ≤ N, | R 41 | ≤ N. CC : P 123 , 234 : T 123 ∨ T 234 ← R 12 ∧ R 23 ∧ R 34 ∧ R 41 . max = CC log | P 123 , 234 ( D ) | ≤ h ∈ Γ n ∩ CC min { h ( A 1 A 2 A 3 ) , h ( A 2 A 3 A 4 ) } max D | 1 ≤ max 2[ h ( A 1 A 2 A 3 ) , h ( A 2 A 3 A 4 )] h ∈ Γ n ∩ CC 1 ≤ max 2[ h ( A 1 A 2 ) + h ( A 2 A 3 ) + h ( A 3 A 4 )] h ∈ Γ n ∩ CC ≤ 3 2 log N.
Table of Contents Connecting the Dots Output Size Bounds and Information Theory Shannon-flow Inequalities and the PANDA Algorithm Wrapping it up Appendix
Roadmap Given degree constraints DC and a disjunctive datalog rule � � T B ← P : R F B ∈B F ∈E Answer (Worst-case Output Size Bound) = DC log | P ( D ) | ≤ max h ∈ Γ n ∩ HDC min max B ∈B h ( B ) = polymatroid bound. D | Question (Algorithm) Compute a model for P within ˜ O (2 polymatroid bound ) Question (Gathering fruits) Plug bound/algorithm into Meta Algorithm, what do we get?
Connection to Shannon-flow Inequalities Lemma (Linearize it) There exists non-negative λ = ( λ B ) B ∈B , with � λ � 1 = 1 , s.t. � h ∈ Γ n ∩ HDC min max B ∈B h ( B ) = max λ B h ( B ) (1) h ∈ Γ n ∩ HDC B ∈B
Connection to Shannon-flow Inequalities Lemma (Linearize it) There exists non-negative λ = ( λ B ) B ∈B , with � λ � 1 = 1 , s.t. � h ∈ Γ n ∩ HDC min max B ∈B h ( B ) = max λ B h ( B ) (1) h ∈ Γ n ∩ HDC B ∈B Lemma (Shannon-flow inequality) � There exists δ ≥ 0 s.t. 2 polymatroid bound = δ Y | X N Y | X , and ( X,Y,N Y | X ) � � λ B · h ( B ) ≤ δ Y | X · h ( Y | X ) , ∀ h ∈ Γ n (2) B ∈B ( X,Y,N Y | X )
Connection to Shannon-flow Inequalities Lemma (Linearize it) There exists non-negative λ = ( λ B ) B ∈B , with � λ � 1 = 1 , s.t. � h ∈ Γ n ∩ HDC min max B ∈B h ( B ) = max λ B h ( B ) (1) h ∈ Γ n ∩ HDC B ∈B Lemma (Shannon-flow inequality) � There exists δ ≥ 0 s.t. 2 polymatroid bound = δ Y | X N Y | X , and ( X,Y,N Y | X ) � � λ B · h ( B ) ≤ δ Y | X · h ( Y | X ) , ∀ h ∈ Γ n (2) B ∈B ( X,Y,N Y | X ) (2) is a (vast) generalization of Shearer’s lemma
PANDA ( P roof- A ssisted e N tropic D egree- A ware) ◮ What? ◮ Compute a model for our disjunctive datalog rule � 2 polymatroid bound � � δ Y | X ◮ Run within ˜ = ˜ O O N : Y | X ( X,Y,N Y | X )
PANDA ( P roof- A ssisted e N tropic D egree- A ware) ◮ What? ◮ Compute a model for our disjunctive datalog rule � 2 polymatroid bound � � δ Y | X ◮ Run within ˜ = ˜ O O N : Y | X ( X,Y,N Y | X ) ◮ How? Proof as symbolic instructions ◮ Construct a Proof Sequence for the corresponding Shannon-flow inequality ◮ Proof steps → relational operators.
Recommend
More recommend