joins aggregates optimization
play

Joins Aggregates Optimization https://fdbresearch.github.io Dan - PowerPoint PPT Presentation

Joins Aggregates Optimization https://fdbresearch.github.io Dan Olteanu PhD Open School University of Warsaw November 23, 2018 1 / 78 Acknowledgements Some work reported in this course has been done in the context of the FDB


  1. SumProduct ⊂ FAQ Problem (SumProduct) Given a commutative semiring ( D , ⊕ , ⊗ ) , compute the function � � � � · · · ϕ ( x 1 , . . . , x f ) = ψ S ( x S ) x f +1 x f +2 x n S ∈E For ⊕ = + and ⊗ = ∗ , ϕ can be expressed in SQL as: SELECT x 1 , . . . , x f , SUM( R 1 . val ∗ · · · ∗ R n . val) FROM R 1 NATURAL JOIN . . . R n GROUP BY x 1 , . . . , x f ; where each function ψ i over variables X S is encoded as a relation R i over X S and an additional variable val to account for the values of ψ i . This formulation is equivalent to: SumProduct [D99] Marginalize a Product Function [AM00] 10 / 78

  2. Many examples for SumProduct ( { true , false } , ∨ , ∧ ) ◮ Constraint satisfaction problems ◮ Boolean conjunctive query evaluation ◮ SAT ◮ k -colorability ◮ etc. ( U , ∪ , ∩ ) ◮ Conjunctive query evaluation ( R , + , × ) ◮ Permanent ◮ DFT ◮ Inference in probabilistic graphical models ◮ #CSP ◮ Matrix chain multiplication ◮ Aggregates in DB ( R + , max , × ) ◮ MAP queries in probabilistic graphical models 11 / 78

  3. SumProduct Example 1: Boolean Query Evaluation Boolean Conjunctive Queries: Boolean query Φ with set rels (Φ) of relation symbols Each relation symbol R ∈ rels (Φ) has variables vars ( R ) � Φ = ∃ X 1 . . . ∃ X n : R ( vars ( R )) R ∈ rels (Φ) FAQ encoding: � � φ = ψ S ( x S ) , where x S ∈E φ has the hypergraph ( V , E ) with V = � R ∈ rels (Φ) vars ( R ) and E = { vars ( R ) | R ∈ rels (Φ) } For each S ∈ E , there is a factor ψ S such that ψ S ( x S ) = ( x S ∈ R ) 12 / 78

  4. SumProduct Example 2: Matrix Chain Multiplication Compute the product A = A 1 · · · A n of n matrices Each matrix A i is over field F and has dimensions p i × p i +1 FAQ encoding: We use n + 1 variables X 1 , . . . , X n +1 with domains Dom( X i ) = [ p i ] Each matrix A i can be viewed as a function of two variables: ψ i , i +1 : Dom( X i ) × Dom( X i +1 ) → F , where ψ i , i +1 ( x i , x i +1 ) = ( A i ) x i x i +1 The problem is now to compute the FAQ expression � � � φ ( x 1 , x n +1 ) = · · · ψ i , i +1 ( x i , x i +1 ) . x 2 ∈ Dom( X 2 ) x n ∈ Dom( X n ) i ∈ [ n ] 13 / 78

  5. SumProduct Example 3: Queries in Graphical Models Discrete undirected graphical model represented by a hypergraph ( V , E ) V = { X 1 , . . . , X n } consists of n discrete random variables There is a factor ψ S : � i ∈ S Dom( X i ) → R + for each edge S ∈ E FAQ expression to compute the marginal Maximum A Posteriori estimates: � φ ( x 1 , . . . , x f ) = x f +1 ∈ Dom( X f +1 ) · · · max max ψ S ( x S ) x n ∈ Dom( X n ) S ∈E FAQ expression to compute the marginal distribution of variables X 1 , . . . , X f : � � � φ ( x 1 , . . . , x f ) = · · · ψ S ( x S ) x f +1 ∈ Dom( X f +1 ) x n ∈ Dom( X n ) S ∈E For conditional distributions prob ( X A | X B = x B ), we set X B to x B . 14 / 78

  6. Outline of Part 2: Aggregates 15 / 78

  7. How to compute a SumProduct FAQ ϕ Find a variable order for ϕ Compute ϕ by eliminating variables in the given order This is a dynamic programming algorithm. Two flavours: ◮ FDB: Top-down with memoization (caching) [BKOZ13] We exemplify two variants: 1. Compute the factorized join and the aggregates in one pass over the factorization 2. Translate the factorized computation into relational queries ◮ InsideOut: Bottom-up with indicator projections [ANR16] The complexity is given by the width of the variable order: Given a database of size N , an FAQ ϕ , a variable order for ϕ with width w , ϕ can be computed in time O ( N w + | OUT | ), where | OUT | is the output size. 16 / 78

  8. Finding a Variable Order for a SumProduct FAQ ϕ First attempt: Same variable order ∆ as for the join part of ϕ One wrinkle: What if not all variables are free? 17 / 78

  9. Finding a Variable Order for a SumProduct FAQ ϕ First attempt: Same variable order ∆ as for the join part of ϕ One wrinkle: What if not all variables are free? The free variables sit above the bound variables in ∆. [BKOZ13,OZ15] Equivalent constraint for hypertree decompositions: [ANR16] Take a hypertree decomposition for the join part of ϕ such that the bags with the free variables form a connected subtree. Implication on complexity: The width for ϕ is at least the width for its join part ⇒ ϕ may be more expensive than its join part if it has free variables This new width is called the FAQ-width in the literature [ANR16] 17 / 78

  10. Computing COUNT over Factorized Join using FDB ∪ ∅ dish � burger � � hotdog � × × ∪ ∪ ∪ ∪ { dish } { dish } � Monday � � Friday � � patty � � bun � � onion � � bun �� onion � � sausage � � Friday � day item × × × × × × × × × { dish , ∪ ∪ ∪ ∪ ∪ ∪ ∪ day } { item } price � Elise � � Elise � � 6 � � 2 � � 2 � � 4 � � Joe � � Steve � customer ϕ = � ... O ( customer , day , dish ) · D ( dish , item ) · I ( item , price ) In SQL: SELECT COUNT(*) FROM O NATURAL JOIN .. I; We change the semiring to ( N , + , ∗ ): ◮ values �→ 1 ∪ �→ + × �→ ∗ 18 / 78

  11. Computing COUNT over Factorized Join using FDB + 12 ∅ dish 1 1 6 6 ∗ ∗ 2 3 3 2 + + + + { dish } { dish } day item 1 1 1 1 1 1 1 1 1 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ 1 1 1 1 2 { dish , + + + + + + + 1 1 day } { item } customer price 1 1 1 1 1 1 1 1 ϕ = � ... O ( customer , day , dish ) · D ( dish , item ) · I ( item , price ) In SQL: SELECT COUNT(*) FROM O NATURAL JOIN .. I; We change the semiring to ( N , + , ∗ ): ◮ values �→ 1 ∪ �→ + × �→ ∗ 19 / 78

  12. Computing SUM over Factorized Join using FDB ∪ � burger � � hotdog � ∅ dish × × ∪ ∪ ∪ ∪ { dish } { dish } day item � Monday � � Friday � � patty � � bun � � onion � � bun �� onion � � sausage � � Friday � × × × × × × × × × { dish , ∪ ∪ ∪ ∪ ∪ ∪ ∪ day } { item } � Elise � � Elise � � 6 � � 2 � � 2 � � 4 � � Joe � � Steve � customer price ϕ = � ... f (dish) · price · O ( customer , day , dish ) · D ( dish , item ) · I ( item , price ) In SQL: SELECT SUM( f (dish) * price) FROM O NATURAL JOIN .. I; ◮ Assume there is a function f that turns dish into reals or indicator vectors. ◮ All values except for dish & price �→ 1, ∪ �→ +, × �→ ∗ . 20 / 78

  13. Computing SUM over Factorized Join using FDB 20 ∗ f ( � burger � )+16 ∗ f ( � hotdog � ) + f ( � burger � ) f ( � hotdog � ) ∅ dish 20 16 ∗ ∗ 2 10 8 2 + + + + { dish } { dish } day item 1 1 1 1 1 1 1 1 1 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ 1 1 6 4 2 { dish , + + + + + + + 2 2 day } { item } customer price 1 1 6 2 2 4 1 1 ϕ = � ... f (dish) · price · O ( customer , day , dish ) · D ( dish , item ) · I ( item , price ) In SQL: SELECT SUM( f (dish) * price) FROM O NATURAL JOIN .. I; ◮ Assume there is a function f that turns dish into reals or indicator vectors. ◮ All values except for dish & price �→ 1, ∪ �→ +, × �→ ∗ . 21 / 78

  14. Computing SUM over Factorized Join using FDB 20 ∗ f ( � burger � )+16 ∗ f ( � hotdog � ) + ∅ f ( � burger � ) f ( � hotdog � ) dish 20 16 ∗ ∗ 2 10 8 2 + + + + { dish } { dish } day item 1 1 1 1 1 1 1 1 1 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ 1 1 6 4 2 { dish , + + + + + + + 2 2 day } { item } price customer 1 1 6 2 2 4 1 1 If f turns dish into indicator vectors: ϕ ( dish ) = � ... price · O ( customer , day , dish ) · D ( dish , item ) · I ( item , price ) In SQL: SELECT dish, SUM(price) FROM O NATURAL JOIN..I GROUP BY dish; 22 / 78

  15. To Compute or Not To Compute the Factorized Join Aggregates can be computed without materializing the factorized join [OZ15,OS16,ANNOS18a+b] The factorized join becomes the trace of the aggregate computation This is called factorized aggregate computation 23 / 78

  16. Example of Factorized Computation via Query Rewriting The 4-path count query Q 4 on a graph with 4 copies of the edge relation E : � V 1 ( a ) · E 1 ( a , b ) · E 2 ( b , c ) · E 3 ( c , d ) · E 4 ( d , e ) · V 2 ( e ) Q 4 () = � �� � a , b , c , d , e J ( a , b , c , d , e ) E 2 E 3 C C key ( C ) = ∅ key ( B ) = key ( D ) = { C } B D B D key ( A ) = { B } E 1 E 4 V 1 A E V 2 A key ( E ) = { D } E 24 / 78

  17. Example of Factorized Computation via Query Rewriting The 4-path count query Q 4 on a graph with 4 copies of the edge relation E : � V 1 ( a ) · E 1 ( a , b ) · E 2 ( b , c ) · E 3 ( c , d ) · E 4 ( d , e ) · V 2 ( e ) Q 4 () = � �� � a , b , c , d , e J ( a , b , c , d , e ) E 2 E 3 C C key ( C ) = ∅ key ( B ) = key ( D ) = { C } B D B D key ( A ) = { B } E 1 E 4 V 1 A E V 2 A key ( E ) = { D } E Sizes for listing/factorized representations of the result of the join J of Q 4 ρ ∗ ( J ) = 3 ⇒ listing representation has size O ( | E | 3 ). fhtw ( J ) = 1 ⇒ factorization with caching has size O ( | E | ). 24 / 78

  18. Example of Factorized Computation via Query Rewriting We would like to compute Q 4 : in O ( | E | ) time (no free variables, so use best variable order) using optimized queries that are derived from the variable order of Q 4 without materializing the factorized join J 25 / 78

  19. Example of Factorized Computation via Query Rewriting E 2 E 3 C ⇒ B D E 1 E 4 V 1 V 2 A E 26 / 78

  20. Example of Factorized Computation via Query Rewriting E 2 E 3 E 2 E 3 C C ⇒ ⇒ B D U 1 B D E 1 E 4 E 4 V 1 V 2 V 2 A E E � U 1 ( b ) = V 1 ( a ) · E 1 ( b , a ) a 26 / 78

  21. Example of Factorized Computation via Query Rewriting E 2 E 3 E 2 E 3 U 2 C C C E 3 ⇒ ⇒ ⇒ B D U 1 B D D E 1 E 4 E 4 E 4 V 1 V 2 V 2 V 2 A E E E � � U 1 ( b ) = V 1 ( a ) · E 1 ( b , a ) U 2 ( c ) = E 2 ( c , b ) · U 1 ( b ) a b 26 / 78

  22. Example of Factorized Computation via Query Rewriting E 2 E 3 E 2 E 3 U 2 U 2 C C C C E 3 E 3 ⇒ ⇒ ⇒ ⇒ B D U 1 B D D U 3 D E 1 E 4 E 4 E 4 V 1 V 2 V 2 V 2 A E E E � � U 1 ( b ) = V 1 ( a ) · E 1 ( b , a ) U 2 ( c ) = E 2 ( c , b ) · U 1 ( b ) a b � U 3 ( d ) = V 2 ( e ) · E 4 ( d , e ) e 26 / 78

  23. Example of Factorized Computation via Query Rewriting E 2 E 3 E 2 E 3 U 2 U 2 C C C C E 3 E 3 U 2 ⇒ ⇒ ⇒ ⇒ ⇒ B D U 1 B D D U 3 D C U 4 E 1 E 4 E 4 E 4 V 1 V 2 V 2 V 2 A E E E � � U 1 ( b ) = V 1 ( a ) · E 1 ( b , a ) U 2 ( c ) = E 2 ( c , b ) · U 1 ( b ) a b � � U 3 ( d ) = V 2 ( e ) · E 4 ( d , e ) U 4 ( c ) = E 3 ( c , d ) · U 3 ( d ) e d 26 / 78

  24. Example of Factorized Computation via Query Rewriting E 2 E 3 E 2 E 3 U 2 U 2 C C C C E 3 E 3 U 2 U 5 ⇒ ⇒ ⇒ ⇒ ⇒ B D U 1 B D D U 3 D C U 4 E 1 E 4 E 4 E 4 V 1 V 2 V 2 V 2 A E E E � � U 1 ( b ) = V 1 ( a ) · E 1 ( b , a ) U 2 ( c ) = E 2 ( c , b ) · U 1 ( b ) a b � � U 3 ( d ) = V 2 ( e ) · E 4 ( d , e ) U 4 ( c ) = E 3 ( c , d ) · U 3 ( d ) e d � U 5 = U 2 ( c ) · U 4 ( c ) c 26 / 78

  25. Example of Factorized Computation via Query Rewriting E 2 E 3 E 2 E 3 U 2 U 2 C C C C E 3 E 3 U 2 U 5 ⇒ ⇒ ⇒ ⇒ ⇒ B D U 1 B D D U 3 D C U 4 E 1 E 4 E 4 E 4 V 1 V 2 V 2 V 2 A E E E This computation strategy corresponds to the following query rewriting: � V 1 ( a ) · E 1 ( b , a ) · E 2 ( c , b ) · E 3 ( c , d ) · E 4 ( d , e ) · V 2 ( e ) = a , b , c , d , e �� �� �� �� � �� �� E 2 ( c , b ) · V 1 ( a ) · E 1 ( b , a ) · E 3 ( c , d ) · E 4 ( d , e ) · V 2 ( e ) c a e b d 27 / 78

  26. Example of FAQ Computation using InsideOut � ϕ ( x 1 , x 2 , x 4 ) = ψ 1 ( x 1 , x 2 , x 3 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) x 3 , x 5 , x 6 , x 7 , x 8 key ( X 2 ) = ∅ X 2 X 2 key ( X 1 ) = { X 2 } ψ 2 ψ 1 X 1 X 4 X 1 X 4 key ( X 3 ) = { X 1 , X 2 } key ( X 4 ) = { X 2 } X 3 X 5 X 3 X 5 key ( X 5 ) = { X 2 , X 4 } ψ 3 key ( X 6 ) = { X 4 , X 5 } key ( X 8 ) = { X 6 } ψ 5 X 7 X 6 X 7 X 6 ψ 4 key ( X 7 ) = { X 5 } X 8 X 8 28 / 78

  27. Example of FAQ Computation using InsideOut � ϕ ( x 1 , x 2 , x 4 ) = ψ 1 ( x 1 , x 2 , x 3 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) x 3 , x 5 , x 6 , x 7 , x 8 key ( X 2 ) = ∅ X 2 X 2 key ( X 1 ) = { X 2 } ψ 2 ψ 1 X 1 X 4 X 1 X 4 key ( X 3 ) = { X 1 , X 2 } key ( X 4 ) = { X 2 } X 3 X 5 X 3 X 5 key ( X 5 ) = { X 2 , X 4 } ψ 3 key ( X 6 ) = { X 4 , X 5 } key ( X 8 ) = { X 6 } ψ 5 X 7 X 6 X 7 X 6 ψ 4 key ( X 7 ) = { X 5 } X 8 X 8 ρ ∗ ( ϕ ) = 4, s ( ϕ ) = 2, fhtw ( ϕ ) = 1. The above variable order ∆ has the free variables x 1 , x 2 , x 4 on top of the others and fhtw (∆) = 1. The query result has size: O ( N ) when factorized; O ( N 2 ) when listed 28 / 78

  28. Example of FAQ Computation using InsideOut X 2 X 1 X 4 X 3 X 5 X 7 X 6 X 8 � ϕ ( x 1 , x 2 , x 4 ) = ψ 1 ( x 1 , x 2 , x 3 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) x 3 , x 5 , x 6 , x 7 , x 8 29 / 78

  29. Example of FAQ Computation using InsideOut X 2 X 1 X 4 X 3 X 5 X 7 X 6 X 8 � ϕ ( x 1 , x 2 , x 4 ) = ψ 1 ( x 1 , x 2 , x 3 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) x 3 , x 5 , x 6 , x 7 , x 8 � � � � ϕ ( x 1 , x 2 , x 4 ) = ψ 1 ( x 1 , x 2 , x 3 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) x 5 , x 6 , x 7 , x 8 x 3 � �� � ψ 6( x 1 , x 2) 30 / 78

  30. Example of FAQ Computation using InsideOut X 2 X 2 ⇒ X 1 X 4 X 1 X 4 X 3 X 5 X 5 X 7 X 6 X 7 X 6 X 8 X 8 � ϕ ( x 1 , x 2 , x 4 ) = ψ 1 ( x 1 , x 2 , x 3 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) x 3 , x 5 , x 6 , x 7 , x 8 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) O ( N ) x 5 , x 6 , x 7 , x 8 31 / 78

  31. Example of FAQ Computation using InsideOut X 2 X 2 ⇒ X 1 X 4 X 1 X 4 X 3 X 5 X 5 X 7 X 6 X 7 X 6 X 8 X 8 � ϕ ( x 1 , x 2 , x 4 ) = ψ 1 ( x 1 , x 2 , x 3 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) x 3 , x 5 , x 6 , x 7 , x 8 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) O ( N ) x 5 , x 6 , x 7 , x 8 � � � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) x 5 , x 6 , x 7 x 8 � �� � ψ 7( x 6) 32 / 78

  32. Example of FAQ Computation using InsideOut X 2 X 2 X 2 ⇒ ⇒ X 1 X 4 X 1 X 4 X 1 X 4 X 3 X 5 X 5 X 5 X 7 X 6 X 7 X 6 X 7 X 6 X 8 X 8 � ϕ ( x 1 , x 2 , x 4 ) = ψ 1 ( x 1 , x 2 , x 3 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) x 3 , x 5 , x 6 , x 7 , x 8 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) O ( N ) x 5 , x 6 , x 7 , x 8 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 7 ( x 6 ) · ψ 5 ( x 5 , x 7 ) O ( N ) x 5 , x 6 , x 7 33 / 78

  33. Example of FAQ Computation using InsideOut X 2 X 2 X 2 ⇒ ⇒ X 1 X 4 X 1 X 4 X 1 X 4 X 3 X 5 X 5 X 5 X 7 X 6 X 7 X 6 X 7 X 6 X 8 X 8 � ϕ ( x 1 , x 2 , x 4 ) = ψ 1 ( x 1 , x 2 , x 3 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) x 3 , x 5 , x 6 , x 7 , x 8 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) O ( N ) x 5 , x 6 , x 7 , x 8 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 7 ( x 6 ) · ψ 5 ( x 5 , x 7 ) O ( N ) x 5 , x 6 , x 7 � � � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 7 ( x 6 ) · ψ 5 ( x 5 , x 7 ) x 5 , x 6 x 7 � �� � ψ 8( x 5) 34 / 78

  34. Example of FAQ Computation using InsideOut X 2 X 2 X 2 X 2 ⇒ ⇒ ⇒ X 1 X 4 X 1 X 4 X 1 X 4 X 1 X 4 X 3 X 5 X 5 X 5 X 5 X 7 X 6 X 7 X 6 X 7 X 6 X 6 X 8 X 8 � ϕ ( x 1 , x 2 , x 4 ) = ψ 1 ( x 1 , x 2 , x 3 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) x 3 , x 5 , x 6 , x 7 , x 8 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) O ( N ) x 5 , x 6 , x 7 , x 8 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 7 ( x 6 ) · ψ 5 ( x 5 , x 7 ) O ( N ) x 5 , x 6 , x 7 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 7 ( x 6 ) · ψ 8 ( x 5 ) O ( N ) x 5 , x 6 35 / 78

  35. Example of FAQ Computation using InsideOut X 2 X 2 X 2 X 2 ⇒ ⇒ ⇒ X 1 X 4 X 1 X 4 X 1 X 4 X 1 X 4 X 3 X 5 X 5 X 5 X 5 X 7 X 6 X 7 X 6 X 7 X 6 X 6 X 8 X 8 � ϕ ( x 1 , x 2 , x 4 ) = ψ 1 ( x 1 , x 2 , x 3 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) x 3 , x 5 , x 6 , x 7 , x 8 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) O ( N ) x 5 , x 6 , x 7 , x 8 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 7 ( x 6 ) · ψ 5 ( x 5 , x 7 ) O ( N ) x 5 , x 6 , x 7 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 7 ( x 6 ) · ψ 8 ( x 5 ) O ( N ) x 5 , x 6 � � � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 7 ( x 6 ) · ψ 8 ( x 5 ) x 5 x 6 � �� � ψ 9( x 4 , x 5) 36 / 78

  36. Example of FAQ Computation using InsideOut X 2 X 2 X 2 X 2 X 2 ⇒ ⇒ ⇒ ⇒ X 1 X 4 X 1 X 4 X 1 X 4 X 1 X 4 X 1 X 4 X 3 X 5 X 5 X 5 X 5 X 5 X 7 X 6 X 7 X 6 X 7 X 6 X 6 X 8 X 8 � ϕ ( x 1 , x 2 , x 4 ) = ψ 1 ( x 1 , x 2 , x 3 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) x 3 , x 5 , x 6 , x 7 , x 8 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) O ( N ) x 5 , x 6 , x 7 , x 8 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 7 ( x 6 ) · ψ 5 ( x 5 , x 7 ) O ( N ) x 5 , x 6 , x 7 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 7 ( x 6 ) · ψ 8 ( x 5 ) O ( N ) x 5 , x 6 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 9 ( x 4 , x 5 ) · ψ 8 ( x 5 ) O ( N ) x 5 37 / 78

  37. Example of FAQ Computation using InsideOut X 2 X 2 X 2 X 2 X 2 ⇒ ⇒ ⇒ ⇒ X 1 X 4 X 1 X 4 X 1 X 4 X 1 X 4 X 1 X 4 X 3 X 5 X 5 X 5 X 5 X 5 X 7 X 6 X 7 X 6 X 7 X 6 X 6 X 8 X 8 � ϕ ( x 1 , x 2 , x 4 ) = ψ 1 ( x 1 , x 2 , x 3 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) x 3 , x 5 , x 6 , x 7 , x 8 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) O ( N ) x 5 , x 6 , x 7 , x 8 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 7 ( x 6 ) · ψ 5 ( x 5 , x 7 ) O ( N ) x 5 , x 6 , x 7 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 7 ( x 6 ) · ψ 8 ( x 5 ) O ( N ) x 5 , x 6 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 9 ( x 4 , x 5 ) · ψ 8 ( x 5 ) O ( N ) x 5 � � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 9 ( x 4 , x 5 ) · ψ 8 ( x 5 ) x 5 � �� � ψ 10( x 2 , x 4) 38 / 78

  38. Example of FAQ Computation using InsideOut X 2 X 2 X 2 X 2 X 2 X 2 ⇒ ⇒ ⇒ ⇒ ⇒ X 1 X 4 X 1 X 4 X 1 X 4 X 1 X 4 X 1 X 4 X 1 X 4 X 3 X 5 X 5 X 5 X 5 X 5 X 7 X 6 X 7 X 6 X 7 X 6 X 6 X 8 X 8 � ϕ ( x 1 , x 2 , x 4 ) = ψ 1 ( x 1 , x 2 , x 3 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) x 3 , x 5 , x 6 , x 7 , x 8 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 4 ( x 6 , x 8 ) · ψ 5 ( x 5 , x 7 ) O ( N ) x 5 , x 6 , x 7 , x 8 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 7 ( x 6 ) · ψ 5 ( x 5 , x 7 ) O ( N ) x 5 , x 6 , x 7 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 3 ( x 4 , x 5 , x 6 ) · ψ 7 ( x 6 ) · ψ 8 ( x 5 ) O ( N ) x 5 , x 6 � � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 4 , x 5 ) · ψ 9 ( x 4 , x 5 ) · ψ 8 ( x 5 ) O ( N ) x 5 � ϕ ( x 1 , x 2 , x 4 ) = ψ 6 ( x 1 , x 2 ) · ψ 10 ( x 2 , x 4 ) O ( N ) 39 / 78

  39. Example of FAQ Computation with Indicator Projections � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 3 ) · ψ 3 ( x 3 , x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 3 , x 4 , x 5 X 1 X 1 key ( X 1 ) = ∅ ψ 1 ψ 4 key ( X 2 ) = { X 1 } key ( X 3 ) = { X 1 , X 2 } X 2 X 4 X 2 X 4 key ( X 4 ) = { X 1 } ψ 2 ψ 3 ψ 6 ψ 5 key ( X 5 ) = { X 1 , X 4 } X 3 X 5 X 3 X 5 40 / 78

  40. Example of FAQ Computation with Indicator Projections � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 3 ) · ψ 3 ( x 3 , x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 3 , x 4 , x 5 X 1 X 1 key ( X 1 ) = ∅ ψ 1 ψ 4 key ( X 2 ) = { X 1 } key ( X 3 ) = { X 1 , X 2 } X 2 X 4 X 2 X 4 key ( X 4 ) = { X 1 } ψ 2 ψ 3 ψ 6 ψ 5 key ( X 5 ) = { X 1 , X 4 } X 3 X 5 X 3 X 5 ρ ∗ ( ϕ ) = 2 . 5, s ( ϕ ) = 1 . 5, fhtw ( ϕ ) = 1 . 5. The above variable order ∆ has the free variable x 1 on top of the others and fhtw (∆) = 1 . 5. The (unary) query result has size O ( N ) when factorized or listed. 40 / 78

  41. Example of FAQ Computation with Indicator Projections X 1 X 2 X 4 X 3 X 5 � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 3 ) · ψ 3 ( x 3 , x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 3 , x 4 , x 5 41 / 78

  42. Example of FAQ Computation with Indicator Projections X 1 X 2 X 4 X 3 X 5 � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 3 ) · ψ 3 ( x 3 , x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 3 , x 4 , x 5 � � � � ψ ′ 1 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 3 ) · ψ 3 ( x 3 , x 1 ) · ψ ′ 4 ( x 1 ) · ψ ′ ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · 6 ( x 1 ) · x 2 , x 4 , x 5 x 3 � �� � ψ 7( x 1 , x 2) ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) ψ ′ 1 is an indicator projection of ψ 1 (similarly, ψ ′ 4 and ψ ′ 6 ): It has the same support as ψ 1 , i.e., same tuples ( x 1 , x 2 ) ψ ′ 1 ( x 1 , x 2 ) = 1 even in case ψ 1 ( x 1 , x 2 ) � = 1 and ψ 1 ( x 1 , x 2 ) � = 0 42 / 78

  43. Example of FAQ Computation with Indicator Projections X 1 X 1 ⇒ X 2 X 4 X 2 X 4 X 3 X 5 X 5 � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 3 ) · ψ 3 ( x 3 , x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 3 , x 4 , x 5 � O ( N 1 . 5 ) � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 7 ( x 1 , x 2 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 4 , x 5 43 / 78

  44. Example of FAQ Computation with Indicator Projections X 1 X 1 ⇒ X 2 X 4 X 2 X 4 X 3 X 5 X 5 � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 3 ) · ψ 3 ( x 3 , x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 3 , x 4 , x 5 � O ( N 1 . 5 ) � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 7 ( x 1 , x 2 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 4 , x 5 � � � � ψ 1 ( x 1 , x 2 ) · ψ 7 ( x 1 , x 2 ) · ψ ′ 4 ( x 1 ) · ψ ′ ϕ ( x 1 ) = 6 ( x 1 ) · x 4 , x 5 x 2 � �� � ψ 8( x 1) ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) The indicator projections ψ ′ 4 and ψ ′ 6 are redundant here, as they were already used for computing φ 7 . 44 / 78

  45. Example of FAQ Computation with Indicator Projections X 1 X 1 X 1 ⇒ ⇒ X 2 X 4 X 2 X 4 X 4 X 3 X 5 X 5 X 5 � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 3 ) · ψ 3 ( x 3 , x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 3 , x 4 , x 5 � O ( N 1 . 5 ) � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 7 ( x 1 , x 2 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 4 , x 5 � � ϕ ( x 1 ) = ψ 8 ( x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) O ( N ) x 4 , x 5 45 / 78

  46. Example of FAQ Computation with Indicator Projections X 1 X 1 X 1 ⇒ ⇒ X 2 X 4 X 2 X 4 X 4 X 3 X 5 X 5 X 5 � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 3 ) · ψ 3 ( x 3 , x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 3 , x 4 , x 5 � O ( N 1 . 5 ) � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 7 ( x 1 , x 2 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 4 , x 5 � � ϕ ( x 1 ) = ψ 8 ( x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) O ( N ) x 4 , x 5 � � � � ψ ′ 8 ( x 1 ) · ψ ′ ϕ ( x 1 ) = ψ 8 ( x 1 ) · ψ 4 ( x 1 , x 4 ) · 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 4 x 5 � �� � ψ 9( x 1 , x 4) 46 / 78

  47. Example of FAQ Computation with Indicator Projections X 1 X 1 X 1 X 1 ⇒ ⇒ ⇒ X 2 X 4 X 2 X 4 X 4 X 4 X 3 X 5 X 5 X 5 � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 3 ) · ψ 3 ( x 3 , x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 3 , x 4 , x 5 � O ( N 1 . 5 ) � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 7 ( x 1 , x 2 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 4 , x 5 � � ϕ ( x 1 ) = ψ 8 ( x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) O ( N ) x 4 , x 5 � O ( N 1 . 5 ) � ϕ ( x 1 ) = ψ 8 ( x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 9 ( x 1 , x 4 ) x 4 47 / 78

  48. Example of FAQ Computation with Indicator Projections X 1 X 1 X 1 X 1 ⇒ ⇒ ⇒ X 2 X 4 X 2 X 4 X 4 X 4 X 3 X 5 X 5 X 5 � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 3 ) · ψ 3 ( x 3 , x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 3 , x 4 , x 5 � O ( N 1 . 5 ) � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 7 ( x 1 , x 2 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 4 , x 5 � � ϕ ( x 1 ) = ψ 8 ( x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) O ( N ) x 4 , x 5 � O ( N 1 . 5 ) � ϕ ( x 1 ) = ψ 8 ( x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 9 ( x 1 , x 4 ) x 4 � � � ψ ′ ϕ ( x 1 ) = ψ 8 ( x 1 ) · 8 ( x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 9 ( x 1 , x 4 ) x 4 � �� � ψ 10( x 1) 48 / 78

  49. Example of FAQ Computation with Indicator Projections X 1 X 1 X 1 X 1 X 1 ⇒ ⇒ ⇒ ⇒ X 2 X 4 X 2 X 4 X 4 X 4 X 3 X 5 X 5 X 5 � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 2 ( x 2 , x 3 ) · ψ 3 ( x 3 , x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 3 , x 4 , x 5 � O ( N 1 . 5 ) � ϕ ( x 1 ) = ψ 1 ( x 1 , x 2 ) · ψ 7 ( x 1 , x 2 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) x 2 , x 4 , x 5 � � ϕ ( x 1 ) = ψ 8 ( x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 5 ( x 4 , x 5 ) · ψ 6 ( x 5 , x 1 ) O ( N ) x 4 , x 5 � O ( N 1 . 5 ) � ϕ ( x 1 ) = ψ 8 ( x 1 ) · ψ 4 ( x 1 , x 4 ) · ψ 9 ( x 1 , x 4 ) x 4 � ϕ ( x 1 ) = ψ 8 ( x 1 ) · ψ 10 ( x 1 ) O ( N ) 49 / 78

  50. Is Factorized Aggregate Computation Practical? A glimpse at performance experiments [ANNOS18b] Retailer dataset (records) excerpt (17M) full (86M) PostgreSQL computing the join 50.63 sec 216.56 sec Aggregates for a linear regression model FDB computing join+aggregates 25.51 sec 380.31 sec Number of aggregates (scalar+group-by) 595+2,418 595+145k Aggregates for a polynomial regression model FDB computing join+aggregates 132.43 sec 1,819.80 sec Number of aggregates (scalar+group-by) 158k+742k 158k+37M In this experiment: FDB only used one core of a commodity machine For both PostgreSQL and FDB, the dataset was entirely in memory The aggregates represent gradients (or parts thereof) used for learning degree 1 and 2 polynomial regression models 50 / 78

  51. Outline of Part 2: Aggregates 51 / 78

  52. Problem Setting Maintain the triangle count Q under single-tuple updates to R , S , and T ! A R T B C S Q counts the number of tuples in the join of R, S, and T . Q = � a , b , c R ( a , b ) · S ( b , c ) · T ( c , a ) 52 / 78

  53. Updates to the Triangle Count R S T A B B C C A a 1 b 1 2 b 1 c 1 2 c 1 a 1 1 3 1 3 a 2 b 1 b 1 c 2 c 2 a 1 c 2 a 2 3 53 / 78

  54. Updates to the Triangle Count R · S · T R S T A B B C C A A B C 2 · 2 · 1 = 4 a 1 b 1 2 b 1 c 1 2 c 1 a 1 1 a 1 b 1 c 2 3 1 3 a 2 b 1 b 1 c 2 c 2 a 1 c 2 a 2 3 53 / 78

  55. Updates to the Triangle Count R · S · T R S T A B B C C A A B C 2 · 2 · 1 = 4 a 1 b 1 2 b 1 c 1 2 c 1 a 1 1 a 1 b 1 c 2 3 1 3 2 · 1 · 3 = 6 a 2 b 1 b 1 c 2 c 2 a 1 a 1 b 1 c 2 c 2 a 2 3 a 2 b 1 c 3 3 · 1 · 3 = 9 53 / 78

  56. Updates to the Triangle Count R · S · T R S T A B B C C A A B C 2 · 2 · 1 = 4 a 1 b 1 2 b 1 c 1 2 c 1 a 1 1 a 1 b 1 c 2 3 1 3 2 · 1 · 3 = 6 a 2 b 1 b 1 c 2 c 2 a 1 a 1 b 1 c 2 c 2 a 2 3 a 2 b 1 c 3 3 · 1 · 3 = 9 Q ( D ) ∅ ( ) 4 + 6 + 9 = 19 53 / 78

  57. Updates to the Triangle Count R · S · T R S T A B B C C A A B C 2 · 2 · 1 = 4 a 1 b 1 2 b 1 c 1 2 c 1 a 1 1 a 1 b 1 c 2 3 1 3 2 · 1 · 3 = 6 a 2 b 1 b 1 c 2 c 2 a 1 a 1 b 1 c 2 c 2 a 2 3 a 2 b 1 c 3 3 · 1 · 3 = 9 δ R = { ( a 2 , b 1 ) �→ − 2 } Q ( D ) A B ∅ a 2 b 1 − 2 ( ) 4 + 6 + 9 = 19 53 / 78

  58. Updates to the Triangle Count R · S · T R S T A B B C C A A B C 2 · 2 · 1 = 4 a 1 b 1 2 b 1 c 1 2 c 1 a 1 1 a 1 b 1 c 2 3 1 3 2 · 1 · 3 = 6 a 2 b 1 b 1 c 2 c 2 a 1 a 1 b 1 c 2 c 2 a 2 3 a 2 b 1 c 3 3 · 1 · 3 = 9 δ R = { ( a 2 , b 1 ) �→ − 2 } Q ( D ) A B ∅ a 2 b 1 − 2 ( ) 4 + 6 + 9 = 19 53 / 78

  59. Updates to the Triangle Count R · S · T R S T A B B C C A A B C 2 · 2 · 1 = 4 a 1 b 1 2 b 1 c 1 2 c 1 a 1 1 a 1 b 1 c 2 1 1 3 2 · 1 · 3 = 6 a 2 b 1 b 1 c 2 c 2 a 1 a 1 b 1 c 2 c 2 a 2 3 a 2 b 1 c 3 3 · 1 · 3 = 9 δ R = { ( a 2 , b 1 ) �→ − 2 } Q ( D ) A B ∅ a 2 b 1 − 2 ( ) 4 + 6 + 9 = 19 53 / 78

  60. Updates to the Triangle Count R · S · T R S T A B B C C A A B C 2 · 2 · 1 = 4 a 1 b 1 2 b 1 c 1 2 c 1 a 1 1 a 1 b 1 c 2 1 1 3 2 · 1 · 3 = 6 a 2 b 1 b 1 c 2 c 2 a 1 a 1 b 1 c 2 c 2 a 2 3 a 2 b 1 c 3 3 · 1 · 3 = 9 δ R = { ( a 2 , b 1 ) �→ − 2 } Q ( D ) A B ∅ a 2 b 1 − 2 ( ) 4 + 6 + 9 = 19 53 / 78

  61. Updates to the Triangle Count R · S · T R S T A B B C C A A B C 2 · 2 · 1 = 4 a 1 b 1 2 b 1 c 1 2 c 1 a 1 1 a 1 b 1 c 2 1 1 3 2 · 1 · 3 = 6 a 2 b 1 b 1 c 2 c 2 a 1 a 1 b 1 c 2 c 2 a 2 3 a 2 b 1 c 3 1 · 1 · 3 = 3 δ R = { ( a 2 , b 1 ) �→ − 2 } Q ( D ) A B ∅ a 2 b 1 − 2 ( ) 4 + 6 + 9 = 19 53 / 78

  62. Updates to the Triangle Count R · S · T R S T A B B C C A A B C 2 · 2 · 1 = 4 a 1 b 1 2 b 1 c 1 2 c 1 a 1 1 a 1 b 1 c 2 1 1 3 2 · 1 · 3 = 6 a 2 b 1 b 1 c 2 c 2 a 1 a 1 b 1 c 2 c 2 a 2 3 a 2 b 1 c 3 1 · 1 · 3 = 3 δ R = { ( a 2 , b 1 ) �→ − 2 } Q ( D ) A B ∅ a 2 b 1 − 2 ( ) 4 + 6 + 3 = 13 53 / 78

  63. Data Updates need the Additive Inverse Data updates can be inserts (tuples with positive multiplicity) and deletes (tuples with negative multiplicity): Semirings are enough if we only want inserts or no updates Recall that FAQs use commutative semirings ( D , ⊕ , ⊗ ): ( D , ⊕ ) is a commutative monoid with identity element 0 : ◮ ( a ⊕ b ) ⊕ c = a ⊕ ( b ⊕ c ) ◮ 0 ⊕ a = a ⊕ 0 = a ◮ a ⊕ b = b ⊕ a ( D , ⊗ ) is a commutative monoid with identity element 1 : ◮ ( a ⊗ b ) ⊗ c = a ⊗ ( b ⊗ c ) ◮ 1 ⊗ a = a ⊗ 1 = a ◮ a ⊕ b = b ⊕ a Multiplication distributes over addition: ◮ a ⊗ ( b ⊕ c ) = ( a ⊗ b ) ⊕ ( a ⊗ c ) Multiplication by 0 annihilates D : ◮ 0 ⊗ a = a ⊗ 0 = 0 54 / 78

  64. From Semirings to Rings We need a commutative ring ( D , ⊕ , ⊗ ) if we want to support deletes as well: ( D , ⊕ ) is an abelian group with identity element 0 : ◮ ( a ⊕ b ) ⊕ c = a ⊕ ( b ⊕ c ) ◮ 0 ⊕ a = a ⊕ 0 = a ◮ a ⊕ b = b ⊕ a ◮ ∃ − a ∈ D : a ⊕ ( − a ) = ( − a ) ⊕ a = 0 ( D , ⊗ ) is a commutative monoid with identity element 1 : ◮ ( a ⊗ b ) ⊗ c = a ⊗ ( b ⊗ c ) ◮ 1 ⊗ a = a ⊗ 1 = a ◮ a ⊕ b = b ⊕ a Multiplication distributes over addition: ◮ a ⊗ ( b ⊕ c ) = ( a ⊗ b ) ⊕ ( a ⊗ c ) Multiplication by 0 annihilates D : ◮ 0 ⊗ a = a ⊗ 0 = 0 Examples: Z , Q , R , C , R n , polynomial ring. We used the ring ( Z , + , ∗ ) in our previous example. 55 / 78

  65. The Maintenance Problem single-tuple single-tuple single-tuple update update update D 0 D 1 D 2 database auxiliary maintain maintain data A 0 A 1 A 2 structure maintain maintain triangle Q ( D 0 ) Q ( D 1 ) Q ( D 2 ) count Given a current database D and a single-tuple update, what are the time and space complexities for maintaining Q ( D )? 56 / 78

  66. Much Ado about Triangles The Triangle Query Served as Milestone in Many Fields Worst-case optimal join algorithms [Algorithmica 1997, SIGMOD R. 2013] Parallel query evaluation [Found. & Trends DB 2018] Randomized approximation in static settings [FOCS 2015] Randomized approximation in data streams [SODA 2002, COCOON 2005, PODS 2006, PODS 2016, Theor. Comput. Sci. 2017] Investigation of Answering Queries under Updates Theoretical developments [PODS 2017, ICDT 2018] Systems developments [F. & T. DB 2012, VLDB J. 2014, SIGMOD 2017, 2018] Lower bounds [STOC 2015, ICM 2018] 57 / 78

  67. Na¨ ıve Maintenance “ Compute from scratch! ” δ R = { ( a ′ , b ′ ) �→ m } � � � R ( a , b ) + δ R ( a , b ) · S ( b , c ) · T ( c , a ) a , b , c � �� � newR = � a , b , c newR ( a , b ) · S ( b , c ) · T ( c , a ) Maintenance Complexity Time: O ( | D | 1 . 5 ) using worst-case optimal join algorithms Space: O ( | D | ) to store input relations 58 / 78

  68. Classical Incremental View Maintenance [CY12] “ Compute the difference! ” δ R = { ( a ′ , b ′ ) �→ m } � � � R ( a , b ) + δ R ( a , b ) · S ( b , c ) · T ( c , a ) a , b , c = � a , b , c R ( a , b ) · S ( b , c ) · T ( c , a ) + δ R ( a ′ , b ′ ) · � c S ( b ′ , c ) · T ( c , a ′ ) Maintenance Complexity Time: O ( | D | ) to intersect C -values from S and T Space: O ( | D | ) to store input relations 59 / 78

  69. Factorized Incremental View Maintenance [NO18] “ Compute the difference by using pre-materialized views! ” δ R = { ( a ′ , b ′ ) �→ m } Pre-materialize V ST ( b , a ) = � c S ( b , c ) · T ( c , a )! � � � R ( a , b ) + δ R ( a , b ) · S ( b , c ) · T ( c , a ) a , b , c = � a , b , c R ( a , b ) · S ( b , c ) · T ( c , a ) + δ R ( a ′ , b ′ ) · V ST ( b ′ , a ′ ) Maintenance Complexity Time for updates to R : O (1) to look up in V ST Time for updates to S and T : O ( | D | ) to maintain V ST Space: O ( | D | 2 ) to store input relations and V ST 60 / 78

  70. Closing the Complexity Gap Complexity bounds for the maintenance of the triangle count Known Upper Bound Maintenance Time: O ( | D | ) O ( | D | ) Space: Lower Bound Amortized maintenance time: not O ( | D | 0 . 5 − γ ) for any γ > 0 (under reasonable complexity theoretic assumptions) 61 / 78

  71. Closing the Complexity Gap Complexity bounds for the maintenance of the triangle count Known Upper Bound Maintenance Time: O ( | D | ) O ( | D | ) Space: Can the triangle count be maintained in sublinear time? Lower Bound Amortized maintenance time: not O ( | D | 0 . 5 − γ ) for any γ > 0 (under reasonable complexity theoretic assumptions) 61 / 78

  72. Closing the Complexity Gap Complexity bounds for the maintenance of the triangle count Known Upper Bound Maintenance Time: O ( | D | ) O ( | D | ) Space: Yes! IVM ε [KNNOZ19] Can the triangle count Amortized maintenance time: be maintained in O ( | D | 0 . 5 ) sublinear time? This is worst-case optimal! Lower Bound Amortized maintenance time: not O ( | D | 0 . 5 − γ ) for any γ > 0 (under reasonable complexity theoretic assumptions) 61 / 78

  73. IVM ε Exhibits a Time-Space Tradeoff [KNNOZ19] Given ε ∈ [0 , 1] and a database D , IVM ε maintains the triangle count with O ( | D | max { ε, 1 − ε } ) amortized update time O ( | D | 1+min { ε, 1 − ε } ) space O ( | D | 3 / 2 ) preprocessing time O (1) answer time. complexity O ( | D | 1 . 5 ) Space Update Time O ( | D | ) worst-case optimality O ( | D | 0 . 5 ) ε = 0 . 5 ε 0 0 . 5 1 Known maintenance approaches are recovered by IVM ε . 62 / 78

  74. Main Ideas in IVM ε Compute the difference like in classical IVM! Materialize views like in Factorized IVM! New ingredient: Use adaptive processing based on data skew! = ⇒ Treat heavy values differently from light values! 63 / 78

  75. Quick Look inside IVM ε Partition R into a light part R L = { t ∈ R | | σ A = t . A | < | D | ε } , a heavy part R H = R \ R L ! R light part A B R L · · A B a b 1 . . . . . . n < | D | ε . . . . . . a b n · · heavy part · · R H a ′ b ′ 1 A B . . . . . . m ≥ | D | ε . . . . . . . . . . . . a ′ b ′ m · · 64 / 78

  76. Quick Look inside IVM ε Derived Bounds Partition R into for all A -values a : a light part | σ A = a R L | < | D | ε R L = { t ∈ R | | σ A = t . A | < | D | ε } , | π A R H | ≤ | D | 1 − ε a heavy part R H = R \ R L ! R light part A B R L · · A B a b 1 . . . . . . n < | D | ε . . . . . . a b n · · heavy part · · R H a ′ b ′ 1 A B . . . . . . m ≥ | D | ε . . . . . . . . . . . . a ′ b ′ m · · 64 / 78

  77. Quick Look inside IVM ε Derived Bounds Partition R into for all A -values a : a light part | σ A = a R L | < | D | ε R L = { t ∈ R | | σ A = t . A | < | D | ε } , | π A R H | ≤ | D | 1 − ε a heavy part R H = R \ R L ! R light part A B R L Likewise, partition · · A B S = S L ∪ S H based on B , and a b 1 . . . . . . n < | D | ε . . . . T = T L ∪ T H based on C ! . . a b n · · heavy part · · R H a ′ b ′ 1 A B . . . . . . m ≥ | D | ε . . . . . . . . . . . . a ′ b ′ m · · 64 / 78

  78. Quick Look inside IVM ε Derived Bounds Partition R into for all A -values a : a light part | σ A = a R L | < | D | ε R L = { t ∈ R | | σ A = t . A | < | D | ε } , | π A R H | ≤ | D | 1 − ε a heavy part R H = R \ R L ! R light part A B R L Likewise, partition · · A B S = S L ∪ S H based on B , and a b 1 . . . . . . n < | D | ε . . . . T = T L ∪ T H based on C ! . . a b n · · heavy part · · R H a ′ b ′ Q is the sum of skew-aware views 1 A B . . . . R U ( a , b ) · S V ( b , c ) · T W ( c , a ) . . m ≥ | D | ε . . . . . . . . with U , V , W ∈ { L , H } . . . . . a ′ b ′ m · · 64 / 78

  79. Adaptive Maintenance Strategy Given an update δ R ∗ = { ( a ′ , b ′ ) �→ m } , compute the difference for each skew-aware view using different strategies: Skew-aware View Evaluation from left to right Time � δ R ∗ ( a ′ , b ′ ) · � S L ( b ′ , c ) · T L ( c , a ′ ) O ( | D | ε ) R ∗ ( a , b ) · S L ( b , c ) · T L ( c , a ) c a , b , c 65 / 78

  80. Adaptive Maintenance Strategy Given an update δ R ∗ = { ( a ′ , b ′ ) �→ m } , compute the difference for each skew-aware view using different strategies: Skew-aware View Evaluation from left to right Time � δ R ∗ ( a ′ , b ′ ) · � S L ( b ′ , c ) · T L ( c , a ′ ) O ( | D | ε ) R ∗ ( a , b ) · S L ( b , c ) · T L ( c , a ) c a , b , c � δ R ∗ ( a ′ , b ′ ) · � T H ( c , a ′ ) · S H ( b ′ , c ) O ( | D | 1 − ε ) R ∗ ( a , b ) · S H ( b , c ) · T H ( c , a ) a , b , c c 65 / 78

  81. Adaptive Maintenance Strategy Given an update δ R ∗ = { ( a ′ , b ′ ) �→ m } , compute the difference for each skew-aware view using different strategies: Skew-aware View Evaluation from left to right Time � δ R ∗ ( a ′ , b ′ ) · � S L ( b ′ , c ) · T L ( c , a ′ ) O ( | D | ε ) R ∗ ( a , b ) · S L ( b , c ) · T L ( c , a ) c a , b , c � δ R ∗ ( a ′ , b ′ ) · � T H ( c , a ′ ) · S H ( b ′ , c ) O ( | D | 1 − ε ) R ∗ ( a , b ) · S H ( b , c ) · T H ( c , a ) a , b , c c δ R ∗ ( a ′ , b ′ ) · � S L ( b ′ , c ) · T H ( c , a ′ ) O ( | D | ε ) c � R ∗ ( a , b ) · S L ( b , c ) · T H ( c , a ) or a , b , c δ R ∗ ( a ′ , b ′ ) · � T H ( c , a ′ ) · S L ( b ′ , c ) O ( | D | 1 − ε ) c 65 / 78

Recommend


More recommend