exploiting incrementality with dbtoaster monitoring
play

Exploiting Incrementality with DBToaster Monitoring Programs - PowerPoint PPT Presentation

Exploiting Incrementality with DBToaster Monitoring Programs Network Monitoring Server Status Task Allocations Task Properties Servers Per Task > Task QOS? Move Task to New Servers Computational Advertising Available Ads Site


  1. Exploiting Incrementality with DBToaster

  2. Monitoring Programs

  3. Network Monitoring Server Status Task Allocations Task Properties Servers Per Task > Task QOS? Move Task to New Servers

  4. Computational Advertising Available Ads Site Information User Clicks Good Ad Offers Which Ad To Show?

  5. Monitoring Programs State Updates Actions Read Views On-Change Reactions Internal Actions State

  6. Monitoring Programs Agile View Spec

  7. • Existing Tools • DBToaster • Cumulus

  8. Stream Processors

  9. Stream Processors

  10. Stream Processors

  11. Stream Processors

  12. Stream Processors

  13. Stream Processors

  14. Stream Processors

  15. Stream Processors No Persistent State

  16. Stream Processors but also dynamic No Persistent State ^

  17. Incremental View Maintenance QUERY

  18. Incremental View Maintenance ON CHANGE : QUERY := R S T ⋈ ⋈

  19. Incremental View Maintenance ON CHANGE : += Δ ( ) QUERY R S T ⋈ ⋈ Simpler But still slow

  20. • Existing Tools • DBToaster • Cumulus

  21. DBToaster GCC C++ Spec

  22. DBToaster GCC C++ Spec

  23. DBToaster • Exploit Incrementality Deltas Recursive View Compiler • Pick the Right Data Model Materialization Materialization Delta Optimizer Computation • ... the right representation • Functional Optimizer ... understand the platform Code Generators • Borrow (Liberally) from Other Fields C++ Hadoop Cumulus Multicore • ... use set-at-a-time optimizations (PL) Runtimes • ... generate machine code (Compilers) • ... and others

  24. Recursive Delta Compilation ON Δ R : Δ ( ) QUERY += R S T ⋈ ⋈ Δ R

  25. Recursive Delta Compilation ON Δ R : Δ ( ) QUERY += R S T ⋈ ⋈ Δ R Usually Δ (R ⋈ S ⋈ T) is Simpler than R ⋈ S ⋈ T ^ Δ R Δ is Closed Usually Δ (R ⋈ S ⋈ T) has Finite Support for Δ R ^ Δ R (Koch, PODS ‘10)

  26. Recursive Delta Compilation QUERY := SUM(R.A * T.D) of R(A,B) S(B,C) T(C,D) ⋈ B ⋈ C

  27. Recursive Delta Compilation ON +R( α , β ) : α of SUM( of SUM(T.D) * T.D) QUERY += S( β ,C) S(B,C) S( β ,C) T(C,D) T(C,D) ⋈ C ⋈ C

  28. Recursive Delta Compilation ON +R( α , β ) : α of SUM( * m 1 [ β ] T.D) QUERY += S( β ,C) T(C,D) ⋈ C m 1 [ β := ] SUM(T.D) of S( β ,C) T(C,D) ⋈ C

  29. Recursive Delta Compilation ON +R( α , β ) : α * m 1 [ β ] QUERY += ON +S( β ’, ɣ ) : m 1 [ β ’ ] SUM(T.D) += of T(C,D) T( ɣ ,D)

  30. Recursive Delta Compilation ON +R( α , β ) : α * m 1 [ β ] QUERY += ON +S( β ’, ɣ ) : m 1 [ β ’ ] m 2 [ ɣ ] += m 2 [ ɣ ] SUM(T.D) := of T( ɣ ,D)

  31. Recursive Delta Compilation ON +R( α , β ) : α * m 1 [ β ] QUERY += ON +S( β ’, ɣ ) : m 1 [ β ’ ] m 2 [ ɣ ] += ON +T( ɣ ’, δ ) : m 2 [ ɣ ’ += ] SUM( ) δ

  32. Recursive Delta Compilation q ON +R( α , β ) : α * m 1 [ β ] QUERY += +R ON +S( β ’, ɣ ) : m 1 m 1 m 1 [ β ’ ] m 2 [ ɣ ] += +S ON +T( ɣ ’, δ ) : ’ += δ m 2 [ ɣ ] m 2 m 2 +T

  33. View Hierarchy q +R +T +S m 1 m 1 m 4 m 7 +S +R +T +S +T +R m 2 m 2 m 3 m 5 m 6 m 8 m 9 +T +R +S +R +T +S

  34. Maintenance Program ON +R[ A, B ]: QUERY[ ] += ( A * QUERY_dR[ B ] ) QUERY_dT[ C ] += FORALL C:( A * QUERY_dR_dT[ B, C ] ) QUERY_dS[ B ] += A ON +S[ B, C ]: QUERY[ ] += ( QUERY_dS[ B ] * QUERY_dR_dS[ C ] ) QUERY_dT[ C ] += QUERY_dS[ B ] QUERY_dR[ B ] += QUERY_dR_dS[ C ] QUERY_dR_dT[ B, C ] += 1. ON +T[ C, D ]: QUERY[ ] += ( QUERY_dT[ C ] * D ) QUERY_dR[ B ] += FORALL B:( D * QUERY_dR_dT[ B; C ] ) QUERY_dR_dS[ C ] += D

  35. Maintenance Program (DBToaster; CIDR ’11) C++

  36. But... Usually Δ (R ⋈ S ⋈ T) is Simpler than R ⋈ S ⋈ T ^ Δ R Nested Subqueries Usually Δ (R ⋈ S ⋈ T) has Finite Support for Δ R ^ Δ R Non-Equi-Joins

  37. Nested Subqueries Nested Subqueries QUERY := COUNT() of R(A,B) where A = SUM(C) of ( S(C) )

  38. Nested Subqueries Step 1: SUM(C) of m 1 [] S(C) Step 2: COUNT() COUNT() of of R(A,B) R(A,B) A = where [result of step 1]

  39. Nested Subqueries Step 2: COUNT() m 2 [ [result of step 1] ] of R(A,B) A = where [result of step 1] m 2 [A]:= COUNT() of R(A,B)

  40. Partial Materialization Materialize the query in parts m 2 [ m 1 [] ] Perform computations at maintenance-time

  41. � Non-Equality Predicates QUERY := COUNT() of R(A) S(B,C) where A < B

  42. Non-Equality Predicates ON +R( α ) : COUNT() SUM(m 1 [B]) QUERY += of S(B,C) where α < B Partial Materialization m 1 [B] := COUNT() of S(B,C) group by B

  43. Materialization Optimizer Partial Materialization • Nested Subqueries • Non-equality predicates • Memory Constraints • High Maintenance Cost • Specialized Datastructures

  44. VWAP IVM & Naive DBToaster SELECT sum(b1.price 60 Full Compilation * b1.volume) Time (min) Depth 1 (IVM) 45 Depth 0 (Repeated) Cumulative Time 30 FROM bids b1 15 WHERE 0.25 * 0 Refreshes (1000/s) 4 ( SELECT sum(b3.volume) 3 Rate of View Refreshing FROM bids b3) 2 > 1 ( SELECT sum(b2.volume) 0 Memory (MB) 40 30 FROM bids b2 Memory Usage 20 WHERE b2.price > 10 0 b1.price); 0 0.2 0.4 0.6 0.8 1 Fraction of Stream Trace Processed

  45. TPC-H Q3 SELECT ORDERS.orderkey, ORDERS.orderdate, ORDERS.shippriority, 10 SUM(extendedprice Full Compilation Time (min) Depth 1 (IVM) 7.5 * (1 - discount)) Depth 0 (Repeated) 5 FROM CUSTOMER, ORDERS, LINEITEM 2.5 WHERE CUSTOMER.mktsegment 0 = 'BUILDING' Refreshes (1000/s) 40 AND ORDERS.custkey 30 = CUSTOMER.custkey 20 AND LINEITEM.orderkey 10 = ORDERS.orderkey 0 Memory (MB) AND ORDERS.orderdate 100 75 < DATE('1995-03-15') 50 AND LINEITEM.SHIPDATE 25 0 > DATE('1995-03-15') 0 0.2 0.4 0.6 0.8 1 GROUP BY ORDERS.orderkey, Fraction of Stream Trace Processed ORDERS.orderdate, ORDERS.shippriority; Half the Memory Usage

  46. • Existing Tools • DBToaster • Cumulus

  47. Maintenance Programs Data-Parallel Computations ON +R[ A, B ]: QUERY[ ] += ( A * QUERY_dR[ B ] ) QUERY_dT[ C ] += FORALL C:( A * QUERY_dR_dT[ B, C ] ) QUERY_dS[ B ] += A ON +S[ B, C ]: QUERY[ ] += ( QUERY_dS[ B ] * QUERY_dR_dS[ C ] ) QUERY_dT[ C ] += QUERY_dS[ B ] QUERY_dR[ B ] += QUERY_dR_dS[ C ] QUERY_dR_dT[ B, C ] += 1. ON +T[ C, D ]: QUERY[ ] += ( QUERY_dT[ C ] * D ) QUERY_dR[ B ] += FORALL B:( D * QUERY_dR_dT[ B; C ] ) QUERY_dR_dS[ C ] += D Key/Value Style Datastructures

  48. Execution Model ON Event(param1, param2, …) Statement 1 Statement 2 Statement 3 …

  49. Statement Execution Read Compute Write [Old Version] [New Version]

  50. S 0 1 2 Epoch: 0

  51. 0 S S:<0,0,0> 1 2 Epoch: 0

  52. 0 S:<0,0,0> 1 2 Epoch: 0

  53. 0 R 1 T 2 Epoch: 0

  54. 0 R:<0,0,1> R 1 T:<0,1,0> T 2 Epoch: 0

  55. 0 R:<0,0,1> 1 T:<0,1,0> 2 Epoch: 0

  56. M 2 [1,3] += 2 <0,2,4> R <0,1,3>

  57. R History M 2 […] += E <0,1,3> M 3 […]*M 4 […] < 0 , 1 , 3 > Σ δ M 3 […] M 4 […] M 3 […]*M 4 […] M 2 […] += … <0,1,3>

  58. M 2 [1,3] += 2 <0,2,4> 2 4 3 1 <1,3> → <0,1,2> <0,1,3> <0,2,5> <0,3,1>

  59. M 2 [1,3] += 2 <0,2,4> 2 4 2 3 1 <1,3> → <0,1,2> <0,1,3> <0,2,4> <0,2,5> <0,3,1> History δ E <0,2,5> < 0 , 2 , 5 >

  60. Open Challenges • Data Placement • Migration • Batch Processing • Live Program Management

Recommend


More recommend