Exploiting Incrementality with DBToaster
Monitoring Programs
Network Monitoring Server Status Task Allocations Task Properties Servers Per Task > Task QOS? Move Task to New Servers
Computational Advertising Available Ads Site Information User Clicks Good Ad Offers Which Ad To Show?
Monitoring Programs State Updates Actions Read Views On-Change Reactions Internal Actions State
Monitoring Programs Agile View Spec
• Existing Tools • DBToaster • Cumulus
Stream Processors
Stream Processors
Stream Processors
Stream Processors
Stream Processors
Stream Processors
Stream Processors
Stream Processors No Persistent State
Stream Processors but also dynamic No Persistent State ^
Incremental View Maintenance QUERY
Incremental View Maintenance ON CHANGE : QUERY := R S T ⋈ ⋈
Incremental View Maintenance ON CHANGE : += Δ ( ) QUERY R S T ⋈ ⋈ Simpler But still slow
• Existing Tools • DBToaster • Cumulus
DBToaster GCC C++ Spec
DBToaster GCC C++ Spec
DBToaster • Exploit Incrementality Deltas Recursive View Compiler • Pick the Right Data Model Materialization Materialization Delta Optimizer Computation • ... the right representation • Functional Optimizer ... understand the platform Code Generators • Borrow (Liberally) from Other Fields C++ Hadoop Cumulus Multicore • ... use set-at-a-time optimizations (PL) Runtimes • ... generate machine code (Compilers) • ... and others
Recursive Delta Compilation ON Δ R : Δ ( ) QUERY += R S T ⋈ ⋈ Δ R
Recursive Delta Compilation ON Δ R : Δ ( ) QUERY += R S T ⋈ ⋈ Δ R Usually Δ (R ⋈ S ⋈ T) is Simpler than R ⋈ S ⋈ T ^ Δ R Δ is Closed Usually Δ (R ⋈ S ⋈ T) has Finite Support for Δ R ^ Δ R (Koch, PODS ‘10)
Recursive Delta Compilation QUERY := SUM(R.A * T.D) of R(A,B) S(B,C) T(C,D) ⋈ B ⋈ C
Recursive Delta Compilation ON +R( α , β ) : α of SUM( of SUM(T.D) * T.D) QUERY += S( β ,C) S(B,C) S( β ,C) T(C,D) T(C,D) ⋈ C ⋈ C
Recursive Delta Compilation ON +R( α , β ) : α of SUM( * m 1 [ β ] T.D) QUERY += S( β ,C) T(C,D) ⋈ C m 1 [ β := ] SUM(T.D) of S( β ,C) T(C,D) ⋈ C
Recursive Delta Compilation ON +R( α , β ) : α * m 1 [ β ] QUERY += ON +S( β ’, ɣ ) : m 1 [ β ’ ] SUM(T.D) += of T(C,D) T( ɣ ,D)
Recursive Delta Compilation ON +R( α , β ) : α * m 1 [ β ] QUERY += ON +S( β ’, ɣ ) : m 1 [ β ’ ] m 2 [ ɣ ] += m 2 [ ɣ ] SUM(T.D) := of T( ɣ ,D)
Recursive Delta Compilation ON +R( α , β ) : α * m 1 [ β ] QUERY += ON +S( β ’, ɣ ) : m 1 [ β ’ ] m 2 [ ɣ ] += ON +T( ɣ ’, δ ) : m 2 [ ɣ ’ += ] SUM( ) δ
Recursive Delta Compilation q ON +R( α , β ) : α * m 1 [ β ] QUERY += +R ON +S( β ’, ɣ ) : m 1 m 1 m 1 [ β ’ ] m 2 [ ɣ ] += +S ON +T( ɣ ’, δ ) : ’ += δ m 2 [ ɣ ] m 2 m 2 +T
View Hierarchy q +R +T +S m 1 m 1 m 4 m 7 +S +R +T +S +T +R m 2 m 2 m 3 m 5 m 6 m 8 m 9 +T +R +S +R +T +S
Maintenance Program ON +R[ A, B ]: QUERY[ ] += ( A * QUERY_dR[ B ] ) QUERY_dT[ C ] += FORALL C:( A * QUERY_dR_dT[ B, C ] ) QUERY_dS[ B ] += A ON +S[ B, C ]: QUERY[ ] += ( QUERY_dS[ B ] * QUERY_dR_dS[ C ] ) QUERY_dT[ C ] += QUERY_dS[ B ] QUERY_dR[ B ] += QUERY_dR_dS[ C ] QUERY_dR_dT[ B, C ] += 1. ON +T[ C, D ]: QUERY[ ] += ( QUERY_dT[ C ] * D ) QUERY_dR[ B ] += FORALL B:( D * QUERY_dR_dT[ B; C ] ) QUERY_dR_dS[ C ] += D
Maintenance Program (DBToaster; CIDR ’11) C++
But... Usually Δ (R ⋈ S ⋈ T) is Simpler than R ⋈ S ⋈ T ^ Δ R Nested Subqueries Usually Δ (R ⋈ S ⋈ T) has Finite Support for Δ R ^ Δ R Non-Equi-Joins
Nested Subqueries Nested Subqueries QUERY := COUNT() of R(A,B) where A = SUM(C) of ( S(C) )
Nested Subqueries Step 1: SUM(C) of m 1 [] S(C) Step 2: COUNT() COUNT() of of R(A,B) R(A,B) A = where [result of step 1]
Nested Subqueries Step 2: COUNT() m 2 [ [result of step 1] ] of R(A,B) A = where [result of step 1] m 2 [A]:= COUNT() of R(A,B)
Partial Materialization Materialize the query in parts m 2 [ m 1 [] ] Perform computations at maintenance-time
� Non-Equality Predicates QUERY := COUNT() of R(A) S(B,C) where A < B
Non-Equality Predicates ON +R( α ) : COUNT() SUM(m 1 [B]) QUERY += of S(B,C) where α < B Partial Materialization m 1 [B] := COUNT() of S(B,C) group by B
Materialization Optimizer Partial Materialization • Nested Subqueries • Non-equality predicates • Memory Constraints • High Maintenance Cost • Specialized Datastructures
VWAP IVM & Naive DBToaster SELECT sum(b1.price 60 Full Compilation * b1.volume) Time (min) Depth 1 (IVM) 45 Depth 0 (Repeated) Cumulative Time 30 FROM bids b1 15 WHERE 0.25 * 0 Refreshes (1000/s) 4 ( SELECT sum(b3.volume) 3 Rate of View Refreshing FROM bids b3) 2 > 1 ( SELECT sum(b2.volume) 0 Memory (MB) 40 30 FROM bids b2 Memory Usage 20 WHERE b2.price > 10 0 b1.price); 0 0.2 0.4 0.6 0.8 1 Fraction of Stream Trace Processed
TPC-H Q3 SELECT ORDERS.orderkey, ORDERS.orderdate, ORDERS.shippriority, 10 SUM(extendedprice Full Compilation Time (min) Depth 1 (IVM) 7.5 * (1 - discount)) Depth 0 (Repeated) 5 FROM CUSTOMER, ORDERS, LINEITEM 2.5 WHERE CUSTOMER.mktsegment 0 = 'BUILDING' Refreshes (1000/s) 40 AND ORDERS.custkey 30 = CUSTOMER.custkey 20 AND LINEITEM.orderkey 10 = ORDERS.orderkey 0 Memory (MB) AND ORDERS.orderdate 100 75 < DATE('1995-03-15') 50 AND LINEITEM.SHIPDATE 25 0 > DATE('1995-03-15') 0 0.2 0.4 0.6 0.8 1 GROUP BY ORDERS.orderkey, Fraction of Stream Trace Processed ORDERS.orderdate, ORDERS.shippriority; Half the Memory Usage
• Existing Tools • DBToaster • Cumulus
Maintenance Programs Data-Parallel Computations ON +R[ A, B ]: QUERY[ ] += ( A * QUERY_dR[ B ] ) QUERY_dT[ C ] += FORALL C:( A * QUERY_dR_dT[ B, C ] ) QUERY_dS[ B ] += A ON +S[ B, C ]: QUERY[ ] += ( QUERY_dS[ B ] * QUERY_dR_dS[ C ] ) QUERY_dT[ C ] += QUERY_dS[ B ] QUERY_dR[ B ] += QUERY_dR_dS[ C ] QUERY_dR_dT[ B, C ] += 1. ON +T[ C, D ]: QUERY[ ] += ( QUERY_dT[ C ] * D ) QUERY_dR[ B ] += FORALL B:( D * QUERY_dR_dT[ B; C ] ) QUERY_dR_dS[ C ] += D Key/Value Style Datastructures
Execution Model ON Event(param1, param2, …) Statement 1 Statement 2 Statement 3 …
Statement Execution Read Compute Write [Old Version] [New Version]
S 0 1 2 Epoch: 0
0 S S:<0,0,0> 1 2 Epoch: 0
0 S:<0,0,0> 1 2 Epoch: 0
0 R 1 T 2 Epoch: 0
0 R:<0,0,1> R 1 T:<0,1,0> T 2 Epoch: 0
0 R:<0,0,1> 1 T:<0,1,0> 2 Epoch: 0
M 2 [1,3] += 2 <0,2,4> R <0,1,3>
R History M 2 […] += E <0,1,3> M 3 […]*M 4 […] < 0 , 1 , 3 > Σ δ M 3 […] M 4 […] M 3 […]*M 4 […] M 2 […] += … <0,1,3>
M 2 [1,3] += 2 <0,2,4> 2 4 3 1 <1,3> → <0,1,2> <0,1,3> <0,2,5> <0,3,1>
M 2 [1,3] += 2 <0,2,4> 2 4 2 3 1 <1,3> → <0,1,2> <0,1,3> <0,2,4> <0,2,5> <0,3,1> History δ E <0,2,5> < 0 , 2 , 5 >
Open Challenges • Data Placement • Migration • Batch Processing • Live Program Management
Recommend
More recommend