Information Systems Group Relax and Let the Database do the Partitioning Online Alekh Jindal, Jens Dittrich - presented by Stefan Schuh VLDB International Workshop on Real-Time Business Intelligence September 2, 2011 Thursday, September 1, 2011
Motivation: Offline Physical Database Design offline workload analysis BI Applications Advisory Tools offline transformation Database DBA 2 Thursday, September 1, 2011
Offline Design Cheats! • Workloads infrequently change over time • DBAs always available • Physical design once-in-a-while process • DBAs make perfect decisions 3 Thursday, September 1, 2011
Motivation: Offline Physical Database Design offline workload analysis BI Applications Advisory Tools offline transformation Database DBA 4 Thursday, September 1, 2011
Motivation: Online Physical Database Design Sub-Problem Proposed Solution Online Indexing BI Applications Indexing Database Cracking Adaptive Indexing Materialized Views Dynamic Materialized Views WE! Partitioning Database 5 Thursday, September 1, 2011
Challenges in Online Partitioning • Collecting online workload • Analyzing workload online • Querying with online workload analysis • Creating partitions online 6 Thursday, September 1, 2011
Challenges in Online Partitioning • Collecting online workload • Analyzing workload online • Querying with online workload analysis • Creating partitions online 6 Thursday, September 1, 2011
What is the Workload? • offline approach: take the last query log as workload (static) • online approach: collect incoming queries in a window and slide it when more queries come (dynamic) 7 Thursday, September 1, 2011
What is the Workload? • offline approach: take the last query log as workload (static) • online approach: collect incoming queries in a window and slide it when more queries come (dynamic) Window Size = 5 Q 1 Q 2 Q 3 Q 4 Q 5 7 Thursday, September 1, 2011
What is the Workload? • offline approach: take the last query log as workload (static) • online approach: collect incoming queries in a window and slide it when more queries come (dynamic) Window Size = 5 Q 1 Q 2 Q 3 Q 4 Q 5 Q 6 7 Thursday, September 1, 2011
What is the Workload? • offline approach: take the last query log as workload (static) • online approach: collect incoming queries in a window and slide it when more queries come (dynamic) Window Size = 5 Q 1 Q 2 Q 3 Q 4 Q 5 Q 6 Q 7 7 Thursday, September 1, 2011
What is the Workload? • offline approach: take the last query log as workload (static) • online approach: collect incoming queries in a window and slide it when more queries come (dynamic) Window Size = 5 Q 1 Q 2 Q 3 Q 4 Q 5 Q 6 Q 7 ..... 7 Thursday, September 1, 2011
How to Express the Partitioning Problem? • Partitioning unit e.g. a 1 , a 2 , a 3 , a 4 , a 5 , a 6 set P u a 1 a 2 a 3 a 4 a 5 a 6 8 Thursday, September 1, 2011
How to Express the Partitioning Problem? a 1 a 2 a 3 a 4 a 5 a 6 • Partitioning unit e.g. a 1 , a 2 , a 3 , a 4 , a 5 , a 6 set P u • ordering e.g. a 3 a 2 a 1 a 5 a 4 a 6 ordering ⇥ , set P u ordering ⇥ , ordering ⇥ , ordering ⇥ , ordering ⇥ , ordering ⇥ , ( ( ( ( ( ( a 1 a 2 a 3 a 4 a 5 a 6 a 3 a 2 a 1 a 5 a 4 a 6 8 Thursday, September 1, 2011
How to Express the Partitioning Problem? a 1 a 2 a 3 a 4 a 5 a 6 • Partitioning unit e.g. a 1 , a 2 , a 3 , a 4 , a 5 , a 6 set P u a 3 a 2 a 1 a 5 a 4 a 6 a 1 a 2 a 3 a 4 a 5 a 6 • ordering e.g. a 3 a 2 a 1 a 5 a 4 a 6 ordering ⇥ , set P u ordering ⇥ , ordering ⇥ , ordering ⇥ , ordering ⇥ , ordering ⇥ , ( ( ( ( ( ( • Split line, Split vector e.g. [01001] ctor S [ ] 0 1 0 0 1 a 1 a 2 a 3 a 4 a 5 a 6 8 Thursday, September 1, 2011
How to Express the Partitioning Problem? a 1 a 2 a 3 a 4 a 5 a 6 • Partitioning unit e.g. a 1 , a 2 , a 3 , a 4 , a 5 , a 6 set P u a 3 a 2 a 1 a 5 a 4 a 6 a 1 a 2 a 3 a 4 a 5 a 6 • ordering e.g. a 3 a 2 a 1 a 5 a 4 a 6 ordering ⇥ , set P u ordering ⇥ , ordering ⇥ , ordering ⇥ , ordering ⇥ , ordering ⇥ , [ 0 1 ] 0 0 1 ( ( ( ( ( ( • Split line, Split vector e.g. [01001] a 1 a 2 a 3 a 4 a 5 a 6 ctor S • Partition e.g. (a 1 , a 2 ) artition p m,r ( S, � ) ( a 1 , a 2 ) a 1 a 2 a 3 a 4 a 5 a 6 8 Thursday, September 1, 2011
How to Express the Partitioning Problem? a 1 a 2 a 3 a 4 a 5 a 6 • Partitioning unit e.g. a 1 , a 2 , a 3 , a 4 , a 5 , a 6 set P u a 3 a 2 a 1 a 5 a 4 a 6 a 1 a 2 a 3 a 4 a 5 a 6 • ordering e.g. a 3 a 2 a 1 a 5 a 4 a 6 ordering ⇥ , set P u ordering ⇥ , ordering ⇥ , ordering ⇥ , ordering ⇥ , ordering ⇥ , [ 0 1 ] 0 0 1 ( ( ( ( ( ( • Split line, Split vector e.g. [01001] a 1 a 2 a 3 a 4 a 5 a 6 ctor S ( a 1 , a 2 ) • Partition e.g. (a 1 , a 2 ) a 1 a 2 a 3 a 4 a 5 a 6 artition p m,r ( S, � ) • Partitioning scheme e.g. (a 1 , a 2 ), (a 3 , a 4 , a 5 ), (a 6 ) scheme P ( S, � ) { (a 1 , a 2 ) (a 3 , a 4 , a 5 ) (a 6 ) } a 1 a 2 a 3 a 4 a 5 a 6 8 Thursday, September 1, 2011
What about Horizontal Partitioning? • Just rotate the table by 90 degrees • abstraction allows us to solve both problems set P u • can be attributes, row-ranges, or any other table slice set P u a 1 a 2 a 3 a 4 a 5 a 6 a 1 a 2 a 3 a 4 a 5 a 6 r 6 r 5 r 4 r 3 r 2 r 1 9 Thursday, September 1, 2011
Partitioning Problem: What to Analyze? • Partitioning unit e.g. a 1 , a 2 , a 3 , a 4 , a 5 , a 6 set P u • ordering e.g. a 3 a 2 a 1 a 5 a 4 a 6 ordering ⇥ , set P u ordering ⇥ , ordering ⇥ , ordering ⇥ , ordering ⇥ , ordering ⇥ , ( ( ( ( ( ( • Split line, Split vector e.g. [01001] ctor S • Partition e.g. (a 1 , a 2 ) artition p m,r ( S, � ) • Partitioning scheme e.g. (a 1 , a 2 ), (a 3 , a 4 , a 5 ), (a 6 ) scheme P ( S, � ) • Workload ad W t k • Problem statement Find , such that: S ⇥ S ⇥ = argmin � ⇥ ordering ⇥ , W t k , P ( S, � ) C est. S ( 10 Thursday, September 1, 2011
How to Analyze the Workload? Step 1: Finding Partitioning Unit Ordering • offline approach: create affinity matrix and cluster it once, as proposed by Navathe et. al. • online approach: leverage the affinity idea, but dynamically update and cluster the affinity matrix 11 Thursday, September 1, 2011
Offline Partitioning Unit Ordering • Create affinity matrix having attributes co-occurrences PartKey SuppKey Quantity PartKey 8 5 6 SuppKey 5 8 4 Quantity 6 4 9 • Cluster affinity matrix to maximize the affinity measure ⇥ ⇥ as M ( ⇥ ) = � x � x j =1 A ( a i , a j )[ A ( a i , a j − 1 )+ A ( a i , a j +1 )]. i =1 ) = A ( a , a ) = A ( a , a ) = A ( a , a )=0. For the = 404 = 440 e M ( ⇥ ) e M ( ⇥ ) matrix has matrix has PartKey SuppKey Quantity PartKey Quantity SuppKey PartKey 8 5 6 PartKey 8 6 5 SuppKey 5 8 4 Quantity 6 9 4 Quantity 6 4 9 SuppKey 5 4 8 12 Thursday, September 1, 2011
Online Partitioning Unit Ordering • Update only the referenced in affinity matrix set P u PartKey Quantity SuppKey PartKey Quantity SuppKey (PartKey, SuppKey) PartKey 8 6 5 PartKey 9 6 6 Quantity 6 9 4 Quantity 6 9 4 SuppKey 5 4 8 SuppKey 6 4 9 • Re-cluster only the referenced in affinity matrix set P u +48 0 PartKey Quantity SuppKey SuppKey PartKey Quantity PartKey 9 6 6 SuppKey 6 9 6 Quantity 6 9 4 PartKey 4 6 9 SuppKey 6 4 9 Quantity 9 6 4 13 Thursday, September 1, 2011
How to Analyze the Workload? Step 2: Enumerating Split Vectors • offline approach: consider all possible split vectors (brute force) a 1 a 2 a 3 .. ... a n is 2 n � 1 . Complexity: 14 Thursday, September 1, 2011
How to Analyze the Workload? Step 2: Enumerating Split Vectors • offline approach: consider all possible split vectors (brute force) • online approach: One-dimensional Online Partitioning (O 2 P) Algorithm Technique 1: prune non-referenced partitioning units Technique 2: consider split vectors greedily Technique 3: save previous best split vectors using dynamic programming 15 Thursday, September 1, 2011
Partitioning Unit Pruning Idea: Prune the unused (non-referenced) in set P u at most two separate partitions a 1 a 2 a 3 .. ... a n Complexity: For p leading and q trailing unused set P u is 2 n � p � q � 1 . 16 Thursday, September 1, 2011
Greedy Split Vector Enumeration Idea: Mark only one (best) split vector at a time a 1 a 2 a 3 .. ... a n a 1 a 2 a 3 .. ... a n a 1 a 2 a 3 .. ... a n Complexity: worst case n 2 17 Thursday, September 1, 2011
Dynamic Programming Idea: save best split vectors in un-split partitions a 1 a 2 a 3 .. ... a n Best Best a 1 a 2 a 3 .. ... a n a 1 a 2 a 3 .. ... a n 18 Thursday, September 1, 2011
How to Amortize Partitioning Analysis? • offline approach: querying after computing and creating partitions Analysis Cost Query Cost • online approach: option 1 : interleave queries with partitioning analysis option 2 : queries in a separate thread Analysis Cost Query Cost 19 Thursday, September 1, 2011
Goals of the Experiments • Does greedy partitioning hurt Quality? • How much is O 2 P faster? • Can such a system adapt to changing workload ? • Will our approach work on real systems? 20 Thursday, September 1, 2011
Recommend
More recommend