Herodotos Herodotou Nedyalko Borisov Shivnath Babu Duke University
Table Partitioning Split parent table into smaller child tables (partitions) Partitioning methods: hash, range, list Sales Sales_Jan_11 Benefits id date . . . id date . . . Improve query performance Faster data loading, archival, backup Sales_Feb_11 id date . . . Efficient statistics maintenance Better cardinality estimation Sales_Mar_11 Fine-grained control for tuning id date . . . . . . 06/14/2011 Duke University 2
Recent Trends Growing usage due to increased data sizes Growing user control due to new SQL extensions Partitioning conditions for derived tables DBA must satisfy multiple objectives and constraints regarding partitioning Implications DBA may have limited control over partitioning scheme Diverse mix of partitioning schemes 06/14/2011 Duke University 3
Partitioning Schemes Multidimensional (Hierarchical) partitioning Tables partitioned on same key but different ranges Range partitioning with non-equi ranges Ads A Payments P A.d P.id 0 P 1 Sales S A 1 A 2 A 3 A 4 A 5 10 P 2 S.d S.d Loans L 20 P 3 L.id S.id 0 0 30 L 1 S 11 S 11 S 12 S 12 S 13 S 13 P 4 20 20 40 P 5 L 2 S 21 S 21 S 22 S 22 S 23 S 23 50 40 40 S 1 S 2 S 3 P 6 60 L 3 S 31 S 31 S 32 S 32 S 33 S 33 P 7 60 60 70 P 8 L 4 S 41 S 42 S 43 80 80 80 06/14/2011 Duke University 4
Partition-aware Query Optimization SELECT * FROM Sales S, Payments P WHERE S.id = P.id AND S.d > Feb-15 AND P.id < 25 Payments P P.id 0 P 1 P 1 Sales S 10 P 2 P 2 S.d S.d 20 P 3 P 3 S.id 0 30 S 11 S 11 S 11 S 12 S 12 S 12 S 12 S 13 S 13 S 13 S 13 P 4 P 4 20 40 P 5 P 5 S 21 S 21 S 21 S 22 S 22 S 22 S 22 S 23 S 23 S 23 S 23 50 40 S 1 S 2 S 3 P 6 P 6 60 S 31 S 31 S 31 S 32 S 32 S 32 S 32 S 32 S 33 S 33 S 33 S 33 S 33 P 7 P 7 60 70 P 8 P 8 S 41 S 41 S 42 S 42 S 42 S 43 S 43 S 43 80 80 06/14/2011 Duke University 5
Partition-aware Query Optimization Sales S Payments P SELECT * S.d P.id 0 FROM Sales S, Payments P S.id P 1 0 10 S 12 S 13 WHERE S.id = P.id P 2 20 20 AND S.d > Feb-15 S 22 S 23 P 3 30 40 AND P.id < 25 P 1 : “join of unions” P 2 : “union of (partition - wise) joins” Union HJ Union Union MJ HJ TS(P 1 ) TS(S 12 ) Union Union Union IS(P 3 ) TS(P 2 ) TS(S 13 ) TS(P 1 ) TS(S 12 ) TS(S 22 ) TS(P 3 ) TS(S 22 ) TS(P 2 ) TS(S 13 ) TS(S 23 ) TS(S 23 ) 06/14/2011 Duke University 6
Partition-aware Query Optimization More efficient partition-wise joins Legend S = Sales More appropriate join orders P = Payments More appropriate join operators L = Loans S ⋈ P ⋈ L S ⋈ P ⋈ L HJ Union Union Union MJ INLJ TS(L 1 ) MJ HJ MJ TS(L 1 ) HJ IS(P 3 ) TS(L 2 ) Union Union Union IS(P 3 ) Union Union Union TS(L 2 ) TS(P 1 ) TS(S 12 ) TS(S 22 ) TS(P 1 ) TS(S 12 ) TS(S 22 ) TS(P 2 ) TS(S 13 ) TS(S 23 ) TS(P 2 ) TS(S 13 ) TS(S 23 ) 06/14/2011 Duke University 7
Problem & Challenges Problem Definition Given a partitioning scheme and a query find the optimal query execution plan Challenges Dealing with plan space explosion Incorporating into state-of-the-art optimizers Partitions as physical or logical properties? Supporting a wide range of partitioning conditions 06/14/2011 Duke University 8
Overview 3. Path Selection S.id = P.id MJ S P P 1 Union Union S 12 TS(S 22 ) TS(P 1 ) S 13 P 2 1. Matching TS(S 23 ) TS(P 2 ) S 22 P 3 P 1 S 12 S 23 HJ S 13 P 2 Union TS(P 3 ) S 22 P 3 2. Clustering TS(S 22 ) S 23 S 12 P 1 TS(S 23 ) S 13 P 2 S 22 P 3 S 23 06/14/2011 Duke University 9
Matching Phase Goal Identify partition-wise join pairs that can generate output records New data structure: Partition Index Tree (PIT) Core idea: associate each partition with intervals Functionalities: index intervals, efficient lookups Implementation: Augmented red-black tree 06/14/2011 Duke University 10
Matching Algorithm Inputs : Table S, Table P, Join condition J Step 1: Convert partitioning conditions to intervals Step 2: Build PIT with intervals of S Step 3: Probe PIT with intervals of P Output: Output : Partition-wise join pairs (S i , P j ) Partition Join Pairs P 2 [10, 20) Probe S 22 [20, 40) 40 S 22 [20, 40) 40 (S 12 , P 1 ) ✓ (S 12 , P 2 ) S 13 [0, 20) 40 S 13 [0, 20) 40 S 23 [20, 40) 40 S 23 [20, 40) 40 (S 13 , P 1 ) (S 13 , P 2 ) ✓ S 12 [0, 20) 40 S 12 [0, 20) 40 (S 22 , P 3 ) (S 23 , P 3 ) Partition Interval Max Partition Interval Max Notation: Notation: 06/14/2011 Duke University 11
Matching Algorithm Benefits O(n log(n)), n = number of partitions of S Θ(n) memory needs Build & reuse PITs multiple times Additional support Numeric, dates, and string ranges or lists Complex partitioning conditions (AND, OR) Complex join conditions (AND, OR) Non-equi joins Details in the paper 06/14/2011 Duke University 12
Clustering Phase Goal Input : Partition join pairs (output from Matching Phase) Step 1: Build bipartite join partition graph Minimize number of partition-wise join pairs Step 2: Find connected components using Breadth-First-Search Output : Clustered join pairs Input: Intermediate: Output: Partition Join Partition Graph Clustered Join Pairs Join Pairs S 12 P 1 (S 12 , P 1 ) ({S 12 ,S 13 },{P 1 ,P 2 }) S 13 P 2 (S 12 , P 2 ) ({S 22 ,S 23 },{P 3 }) (S 13 , P 1 ) S 12 (S 13 , P 2 ) P 3 (S 22 , P 3 ) S 13 (S 23 , P 3 ) 06/14/2011 Duke University 13
Path Creation and Selection Goal Create and cost partition-wise join paths for child tables S ⋈ P ⋈ L Union MJ INLJ MJ TS(L 1 ) HJ IS(P 3 ) Union Union Union TS(L 2 ) TS(P 1 ) TS(S 12 ) TS(S 22 ) TS(P 2 ) TS(S 13 ) TS(S 23 ) 06/14/2011 Duke University 14
Bottom-up Query Optimization Find & retain best 3-way join LSP paths per interesting order Find & retain best 2-way join LP LS SP paths per interesting order Find & retain best access L S P paths 06/14/2011 Duke University 15
Bottom-up Query Optimization Logical Relation (Join) SP HJ MJ Union Union Union Union TS(S 12 ) TS(S 12 ) TS(P 1 ) IS(P 1 ) TS(P 2 ) TS(S 13 ) IS(P 2 ) TS(S 13 ) TS(P 3 ) IS(P 3 ) TS(S 22 ) TS(S 22 ) TS(S 23 ) TS(S 23 ) Best (physical) Best (physical) join path join path with interesting order 06/14/2011 Duke University 16
Extended Enumeration HJ P j Original enumeration Union Union Create and cost join of unions TS(S 12 ) TS(P 1 ) Extended enumeration TS(P 2 ) TS(S 13 ) TS(P 3 ) Create and cost union of joins TS(S 22 ) TS(S 23 ) Retain best path per interesting order Not enough! Union P u Not considering entire MJ HJ plan space (e.g., Union Union Union TS(P 3 ) if P j is best, no 3-way TS(P 1 ) TS(S 12 ) TS(S 22 ) partition-wise joins) TS(P 2 ) TS(S 13 ) TS(S 23 ) 06/14/2011 Duke University 17
Treating Partitions as Physical Properties Interesting partitions in joins Can make later joins less expensive Approach Retain best path for each interesting order Retain best path for each interesting partition Limitation Not considering entire plan space (e.g., cannot create union of joins with different join orders) 06/14/2011 Duke University 18
Treating Partitions as Logical Properties Logical parent join LSP L 1 S 12 S 13 P 1 P 2 Logical child joins L 2 S 22 S 23 P 3 Logical child joins LS LP SP L 1 S 12 S 13 L 1 P 1 P 2 S 12 S 13 P 1 P 2 Treated like L 2 P 3 L 2 S 22 S 23 S 22 S 23 P 3 logical parent joins Retain best join L S P S 12 paths per L 1 P 1 S 13 interesting L 2 P 2 S 22 P 3 order S 23 06/14/2011 Duke University 19
Treating Partitions as Logical Properties Property 1 Interesting orders independent across child joins Property 2 Child joins can have different join orders/operators Property 3 Entire extended plan space is enumerated Optimality guarantee Our bottom-up optimizer will find the optimal plan in the extended plan space 06/14/2011 Duke University 20
Experimental Evaluation Prototype using PostgreSQL 8.3.7 TPC-H benchmark (scale 30) Evaluation Methodology DBA has full/limited control over partitioning scheme State-of-the-art Vs. Our partition-aware optimizer Optimizer evaluation metrics Execution time Optimization time Memory utilization 06/14/2011 Duke University 21
Evaluation: Execution Time State-of-the-art Partition aware Basic Advanced 140 Execution Time (min) 120 100 80 60 40 20 0 2 3 4 5 7 8 9 10 12 14 Query 06/14/2011 Duke University 22
Evaluation: Optimization Time Basic/Intermediate Advanced Optimization Time (ms) 350 300 250 200 150 100 50 0 2 3 4 5 7 8 9 10 12 14 Query 06/14/2011 Duke University 23
Evaluation: Memory Utilization PostgreSQL Partition-aware Basic Advanced 40 Memory Usage (MB) 30 20 10 0 2 3 4 5 7 8 9 10 12 14 Query 06/14/2011 Duke University 24
Summary Extended plan space to include plans with multiway partition-wise joins Developed new partition-aware optimization techniques Easy incorporation into bottom-up query optimizers 06/14/2011 Duke University 25
Recommend
More recommend