herodotos herodotou nedyalko borisov shivnath babu
play

Herodotos Herodotou Nedyalko Borisov Shivnath Babu Duke University - PowerPoint PPT Presentation

Herodotos Herodotou Nedyalko Borisov Shivnath Babu Duke University Table Partitioning Split parent table into smaller child tables (partitions) Partitioning methods: hash, range, list Sales Sales_Jan_11 Benefits id date . .


  1. Herodotos Herodotou Nedyalko Borisov Shivnath Babu Duke University

  2. Table Partitioning  Split parent table into smaller child tables (partitions)  Partitioning methods: hash, range, list Sales Sales_Jan_11  Benefits id date . . . id date . . .  Improve query performance  Faster data loading, archival, backup Sales_Feb_11 id date . . .  Efficient statistics maintenance  Better cardinality estimation Sales_Mar_11  Fine-grained control for tuning id date . . .  . . . 06/14/2011 Duke University 2

  3. Recent Trends  Growing usage due to increased data sizes  Growing user control due to new SQL extensions  Partitioning conditions for derived tables  DBA must satisfy multiple objectives and constraints regarding partitioning  Implications  DBA may have limited control over partitioning scheme  Diverse mix of partitioning schemes 06/14/2011 Duke University 3

  4. Partitioning Schemes  Multidimensional (Hierarchical) partitioning  Tables partitioned on same key but different ranges  Range partitioning with non-equi ranges Ads A Payments P A.d P.id 0 P 1 Sales S A 1 A 2 A 3 A 4 A 5 10 P 2 S.d S.d Loans L 20 P 3 L.id S.id 0 0 30 L 1 S 11 S 11 S 12 S 12 S 13 S 13 P 4 20 20 40 P 5 L 2 S 21 S 21 S 22 S 22 S 23 S 23 50 40 40 S 1 S 2 S 3 P 6 60 L 3 S 31 S 31 S 32 S 32 S 33 S 33 P 7 60 60 70 P 8 L 4 S 41 S 42 S 43 80 80 80 06/14/2011 Duke University 4

  5. Partition-aware Query Optimization SELECT * FROM Sales S, Payments P WHERE S.id = P.id AND S.d > Feb-15 AND P.id < 25 Payments P P.id 0 P 1 P 1 Sales S 10 P 2 P 2 S.d S.d 20 P 3 P 3 S.id 0 30 S 11 S 11 S 11 S 12 S 12 S 12 S 12 S 13 S 13 S 13 S 13 P 4 P 4 20 40 P 5 P 5 S 21 S 21 S 21 S 22 S 22 S 22 S 22 S 23 S 23 S 23 S 23 50 40 S 1 S 2 S 3 P 6 P 6 60 S 31 S 31 S 31 S 32 S 32 S 32 S 32 S 32 S 33 S 33 S 33 S 33 S 33 P 7 P 7 60 70 P 8 P 8 S 41 S 41 S 42 S 42 S 42 S 43 S 43 S 43 80 80 06/14/2011 Duke University 5

  6. Partition-aware Query Optimization Sales S Payments P SELECT * S.d P.id 0 FROM Sales S, Payments P S.id P 1 0 10 S 12 S 13 WHERE S.id = P.id P 2 20 20 AND S.d > Feb-15 S 22 S 23 P 3 30 40 AND P.id < 25 P 1 : “join of unions” P 2 : “union of (partition - wise) joins” Union HJ Union Union MJ HJ TS(P 1 ) TS(S 12 ) Union Union Union IS(P 3 ) TS(P 2 ) TS(S 13 ) TS(P 1 ) TS(S 12 ) TS(S 22 ) TS(P 3 ) TS(S 22 ) TS(P 2 ) TS(S 13 ) TS(S 23 ) TS(S 23 ) 06/14/2011 Duke University 6

  7. Partition-aware Query Optimization  More efficient partition-wise joins Legend S = Sales  More appropriate join orders P = Payments  More appropriate join operators L = Loans S ⋈ P ⋈ L S ⋈ P ⋈ L HJ Union Union Union MJ INLJ TS(L 1 ) MJ HJ MJ TS(L 1 ) HJ IS(P 3 ) TS(L 2 ) Union Union Union IS(P 3 ) Union Union Union TS(L 2 ) TS(P 1 ) TS(S 12 ) TS(S 22 ) TS(P 1 ) TS(S 12 ) TS(S 22 ) TS(P 2 ) TS(S 13 ) TS(S 23 ) TS(P 2 ) TS(S 13 ) TS(S 23 ) 06/14/2011 Duke University 7

  8. Problem & Challenges  Problem Definition  Given a partitioning scheme and a query find the optimal query execution plan  Challenges  Dealing with plan space explosion  Incorporating into state-of-the-art optimizers  Partitions as physical or logical properties?  Supporting a wide range of partitioning conditions 06/14/2011 Duke University 8

  9. Overview 3. Path Selection S.id = P.id MJ S P P 1 Union Union S 12 TS(S 22 ) TS(P 1 ) S 13 P 2 1. Matching TS(S 23 ) TS(P 2 ) S 22 P 3 P 1 S 12 S 23 HJ S 13 P 2 Union TS(P 3 ) S 22 P 3 2. Clustering TS(S 22 ) S 23 S 12 P 1 TS(S 23 ) S 13 P 2 S 22 P 3 S 23 06/14/2011 Duke University 9

  10. Matching Phase  Goal  Identify partition-wise join pairs that can generate output records  New data structure: Partition Index Tree (PIT)  Core idea: associate each partition with intervals  Functionalities: index intervals, efficient lookups  Implementation: Augmented red-black tree 06/14/2011 Duke University 10

  11. Matching Algorithm Inputs : Table S, Table P, Join condition J Step 1: Convert partitioning conditions to intervals Step 2: Build PIT with intervals of S Step 3: Probe PIT with intervals of P Output: Output : Partition-wise join pairs (S i , P j ) Partition Join Pairs P 2 [10, 20) Probe S 22 [20, 40) 40 S 22 [20, 40) 40 (S 12 , P 1 ) ✓ (S 12 , P 2 ) S 13 [0, 20) 40 S 13 [0, 20) 40 S 23 [20, 40) 40 S 23 [20, 40) 40 (S 13 , P 1 ) (S 13 , P 2 ) ✓ S 12 [0, 20) 40 S 12 [0, 20) 40 (S 22 , P 3 ) (S 23 , P 3 ) Partition Interval Max Partition Interval Max Notation: Notation: 06/14/2011 Duke University 11

  12. Matching Algorithm  Benefits  O(n log(n)), n = number of partitions of S  Θ(n) memory needs  Build & reuse PITs multiple times  Additional support  Numeric, dates, and string ranges or lists  Complex partitioning conditions (AND, OR)  Complex join conditions (AND, OR)  Non-equi joins  Details in the paper 06/14/2011 Duke University 12

  13. Clustering Phase  Goal Input : Partition join pairs (output from Matching Phase) Step 1: Build bipartite join partition graph  Minimize number of partition-wise join pairs Step 2: Find connected components using Breadth-First-Search Output : Clustered join pairs Input: Intermediate: Output: Partition Join Partition Graph Clustered Join Pairs Join Pairs S 12 P 1 (S 12 , P 1 ) ({S 12 ,S 13 },{P 1 ,P 2 }) S 13 P 2 (S 12 , P 2 ) ({S 22 ,S 23 },{P 3 }) (S 13 , P 1 ) S 12 (S 13 , P 2 ) P 3 (S 22 , P 3 ) S 13 (S 23 , P 3 ) 06/14/2011 Duke University 13

  14. Path Creation and Selection  Goal  Create and cost partition-wise join paths for child tables S ⋈ P ⋈ L Union MJ INLJ MJ TS(L 1 ) HJ IS(P 3 ) Union Union Union TS(L 2 ) TS(P 1 ) TS(S 12 ) TS(S 22 ) TS(P 2 ) TS(S 13 ) TS(S 23 ) 06/14/2011 Duke University 14

  15. Bottom-up Query Optimization Find & retain best 3-way join LSP paths per interesting order Find & retain best 2-way join LP LS SP paths per interesting order Find & retain best access L S P paths 06/14/2011 Duke University 15

  16. Bottom-up Query Optimization Logical Relation (Join) SP HJ MJ Union Union Union Union TS(S 12 ) TS(S 12 ) TS(P 1 ) IS(P 1 ) TS(P 2 ) TS(S 13 ) IS(P 2 ) TS(S 13 ) TS(P 3 ) IS(P 3 ) TS(S 22 ) TS(S 22 ) TS(S 23 ) TS(S 23 ) Best (physical) Best (physical) join path join path with interesting order 06/14/2011 Duke University 16

  17. Extended Enumeration HJ P j  Original enumeration Union Union  Create and cost join of unions TS(S 12 ) TS(P 1 )  Extended enumeration TS(P 2 ) TS(S 13 ) TS(P 3 )  Create and cost union of joins TS(S 22 ) TS(S 23 )  Retain best path per interesting order  Not enough! Union P u  Not considering entire MJ HJ plan space (e.g., Union Union Union TS(P 3 ) if P j is best, no 3-way TS(P 1 ) TS(S 12 ) TS(S 22 ) partition-wise joins) TS(P 2 ) TS(S 13 ) TS(S 23 ) 06/14/2011 Duke University 17

  18. Treating Partitions as Physical Properties  Interesting partitions in joins  Can make later joins less expensive  Approach  Retain best path for each interesting order  Retain best path for each interesting partition  Limitation  Not considering entire plan space (e.g., cannot create union of joins with different join orders) 06/14/2011 Duke University 18

  19. Treating Partitions as Logical Properties Logical parent join LSP L 1 S 12 S 13 P 1 P 2 Logical child joins L 2 S 22 S 23 P 3 Logical child joins LS LP SP L 1 S 12 S 13 L 1 P 1 P 2 S 12 S 13 P 1 P 2  Treated like L 2 P 3 L 2 S 22 S 23 S 22 S 23 P 3 logical parent joins  Retain best join L S P S 12 paths per L 1 P 1 S 13 interesting L 2 P 2 S 22 P 3 order S 23 06/14/2011 Duke University 19

  20. Treating Partitions as Logical Properties  Property 1  Interesting orders independent across child joins  Property 2  Child joins can have different join orders/operators  Property 3  Entire extended plan space is enumerated  Optimality guarantee  Our bottom-up optimizer will find the optimal plan in the extended plan space 06/14/2011 Duke University 20

  21. Experimental Evaluation  Prototype using PostgreSQL 8.3.7  TPC-H benchmark (scale 30)  Evaluation Methodology  DBA has full/limited control over partitioning scheme  State-of-the-art Vs. Our partition-aware optimizer  Optimizer evaluation metrics  Execution time  Optimization time  Memory utilization 06/14/2011 Duke University 21

  22. Evaluation: Execution Time State-of-the-art Partition aware Basic Advanced 140 Execution Time (min) 120 100 80 60 40 20 0 2 3 4 5 7 8 9 10 12 14 Query 06/14/2011 Duke University 22

  23. Evaluation: Optimization Time Basic/Intermediate Advanced Optimization Time (ms) 350 300 250 200 150 100 50 0 2 3 4 5 7 8 9 10 12 14 Query 06/14/2011 Duke University 23

  24. Evaluation: Memory Utilization PostgreSQL Partition-aware Basic Advanced 40 Memory Usage (MB) 30 20 10 0 2 3 4 5 7 8 9 10 12 14 Query 06/14/2011 Duke University 24

  25. Summary  Extended plan space to include plans with multiway partition-wise joins  Developed new partition-aware optimization techniques  Easy incorporation into bottom-up query optimizers 06/14/2011 Duke University 25

Recommend


More recommend