a comparison of knives for bread slicing
play

A Comparison of Knives for Bread Slicing Alekh Jindal*, Endre - PowerPoint PPT Presentation

A Comparison of Knives for Bread Slicing Alekh Jindal*, Endre Palatinus, Vladimir Pavlov, Jens Dittrich Information Systems Group Saarland University *MIT CSAIL Data Layout 2 Row-layout CUSTKEY NATIONKEY NAME MKTSEGMENT ADDRESS COMMENT


  1. A Comparison of Knives for Bread Slicing Alekh Jindal*, Endre Palatinus, Vladimir Pavlov, Jens Dittrich Information Systems Group Saarland University *MIT CSAIL

  2. Data Layout 2

  3. Row-layout CUSTKEY NATIONKEY NAME MKTSEGMENT ADDRESS COMMENT PHONE ACCTBAL 1234556 DE G.W. 43 E1.4 - 1234 € 987,513 2334444 GB I.N. 22 OX13 - 332 € 10,522 1123234 US M.S. 22 CA16 - 1233 € 6,452 2323454 DE J.D. 43 E1.3 CST_LOW 54443 € 399 2311555 GB A.M. 12 WA154 - 23442 € 46,523 1231235 NL T.V. 42 AM3321 - 1123 € 180,000 3

  4. Column-layout CUSTKEY NATIONKEY NAME MKTSEGMENT ADDRESS COMMENT PHONE ACCTBAL 1234556 DE G.W. 43 E1.4 - 1234 € 987,513 2334444 GB I.N. 22 OX13 - 332 € 10,522 1123234 US M.S. 22 CA16 - 1233 € 6,452 2323454 DE J.D. 43 E1.3 CST_LOW 54443 € 399 2311555 GB A.M. 12 WA154 - 23442 € 46,523 1231235 NL T.V. 42 AM3321 - 1123 € 180,000 4

  5. Column-grouping CUSTKEY NATIONKEY NAME MKTSEGMENT ADDRESS COMMENT PHONE ACCTBAL 1234556 DE G.W. 43 E1.4 - 1234 € 987,513 2334444 GB I.N. 22 OX13 - 332 € 10,522 1123234 US M.S. 22 CA16 - 1233 € 6,452 2323454 DE J.D. 43 E1.3 CST_LOW 54443 € 399 2311555 GB A.M. 12 WA154 - 23442 € 46,523 1231235 NL T.V. 42 AM3321 - 1123 € 180,000 5

  6. The Vertical Partitioning Problem • Given a workload and a cost function • Provide a complete and disjunct partitioning of the set of attributes of a table

  7. Workload 7

  8. CUSTKEY NAME ADDRESS NATIONKEY PHONE ACCTBAL MKTSEGMENT COMMENT Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22 8

  9. Selectivity? High selectivity Indexes Low selectivity Vertical partitioning 9

  10. Vertical Partitioning in Legacy Row-Stores 10

  11. TPC-H Customer CUSTKEY NATIONKEY NAME MKTSEGMENT ADDRESS COMMENT PHONE ACCTBAL P5 P6 P1 P2 P3 P4 ADDRESS COMMENT PHONE ACCTBAL CUSTKEY NATIONKEY NAME MKTSEGMENT 11

  12. SELECT Name, Address, Acctbal FROM Customer P5 P6 P1 P2 P3 P4 ADDRESS COMMENT PHONE ACCTBAL CUSTKEY NATIONKEY NAME MKTSEGMENT 12

  13. SELECT Name, Address, Acctbal FROM Customer P3 P5 P6 NAME ADDRESS COMMENT PHONE ACCTBAL Database Buffer NAME ADDRESS COMMENT PHONE ACCTBAL 13

  14. SELECT Name, Address, Acctbal FROM Customer P3 P5 P6 NAME ADDRESS COMMENT PHONE ACCTBAL Database Buffer NAME ADDRESS COMMENT PHONE ACCTBAL 14

  15. SELECT Name, Address, Acctbal FROM Customer P3 P5 P6 NAME ADDRESS COMMENT PHONE ACCTBAL Database Buffer NAME ADDRESS COMMENT PHONE ACCTBAL 15

  16. SELECT Name, Address, Acctbal FROM Customer P3 P5 P6 NAME ADDRESS COMMENT PHONE ACCTBAL Database Buffer NAME ADDRESS COMMENT PHONE ACCTBAL 16

  17. SELECT Name, Address, Acctbal FROM Customer P3 P5 P6 NAME ADDRESS COMMENT PHONE ACCTBAL Database Buffer NAME ADDRESS COMMENT PHONE ACCTBAL 17

  18. SELECT Name, Address, Acctbal FROM Customer P3 P5 P6 NAME ADDRESS COMMENT PHONE ACCTBAL Database Buffer NAME ADDRESS COMMENT PHONE ACCTBAL 18

  19. SELECT Name, Address, Acctbal FROM Customer P3 P5 P6 NAME ADDRESS COMMENT PHONE ACCTBAL Database Buffer NAME ADDRESS COMMENT PHONE ACCTBAL 19

  20. SELECT Name, Address, Acctbal FROM Customer P3 P5 P6 NAME ADDRESS COMMENT PHONE ACCTBAL Database Buffer NAME ADDRESS COMMENT PHONE ACCTBAL 20

  21. Classification of VP algorithms 21

  22. Whole workload Starting Point Starting Point Starting Point Attribute subset Query subset Brute force Search Strategy Search Strategy Search Strategy Top-down Bottom-up No pruning Candidate Pruning Candidate Pruning Threshold-based 22

  23. AutoPart HillClimb HYRISE Navathe O2P Trojan Brute Force Whole workload Starting Point Starting Point Starting Point Attribute subset Query subset Brute force Search Strategy Search Strategy Search Strategy Top-down Bottom-up No pruning Candidate Pruning Candidate Pruning Threshold-based 23

  24. HillClimb Example

  25. Iteration 1 CUSTKEY NAME ADDRESS NATIONKEY PHONE ACCTBAL MKTSEGMENT COMMENT Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22

  26. Iteration 2 CUSTKEY NAME ADDRESS NATIONKEY PHONE ACCTBAL MKTSEGMENT COMMENT Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22

  27. Iteration 3 CUSTKEY NAME ADDRESS NATIONKEY PHONE ACCTBAL MKTSEGMENT COMMENT Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22

  28. Settings for VP Algos FILE Granularity Granularity Granularity DATA PAGE DATABASE BLOCK HARD DISK Hardware Hardware MAIN MEMORY OFFLINE Workload Workload ONLINE NONE Replication Replication Replication FULL PARTIAL CUSTOM System System System COST MODEL OPEN SOURCE 28

  29. Settings for VP Algos FILE Granularity Granularity Granularity DATA PAGE DATABASE BLOCK HARD DISK Hardware Hardware MAIN MEMORY OFFLINE Workload Workload ONLINE NONE Replication Replication Replication FULL PARTIAL CUSTOM System System System COST MODEL OPEN SOURCE 29

  30. Settings for VP Algos AutoPart HillClimb HYRISE Navathe O2P Trojan FILE Granularity Granularity Granularity DATA PAGE DATABASE BLOCK HARD DISK Hardware Hardware MAIN MEMORY OFFLINE Workload Workload ONLINE NONE Replication Replication Replication FULL PARTIAL CUSTOM System System System COST MODEL OPEN SOURCE 30

  31. Experimental Results 31

  32. TPC-H Query Runtimes 2,058 600 506 481 Estd. workload runtime (s) 450 393 387 381 381 381 381 300 150 0 Row AutoPart HillClimb HYRISE Navathe O2P Trojan BruteForce Column 32

  33. Unnecessary Data Read 100% 83.81% Unnecessary data read (%) 75% 50% 25.37% 21.34% 25% 0.91% 0.80% 0.80% 0.80% 0.00% 0% 0% AutoPart HillClimb HYRISE Navathe O2P Trojan BruteForce Column Row 33

  34. Average #Tuple Reconstruction Joins 3 Avg. tuple reconstruction joins 2 1 0 AutoPart HillClimb HYRISE Navathe O2P Trojan BruteForce Column Row 34

  35. Try another Benchmark Layout TPC-H SSB AutoPart 3.71% 5.29% HillClimb 3.71% 5.29% HYRISE 1.58% 5.27% Navathe -21.47% 1.64% O2P -27.74% 1.64% Trojan 3.71% 0.05% BruteForce 3.71% 5.29% Improvement over Column-layout [O’Neil et al: Star Schema Benchmark] 35

  36. Try another Cost Model Layout HDD MM AutoPart 3.71% 0.00% HillClimb 3.71% 0.00% HYRISE 1.58% 0.00% Navathe -21.47% -15.07% O2P -27.74% -15.53% Trojan 3.71% 0.00% BruteForce 3.71% 0.00% Improvement over Column-layout [Grund et al: HYRISE, VLDB’10] 36

  37. Try it in DBMS-X Compression Row Column HillClimb Default (LZO or Delta) 1652 s 377 s 450 s Dictionary 1265 s 511 s 532 s Actual Workload Runtimes 37

  38. Buffer Size is Crucial 150% Normalized estd. costs (%) 125% 100% 75% 50% HillClimb Navathe 25% Materialized views Column 0% 0.01 0.1 1 10 100 1,000 10,000 Buffer Size (MB, log scale) 38

  39. Summary • Buffer size is crucial • Column layout is good enough • HillClimb is the best VP algorithm

Recommend


More recommend