ph d
play

Ph.D. Matt Might, The Illustrated Guide to a Ph.D.: - PowerPoint PPT Presentation

Ph.D. Matt Might, The Illustrated Guide to a Ph.D.: http://matt.might.net/articles/phd-school-in-pictures ONE SIZE DOES NOT FIT ALL Streaming OLAP OLTP Archiving Log-processing Web-search Scan-oriented 4 Streaming OLAP OLTP Archiving


  1. Ph.D. Matt Might, The Illustrated Guide to a Ph.D.: http://matt.might.net/articles/phd-school-in-pictures

  2. ONE SIZE DOES NOT FIT ALL

  3. Streaming OLAP OLTP Archiving Log-processing Web-search Scan-oriented 4

  4. Streaming OLAP OLTP Archiving Log-processing Web-search Scan-oriented 4

  5. Streaming OLAP OLTP Archiving Log-processing Web-search Scan-oriented 4

  6. Streaming OLAP OLTP Archiving Log-processing Web-search Scan-oriented 4

  7. Streaming OLAP OLTP Archiving Log-processing Web-search Scan-oriented 4

  8. OLTP 5

  9. OLAP 5

  10. Archive 5

  11. Streaming 5

  12. Log-processing 5

  13. Streaming OLAP OLTP Archiving Log-processing Web-search Scan-oriented 6

  14. 6

  15. Indexes Column Row Raw files Row+Column 7

  16. Storage Views 1 abc 56 887.9 2 fdg 89 445.35 3 poe 67 234.67 4 lkj 12 385.92 5 yui 17 612.13 6 omg 90 148.9 8

  17. Storage Views 1 abc 56 887.9 Log 2 fdg 89 445.35 3 poe 67 234.67 4 lkj 12 385.92 5 yui 17 612.13 6 omg 90 148.9 8

  18. Storage Views 1 abc 56 887.9 Log 2 fdg 89 445.35 3 poe 67 234.67 Row 4 lkj 12 385.92 5 yui 17 612.13 6 omg 90 148.9 8

  19. Storage Views 1 abc 56 887.9 Log 2 fdg 89 445.35 3 poe 67 234.67 Row 4 lkj 12 385.92 Column 5 yui 17 612.13 6 omg 90 148.9 8

  20. Storage Views 1 abc 56 887.9 Log 2 fdg 89 445.35 3 poe 67 234.67 Row 4 lkj 12 385.92 Column 5 yui 17 612.13 Column grouped 6 omg 90 148.9 8

  21. Storage Views 1 abc 56 887.9 Index Log 2 fdg 89 445.35 3 poe 67 234.67 Row 4 lkj 12 385.92 Column 5 yui 17 612.13 Column grouped 6 omg 90 148.9 8

  22. Storage Views 1 abc 56 887.9 Index Log 2 fdg 89 445.35 PAX 3 poe 67 234.67 Row 4 lkj 12 385.92 Column 5 yui 17 612.13 Column grouped 6 omg 90 148.9 8

  23. Example: Flight Tickets tickets.customer_id " ! ( ( )) customer.* a 1 =x 1.. a n =x n customer.id Result Log SV 9

  24. Example: Flight Tickets tickets.customer_id " ! ( ( )) customer.* a 1 =x 1.. a n =x n customer.id Result Log SV )) ! ( tickets ! tickets.customer_id $ ( ( )) ( bag=tickets " # Col SV bag,key customer.* a 1 =x 1.. a n =x n recent customer.id Result Log SV # ( ! ( " Row SV ) ) r e c e n t b a g , k e y b customers a g = c u s t o m e r s 9

  25. Example: Flight Tickets tickets.customer_id " ! ( ( )) customer.* a 1 =x 1.. a n =x n customer.id Result Log SV )) ! ( tickets ! tickets.customer_id $ ( ( )) ( bag=tickets " # Col SV bag,key customer.* a 1 =x 1.. a n =x n recent customer.id Result Log SV # ( ! ( " Row SV ) ) r e c e n t b a g , k e y b customers a g = c u s t o m e r s Cold s Index y a d Col SV 7 - SV ! tickets.customer_id $ w ( ( )) o n ! < customer.* a 1 =x 1.. a n =x n e $ price,rid ) m customer.id ) i t ! ( bag=tickets ( " # # Col SV ! bag,key Result ( recent count(*)>=5 customer_id " time>=now-7days ) Log SV $ id,rid # ( ! ( " Index )) Frequent Fliers Row SV recent tickets.customer_id bag,key bag=customers SV 9 (Adaptive Partial Index) customer.id

  26. Example: Flight Tickets tickets.customer_id " ! ( ( )) customer.* Primary Log Store a 1 =x 1.. a n =x n customer.id Result Log SV )) ! ( tickets ! tickets.customer_id $ ( ( )) ( bag=tickets " # Primary Col SV bag,key customer.* a 1 =x 1.. a n =x n Log Store recent customer.id Result Log SV # ( ! ( " Row SV ) ) r e c e n t b a g , k e y b customers a g = c u s t o m e r s Cold s Index y a d Col SV 7 - SV ! tickets.customer_id $ w ( ( )) o n ! < customer.* a 1 =x 1.. a n =x n e $ price,rid ) m customer.id ) Primary i t ! ( Log Store bag=tickets ( " # # Col SV ! bag,key Result ( recent count(*)>=5 customer_id " time>=now-7days ) Log SV $ id,rid # ( ! ( " Index )) Frequent Fliers Row SV recent tickets.customer_id bag,key bag=customers SV 9 (Adaptive Partial Index) customer.id

  27. WTF! Where’s The Food! 10

  28. Rodent Store

  29. What to store? Data Files copy 1 copy 2 copy 3

  30. How to store? + a b ? Data Files

  31. Where to store? ? Data Files

  32. Data Management System Data View Logical DSL DSL WWHow! Language WWHow! Layer Physical Storage Interface Data View Physical

  33. Example Use-cases • WWHow! File System • WWHow! RAID • WWHow! Relational DBMS • WWHow! Cloud

  34. Store my conferences talks (PDFs 2x and PPTs 1x) using RSA compression on University server STORE ‘/Users/Bob/Conferences/Talks/*.*’ WHAT *.(pdf | ppt), *.pdf WHERE vise4 HOW encryption(rsa) FOR *;

  35. I want my conference talks to be highly available STORE ‘/Users/Bob/Conferences/Talks/*.*’ WHAT *.(pdf | ppt), *.pdf HOW encryption(rsa) FOR * PREFERENCE Availability=‘high’;

  36. I want my conference talks to be highly available STORE ‘/Users/Bob/Conferences/Talks/*.*’ WHAT *.(pdf | ppt), *.pdf HOW encryption(rsa) FOR * PREFERENCE Availability=‘high’; job for the 
 WWhow! data storage optimizer

  37. OctopusDB • Cool Vision • Tough to realize 19

  38. C-Store

  39. ? 21

  40. Trojan Columns Application User Database Query Processor Relations UDF Storage Layer Physical Representation File 1 File 2 File 3 .... File n 23

  41. Trojan Columns Relation Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household Physical Table Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household 24

  42. Trojan Columns Relation Customer Tuple name phone market_segment Iterator write-UDF (a) Convert row (c) Get next smith 2134 automobile tuples into blobs row data john 3425 household kim 6756 furniture Data Data joe 9878 building Parser Accessor mark 4312 building (b) Store blob data steve 2435 automobile jim 5766 household ian 8789 household Physical Table Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household 24

  43. Trojan Columns Relation (g)End of table Customer Tuple name phone market_segment Iterator read-UDF (f) Fetch (e) Reconstruct smith 2134 automobile blob data row tuples john 3425 household kim 6756 furniture Data Data joe 9878 building Parser Accessor mark 4312 building (d) Parse blob data steve 2435 automobile jim 5766 household ian 8789 household Physical Table Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household 25

  44. Example: TPC-H Query 6 Result γ agg (extendedprice * discount) σ shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24 π quantity, discount extendedprice, shipdate S CAN lineitem 26

  45. 26 Example: TPC-H Query 6 ‘1994-01-01’ AND ‘1995-01-01’ scanUDF agg (extendedprice * discount) AND discount BETWEEN extendedprice, shipdate shipdate BETWEEN shipd AND quantity < 24 quantity, discount scanUDF 0.05 AND 0.07 lineitem Result S CAN σ π γ ‘1994-01-01’ AND ‘1995-01-01’ agg (extendedprice * discount) AND discount BETWEEN extendedprice, shipdate shipdate BETWEEN AND quantity < 24 quantity, discount 0.05 AND 0.07 lineitem Result S CAN σ π γ

  46. Example: TPC-H Query 6 Result Result Result γ γ γ agg (extendedprice * discount) agg (extendedprice * discount) agg (extendedprice * discount) selectUDF σ σ σ shipdate BETWEEN shipdate BETWEEN shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ ‘1994-01-01’ AND ‘1995-01-01’ ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN AND discount BETWEEN AND discount BETWEEN 0.05 AND 0.07 0.05 AND 0.07 0.05 AND 0.07 AND quantity < 24 AND quantity < 24 AND quantity < 24 π π π quantity, discount quantity, discount quantity, discount extendedprice, shipdate extendedprice, shipdate extendedprice, shipdate scanUDF S CAN S CAN S CAN lineitem lineitem lineitem shipd scanUDF selectUDF σ σ σ 26 te BETWEEN

  47. Benchmark Results * 30 Standard Row Trojan Columns Query Time (sec) 20 10 0 Q1 Q2 Q3 Q4 Q5 Q6 Q7 71.74058 72.41696 27 * Mike Stonebraker et. al. C-Store: A Column Oriented DBMS. VLDB 2005

Recommend


More recommend