Ph.D. Matt Might, The Illustrated Guide to a Ph.D.: http://matt.might.net/articles/phd-school-in-pictures
ONE SIZE DOES NOT FIT ALL
Streaming OLAP OLTP Archiving Log-processing Web-search Scan-oriented 4
Streaming OLAP OLTP Archiving Log-processing Web-search Scan-oriented 4
Streaming OLAP OLTP Archiving Log-processing Web-search Scan-oriented 4
Streaming OLAP OLTP Archiving Log-processing Web-search Scan-oriented 4
Streaming OLAP OLTP Archiving Log-processing Web-search Scan-oriented 4
OLTP 5
OLAP 5
Archive 5
Streaming 5
Log-processing 5
Streaming OLAP OLTP Archiving Log-processing Web-search Scan-oriented 6
6
Indexes Column Row Raw files Row+Column 7
Storage Views 1 abc 56 887.9 2 fdg 89 445.35 3 poe 67 234.67 4 lkj 12 385.92 5 yui 17 612.13 6 omg 90 148.9 8
Storage Views 1 abc 56 887.9 Log 2 fdg 89 445.35 3 poe 67 234.67 4 lkj 12 385.92 5 yui 17 612.13 6 omg 90 148.9 8
Storage Views 1 abc 56 887.9 Log 2 fdg 89 445.35 3 poe 67 234.67 Row 4 lkj 12 385.92 5 yui 17 612.13 6 omg 90 148.9 8
Storage Views 1 abc 56 887.9 Log 2 fdg 89 445.35 3 poe 67 234.67 Row 4 lkj 12 385.92 Column 5 yui 17 612.13 6 omg 90 148.9 8
Storage Views 1 abc 56 887.9 Log 2 fdg 89 445.35 3 poe 67 234.67 Row 4 lkj 12 385.92 Column 5 yui 17 612.13 Column grouped 6 omg 90 148.9 8
Storage Views 1 abc 56 887.9 Index Log 2 fdg 89 445.35 3 poe 67 234.67 Row 4 lkj 12 385.92 Column 5 yui 17 612.13 Column grouped 6 omg 90 148.9 8
Storage Views 1 abc 56 887.9 Index Log 2 fdg 89 445.35 PAX 3 poe 67 234.67 Row 4 lkj 12 385.92 Column 5 yui 17 612.13 Column grouped 6 omg 90 148.9 8
Example: Flight Tickets tickets.customer_id " ! ( ( )) customer.* a 1 =x 1.. a n =x n customer.id Result Log SV 9
Example: Flight Tickets tickets.customer_id " ! ( ( )) customer.* a 1 =x 1.. a n =x n customer.id Result Log SV )) ! ( tickets ! tickets.customer_id $ ( ( )) ( bag=tickets " # Col SV bag,key customer.* a 1 =x 1.. a n =x n recent customer.id Result Log SV # ( ! ( " Row SV ) ) r e c e n t b a g , k e y b customers a g = c u s t o m e r s 9
Example: Flight Tickets tickets.customer_id " ! ( ( )) customer.* a 1 =x 1.. a n =x n customer.id Result Log SV )) ! ( tickets ! tickets.customer_id $ ( ( )) ( bag=tickets " # Col SV bag,key customer.* a 1 =x 1.. a n =x n recent customer.id Result Log SV # ( ! ( " Row SV ) ) r e c e n t b a g , k e y b customers a g = c u s t o m e r s Cold s Index y a d Col SV 7 - SV ! tickets.customer_id $ w ( ( )) o n ! < customer.* a 1 =x 1.. a n =x n e $ price,rid ) m customer.id ) i t ! ( bag=tickets ( " # # Col SV ! bag,key Result ( recent count(*)>=5 customer_id " time>=now-7days ) Log SV $ id,rid # ( ! ( " Index )) Frequent Fliers Row SV recent tickets.customer_id bag,key bag=customers SV 9 (Adaptive Partial Index) customer.id
Example: Flight Tickets tickets.customer_id " ! ( ( )) customer.* Primary Log Store a 1 =x 1.. a n =x n customer.id Result Log SV )) ! ( tickets ! tickets.customer_id $ ( ( )) ( bag=tickets " # Primary Col SV bag,key customer.* a 1 =x 1.. a n =x n Log Store recent customer.id Result Log SV # ( ! ( " Row SV ) ) r e c e n t b a g , k e y b customers a g = c u s t o m e r s Cold s Index y a d Col SV 7 - SV ! tickets.customer_id $ w ( ( )) o n ! < customer.* a 1 =x 1.. a n =x n e $ price,rid ) m customer.id ) Primary i t ! ( Log Store bag=tickets ( " # # Col SV ! bag,key Result ( recent count(*)>=5 customer_id " time>=now-7days ) Log SV $ id,rid # ( ! ( " Index )) Frequent Fliers Row SV recent tickets.customer_id bag,key bag=customers SV 9 (Adaptive Partial Index) customer.id
WTF! Where’s The Food! 10
Rodent Store
What to store? Data Files copy 1 copy 2 copy 3
How to store? + a b ? Data Files
Where to store? ? Data Files
Data Management System Data View Logical DSL DSL WWHow! Language WWHow! Layer Physical Storage Interface Data View Physical
Example Use-cases • WWHow! File System • WWHow! RAID • WWHow! Relational DBMS • WWHow! Cloud
Store my conferences talks (PDFs 2x and PPTs 1x) using RSA compression on University server STORE ‘/Users/Bob/Conferences/Talks/*.*’ WHAT *.(pdf | ppt), *.pdf WHERE vise4 HOW encryption(rsa) FOR *;
I want my conference talks to be highly available STORE ‘/Users/Bob/Conferences/Talks/*.*’ WHAT *.(pdf | ppt), *.pdf HOW encryption(rsa) FOR * PREFERENCE Availability=‘high’;
I want my conference talks to be highly available STORE ‘/Users/Bob/Conferences/Talks/*.*’ WHAT *.(pdf | ppt), *.pdf HOW encryption(rsa) FOR * PREFERENCE Availability=‘high’; job for the WWhow! data storage optimizer
OctopusDB • Cool Vision • Tough to realize 19
C-Store
? 21
Trojan Columns Application User Database Query Processor Relations UDF Storage Layer Physical Representation File 1 File 2 File 3 .... File n 23
Trojan Columns Relation Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household Physical Table Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household 24
Trojan Columns Relation Customer Tuple name phone market_segment Iterator write-UDF (a) Convert row (c) Get next smith 2134 automobile tuples into blobs row data john 3425 household kim 6756 furniture Data Data joe 9878 building Parser Accessor mark 4312 building (b) Store blob data steve 2435 automobile jim 5766 household ian 8789 household Physical Table Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household 24
Trojan Columns Relation (g)End of table Customer Tuple name phone market_segment Iterator read-UDF (f) Fetch (e) Reconstruct smith 2134 automobile blob data row tuples john 3425 household kim 6756 furniture Data Data joe 9878 building Parser Accessor mark 4312 building (d) Parse blob data steve 2435 automobile jim 5766 household ian 8789 household Physical Table Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household 25
Example: TPC-H Query 6 Result γ agg (extendedprice * discount) σ shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN 0.05 AND 0.07 AND quantity < 24 π quantity, discount extendedprice, shipdate S CAN lineitem 26
26 Example: TPC-H Query 6 ‘1994-01-01’ AND ‘1995-01-01’ scanUDF agg (extendedprice * discount) AND discount BETWEEN extendedprice, shipdate shipdate BETWEEN shipd AND quantity < 24 quantity, discount scanUDF 0.05 AND 0.07 lineitem Result S CAN σ π γ ‘1994-01-01’ AND ‘1995-01-01’ agg (extendedprice * discount) AND discount BETWEEN extendedprice, shipdate shipdate BETWEEN AND quantity < 24 quantity, discount 0.05 AND 0.07 lineitem Result S CAN σ π γ
Example: TPC-H Query 6 Result Result Result γ γ γ agg (extendedprice * discount) agg (extendedprice * discount) agg (extendedprice * discount) selectUDF σ σ σ shipdate BETWEEN shipdate BETWEEN shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ ‘1994-01-01’ AND ‘1995-01-01’ ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN AND discount BETWEEN AND discount BETWEEN 0.05 AND 0.07 0.05 AND 0.07 0.05 AND 0.07 AND quantity < 24 AND quantity < 24 AND quantity < 24 π π π quantity, discount quantity, discount quantity, discount extendedprice, shipdate extendedprice, shipdate extendedprice, shipdate scanUDF S CAN S CAN S CAN lineitem lineitem lineitem shipd scanUDF selectUDF σ σ σ 26 te BETWEEN
Benchmark Results * 30 Standard Row Trojan Columns Query Time (sec) 20 10 0 Q1 Q2 Q3 Q4 Q5 Q6 Q7 71.74058 72.41696 27 * Mike Stonebraker et. al. C-Store: A Column Oriented DBMS. VLDB 2005
Recommend
More recommend