How does it work? • Exploit UDFs provided by existing systems Query Processor • Inject pieces of code Relations • Hack layouts into the UDFs UDF Storage Layer • UDF as mapping between logical and physical view of data Physical Representation File 1 File 2 File 3 .... File n 27
How does it work? • Exploit UDFs provided by existing systems Query Processor • Inject pieces of code Relations • Hack layouts into the UDFs UDF Storage Layer • UDF as mapping between logical and physical view of data Physical Representation • Novel use of UDFs File 1 File 2 File 3 .... File n 27
Use Case 1 : OLAP in Row-stores 28
OLTP OLAP 29
OLTP OLAP 29
OLTP OLAP? • Can we push the limits of row stores? 29
Trojan Columns Relation Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household 30
Trojan Columns Relation Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household Physical Table Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household 30
Trojan Columns Relation Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household Physical Table Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household 30
Trojan Columns Relation Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household Physical Table Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household 30
Trojan Columns Relation Customer name phone market_segment smith 2134 automobile john 3425 household kim 6756 furniture joe 9878 building mark 4312 building steve 2435 automobile jim 5766 household ian 8789 household Physical Table Customer_trojan segment_ID attribute_ID blob_data 1 name smith, john, kim, joe 1 phone 2134, 3425, 6756, 9878 1 market_segment automobile, household, furniture, building 2 name mark, steve, jim, ian 2 phone 4312, 2435, 5766, 8789 2 market_segment building, automobile, household, household 30
Example: TPC-H Query 6 Result Result Result γ γ γ agg (extendedprice * discount) agg (extendedprice * discount) agg (extendedprice * discount) selectUDF σ σ σ shipdate BETWEEN shipdate BETWEEN shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ ‘1994-01-01’ AND ‘1995-01-01’ ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN AND discount BETWEEN AND discount BETWEEN 0.05 AND 0.07 0.05 AND 0.07 0.05 AND 0.07 AND quantity < 24 AND quantity < 24 AND quantity < 24 π π π quantity, discount quantity, discount quantity, discount extendedprice, shipdate extendedprice, shipdate extendedprice, shipdate scanUDF S CAN S CAN S CAN lineitem lineitem lineitem shipd scanUDF selectUDF σ σ 31 σ ate BETWEEN
Example: TPC-H Query 6 Result Result Result γ γ γ agg (extendedprice * discount) agg (extendedprice * discount) agg (extendedprice * discount) selectUDF σ σ σ shipdate BETWEEN shipdate BETWEEN shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ ‘1994-01-01’ AND ‘1995-01-01’ ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN AND discount BETWEEN AND discount BETWEEN 0.05 AND 0.07 0.05 AND 0.07 0.05 AND 0.07 AND quantity < 24 AND quantity < 24 AND quantity < 24 π π π quantity, discount quantity, discount quantity, discount extendedprice, shipdate extendedprice, shipdate extendedprice, shipdate scanUDF S CAN S CAN S CAN lineitem lineitem lineitem shipd scanUDF selectUDF σ σ 31 σ ate BETWEEN
Example: TPC-H Query 6 Result Result Result γ γ γ agg (extendedprice * discount) agg (extendedprice * discount) agg (extendedprice * discount) selectUDF σ σ σ shipdate BETWEEN shipdate BETWEEN shipdate BETWEEN ‘1994-01-01’ AND ‘1995-01-01’ ‘1994-01-01’ AND ‘1995-01-01’ ‘1994-01-01’ AND ‘1995-01-01’ AND discount BETWEEN AND discount BETWEEN AND discount BETWEEN 0.05 AND 0.07 0.05 AND 0.07 0.05 AND 0.07 AND quantity < 24 AND quantity < 24 AND quantity < 24 π π π quantity, discount quantity, discount quantity, discount extendedprice, shipdate extendedprice, shipdate extendedprice, shipdate scanUDF S CAN S CAN S CAN lineitem lineitem lineitem shipd scanUDF selectUDF σ σ 31 σ ate BETWEEN
C-Store Benchmark * 30 Standard Row Trojan Columns Query Time (sec) 20 10 0 Q1 Q2 Q3 Q4 Q5 Q6 Q7 71.74058 72.41696 32 * Mike Stonebraker et. al. C-Store: A Column Oriented DBMS. VLDB 2005
TPC-H Benchmark * 100 Standard Row Trojan Columns 75 Query Time (sec) 50 25 0 Q1 Q6 Q12 Q14 33 * tpch.org/tpch
TPC-H Benchmark * 900 Standard Row Trojan Columns 675 Query Time (sec) 450 225 0 Q3 Q5 Q10 Q19 34 * tpch.org/tpch
Micro-Benchmark 16 2.13 2.13 2.11 2.06 1.55 0.47 0.06 # referenced attributes (r) 15 4.64 4.62 4.55 4.27 2.57 0.55 0.06 13 5.00 5.00 4.94 4.61 2.70 0.57 0.06 11 5.79 5.82 5.75 5.24 2.87 0.56 0.06 9 6.39 6.38 6.25 5.79 3.11 0.54 0.06 7 7.00 6.96 6.80 6.23 3.17 0.56 0.06 5 10.96 10.94 10.55 9.27 3.75 0.57 0.06 3 12.86 13.57 13.22 11.03 4.16 0.56 0.06 1 17.43 17.61 16.61 13.57 4.39 0.57 0.06 1E-06 1E-05 1E-04 1E-03 1E-02 1E-01 1E+00 selectivity (fraction of tuples accessed) 35
versus Column Stores 100 Standard Row Trojan Columns DBMS-Y 75 Query Time (sec) 50 25 0 Q1 Q6 Q12 Q14 36
Trojan Columns Advantages • Column + Row storage • Much better performance • Closed source system 37
Use Case 2: Big Data Analytics 38
Web-log Processing • Scan Tasks User visits from different countries 39
Web-log Processing • Scan Tasks • Selection Tasks User visits with duration greater than 10s 39
Web-log Processing • Scan Tasks • Selection Tasks • Projection Tasks URL and duration of each user visit 39
Web-log Processing • Scan Tasks • Selection Tasks • Projection Tasks • Join Tasks Average PageRank visited by each user IP 39
Web-log Processing T Data Load Phase T 1 L PPartblock T 6 Replicate Replicate Replicate Replicate Replicate Replicate H1 Fetch Fetch Fetch H2 H3 Fetch Fetch Fetch H4 Store Store Store . . . Store Store Store . . . Scan Scan Scan PPartsplit PPartsplit PPartsplit T 1 T 5 T 2 T 4 T 3 T 6 M1 Union M2 M3 M4 RecReaditemize RecReaditemize MMapmap MMapmap PPartmem Map Phase LPartsh LPartsh LPartsh LPartsh Sortcmp Sortcmp Sortcmp Sortcmp SortGrpgrp SortGrpgrp SortGrpgrp SortGrpgrp MMapcombine MMapcombine MMapcombine . . . MMapcombine . . . Store Store Store Mergecmp SortGrpgrp • Scan Tasks MMapcombine Store Store • Selection Tasks PPartsh PPartsh Shu ffl e Phase Fetch Fetch Fetch Fetch • Projection Tasks Buffer Buffer Buffer Buffer Store Store Merge • Join Tasks Store Mergecmp . . . Reduce Phase SortGrpgrp MMapreduce 39 R1 Store R2
Trojan Index DataSet ... ... SData T H F Trojan Index Indexed Split i • Each HDFS block sorted • Each block contains an index 40
Trojan Index HDFS Blocks DataSet ... ... SData T Trojan Index H F Indexed Split i DataSet ... ... SData T Trojan Index H F Indexed Split i DataSet ... ... SData T Trojan Index H F Indexed Split i DataSet ... ... SData T Trojan Index H F Indexed Split i • Each HDFS block sorted • Each block contains an index • Index access in UDF 40
Trojan Index Creation T Data Load Phase T 1 L PPartblock T 6 Replicate Replicate Replicate Replicate Replicate Replicate H1 Fetch Fetch Fetch H2 H3 Fetch Fetch Fetch H4 Store Store Store . . . Store Store Store . . . Scan Scan Scan PPartsplit PPartsplit PPartsplit T 1 T 5 T 2 T 4 T 3 T 6 map (key k , value v ) �� M1 Union M2 M3 M4 RecReaditemize RecReaditemize [( getSplitID() ⌅ prj ai ( k ⌅ v ) , k ⌅ v )] MMapmap MMapmap PPartmem (key ik vset ivs ) �� Map Phase LPartsh LPartsh LPartsh LPartsh cmp (key k 1 , key k 2) �� compare ( k 1 . a , k 2 . a ) Sortcmp Sortcmp Sortcmp Sortcmp SortGrpgrp SortGrpgrp SortGrpgrp SortGrpgrp MMapcombine MMapcombine MMapcombine . . . grp (key k 1 , key k 2) �� compare ( k 1 . splitID , k 2 . splitID ) MMapcombine . . . Store Store Store Mergecmp SortGrpgrp MMapcombine sh (key k , value v , int numPartitions ) �� k . splitID % numPartitions Store Store PPartsh PPartsh Shu ffl e Phase Fetch Fetch Fetch Fetch Buffer Buffer Buffer Buffer Store Store Merge Store ⌅ ⌅ Mergecmp . . . Reduce Phase reduce (key ik , vset ivs ) �� SortGrpgrp MMapreduce [( ivs ⌅ indexBuilder ai ( ivs ))] R1 Store R2 41 T � T � 1 2
Trojan Index Access T Data Load Phase T 1 L PPartblock T 6 Replicate Replicate Replicate Replicate Replicate Replicate H1 Fetch Fetch Fetch H2 H3 Fetch Fetch Fetch H4 Store Store Store . . . Store Store Store . . . Scan Scan Scan PPartsplit PPartsplit PPartsplit T 1 T 5 T 2 T 4 T 3 T 6 Algorithm 1 : Trojan Index / Trojan Join split UDF M1 Union M2 M3 M4 Input : JobConf job, Int numSplits Output : logical data splits RecReaditemize RecReaditemize 1 FileSplit [] splits; MMapmap MMapmap 2 File [] files = GetFiles( job ) ; PPartmem 3 foreach file in files do Map Phase Path path = file. getPath() ; 4 LPartsh LPartsh LPartsh LPartsh 5 InputStream in = GetInputStream( path ) ; Sortcmp Sortcmp Sortcmp Sortcmp 6 Long o ff set = file. getLength() ; 7 while o ff set > 0 do SortGrpgrp SortGrpgrp SortGrpgrp SortGrpgrp in. seek( o ff set- FOOTER SIZE ) ; 8 MMapcombine MMapcombine MMapcombine . . . MMapcombine . . . 9 Footer footer = ReadFooter( in ) ; 10 Long splitSize = footer. getSplitSize() ; Store Store Store 11 o ff set - = (splitSize + FOOTER SIZE ); Mergecmp BlockLocations blocks = GetBlockLocations( path,o ff set ) ; 12 13 FileSplit newSplit = CreateSplit( path,o ff set,splitSize,blocks ) ; SortGrpgrp 14 splits. add( newSplit ) ; MMapcombine 15 end Store Store 16 end 17 return splits; PPartsh PPartsh Shu ffl e Phase Fetch Fetch Fetch Fetch Buffer Buffer Buffer Buffer Store Store Merge Store Mergecmp . . . Reduce Phase SortGrpgrp MMapreduce R1 Store R2 42 T � T � 1 2
Trojan Index Access T Data Load Phase T 1 L PPartblock T 6 Replicate Replicate Replicate Replicate Replicate Replicate H1 Fetch Fetch Fetch H2 H3 Fetch Fetch Fetch H4 Store Store Store . . . Store Store Store . . . Scan Scan Scan PPartsplit PPartsplit PPartsplit T 1 T 5 T 2 T 4 T 3 T 6 Algorithm 2 : Trojan Index itemize .initialize UDF M1 Union M2 M3 M4 Input : FileSplit split, JobConf job RecReaditemize RecReaditemize 1 Global FileSplit split = split; 2 Key lowKey = job. getLowKey() ; MMapmap MMapmap 3 Global Key highKey = job. getHighKey() ; PPartmem 4 Int splitStart = split. getStart() ; Map Phase 5 Global Int splitEnd = split. getEnd() ; LPartsh LPartsh LPartsh LPartsh 6 Header h = ReadHeader( split ) ; Sortcmp Sortcmp Sortcmp Sortcmp 7 Overlap type = h. getOverlapType( lowKey,highKey ) ; Global Int o ff set; 8 SortGrpgrp SortGrpgrp SortGrpgrp SortGrpgrp 9 if type == LEFT CONTAINED or type == FULL CONTAINED or type == MMapcombine MMapcombine MMapcombine . . . MMapcombine . . . POINT CONTAINED then 10 Index i = ReadIndex( split ) ; Store Store Store 11 o ff set = splitStart + i. lookup( lowKey ) ; Mergecmp 12 else if type == RIGHT CONTAINED or type == SPAN then SortGrpgrp 13 o ff set = splitStart; 14 else MMapcombine 15 // NOT CONTAINED, skip the split; Store Store 16 o ff set = splitEnd; 17 end PPartsh PPartsh 18 Seek( o ff set ) ; Shu ffl e Phase Fetch Fetch Fetch Fetch Buffer Buffer Buffer Buffer Store Store Merge Store Mergecmp . . . Reduce Phase SortGrpgrp MMapreduce R1 Store R2 42 T � T � 1 2
Trojan Index Access T Data Load Phase T 1 L PPartblock T 6 Replicate Replicate Replicate Replicate Replicate Replicate H1 Fetch Fetch Fetch H2 H3 Fetch Fetch Fetch H4 Store Store Store . . . Store Store Store . . . Scan Scan Scan PPartsplit PPartsplit PPartsplit T 1 T 5 T 2 T 4 T 3 T 6 Algorithm 3 : Trojan Index itemize .next UDF M1 Union M2 M3 M4 Input : KeyType key, ValueType value Output : has more records RecReaditemize RecReaditemize 1 if o ff set < splitEnd then MMapmap MMapmap 2 Record nextRecord = ReadNextRecord( split ) ; PPartmem 3 o ff set += nextRecord.size(); Map Phase 4 if nextRecord.key < highKey then LPartsh LPartsh LPartsh LPartsh 5 SetKeyValue( key, value, nextRecord ) ; Sortcmp Sortcmp Sortcmp Sortcmp 6 return true; SortGrpgrp SortGrpgrp SortGrpgrp SortGrpgrp 7 end 8 end MMapcombine MMapcombine MMapcombine . . . MMapcombine . . . 9 return false; Store Store Store Mergecmp SortGrpgrp MMapcombine Store Store PPartsh PPartsh Shu ffl e Phase Fetch Fetch Fetch Fetch Buffer Buffer Buffer Buffer Store Store Merge Store Mergecmp . . . Reduce Phase SortGrpgrp MMapreduce R1 Store R2 42 T � T � 1 2 size
Selection Analytical Task * Hadoop Hadoop++(256MB) 140 HadoopDB Hadoop++(1GB) HadoopDB Chunks 120 runtime [seconds] 100 80 60 40 20 0 10 nodes 50 nodes 100 nodes 43 * Pavlo et. al. A Comparison of Approaches to large-Scale Data Analysis. SIGMOD 2009
Trojan Index Advantages • Scan + Index data accesses • Parallel index lookups • Non-invasive system changes • Much better performance • Each HDFS block sorted • Each block contains an index • Index access in UDF 44
Trojan Join DataSet ... ... Ht SData T k Hs SData S k Ht SData T k+1 Hs SData S k+1 F co-group j co-group j+1 Co-Partitioned Split i • Each HDFS block co- partitioning over two relations • Join relations are co-located 45
Trojan Join HDFS Blocks DataSet ... ... Ht SData T k Hs SData S k Ht SData T k+1 Hs SData S k+1 F co-group j co-group j+1 Co-Partitioned Split i DataSet ... ... Ht SData T k Hs SData S k Ht SData T k+1 Hs SData S k+1 F co-group j co-group j+1 Co-Partitioned Split i DataSet ... ... Ht SData T k Hs SData S k Ht SData T k+1 Hs SData S k+1 F co-group j co-group j+1 Co-Partitioned Split i DataSet ... ... Ht SData T k Hs SData S k Ht SData T k+1 Hs SData S k+1 F co-group j co-group j+1 Co-Partitioned Split i • Each HDFS block co- partitioning over two relations • Join relations are co-located • Co-partitioned join in UDF physical 45 to co-partitioned co-partition
Join Analytical Task * 2500 Hadoop Hadoop++(256MB) HadoopDB Hadoop++(1GB) 2000 runtime [seconds] 1500 1000 500 0 10 nodes 50 nodes 100 nodes 46 * Pavlo et. al. A Comparison of Approaches to large-Scale Data Analysis. SIGMOD 2009
Trojan Join Advantages • Re- + Co- partitioned join • Parallel join processing • Non-invasive system changes • Much better performance • Each HDFS block co- partitioning over two relations • Join relations are co-located • Co-partitioned join in UDF 47
Trojan Layouts Replica 1 Replica 2 Replica 3 • Each HDFS block in row or column • Each block replica in different layout • Pick right layout in UDF 48
Recommend
More recommend