hardware conscious data processing systems
play

HARDWARE-CONSCIOUS DATA PROCESSING SYSTEMS Holger Pirk - PowerPoint PPT Presentation

HARDWARE-CONSCIOUS DATA PROCESSING SYSTEMS Holger Pirk http://doc.ic.ac.uk/~hlgr Data Processing Performance - A Case Study Data Processing Performance - A Case Study 100 75 50 25 0 April May June July select sum(sales),


  1. …and looks almost the same as multicore-parallelism 1 input := Load ("input") // Single Column: val 2 ids := Range (input) 3 partitionSize := Constant (1024) 4 partitionIDs := Divide (ids, partitionSize) 5 positions := Partition (partitionIDs) 6 inputWPart := Zip (input, partition) 7 partInput := Scatter (inputWPart, positions) 8 pSum := FoldSum (partInput.val, partInput.partition) backend performs 9 totalSum := FoldSum (pSum) simple static analysis 1 input := Load ("input") // Single Column: val 2 ids := Range (input) 3 laneCount := Constant (2) 4 partitionIDs := Modulo (ids, laneCount) 5 positions := Partition (partitionIDs) 6 inputWPart := Zip (input, partition) 7 partInput := Scatter (inputWPart, positions) 8 pSum := FoldSum (partInput.val, partInput.partition) 9 totalSum := FoldSum (pSum) 10

  2. A portable high-performance database kernel

  3. The Voodoo query processing system SQL "Liberated" from MonetDB Similar to code generation example Logical Plan Optimizer Unified Algebra Our Contributions Intermediate Language OpenCL OS/Hardware/…

  4. Let’s extend our example SELECT SUM(l_quantity) FROM lineitem WHERE l_shipdate> 5

  5. MonetDB generates a logical plan… sum SELECT SUM(l_quantity) select FROM lineitem WHERE l_shipdate> 5 lineitem

  6. …we compile the logical plan to Voodoo… tmp1 = Load(.lineitem.l_quantity) sum tmp2 = Load(.lineitem.l_shipdate) tmp3.val = Range(728659,tmp2,0) tmp4.val = Greater(tmp2,.l_shipdate,tmp3,.val) select tmp5.val = Range(0,tmp4,1) tmp6 = Zip(.fold,tmp5,.val,.value,tmp4,.val) tmp7.val = FoldSelect(tmp6,.fold,.value) lineitem tmp8 = Gather(tmp1,tmp7,.val) tmp14.val = Range(0,tmp8,0) tmp15 = Zip(.fold,tmp14,.val,.value,tmp8,.val) tmp16.L1 = FoldSum(tmp15,.fold,.value) Return(tmp16)

  7. …i.e., a dataflow program Const .lineitem.l_shipdate Greater sum Range .lineitem.l_quantity FoldSelect select Gather lineitem Constant FoldSum

  8. Generating "undergraduate" C from Voodoo C Const Const .lineitem.l_shipdate .lineitem.l_shipdate pdate Greater Greater for(size_t i = 0; i < sizeof(l_shipdate); i++) l_shipdate[i] > 5 if( ) Range Range l_quantity[i] result += .lineitem.l_quantity .lineitem.l_quantity tity F FoldSelect FoldSelect Gather Gather Constant Constant FoldSum FoldSum

  9. The “graduate" version .lineitem.l_shipdate Const Greater extern size_t inputSize; Range void fragment1(int* tmp, int* s_date, int* quantity) { for(size_t i = 0; i < grainsize; i++) .lineitem.l_quantity FoldSelect if(s_date[pId * grainsize + i] > 5) out[pId] += quantity[pId * grainsize + i] Gather } void fragment2(int* out, int* tmp) { Range Constant for(size_t i = 0; i < inputSize / grainsize; i++) output[0] += tmp[i] Divide } FoldSum Multicore-partitioned fold control-incompatible folds Constant → hierarchical aggregation FoldSum

  10. Voodoo outperforms MonetDB (TPC-H subset, SF10) ′ 2000 Monet Voodoo 1500 Time in ms 1000 500 0 1 4 5 6 7 8 9 10 11 12 14 15 19 20 TPC - H Query an old acquaintance

  11. Voodoo is competitive with Hyper on CPU (selected TPC-H queries, SF10) 700 HyPeR Voodoo 591 600 500 Time in ms 420 400 279 300 222 191 200 162 151 158 155 154 120 103 96 76 85 76 73 74 100 63 64 56 47 38 45 30 42 18 25 0 1 4 5 6 7 8 9 10 11 12 14 15 19 20 TPC-H Query

  12. Voodoo is competitive with Hyper on CPU (selected TPC-H queries, SF10) Generally Competitive 700 HyPeR Voodoo 591 600 500 Time in ms 420 400 279 300 222 191 200 162 151 158 155 154 120 103 96 76 85 76 73 74 100 63 64 56 47 38 45 30 42 18 25 0 1 4 5 6 7 8 9 10 11 12 14 15 19 20 TPC-H Query

  13. Voodoo is competitive with Hyper on CPU (selected TPC-H queries, SF10) Often faster 700 HyPeR Voodoo 591 600 500 Time in ms 420 400 279 300 222 191 200 162 151 158 155 154 120 103 96 76 85 76 73 74 100 63 64 56 47 38 45 30 42 18 25 0 1 4 5 6 7 8 9 10 11 12 14 15 19 20 TPC-H Query

  14. Voodoo is competitive with Hyper on CPU (selected TPC-H queries, SF10) Sometimes slower 700 HyPeR Voodoo 591 600 500 Time in ms 420 400 279 300 222 191 200 162 151 158 155 154 120 103 96 76 85 76 73 74 100 63 64 56 47 38 45 30 42 18 25 0 1 4 5 6 7 8 9 10 11 12 14 15 19 20 TPC-H Query

  15. Voodoo is competitive with Hyper on CPU (selected TPC-H queries, SF10) Q9 is an interesting case where tuning is important 700 HyPeR Voodoo 591 600 500 Time in ms 420 400 279 300 222 191 200 162 151 158 155 154 120 103 96 76 85 76 73 74 100 63 64 56 47 38 45 30 42 18 25 0 1 4 5 6 7 8 9 10 11 12 14 15 19 20 TPC-H Query

  16. VOODOO AS A TUNING TOOL

  17. Selective Foreign-Key Joins join select sum(part.price) from lineitem, part where discount > $x and lineitem.partkey_fk = part.partkey_pk select items that were sold at a discount greater than x look up their price from the parts table and sum it

  18. Selective Foreign-Key Joins lookup/Foreign-Key Join join join select sum(part.price) from lineitem, part where discount > $x select part and lineitem.partkey_fk = part.partkey_pk lineitem primary key, i.e., unique

  19. Selective Foreign-Key Joins lineitem part price partkey discount 0 .80 .1 2 .93 .01 0 .10 .06 3 .75 .07 2 .5 2 .25 Foreign-Key Index 1 .0 3 .15 1 .03 select sum(price) from lineitem, part where discount > $1 and lineitem.partkey = part.partkey

  20. Selective Foreign-Key Joins read key/pointer resolve pointer/ select read value price >.05 partkey discount 0 .80 .1 ✔ 2 .93 .01 ✘ 0 .10 .06 ✔ 3 .75 .07 ✔ 2 .5 ✔ 2 .25 ✔ 1 .0 ✘ 3 .15 ✔ 1 .03 ✘ select sum(price) from lineitem, part where discount > $1 and lineitem.partkey = part.partkey

  21. Selective Foreign-Key Joins price discount partkey 0 .80 ✔ 2 .93 ✘ 0 .10 ✔ 3 .75 ✔ 2 ✔ 2 ✔ 1 ✘ 3 ✔ 1 ✘ select sum(price) from lineitem, part where discount > $1 and lineitem.partkey = part.partkey

  22. Selective Foreign-Key Joins price discount partkey 0 .80 ✔ 2 .93 ✘ 0 .10 ✔ 3 .75 ✔ 2 ✔ 2 ✔ 1 ✘ 3 ✔ 1 ✘ select sum(price) from lineitem, part where discount > $1 and lineitem.partkey = part.partkey

  23. Branching Foreign-Key Joins .lineitem.l_discount Range select sum(price) Selection Key-Lookup from lineitem, part Greater where discount > $1 and partkey = partkey for(size_t i = 0; i < lineitemsize; i++) Range if(discount[i] > $1) result += price[partkey[i]]; Zip .lineitem.lineitem_part FoldSelect .part.p_retailprice Gather 0.25 0.20 Time in seconds Gather 0.15 Range 0.10 0.05 Zip 0.00 20 40 60 80 100 FoldSum Selectivity return

  24. Predicated Foreign-Key Joins for(size_t i = 0; i < lineitemsize; i++) if(discount[i] > $1) result += price[partkey[i]];

  25. Predicated Foreign-Key Joins for(size_t i = 0; i < lineitemsize; i++) result += (discount[i] > $1) * price[partkey[i]];

  26. Selective Foreign-Key Joins price discount partkey 0 .80 ✔ 2 .93 ✘ 0 .10 ✔ 3 .75 ✔ 2 ✔ 2 ✔ 1 ✘ 3 ✔ 1 ✘ select sum(price) from lineitem, part where discount > $1 and lineitem.partkey = part.partkey

  27. Predicated Foreign-Key Joins price discount partkey 0 .80 ✔ 2 .93 ✘ 0 .10 ✔ 3 .75 ✔ 2 ✔ 2 ✔ 1 ✘ 3 ✔ 1 ✘ select sum(price) from lineitem, part where discount > $1 and lineitem.partkey = part.partkey

  28. Predicated Foreign-Key Joins .lineitem.l_discount Range .lineitem.l_discount Greater .part.p_retailprice .lineitem.lineitem_part Range Range Zip Gather Greater FoldSum( .lineitem.lineitem_part FoldSelect Gather( Range Multiply Gather( .part.p_retailprice Gather Zip FoldSelect(x),y),z)) Gather =>FoldSum( FoldSum Multiply(x, Range Multiply return Gather(y,z))) Zip FoldSum return

  29. Selective Foreign-Key Joins select sum(price) from lineitem, part where discount > $1 and partkey = partkey .lineitem.l_discount for(size_t i = 0; i < lineitemsize; i++) .part.p_retailprice .lineitem.lineitem_part Range result += (price[i] > $1) * p_discount[partkey[i]]; Gather Greater Range Multiply Zip FoldSum return

  30. Selective Foreign-Key Joins select sum(price) from lineitem, part where discount > $1 and partkey = partkey .lineitem.l_discount for(size_t i = 0; i < lineitemsize; i++) .part.p_retailprice .lineitem.lineitem_part Range result += (price[i] > $1) * p_discount[partkey[i]]; Gather Greater 0.25 Range Multiply 0.20 Time in seconds 0.15 Zip 0.10 Branching FoldSum Predicated Aggregation 0.05 return 0.00 20 40 60 80 100 Selectivity

  31. Double-predicated Foreign-Key Joins .lineitem.l_discount .lineitem.l_discount Range .part.p_retailprice .lineitem.lineitem_part Range .lineitem.lineitem_part Greater .part.p_retailprice Multiply Gather Greater FoldSum( Multiply( Range Multiply Gather Gather( Zip x,y),z)) Range Multiply =>FoldSum( FoldSum Zip Multiply( return Gather( FoldSum Multiply(x,z),y),z)) return

  32. Predicated Foreign-Key Joins price discount partkey 0 .80 ✔ 2 .93 ✘ 0 .10 ✔ 3 .75 ✔ 2 ✔ 2 ✔ 1 ✘ 3 ✔ 1 ✘ select sum(price) from lineitem, part where discount > $1 and lineitem.partkey = part.partkey

  33. Predicated Foreign-Key Joins price discount partkey 0 .80 ✔ 2 .93 ✘ 0 .10 ✔ 3 .75 ✔ 2 ✔ 2 ✔ 1 ✘ 3 ✔ 1 ✘ select sum(price) from lineitem, part where discount > $1 and lineitem.partkey = part.partkey

  34. Double-predicated Foreign-Key Joins price discount partkey 0 .80 ✔ 2 .93 ✘ 0 .10 ✔ 3 .75 ✔ 2 ✔ 2 ✔ 1 ✘ 3 ✔ 1 ✘ select sum(price) from lineitem, part where discount > $1 and lineitem.partkey = part.partkey

  35. Double-predicated Foreign-Key Joins select sum(price) .lineitem.l_discount from lineitem, part where discount > $1 Range and partkey = partkey for(size_t i = 0; i < lineitemsize; i++) .lineitem.lineitem_part Greater result += (discount[i] > $1) * price[partkey[i]]; .part.p_retailprice Multiply Gather Range Multiply Zip FoldSum return

  36. Double-predicated Foreign-Key Joins select sum(price) .lineitem.l_discount from lineitem, part where discount > $1 Range and partkey = partkey for(size_t i = 0; i < lineitemsize; i++) .lineitem.lineitem_part Greater result += (discount[i] > $1) * price[partkey[i] * (discount[i] > $1)]; .part.p_retailprice Multiply Gather 0.25 0.20 Range Multiply Time in seconds 0.15 Zip 0.10 Branching Predicated Aggregation FoldSum 0.05 Predicated Lookups 0.00 return 20 40 60 80 100 Selectivity

  37. Double-predicated Foreign-Key Joins 0.25 0.20 Time in seconds 0.15 0.10 Branching Predicated Aggregation 0.05 Predicated Lookups 0.00 20 40 60 80 100 Selectivity

  38. Voodoo Wrap SQL • Fast and expressive like C Logical Plan • Optimizable like relational algebra Voodoo Optimizer • Portable to different devices Executable • Enables serendipitous discovery 
 of new optimizations OS/Hardware/…

  39. Quo Vadis

  40. FUTURE WORK SQL Relational Algebra Optimizer Voodoo Executable OS/Hardware/…

  41. FUTURE WORK Domain Streams Specific Tensorflow Arrays Graphs SQL Weld Relational Algebra Optimizer Voodoo Executable OS/Hardware/… [VLDB 2018]

  42. QUO VADIS SQL Relational Algebra Optimizer Voodoo Executable OS/Hardware/…

  43. Plugging holes in the Design Space Remember these? 700 HyPeR Voodoo 591 600 500 Time in ms 420 400 279 300 222 191 200 162 151 158 155 154 120 103 96 76 85 76 73 74 100 63 64 56 47 38 45 30 42 18 25 0 1 4 5 6 7 8 9 10 11 12 14 15 19 20 TPC-H Query

  44. Plugging holes in the Design Space • Massively Parallel Top-K [SIGMOD 2018] • Incremental Window Computation [ongoing] • Dynamic Load Balancing on GPUs [future]

  45. Auto-Generating Databases 0.25 0.20 Time in seconds 0.15 0.10 Branching Predicated Aggregation 0.05 Predicated Lookups 0.00 20 40 60 80 100 Selectivity

  46. Auto-Generating Databases 0.25 0.20 Time in seconds 0.15 0.10 Branching Predicated Aggregation 0.05 Predicated Lookups 0.00 0 20 40 60 80 100 Selectivity in %

  47. Auto-Generating Databases • Hardware-conscious modelling & tuning [ongoing]

  48. Auto-Generating Databases DBMS DBMS Hardware Model Indexing Processing Indexing Processing … Voodoo Program Hardware Hardware Self-driving Synthesis Self-tuning CPU CPU DBMS DBMS Core 1 Core 1 Core 2 Core 2 Registers Registers Registers Registers L1 Cache L1 Cache TLB TLB L1 Cache L1 Cache TLB TLB Indexing Indexing L2 Cache L2 Cache L2 Cache L2 Cache Last Level (L3) Cache Last Level (L3) Cache Processing Processing on die on die off die off die Memory Memory Hardware Hardware Self-generating

  49. Auto-Generating Databases • Hardware-conscious modelling & tuning [ongoing] • Auto-Generating Databases [future]

  50. Wrapping up

  51. Thanks to Collaborators Sam Madden Matei Zaharia Mike Stonebraker Shoumik Palkar Anil Shanbhag Malte Schwarzkopf

  52. A new DB research hub — Visit us! London Eye Westminster Hyde Park Buckingham Palace Imperial College

  53. Recruiting • Interns • PhDs • Postdocs

Recommend


More recommend