navigating big data
play

NAVIGATING BIG DATA with High-Throughput, Energy- Efficient Data - PowerPoint PPT Presentation

NAVIGATING BIG DATA with High-Throughput, Energy- Efficient Data Partitioning Lisa Wu, R.J. Barker, Martha Kim, and Ken Ross Columbia University Sunday, July 28, 2013 1 BIG DATA is here Sources: IDC Worldwide Big Data Technology and


  1. Range Partition key X Y Z 10 15 8 20 27 20 30 52 29 splitters 16 31 Columbia University 15 Sunday, July 28, 2013 15

  2. Range Partition key partitions <= 8 > X Y Z 10 15 8 20 27 20 30 52 29 splitters 16 31 Columbia University 15 Sunday, July 28, 2013 15

  3. Range Partition key partitions <= 8 > X Y Z 10 15 15 <= 20 8 20 > 16 27 <= 20 27 30 > 52 29 29 52 splitters 16 31 31 Columbia University 15 Sunday, July 28, 2013 15

  4. HARP Microarchitecture From SB in HARP ISA set_splitter partition_start To SB out partition_stop Columbia University 16 Sunday, July 28, 2013 16

  5. HARP Microarchitecture Serializer 1 Conveyor 2 = = = < < < From SB in WE WE WE WE HARP ISA set_splitter Merge 3 partition_start To SB out partition_stop Columbia University 16 Sunday, July 28, 2013 16

  6. Step 1: HARP Configuration Serializer 1 Conveyor 2 = = = < < < From SB in WE WE WE WE HARP ISA set_splitter Merge 3 partition_start To SB out partition_stop Columbia University 17 Sunday, July 28, 2013 17

  7. Step 1: HARP Configuration 10 20 30 Serializer 1 Conveyor 2 = = = < < < From SB in WE WE WE WE HARP ISA set_splitter Merge 3 partition_start To SB out partition_stop Columbia University 17 Sunday, July 28, 2013 17

  8. Step 2: Signal HARP to Start Processing 10 20 30 Serializer 1 Conveyor 2 = = = < < < From SB in WE WE WE WE HARP ISA set_splitter Merge 3 partition_start To SB out partition_stop Columbia University 18 Sunday, July 28, 2013 18

  9. Step 2: Signal HARP to Start Processing 10 20 30 Serializer 1 Conveyor 2 = = = < < < From SB in WE WE WE WE HARP ISA set_splitter Merge 3 partition_start To SB out partition_stop Columbia University 18 Sunday, July 28, 2013 18

  10. Step 3: Serialize SBin Cachelines into Records 10 20 30 Serializer 1 Conveyor 2 = = = < < < From SB in WE WE WE WE HARP ISA set_splitter Merge 3 partition_start To SB out partition_stop Columbia University 19 Sunday, July 28, 2013 19

  11. Step 3: Serialize SBin Cachelines into Records 10 20 30 Serializer 1 Conveyor 2 = = = < < < From SB in WE WE WE WE HARP ISA set_splitter Merge 3 partition_start To SB out partition_stop Columbia University 19 Sunday, July 28, 2013 19

  12. Step 4: Comparator Conveyor 10 20 30 Serializer 1 Conveyor 2 = = = < < < From SB in WE WE WE WE HARP ISA set_splitter Merge 3 partition_start To SB out partition_stop Columbia University 20 Sunday, July 28, 2013 20

  13. Step 4: Comparator Conveyor 10 20 30 Serializer 1 Conveyor 2 = = = < < < From SB in 15 15 15, part2 WE WE WE WE HARP ISA set_splitter Merge 3 partition_start To SB out partition_stop Columbia University 20 Sunday, July 28, 2013 20

  14. Step 5: Merge Output Records to SBout 10 20 30 Serializer 1 Conveyor 2 = = = < < < From SB in WE WE WE WE HARP ISA set_splitter Merge 3 partition_start To SB out partition_stop Columbia University 21 Sunday, July 28, 2013 21

  15. Step 5: Merge Output Records to SBout 10 20 30 Serializer 1 Conveyor 2 = = = < < < From SB in WE WE WE WE HARP ISA set_splitter Merge 3 partition_start To SB out partition_stop Columbia University 21 Sunday, July 28, 2013 21

  16. Step 6: Drain In-Flight Records and Signal HARP to Stop Processing 10 20 30 Serializer 1 Conveyor 2 = = = < < < From SB in WE WE WE WE HARP ISA set_splitter Merge 3 partition_start To SB out partition_stop Columbia University 22 Sunday, July 28, 2013 22

  17. Step 6: Drain In-Flight Records and Signal HARP to Stop Processing 10 20 30 Serializer 1 Conveyor 2 = = = < < < From SB in WE WE WE WE HARP ISA set_splitter Merge 3 partition_start To SB out partition_stop Columbia University 22 Sunday, July 28, 2013 22

  18. Streaming Framework Architecture HARP HARP Core Core SB out SB out L1 L1 SB in SB in L2 L2 Inspired by Jouppi’s work Memory Improving direct-mapped cache performance by the Memory addition of a small fully-associative cache and Controller prefetch buffers. In ISCA, 1990. Columbia University 23 Sunday, July 28, 2013 23

  19. Streaming Framework Architecture HARP HARP Core Core SB out SB out Software- L1 L1 SB in SB in controlled data L2 streaming in/out L2 Inspired by Jouppi’s work Memory Improving direct-mapped cache performance by the Memory addition of a small fully-associative cache and Controller prefetch buffers. In ISCA, 1990. Columbia University 23 Sunday, July 28, 2013 23

  20. Step 1: Issue Core HARP L1 sbload from Store SB out Buffer Core L2 SB in SB ISA LLC Req sbload Buffer sbstore sbsave Memory sbrestore Columbia University 24 Sunday, July 28, 2013 24

  21. Step 1: Issue Core HARP L1 sbload from Store SB out Buffer Core L2 SB in SB ISA LLC Req sbload Buffer sbstore sbsave Memory sbrestore Columbia University 24 Sunday, July 28, 2013 24

  22. Step 2: Send Core HARP L1 sbload from Store Req Buffer to SB out Buffer Memory L2 SB in SB ISA LLC Req sbload Buffer sbstore sbsave Memory sbrestore Columbia University 25 Sunday, July 28, 2013 25

  23. Step 2: Send ✗ Core HARP L1 sbload from Store Req Buffer to SB out Buffer Memory ✗ L2 SB in SB ISA ✗ LLC Req sbload Buffer sbstore sbsave Memory sbrestore Columbia University 25 Sunday, July 28, 2013 25

  24. Step 2: Send ✗ Core HARP L1 sbload from Store Req Buffer to SB out Buffer Memory ✗ L2 SB in SB ISA ✗ LLC Req sbload Buffer sbstore C: Cache S: SB sbsave Memory sbrestore Columbia University 25 Sunday, July 28, 2013 25

  25. Step 3: Data ✗ Core HARP L1 Return from Store SB out Buffer Memory to SBin ✗ L2 SB in SB ISA ✗ LLC Req sbload Buffer sbstore C: Cache S: SB sbsave Memory sbrestore Columbia University 26 Sunday, July 28, 2013 26

  26. Step 3: Data ✗ Core HARP L1 Return from Store SB out Buffer Memory to SBin ✗ L2 SB in SB ISA ✗ LLC Req sbload Buffer sbstore C: Cache S: SB sbsave Memory sbrestore Columbia University 26 Sunday, July 28, 2013 26

  27. Step 3: Data ✗ Core HARP L1 Return from Store SB out Buffer Memory to SBin ✗ L2 SB in SB ISA ✗ LLC Req sbload Buffer sbstore C: Cache S: SB sbsave Memory sbrestore Columbia University 26 Sunday, July 28, 2013 26

  28. Step 4: HARP ✗ Core HARP L1 Pulls Data from Store SBin and Pushes SB out Buffer Data to SBout ✗ L2 SB in SB ISA ✗ LLC Req sbload Buffer sbstore C: Cache S: SB sbsave Memory sbrestore Columbia University 27 Sunday, July 28, 2013 27

  29. Step 4: HARP ✗ Core HARP L1 Pulls Data from Store SBin and Pushes SB out Buffer Data to SBout ✗ L2 SB in SB ISA ✗ LLC Req sbload Buffer sbstore C: Cache S: SB sbsave Memory sbrestore Columbia University 27 Sunday, July 28, 2013 27

  30. Step 5: Issue Core HARP L1 sbstore from Store SB out Buffer Core L2 SB in SB ISA LLC Req sbload Buffer sbstore sbsave Memory sbrestore Columbia University 28 Sunday, July 28, 2013 28

  31. Step 5: Issue Core HARP L1 sbstore from Store SB out Buffer Core L2 SB in SB ISA LLC Req sbload Buffer sbstore sbsave Memory sbrestore Columbia University 28 Sunday, July 28, 2013 28

  32. Step 6: Data Core HARP L1 Copied from Store head of SBout SB out Buffer to Store Buffer L2 SB in SB ISA LLC Req sbload Buffer sbstore sbsave Memory sbrestore Columbia University 29 Sunday, July 28, 2013 29

  33. Step 6: Data Core HARP L1 Copied from Store head of SBout SB out Buffer to Store Buffer L2 SB in SB ISA LLC Req sbload Buffer sbstore sbsave Memory sbrestore Columbia University 29 Sunday, July 28, 2013 29

  34. Step 7: Data Core HARP L1 Written Back to Store Memory via Existing SB out Buffer Store Datapath L2 SB in SB ISA LLC Req sbload Buffer sbstore sbsave Memory sbrestore Columbia University 30 Sunday, July 28, 2013 30

  35. Step 7: Data Core HARP L1 Written Back to Store Memory via Existing SB out Buffer Store Datapath L2 SB in SB ISA LLC Req sbload Buffer sbstore sbsave Memory sbrestore Columbia University 30 Sunday, July 28, 2013 30

  36. Interrupts and Core HARP L1 Context Store SB out Buffer Switches L2 SB in SB ISA LLC Req sbload Buffer sbstore sbsave Memory sbrestore Columbia University 31 Sunday, July 28, 2013 31

  37. Interrupts and Core HARP L1 Context Store SB out Buffer Switches L2 SB in SB ISA Architectural LLC Req sbload Buffer sbstore sbsave Memory sbrestore Columbia University 31 Sunday, July 28, 2013 31

  38. Interrupts and Core HARP L1 Context Store SB out Buffer Switches L2 SB in SB ISA Architectural LLC Req sbload Buffer sbstore sbsave Memory sbrestore Columbia University 31 Sunday, July 28, 2013 31

  39. Interrupts and Core HARP L1 Context Store SB out Buffer Switches L2 SB in SB ISA Architectural LLC Req sbload Buffer sbstore sbsave Memory sbrestore Columbia University 31 Sunday, July 28, 2013 31

  40. Accelerator Integration Choice • Tightly coupled and software controlled: • area/power savings • coherence • utilize hardware prefetchers • software-managed data layout • address-free domain for accelerators Columbia University 32 Sunday, July 28, 2013 32

  41. Remainder of the Talk • Brief System Overview • HARP UArch HARP HARP Core Core • Streaming Framework SB out SB out L1 L1 UArch SB in SB in • HARP and Streaming L2 L2 Framework Evaluation • Discussion and DSE Memory Memory Controller Columbia University 33 Sunday, July 28, 2013 33

  42. Evaluation Methodology Columbia University 34 Sunday, July 28, 2013 34

  43. Evaluation Methodology • HARP • Bluespec System Verilog implementation • Cycle-accurate simulation in BlueSim • Synthesis, P&R with Synopsys (32nm std cells) Columbia University 34 Sunday, July 28, 2013 34

  44. Evaluation Methodology • HARP • Bluespec System Verilog implementation • Cycle-accurate simulation in BlueSim • Synthesis, P&R with Synopsys (32nm std cells) • Streaming framework • 3 versions of 1GB table memcpy: c-lib, ASM (scalar), ASM(vector) • Conservative area/power estimates with CACTI Columbia University 34 Sunday, July 28, 2013 34

  45. Area and Power Overheads Area (% Xeon core) HARP Stream Buffers 15% 11% 8% 4% 0% 15 31 63 127 255 511 Number of Partitions Columbia University 35 Sunday, July 28, 2013 35

  46. Area and Power Overheads Area (% Xeon core) HARP Stream Buffers 15% 11% 8% 4% 0% 15 31 63 127 255 511 Number of Partitions Power (% Xeon core) 10% 8% 6% 4% 2% 0% 15 31 63 127 255 511 Columbia University 35 Sunday, July 28, 2013 35

  47. SW Partitioning Performance Partitioning Throughput (GB/s) 8 1 thread 6 16 threads 4 2 0 0 150 300 450 600 Number of Partitions Columbia University 36 Sunday, July 28, 2013 36

  48. Performance Evaluation Partitioning Throughput (GB/s) 8 1 thread 6 16 threads 1 thread + HARP 4 2 0 0 150 300 450 600 Number of Partitions Columbia University 37 Sunday, July 28, 2013 37

  49. Performance Evaluation Partitioning Throughput (GB/s) 8 1 thread 6 16 threads 1 thread + HARP 4 7 .8x 2 0 0 150 300 450 600 Number of Partitions Columbia University 37 Sunday, July 28, 2013 37

  50. Performance Evaluation Partitioning Throughput (GB/s) 8 1 thread 6 16 threads 1 thread + HARP 4 8.8x 7 .8x 2 0 0 150 300 450 600 Number of Partitions Columbia University 37 Sunday, July 28, 2013 37

  51. Streaming Framework Provides Sufficient BW to Feed HARP? Partitioning Throughput (GB/s) 7 5.25 3.5 1 thread + HARP 1.75 0 0 150 300 450 600 Number of Partitions Columbia University 38 Sunday, July 28, 2013 38

Recommend


More recommend