server server server server server datacenter network e g
play

Server Server Server Server Server Datacenter Network - PowerPoint PPT Presentation

Chanwoo Chung , Jinhyung Koo, Junsu Im, Arvind , and Sungjin Lee DGIST and MIT NVRAMOS 19 2019.10.24 DATA -INTENSIVE COMPUTING SYSTEMS LAB ORATORY Computation Application Application Application Application Application


  1. Chanwoo Chung ǂ , Jinhyung Koo, Junsu Im, Arvind ǂ , and Sungjin Lee DGIST and MIT ǂ NVRAMOS ‘19 2019.10.24 DATA -INTENSIVE COMPUTING SYSTEMS LAB ORATORY

  2. Computation Application Application Application Application Application … … Server Server Server Server Server … Datacenter Network (e.g., Ethernet, InfiniBand, …) … Storage Xeon … GB Disk Array CPUs w/ RAID DRAM Storage Node 0 Storage Node 1 Storage Node N It is not mere storage – it is another high-end server !!! High-end Xeon CPUs Power Hungry (e.g., 1700 W) Several GBs of DRAM Expensive (e.g., $2~40,000 w/o SSDs) An array of SSDs Large Volume (e.g., 2-4 U) Large form-factor High TCO (e.g., Cooling) … … 2

  3. ▪ HDD is slow – require large DRAM and array of disks ▪ 10 ms latency & 100~300 MB/s throughput ▪ HDD is dumb – the host system makes it smarter ▪ Xeon CPUs with advanced algorithms Aggr. Network Throughput = 20 GB/s 40GbE 40GbE 40GbE 40GbE Host Protocol Translation (e.g., NFS, CIFS, …) Storage Host … Caching/Buffering Parity Mgmt Prefetching Dedup/Compresion Local File System (e.g., EXT4, WAFL, …) Xeon GB Disk Array 300 MB/s 300 MB/s CPUs w/ RAID DRAM … HDD HDD HDD HDD HDD HDD HDD HDD 3

  4. ▪ HDD is slow – require large DRAM and array of disks ▪ 10 ms latency & 100~300 MB/s throughput SSDs are not a bottleneck → Network/CPU are new bottlenecks ▪ HDD is dumb – the host system makes it smarter ▪ Xeon CPUs with advanced algorithms Bottleneck!!! Aggr. Network Throughput = 20 GB/s 40GbE 40GbE 40GbE 40GbE Host Protocol Translation (e.g., NFS, CIFS, …) Storage Host … Caching/Buffering Parity Mgmt Prefetching Dedup/Compresion Local File System (e.g., EXT4, WAFL, …) Xeon GB SSD Array 1~10 GB/s 1~10 GB/s CPUs w/ RAID DRAM … Aggr. SDD Throughput = 10~100 GB/s (with 10 SSDs) SSD SSD SSD SSD SSD SSD SSD SSD 3

  5. EMC NetApp HPE Hynix XtremIO SolidFire 3PAR AFA Capacity 36~144TB 46TB 750TB 522TB # of SSDs 18~72 12 120 576 SSD Array Aggr. 18~72 GB/s 12 GB/s 120 GB/s 576 GB/s Throughput* 4~8x 2x 4~12x 3x Ports 10Gb iSCSI 25Gb iSCSI 16Gb FC Gen3 PCIe Network Aggr. 5~10 GB/s 6.25 GB/s 8~24 GB/s 48 GB/s Throughput ※ Aggr. SSD throughput was estimated assuming each SSD offers 1GB/s throughput ▪ Supported by the latest works ▪ K. Kourtis et al., “Reaping the performance of fast NVM storage with uDepot ,” USENIX FAST ‘19 ▪ J. Kim et al., “Alleviating Garbage Collection Interference through Spatial Separation in All Flash Arrays,” USENIX ATC ‘19 4

  6. ▪ Supported by the latest works ▪ K. Kourtis et al., “Reaping the performance of fast NVM storage with uDepot ,” USENIX FAST ‘19 ▪ J. Kim et al., “Alleviating Garbage Collection Interference through Spatial Separation in All Flash Arrays,” USENIX ATC ‘19 4

  7. ▪ HDD is slow – require large DRAM and array of disks ▪ 10 ms latency & 100~300 MB/s throughput SSDs are not a bottleneck → Network/CPU are new bottlenecks ▪ HDD is dumb – the host system makes it smarter ▪ Xeon CPUs with advanced algorithms SSDs are smart enough, supporting many features → Duplicate storage management hurts performance Bottleneck!!! Aggr. Network Throughput = 20 GB/s 40GbE 40GbE 40GbE 40GbE Host Protocol Translation (e.g., NFS, CIFS, …) Storage Host … Caching/Buffering Parity Mgmt Prefetching Dedup/Compresion Local File System (e.g., EXT4, WAFL, …) Xeon GB SSD Array 1~10 GB/s 1~10 GB/s CPUs w/ RAID DRAM Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl … SSD SSD SSD SSD SSD SSD SSD SSD 5

  8. ▪ 4 embedded CPUs (ARM) running at 700 MHz to 1.4 GHz and > 1~16GB DRAM that a desktop PC had 10 years ago ▪ Those resources are required for running firmware (i.e., FTL) PCIe Interface (1~10 GB/s) Host-to-PCIe Controller ARM CPU ARM CPU Block I/O-to-Flash I/O Interfacing (Max 1.4 GHz) (Max 1.4 GHz) DRAM Remapping Wear-Leveling Cleaning (>4 GB) ARM CPU ARM CPU Parity Mgmt. Deduplication Compression (Max 1.4 GHz) (Max 1.4 GHz) RAID … NAND NAND NAND NAND NAND NAND NAND NAND CHIP CHIP CHIP CHIP CHIP CHIP CHIP CHIP 6

  9. Computation Application Application Application Application Application … … Server Server Server Server Server … Datacenter Network (e.g., Ethernet, InfiniBand, …) … Storage Xeon … GB Disk Array CPUs w/ RAID DRAM Storage Node 0 Storage Node 1 Storage Node N Let’s assume that this storage node has 8TB 72 SSDs (EMC XtremIO) ▪ # of ARM cores: 4 cores x 72 = 288 ARM cores ▪ Aggregate DRAM: 8 GB x 72 = 576 GB DRAM Just for managing NAND flash Q: Is this a storage node or a low-power microserver? 7

  10. ▪ Use simple SSD? ▪ Software Defined Flash (ASPLOS ’14) ▪ Application- managed Flash (USENIX FAST ’16) ▪ LightNVM (USENIX FAST ’17) → Network/CPU are still bottleneck ▪ Use better SSD organization? ▪ SWAN (HotStorage ’16; USENIX ATC ‘19) → Still rely on power-hungry and expensive host ▪ Any other solution? 8

  11. ▪ Motivation ▪ Basic Idea ▪ LightStore Software ▪ LightStore Controller ▪ LightStore Adapters ▪ Experimental Results ▪ Conclusion 9

  12. ▪ Get rid of a space-consuming, expensive, power-hungry host server ▪ Put and run everything in SSDs ▪ Attach SSDs to a datacenter network ▪ Let application servers directly talk to SSDs … Application Application Application Server Server Server Datacenter Network Host Protocol Translation (e.g., NFS, CIFS, …) … Parity Mgmt Prefetching Caching/Buffering Local File System (e.g., EXT4, WAFL, …) Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl … SSD SSD SSD SSD SSD SSD SSD 10

  13. ▪ Get rid of a space-consuming, expensive, power-hungry host server ▪ Put and run everything in SSDs ▪ Attach SSDs to a datacenter network ▪ Let application servers directly talk to SSDs … Application Application Application Server Server Server Datacenter Network Host Protocol Translation (e.g., NFS, CIFS, …) … Parity Mgmt Prefetching Caching/Buffering Local File System (e.g., EXT4, WAFL, …) Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl … SSD SSD SSD SSD SSD SSD SSD 10

  14. ▪ Get rid of a space-consuming, expensive, power-hungry host server ▪ Put and run everything in SSDs ▪ Attach SSDs to a datacenter network ▪ Let application servers directly talk to SSDs … Application Application Application Server Server Server Datacenter Network Host-to-PCIe Controller Host Protocol Translation DRAM High-level Flash Management (2~4 GB) Low-level Flash Management RAID Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl … … SSD SSD SSD SSD SSD SSD SSD NAND NAND NAND NAND NAND NAND NAND NAND 10

  15. ▪ Get rid of a space-consuming, expensive, power-hungry host server ▪ Put and run everything in SSDs ▪ Attach SSDs to a datacenter network ▪ Let application servers directly talk to SSDs … Application Application Application Server Server Server Datacenter Network Host-to-PCIe Controller Ethernet Controller Host Protocol Translation DRAM High-level Flash Management (2~4 GB) Low-level Flash Management RAID Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl … … SSD SSD SSD SSD SSD SSD SSD NAND NAND NAND NAND NAND NAND NAND NAND 10

  16. ▪ Get rid of a space-consuming, expensive, power-hungry host server ▪ Put and run everything in SSDs ▪ Attach SSDs to a datacenter network ▪ Let application servers directly talk to SSDs … Application Application Application Server Server Server Datacenter Network Host-to-PCIe Controller Ethernet Controller Host Protocol Translation DRAM High-level Flash Management (2~4 GB) Low-level Flash Management RAID Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl … … SSD SSD SSD SSD SSD SSD SSD NAND NAND NAND NAND NAND NAND NAND NAND Deliver Flash’s low latency & high throughput to network ports! 10

  17. ▪ Get rid of a space-consuming, expensive, power-hungry host server ▪ Put and run everything in SSDs ▪ Attach SSDs to a datacenter network ▪ Let application servers directly talk to SSDs … Application Application Application Server Server Server Datacenter Network An x86 storage server with N SSDs is replaced with N SSDs Low Power (e.g., 100 W / 10 SSDs) Cheap (e.g., Zero server cost) Small Volume (e.g., Less than 1U) Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl Ctrl … Low TCO (e.g., Less Cooling) SSD SSD SSD SSD SSD SSD SSD Scalability (No network bottleneck) 10

  18. ▪ Can we run complicated server software on wimpy ARM cores? ▪ How can we provide the same interface with application servers? ▪ How can we manage unreliable NAND without more ARM cores? 11

Recommend


More recommend