fla lashnet t fla lash netw twork ork sta tack c k co de
play

Fla lashNet: t: Fla lash/Netw twork ork Sta tack C k Co-De - PowerPoint PPT Presentation

Fla lashNet: t: Fla lash/Netw twork ork Sta tack C k Co-De Design ign Animesh Trivedi, Nikolas Ioannou, Bernard Metzler, Patrick Stuedi, Jonas Pfefferle, Ioannis Koltsidas, Kornilios Kourtis, and Thomas R. Gross IBM Research and ETH


  1. Fla lashNet: t: Fla lash/Netw twork ork Sta tack C k Co-De Design ign Animesh Trivedi, Nikolas Ioannou, Bernard Metzler, Patrick Stuedi, Jonas Pfefferle, Ioannis Koltsidas, Kornilios Kourtis, and Thomas R. Gross IBM Research and ETH Zurich, Switzerland

  2. Modern Distributed Systems  data intensive  run on 100-1000s of servers  performance depends upon both network and storage SYSTOR 2017, Haifa 2

  3. Modern Distributed Systems  performance depends upon both network and storage SYSTOR 2017, Haifa 3

  4. Modern Distributed Systems ● StackMap: Low-Latency Networking with the OS Stack and Dedicated NICs, USENIX'16 ● Network Stack Specialization for Performance, SIGCOMM'14 ● mTCP: A Highly Scalable User-level TCP Stack for Multicore Systems, NSDI'14 ● MegaPipe: A New Programming Interface for Scalable Network I/O, OSDI'12  performance depends upon ● … both network and storage SYSTOR 2017, Haifa 4

  5. Modern Distributed Systems ● StackMap: Low-Latency Networking with the OS Stack and Dedicated NICs, USENIX'16 ● Network Stack Specialization for Performance, SIGCOMM'14 ● mTCP: A Highly Scalable User-level TCP Stack for Multicore Systems, NSDI'14 ● MegaPipe: A New Programming Interface for Scalable Network I/O, OSDI'12  performance depends upon ● … both network and storage ● NVMeDirect: A User-space I/O Framework for Application-specific Optimization on NVMe SSDs, HotStorage'16 ● OS I/O Path Optimizations for Flash Solid-state Drives, USENIX'14 ● Linux Block IO: Introducing Multi-queue SSD Access on Multi-core Systems, SYSTOR'13 ● When Poll is Better Than Interrupt, FAST'12 ● … SYSTOR 2017, Haifa 5

  6. Modern Distributed Systems ● StackMap: Low-Latency Networking with the OS Stack and Dedicated NICs, USENIX'16 ● Network Stack Specialization for Performance, SIGCOMM'14 ● mTCP: A Highly Scalable User-level TCP Stack for Multicore Systems, NSDI'14 ● MegaPipe: A New Programming Interface for Scalable Network I/O, OSDI'12  performance depends upon ● … both network and storage ● NVMeDirect: A User-space I/O Framework for Application-specific Optimization on NVMe SSDs, HotStorage'16 ● OS I/O Path Optimizations for Flash Solid-state Drives, USENIX'14 ● Linux Block IO: Introducing Multi-queue SSD Access on Multi-core Systems, SYSTOR'13 ● When Poll is Better Than Interrupt, FAST'12 ● … SYSTOR 2017, Haifa 6

  7. Modern Distributed Systems ● StackMap: Low-Latency Networking with the OS Stack and Dedicated NICs, USENIX'16 ● Network Stack Specialization for Performance, SIGCOMM'14 ● mTCP: A Highly Scalable User-level TCP Stack for Multicore Systems, NSDI'14 ● MegaPipe: A New Programming Interface for Scalable Network I/O, OSDI'12  performance depends upon ● … both network and storage ● NVMeDirect: A User-space I/O Framework for Application-specific Optimization on NVMe SSDs, HotStorage'16 ● OS I/O Path Optimizations for Flash Solid-state Drives, USENIX'14 ● Linux Block IO: Introducing Multi-queue SSD Access on Multi-core Systems, SYSTOR'13 ● When Poll is Better Than Interrupt, FAST'12 ● … SYSTOR 2017, Haifa 7

  8. The Cost of the Gap 1400 1200 1000 800 K IOPS 600 400 200 0 spec. block IO netperf iSCSI KV NFS HDFS SYSTOR 2017, Haifa 8

  9. The Cost of the Gap 1400 1200 1000 800 K IOPS 600 400 200 0 spec. block IO netperf iSCSI KV NFS HDFS SYSTOR 2017, Haifa 9

  10. The Cost of the Gap 1400 1200 1000 800 K IOPS 600 400 200 0 spec. block IO netperf iSCSI KV NFS HDFS SYSTOR 2017, Haifa 10

  11. The Reason for the Gap client flash storage SYSTOR 2017, Haifa 11

  12. The Reason for the Gap 1 request 2 request processing client flash 3 response storage SYSTOR 2017, Haifa 12

  13. The Reason for the Gap 1 request 2 request processing client flash 3 response storage performance = network IO + server time + storage IO SYSTOR 2017, Haifa 13

  14. The Reason for the Gap 1 request 2 request processing client flash 3 response storage performance = network IO + server time + storage IO application involvement scheduling fs lookups and overheads ... SYSTOR 2017, Haifa 14

  15. A Detailed Look: send usersp userspace kern ernel 1. TCP/IP processing SYSTOR 2017, Haifa 15

  16. A Detailed Look: send 3. receive request usersp userspace kern ernel 2. receive processing 1. TCP/IP processing SYSTOR 2017, Haifa 16

  17. A Detailed Look: send 3. receive request usersp userspace kern ernel 4. fs translation 2. receive processing 5. block I/O 1. TCP/IP processing SYSTOR 2017, Haifa 17

  18. A Detailed Look: send 3. receive 7. response request transmission userspace usersp kern ernel 4. fs translation 2. receive processing 5. block I/O 1. TCP/IP 6. flash I/O processing completion SYSTOR 2017, Haifa 18

  19. A Detailed Look: send 3. receive 7. response request transmission userspace usersp kern ernel 4. fs translation 8. send 2. receive processing 5. block I/O 1. TCP/IP 6. flash I/O 9.TX done processing completion SYSTOR 2017, Haifa 19

  20. A Detailed Look: sendfjle 3. receive 7. response request transmission userspace usersp kern ernel 4. fs translation 8. send 2. receive processing 5. block I/O 1. TCP/IP 6. flash I/O 9.TX done processing completion SYSTOR 2017, Haifa 20

  21. A Detailed Look: sendfjle 3. receive 7. response request transmission userspace usersp kern ernel 4. fs translation 8. send 2. receive processing 5. block I/O 1. TCP/IP 6. flash I/O 9.TX done processing completion SYSTOR 2017, Haifa 21

  22. The FlashNet Approach 3. receive request usersp userspace kern ernel 4. fs translation 2. receive processing 5. block I/O 1. TCP/IP processing SYSTOR 2017, Haifa 22

  23. The FlashNet Approach eliminate application involvement 3. receive request userspace usersp kern ernel 4. fs translation 2. receive processing 5. block I/O 1. TCP/IP processing SYSTOR 2017, Haifa 23

  24. The FlashNet Approach eliminate application involvement 3. receive request usersp userspace reduce file kern ernel 4. fs translation system overheads 2. receive processing 5. block I/O 1. TCP/IP processing SYSTOR 2017, Haifa 24

  25. The FlashNet Approach eliminate application involvement 3. receive request usersp userspace reduce file kern ernel 4. fs translation system overheads 2. receive processing 5. block I/O enable direct network and storage interaction 1. TCP/IP processing SYSTOR 2017, Haifa 25

  26. The FlashNet Approach eliminate application involvement RDMA usersp userspace reduce file kern ernel 4. fs translation system overheads 2. RDMA processing 5. block I/O enable direct network and storage interaction 1. TCP/IP processing SYSTOR 2017, Haifa 26

  27. The FlashNet Approach eliminate application involvement RDMA usersp userspace reduce file kern ernel simple fs layout system overheads 2. RDMA processing 5. block I/O enable direct network and storage interaction 1. TCP/IP processing SYSTOR 2017, Haifa 27

  28. The FlashNet Approach eliminate application involvement RDMA usersp userspace reduce file kern ernel simple fs layout system overheads 2. RDMA processing use VM 5. block I/O enable direct network and storage interaction 1. TCP/IP processing SYSTOR 2017, Haifa 28

  29. The FlashNet Approach usersp userspace 2. RDMA v to L 6. response kern ernel a B A transmission 2. RDMA processing 3. block I/O 1. TCP/IP 6. flash I/O 7.TX done processing completion SYSTOR 2017, Haifa 29

  30. FlashNet: A Co-Designed Network and Storage Stack 64-bit LBA space flash controller ● flash virtualization ● I/O management ● ... SYSTOR 2017, Haifa 30

  31. FlashNet: A Co-Designed Network and Storage Stack ● contigious file allocation ● supporting m & local file I/O m a p ● … ContigFS flash controller SYSTOR 2017, Haifa 31

  32. FlashNet: A Co-Designed Network and Storage Stack ● lazy RDMA pinning ● resolving flash & file addresses ● … ContigFS RDMA controller flash controller SYSTOR 2017, Haifa 32

  33. FlashNet: A Co-Designed Network and Storage Stack FlashNet I/O stack ContigFS RDMA controller flash controller SYSTOR 2017, Haifa 33

  34. FlashNet: A Co-Designed Network and Storage Stack file server application virtual address LBA ContigFS RDMA STag PBA controller flash controller network control setup expanding to storage SYSTOR 2017, Haifa 34

  35. FlashNet: A Co-Designed Network and Storage Stack file server application virtual address LBA ContigFS RDMA STag PBA controller flash controller network control setup expanding to storage data path from a flash device to a client buffer SYSTOR 2017, Haifa 35

  36. FlashNet: A Co-Designed Network and Storage Stack client server application nfs ContigFS RDMA iSCSI controller flash RNIC controller SKB network control setup expanding to storage data path from a flash device to a client buffer SYSTOR 2017, Haifa 36

  37. Performance Evaluation How efficient is FlashNet's IO path? Does it help with applications? ...more in the paper 9-machine cluster testbed CPU : dual socket E5-2690, 2.9 GHz, 16 cores DRAM : 256 GB, DDR3 1600 MHz NIC : 40Gbit/s Ethernet 3xNVMe Flash : 6.6 GB/sec (read), 2.7 GB/sec (write) peak 4kB read IOPS: 1.3 M SYSTOR 2017, Haifa 37

Recommend


More recommend