network stack specialization for performance
play

Network stack specialization for performance goo.gl/1la2u6 Ilias - PowerPoint PPT Presentation

Network stack specialization for performance goo.gl/1la2u6 Ilias Marinos , Robert N.M. Watson , Mark Handley* University of Cambridge, * University College London Motivation Providers are scaling out rapidly. Key aspects: 1


  1. Network stack specialization for performance goo.gl/1la2u6 Ilias Marinos § , Robert N.M. Watson § , Mark Handley* § University of Cambridge, * University College London

  2. Motivation Providers are scaling out rapidly. Key aspects: • 1 machine:N functions N machines:1 function • Performance is critical • Scalability on multicore systems • Cost & energy concerns

  3. Motivation Providers are scaling out rapidly. Key aspects: • 1 machine:N functions N machines:1 function • Performance is critical • Scalability on multicore systems • Cost & energy concerns Are general-purpose stacks the right solution for that kind of role?

  4. The Problem • Conventional stacks are great for bulk transfers, but what about short ones?

  5. The Problem Network Throughput (Gbps) 10 Throughput (Gbps) 8 6 4 2 0 8 16 24 32 64 128 HTTP object size (KB)

  6. The Problem Network Throughput (Gbps) CPU utilization (%) CPU utilization (%) 10 200 Throughput (Gbps) 8 160 6 120 4 80 2 40 0 0 8 16 24 32 64 128 HTTP object size (KB)

  7. The Problem NIC saturation, 
 Low CPU-usage 
 Network Throughput (Gbps) CPU utilization (%) CPU utilization (%) 10 200 Throughput (Gbps) 8 160 6 120 4 80 2 40 0 0 8 16 24 32 64 128 HTTP object size (KB)

  8. The Problem NIC saturation, 
 Throughput/CPU Low CPU-usage 
 ratio is low Network Throughput (Gbps) CPU utilization (%) CPU utilization (%) 10 200 Throughput (Gbps) 8 160 6 120 4 80 2 40 0 0 8 16 24 32 64 128 HTTP object size (KB)

  9. The Problem NIC saturation, 
 Throughput/CPU Low CPU-usage 
 ratio is low Network Throughput (Gbps) CPU utilization (%) CPU utilization (%) 10 200 Throughput (Gbps) 8 160 6 120 4 80 2 40 0 0 8 16 24 32 64 128 HTTP object size (KB) Short-lived HTTP flows are a problem!

  10. Why is this important?

  11. Why is this important? Distribution based on traces from Yahoo! CDN [Al-Fares et’al 2011] �

  12. Why is this important? 95% of the HTTP requested object sizes ≤ 50K 90% of the HTTP requested object sizes ≤ 25K Distribution based on traces from Yahoo! CDN [Al-Fares et’al 2011] �

  13. Design Goals Design a network stack that: • Allows transparent flow of memory from NIC to the application and vice versa • Reduces system costs (e.g., batching, cache- locality, lock- and sharing-free, CPU-affinity) • Exploits application-specific knowledge to reduce repetitive processing costs (e.g. TCP segmentation of web objects, checksums)

  14. Sandstorm: A specialized webserver stack Prototyped on top of FreeBSD’s web_write() web_recv() webserver netmap framework: tcpip_write() tcpip_recv() user space libtcpip.so tcpip_fsm() • libnmio : abstracting netmap- tcpip_output() tcpip_input() related I/O libeth.so eth_output() eth_input() libnmio.so netmap_output() netmap_input() • libeth : lightweight ethernet layer zero netmap copy ioctls DMA memory to userspace kernel space • libtcpip : optimized TCP/IP mapped RX buffer TX syscall layer rings device driver • application: simple HTTP server that serves static content

  15. Sandstorm: A specialized webserver stack Key decisions (some of them): • Application & stack are merged into the same process address space • Static content is pre-segmented into network packets and a-priori loaded to DRAM • Received packet frames are processed in-place on the RX rings, w/o memory copying/buffering • RX/TX packet batching greatly amortizes the system call overhead • Bufferless, synchronous model (no socket layer)

  16. Sandstorm Architecture (10,000ft view) app tcpip user eth space content nmio A A ix0:RX ix0:TX B B .. .. .. kernel NIC driver space

  17. Sandstorm Architecture (10,000ft view) app tcpip user eth space netmap_input() content nmio A A ix0:RX ix0:TX B B .. .. .. kernel NIC driver space

  18. Sandstorm Architecture (10,000ft view) app tcpip user eth space netmap_input() content nmio A A ix0:RX ix0:TX B B .. .. .. kernel NIC driver space

  19. Sandstorm Architecture (10,000ft view) app tcpip user eth space netmap_input() content nmio POLLIN A A ix0:RX ix0:TX B B .. .. .. kernel NIC driver space

  20. Sandstorm Architecture (10,000ft view) app tcpip user eth space ether_input() netmap_input() content nmio POLLIN A A ix0:RX ix0:TX B B .. .. .. kernel NIC driver space

  21. Sandstorm Architecture (10,000ft view) app tcpip tcpip_input() user eth space ether_input() netmap_input() content nmio POLLIN A A ix0:RX ix0:TX B B .. .. .. kernel NIC driver space

  22. Sandstorm Architecture (10,000ft view) app tcpip TCP � tcpip_input() FSM user eth space ether_input() netmap_input() content nmio POLLIN A A ix0:RX ix0:TX B B .. .. .. kernel NIC driver space

  23. Sandstorm Architecture (10,000ft view) app websrv_accept() websrv_receive() tcpip TCP � tcpip_input() tcpip_output() FSM user eth space ether_input() netmap_input() content nmio POLLIN A A ix0:RX ix0:TX B B .. .. .. kernel NIC driver space

  24. Sandstorm Architecture (10,000ft view) app websrv_accept() websrv_receive() tcpip TCP � tcpip_input() tcpip_output() FSM user eth space ether_input() netmap_input() content nmio POLLIN A A ix0:RX ix0:TX B B .. .. .. kernel NIC driver space

  25. Sandstorm Architecture (10,000ft view) app websrv_accept() websrv_receive() tcpip TCP � tcpip_input() tcpip_output() FSM user eth space ether_input() ether_output() netmap_input() content nmio POLLIN A A ix0:RX ix0:TX B B .. .. .. kernel NIC driver space

  26. Sandstorm Architecture (10,000ft view) app websrv_accept() websrv_receive() tcpip TCP � tcpip_input() tcpip_output() FSM user eth space ether_input() ether_output() netmap_input() content nmio POLLIN A A ix0:RX ix0:TX B B .. .. .. kernel NIC driver space

  27. Sandstorm Architecture (10,000ft view) app websrv_accept() websrv_receive() tcpip TCP � tcpip_input() tcpip_output() FSM user eth space ether_input() ether_output() netmap_input() netmap_output() content nmio POLLIN A A ix0:RX ix0:TX B B .. .. .. kernel NIC driver space

  28. Sandstorm Architecture (10,000ft view) app websrv_accept() websrv_receive() tcpip TCP � tcpip_input() tcpip_output() FSM user eth space ether_input() ether_output() netmap_input() netmap_output() content nmio POLLIN A A ix0:RX ix0:TX B B .. .. .. kernel NIC driver space

  29. Sandstorm Architecture (10,000ft view) app websrv_accept() websrv_receive() tcpip TCP � tcpip_input() tcpip_output() FSM user eth space ether_input() ether_output() netmap_input() netmap_output() content nmio POLLIN A A ix0:RX ix0:TX B B .. .. .. kernel NIC driver space

  30. Sandstorm Architecture (10,000ft view) app websrv_accept() websrv_receive() tcpip TCP � tcpip_input() tcpip_output() FSM user eth space ether_input() ether_output() netmap_input() netmap_output() POLLOUT content nmio POLLIN A A ix0:RX ix0:TX B B .. .. .. kernel NIC driver space

  31. Evaluation nginx+FreeBSD nginx+Linux Sandstorm 60 Throughput - 6NICs (Gbps) 50 40 30 20 10 0 4 8 16 24 32 64 128 256 512 756 1024 HTTP Object Size (KB)

  32. Evaluation nginx+FreeBSD nginx+Linux Sandstorm 60 Throughput - 6NICs (Gbps) 50 ~1.8x 40 ~3.6x 30 ~9.8x 20 10 0 4 8 16 24 32 64 128 256 512 756 1024 HTTP Object Size (KB)

  33. Evaluation nginx+FreeBSD nginx+Linux Sandstorm 60 Throughput - 6NICs (Gbps) 50 ~1.8x 40 Start converging ~3.6x for sizes ≥ 256K 30 ~9.8x 20 10 0 4 8 16 24 32 64 128 256 512 756 1024 HTTP Object Size (KB)

  34. To copy or not to copy? TX /* Get src and destination slots */ zerocopy struct netmap_slot *bf = &ppool->slot[slotindex]; struct netmap_slot *tx = &txring->slot[cur]; � n /* zero-copy packet */ tx->buf_idx = bf->buf_idx; tx->len = bf->len; tx->flags = NS_BUF_CHANGED; n OR TX /* Get source and destination bufs */ char *srcp = NETMAP_BUF(ppool, bf->buf_idx); memcpy char *dstp = NETMAP_BUF(txring, tx->buf_idx); � /* memcpy packet */ memcpy(dstp, srcp, bf->len); tx->len = bf->len;

  35. To copy or not to copy? 10 Throughput (Gbps) 8 6 4 2 0 Sandstorm “zerocopy” Sandstorm “memcpy” Intel Core 2 (2006) Serving a 24KB HTTP object

  36. To copy or not to copy? 10 Throughput (Gbps) 8 -33% 6 4 2 0 Sandstorm “zerocopy” Sandstorm “memcpy” Intel Core 2 (2006) Serving a 24KB HTTP object

  37. To copy or not to copy? 10 ? Throughput (Gbps) = 8 6 4 2 0 Sandstorm “zerocopy” Sandstorm “memcpy” Intel Sandybridge (2013) Serving a 24KB HTTP object

  38. CPU microarchitecture ~2006 C L Memory 2 Controller C FSB Hub C L DMA engine 2 C PCIe PCIe

  39. CPU microarchitecture ~2006 C L Memory 2 Controller C FSB Hub C L DMA engine 2 C PCIe PCIe

  40. CPU microarchitecture ~2006 C L Memory 2 Controller C FSB Hub C L DMA engine 2 C PCIe PCIe Raise interrupt

  41. CPU microarchitecture ~2006 C L Memory 2 Controller C FSB Hub C L DMA engine 2 C PCIe PCIe Raise interrupt

Recommend


More recommend