NetKernel: Making Network Stack Part of the Virtualized Infrastructure Zhixiong Niu , Hong Xu, Peng Cheng, Qiang Su, Yongqiang Xiong, Tao Wang, Dongsu Han, Keith Winstein
Current architecture in the cloud VM VM APP APP Guest OS Guest OS Network Stack Network Stack Hypervisor infrastructure DCN Infrastructure 2
What’re the fundamental limitations? 3
Motivation: Tenants Have to deal with the network stack all by myself CTCP BBR MPTCP Buffer TCP parameters net.ipv4.tcp_rmem initcwnd CUBIC PCC DCTCP net.ipv4.tcp_wmem initialRTO (ms) mTCP net.core.rmem_max minRTO (ms) StackMap MegaPipe net.core.wmem_max DelayedAckTimeout (ms) Kernel FlexSC FastSocket 4
Tenants are primarily concerned with performance and functionality, not implementation details. 5
Motivation: Operator I know everything here. I can really help my tenants (and make some money!) NIC NIC NIC RDMA FPGA Kernel Kernel Kernel NIC DPDK Resources 6
Motivation: Operator VM Zero visibility or control Tenant of the network stack Stack Hypervisor Provider Can’t deploy new stacks (DCTCP) Difficult to even define performance SLA Difficult to troubleshoot Difficult to perform management tasks 7
Is there a better way? 8
Making Network Stack Part of the Virtualized Infrastructure Interface unchanged (BSD sockets, etc.) Packets handled in Current architecture the NSM 9 9
Benefits • Better efficiency in management for the operator • Orchestrate the resource provisioning strategies more flexibly • Implement management functions as a part of user’s network stack • Deployment and performance gains for users without efforts • Enforce various kernel stack optimizations • Enforce high-performance userspace stacks • Use advanced hardware 10
Design Challenges • How to transparently redirect socket API calls without changing applications? • How to transmit the socket semantics between the VM and NSM? • How to ensure high performance with semantics transmission (e.g., 100 Gbps)? 11
Transparent socket API redirection • A new sock type, SOCK_NETKERNEL • GuestLib: A complete implementation of BSD socket APIs Tenant VM BSD Socket API socket(), socket_sendmsg(), … socket(), send(), … GuestLib nk_socket(), nk_socket_sendmsg(), … nk_socket(), nk_sendmsg(), … GuestLib 12
A lightweight semantics channel • NQE: NetKernel queue elements for semantics 1B 1B 1B 4B 8B 8B 4B 5B op Queue VM VM ID op_data data pointer size rsved type set ID socket ID • NQE queues for semantics transmission and hugepages for data transmission in NetKernel device Tenant VM BSD Socket API socket(), send(), … (4) return to app (1) NetKernel socket GuestLib nk_bind(), nk_sendmsg(), … (3) response NQE (2) translate to NQE Huge NetKernel NQE 13 Queues pages device
Scalable lockless queues • Per-core queue set, lockless queues • NQE switching via CoreEngine NSM 1 VM1 ServiceLib GuestLib … NK device queue set 2 queue set 1 queue set 1 connection table <VM ID, queue set ID, socket ID> <NSM ID, queue set ID, socket ID> <01, 01, 2A 3E 97 C3> <01, 01, C8 5D 42 6F> <01, 01, FC 68 4E 02> <01, 02, ?> CoreEngine 14
VM based NSM. • Supports existing kernel and userspace stacks from various Oses • Provide good isolation to guarantee the performance • Run stacks independent of the hypervisor 15
NetKernel Tenant VM NSM APP1 APP2 Network Stack BSD Socket Huge pages GuestLib ServiceLib (NetKernel Socket) vNIC Huge Huge NetKernel mmap pages pages device queues queues Virtual Switch or Embedded NetKernel CoreEngine Switch (SR-IOV) stripped area indicates a shared memory region pNICs 16
Implementation • QEMU KVM 2.5.0, Linux Kernel 4.9 • Intel(R) Xeon(R) 16-core CPU @ 2.30GHz x 2 • 256GB DDR4 2133MHz • Mellanox ConnectX-4 100G single port NIC 17
Use Cases #1: Multiplexing Application Gateway (AG): L7 proxy and load balancing services AG1 AG2 AG3 4 core 4 core 4 core 120 AG1 AG2 AG3 Normalized rps performance 100 80 60 40 20 0 0 10 20 30 40 50 60 Time (min) Normalized RPS Performance of a trace from a large cloud 18
Use Cases #1: Multiplexing NetKernel: 9 Cores Baseline: 12 Cores AG1 1 core 14 Baseline Netkernel AG2 Normalized rps per core NSM 12 10 5 cores 1 core 8 6 AG3 4 CoreEngine 1 core 2 1 core 0 0 10 20 30 40 50 60 Time (min) Benefit: NetKernel can help operator perform network management more efficiently 19
Use Case #2: Deploying mTCP without API Change • mTCP doesn't support Nginx yet • mTCP ported as an NSM, fixed a bug in DPDK mlx5_core driver • Unmodified Nginx on mTCP without any tenant effort 400 350 mTCP NSM brings ~1.8x performance gain 300 250 200 150 100 50 0 1 vCPU 2 vCPUs 4 vCPUs Krps Kernel Stack NSM mTCP NSM 20
Use Case #3: Shared Memory Networking • The operator can easily detect the on-host traffic with NetKernel • For on-host traffic, it can use shared memory NSM to avoid TCP and bridge overhead Deployment and performance gains for users Shared memory NSM can achieve >2x performance gain for on-host traffic 120 Baseline 100 1et.ernel w. sKareG mem 160 TKrougKSut (GbSs) 80 60 40 20 0 64 128 256 512 1024 2048 4096 8192 0essage 6ize (B) Benefit: NetKernel can help user achieve deployment and performance gains 21
Microbenchmarks: Throughput • Baseline (a VM) and NetKernel (a VM with a Linux Kernel) using the same setting • 8 TCP connections, 8KB messages 120 120 Baseline Baseline 100 100 1etKeUnel 1etKeUnel 7KUougKput (Gbps) 7KUougKput (Gbps) 80 80 60 60 40 40 20 20 0 0 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 # of vC38s # of vC38s Send Receive Can achieve 100Gbps with 3 cores (send), 8 cores (receive) 22
Microbenchmarks: RPS • Simple epoll server, short TCP conn. • 64B request/response 1200 Baseline 1000 1et.eUnel 5eTuests / sec (x 10 3 ) 1et.eUnel w. P7C3 160 800 mTCP NSM brings 2x performance gain P7C3 600 400 200 0 1 2 3 4 5 6 7 8 # Rf vC38s 23
Discussion and future directions • How can I do Netfilter? • Hard to support for multiple-tenant NSM • What about troubleshooting performance issues? • Operator can easily monitor their NSMs by deploy additional mechanisms in the NSMs • Does NetKernel increase the attack surface? • Own address spaces for NK device • Isolated channel between NSM and VM • Future directions • Performance isolation • Charging policies • FPGA/SoC 24
Recap • Designed and implemented NetKernel • Decouples the network stack from the guest • Making it part of the virtualized infrastructure in the cloud • Enabled several new usecases • Multiplexing, mTCP NSM, Shared mem. NSM • Conducted comprehensive testbed evaluation with commodity 100G NICs • Website • https://netkernel.net 25
Recommend
More recommend