the express data path
play

The eXpress Data Path Fast Programmable Packet Processing in the - PowerPoint PPT Presentation

The eXpress Data Path Fast Programmable Packet Processing in the Operating System Kernel Toke Hiland-Jrgensen (Karlstad University) Jesper Dangaard Brouer (Red Hat) Daniel Borkmann (Cilium.io) John Fastabend (Cilium.io) Tom Herbert


  1. The eXpress Data Path Fast Programmable Packet Processing in the Operating System Kernel Toke Høiland-Jørgensen (Karlstad University) Jesper Dangaard Brouer (Red Hat) Daniel Borkmann (Cilium.io) John Fastabend (Cilium.io) Tom Herbert (Quantonium Inc) David Ahern (Cumulus Networks) David Miller (Red Hat) CoNEXT '18 Heraklion, Greece, Dec 2018 - Toke Høiland-Jørgensen <toke@toke.dk>    1

  2. Outline Challenges with high-speed packet processing XDP design Performance evaluation Example applications Conclusion - Toke Høiland-Jørgensen <toke@toke.dk>    2

  3. High-Speed Packet Processing is Hard Millions of packets per second 10 Gbps: 14.8Mpps / 67.5 ns per packet 100 Gbps: 148Mpps / 6.75 ns per packet Operating system stacks are too slow to keep up - Toke Høiland-Jørgensen <toke@toke.dk>    3

  4. Previous solutions Kernel bypass - move hardware to userspace High performance, but hard to integrate with the system Fast-path frame-to-userspace solutions (Netmap etc) Kernel in control, but lower performance Custom in-kernel modules (e.g., Open vSwitch) Avoids context switching, but is a maintenance burden XDP: Move processing into the kernel instead - Toke Høiland-Jørgensen <toke@toke.dk>    4

  5. XDP: Benefits Integrated with the kernel; driver retains control of hardware Can selectively use kernel stack features Stable API No packet re-injection needed Transparent to the host Dynamically re-programmable Doesn’t need a full CPU core - Toke Høiland-Jørgensen <toke@toke.dk>    5

  6. XDP: Overall design Userspace VMs and containers Applications Control plane The XDP driver hook The eBPF virtual machine Linux kernel Network stack BPF maps AF_INET Virtual devices AF_RAW The eBPF verifier TCP/UDP BPF maps IP layer AF_XDP Queueing TC BPF and forwarding Device driver XDP Build sk_buff Drop Network hardware Packet data flow Control data flow - Toke Høiland-Jørgensen <toke@toke.dk>    6

  7. XDP program flow Program execution phase transitions Communication w/rest of system Communication with rest of system Other BPF Userspace Kernel Packet flow programs programs networking stack in kernel Read/write metadata Packet verdict Context object Kernel helpers Maps Return code - RX metadata (queue no, ...) Use kernel functions, e.g.: - Key/value stores - Pointer to packet data - Checksumming - Hash, array, trie, etc. - Space for custom metadata - Routing table lookups - Defined by program Pass to stack Xmit out Redirect Drop Parse packet Rewrite packet Userspace Interface CPU - Direct memory access to packet data - Write any packet header / payload - Tail calls to split processing - Grow/shrink packet headroom - Toke Høiland-Jørgensen <toke@toke.dk>    7

  8. Example XDP program SEC("xdp1") /* marks main eBPF program entry point */ int xdp_prog1(struct xdp_md *ctx) { void *data_end = (void *)(long)ctx->data_end; void *data = (void *)(long)ctx->data; /* map used to count packets; key is IP protocol, struct ethhdr *eth = data; int rc = XDP_DROP; value is pkt count */ long *value; u16 h_proto; u64 nh_off; u32 ipproto; struct bpf_map_def SEC("maps") rxcnt = { .type = BPF_MAP_TYPE_PERCPU_ARRAY, nh_off = sizeof(*eth); .key_size = sizeof(u32), if (data + nh_off > data_end) .value_size = sizeof(long), return rc; .max_entries = 256, }; h_proto = eth->h_proto; /* swaps MAC addresses using direct packet data access */ /* check VLAN tag; could be repeated to support double-tagged VLAN */ static void swap_src_dst_mac(void *data) if (h_proto == htons(ETH_P_8021Q) || h_proto == htons(ETH_P_8021AD)) { { struct vlan_hdr *vhdr; unsigned short *p = data; unsigned short dst[3]; vhdr = data + nh_off; dst[0] = p[0]; nh_off += sizeof(struct vlan_hdr); dst[1] = p[1]; if (data + nh_off > data_end) dst[2] = p[2]; return rc; p[0] = p[3]; h_proto = vhdr->h_vlan_encapsulated_proto; p[1] = p[4]; } p[2] = p[5]; p[3] = dst[0]; if (h_proto == htons(ETH_P_IP)) p[4] = dst[1]; ipproto = parse_ipv4(data, nh_off, data_end); p[5] = dst[2]; else if (h_proto == htons(ETH_P_IPV6)) } ipproto = parse_ipv6(data, nh_off, data_end); else static int parse_ipv4(void *data, u64 nh_off, void *data_end) ipproto = 0; { struct iphdr *iph = data + nh_off; /* lookup map element for ip protocol, used for packet counter */ if (iph + 1 > data_end) value = bpf_map_lookup_elem(&rxcnt, &ipproto); return 0; if (value) return iph->protocol; *value += 1; } /* swap MAC addrs for UDP packets, transmit out this interface */ if (ipproto == IPPROTO_UDP) { swap_src_dst_mac(data); rc = XDP_TX; } return rc; } - Toke Høiland-Jørgensen <toke@toke.dk>    8

  9. Performance benchmarks Benchmark against DPDK Establishes baseline performance Simple tests Packet drop performance CPU usage Packet forwarding performance All tests are with 64 byte packets - measuring Packets Per Second (PPS). - Toke Høiland-Jørgensen <toke@toke.dk>    9

  10. Packet drop performance DPDK 120 XDP 100 Linux (raw) Linux (conntrack) 80 Mpps 60 40 20 0 1 2 3 4 5 6 Number of cores - Toke Høiland-Jørgensen <toke@toke.dk>    10

  11. CPU usage in drop test 100 80 CPU usage (%) 60 40 DPDK 20 XDP Linux 0 0 5 10 15 20 25 Offered load (Mpps) - Toke Høiland-Jørgensen <toke@toke.dk>    11

  12. Packet forwarding throughput 80 70 60 50 Mpps 40 30 DPDK (different NIC) 20 XDP (same NIC) 10 XDP (different NIC) 0 1 2 3 4 5 6 Number of cores - Toke Høiland-Jørgensen <toke@toke.dk>    12

  13. Packet forwarding latency Average Maximum < 10 μs 100 pps 1 Mpps 100 pps 1 Mpps 100 pps 1 Mpps XDP 82 μs 7 μs 272 μs 202 μs 0% 98.1% DPDK 2 μs 3 μs 161 μs 189 μs 99.5% 99.0% - Toke Høiland-Jørgensen <toke@toke.dk>    13

  14. Application proof-of-concept Shows feasibility of three applications: Software router DDoS protection system Layer-4 load balancer Not a benchmark against state-of-the-art implementations - Toke Høiland-Jørgensen <toke@toke.dk>    14

  15. Software routing performance XDP (single route) XDP (full table) Linux (single route) Linux (full table) 0 1 2 3 4 5 Mpps (single core) - Toke Høiland-Jørgensen <toke@toke.dk>    15

  16. DDoS protection 35 XDP No XDP 30 25 TCP Ktrans/s 20 15 10 5 0 0 5 10 15 20 25 Mpps DOS traffic - Toke Høiland-Jørgensen <toke@toke.dk>    16

  17. Load balancer performance CPU Cores 1 2 3 4 5 6 XDP (Katran) 5.2 10.1 14.6 19.5 23.4 29.3 Linux (IPVS) 1.2 2.4 3.7 4.8 6.0 7.3 Based on the Katran load balancer (open sourced by Facebook). - Toke Høiland-Jørgensen <toke@toke.dk>    17

  18. Summary XDP: Integrates programmable packet processing into the kernel Combines speed with flexibility Is supported by the Linux kernel community Is already used in high-profile production use cases See https://github.com/tohojo/xdp-paper for details - Toke Høiland-Jørgensen <toke@toke.dk>    18

  19. Acknowledgements XDP has been developed over a number of years by the Linux networking community. Thanks to everyone involved; in particular, to: Alexei Starovoitov for his work on the eBPF VM and verifier Björn Töpel and Magnus Karlsson for their work on AF_XDP Also thanks to our anonymous reviewers, and to our shepherd Srinivas Narayana for their helpful comments on the paper - Toke Høiland-Jørgensen <toke@toke.dk>    19

Recommend


More recommend