CS244 Advanced Topics in Networking Lecture 7: Programmable Forwarding Nick McKeown Processing in Hardware for SDN” “Forwarding Metamorphosis: Fast Programmable Match-Action [Pat Bosshart et al. 2013] Spring 2020
Context + Others from TI Pat Bosshart At the time: TI (Texas Instruments) Architect of first LISP CPU and 1GHz DSP + Others from Stanford George Varghese At the time: MSR Today: Professor at UCLA At the time the paper was written (2012)… ▪ Fastest switch ASICs were fixed function, around 1Tb/s ▪ Lots of interest in “disaggregated” switches for large data-centers 2
Fixed Parser Switch with fixed function pipeline L2 Table L2 Hdr Actions Fixed Header Processing Pipeline IPv4 Table IP Hdr Actions IPv6 Table v6 Hdr Actions ACL Table ACL Actions 3
You said Amalee Wilson There’s a key phrase in the abstract, “contrary to concerns within the community,” and I’m curious about what those concerns are. 4
“Programmable switches run 10x slower, consume more power and cost more.” Conventional wisdom in 2010
Packet Forwarding Speeds 6.4Tb/s 100000 10000 Switch Chip 1000 CPU 100 Gb/s (per chip) 10 1 0 1990 1995 2000 2005 2010 2015 2020
Packet Forwarding Speeds 6.4Tb/s 100000 10000 Switch Chip 80x 1000 CPU 100 Gb/s (per chip) 10 1 0 1990 1995 2000 2005 2010 2015 2020
Domain Specific Processors Machine Signal Computers Graphics Networking Learning Processing Java OpenCL Matlab Language TensorFlow >> Compiler Compiler Compiler Compiler Compiler ? ? TPU CPU GPU DSP
Domain Specific Processors Machine Signal Computers Graphics Networking Learning Processing Java OpenCL Matlab P4 TensorFlow >> Compiler Compiler Compiler Compiler Compiler ? PISA TPU CPU GPU DSP aka “RMT”
Network systems tend to be designed “bottom-up” Switch OS Driver “This is how I process packets …” Fixed-function switch
What if they could be programmed “top-down”? “This is precisely how you must Switch OS process packets” Driver Programmable Switch
You said Wantong Jiang: At the end of the paper, the authors mention FPGA and claim that they are too expensive. This paper was published in 2013 and I wonder if it's still the case nowadays. Firas Abuzaid: The paper mentions that FPGAs are too expensive to be considered. Now that FPGAs have become more widely available, could they be used instead of RMTs? 12
The RMT design [2013]
Programmable Programmable Packet Buffers Match+Action De-parsers parsers Match+Action Pipeline Pipeline 14
15
You said Will Brand [W]hat goes into designing the vocabulary of a RISC instruction set? Since I can't just try to prove the instructions are Turing-complete, and the instruction set doesn't have the kind of specification I might expect from a general-purpose language, I find it difficult to "trust" that Table 1 encapsulates a reasonable portion of the actions we might want to make possible… 16
PISA: Protocol Independent Switch Architecture Programmer declares which Programmer declares what headers are recognized tables are needed and how packets are processed Match+Action Programmable Memory ALU Parser All stages are identical. A “compiler target”.
PISA: Protocol Independent Switch Architecture Programmable Parser
PISA: Protocol Independent Switch Architecture Programmable MPLS Tag Parser Ethernet Table IPv4 MAC Address Address Table ACL Table Rules
PISA: Protocol Independent Switch Architecture Programmable MPLS IPv4 Tag Parser Ethernet Table Address Table MAC Address ACL Table Rules IPv6 Address Table VXLAN
P4 program example: Parsing Headers IPv4 Ethernet My Encap MyEncap ACL IPv6 header_type ethernet_t { My fields { Ethernet dstAddr : 48; Encap srcAddr : 48; parser parse_ethernet { etherType : 16; extract (ethernet); } return select ( latest .etherType) { } 0x8100 : parse_my_encap; IPv4 IPv6 0x800 : parse_ipv4; header_type my_encap_t { 0x86DD : parse_ipv6; fields { } foo : 12; } bar : 8; baz : 4; qux : 4; TCP next_protocol : 4; } }
P4 program example IPv4 Ethernet My Encap MyEncap ACL IPv6 table ipv4_lpm { reads { control ingress ipv4.dstAddr : lpm ; { } apply (l2); actions { apply (my_encap); set_next_hop; if (valid(ipv4) { drop; apply (ipv4_lpm); } } else { } apply (ipv6_lpm); } apply (acl); action set_next_hop(nhop_ipv4_addr, port) } { modify_field (metadata.nhop_ipv4_addr, nhop_ipv4_addr); modify_field (standard_metadata.egress_port, port); add_to_field (ipv4.ttl, -1); }
How programmability is used 1 Reducing complexity
Reducing complexity Switch OS switch.p4 Driver Tunneling IPv4 and IPv6 routing - IPv4 and IPv6 Routing & Switching - Unicast Routing - IP-in-IP (6in4, 4in4) Security Features - Routed Ports & SVI - VXLAN, NVGRE, GENEVE & GRE - Storm Control, IP Source Guard - VRF - Segment Routing, ILA - Unicast RPF Compiler Monitoring & Telemetry - Strict and Loose MPLS - Ingress Mirroring and Egress Mirroring - Multicast - LER and LSR - Negative Mirroring - PIM-SM/DM & PIM-Bidir - IPv4/v6 routing (L3VPN) - Sflow - L2 switching (EoMPLS, VPLS) - INT Ethernet switching - MPLS over UDP/GRE - VLAN Flooding Counters - MAC Learning & Aging ACL - Route Table Entry Counters - STP state - MAC ACL, IPv4/v6 ACL, RACL - VLAN/Bridge Domain Counters - VLAN Translation - QoS ACL, System ACL, PBR - Port/Interface Counters - Port Range lookups in ACLs Load balancing Protocol Offload - LAG QOS - BFD, OAM - ECMP & WCMP - QoS Classification & marking - Resilient Hashing - Drop profiles/WRED Multi-chip Fabric Support - Flowlet Switching - RoCE v2 & FCoE - Forwarding, QOS - CoPP (Control plane policing) Fast Failover Programmable Switch – LAG & ECMP NAT and L4 Load Balancing
Reducing complexity My Switch OS switch.p4 Driver Compiler Programmable Switch
How programmability is used Adding new features 2
Protocol complexity 20 years ago Ethernet ethtype ethtype IPv4 IPX
Datacenter switch today switch.p4
Example new features 1. New encapsulations and tunnels 2. New ways to tag packets for special treatment 3. New approaches to routing: e.g. source routing in DCs 4. New approaches to congestion control 5. New ways to process packets: e.g. ticker-symbols
Example new features 1. Layer-4 Load Balancer 1 ▪ Replace 100 servers or 10 dedicated boxes with one programmable switch ▪ Track and maintain mapping for 5-10 million http flows 2. Fast stateless firewall ▪ Add/delete and track 100s of thousands of new connections per second 3. Cache for Key-value store 2 ▪ Memcache in-network cache for 100 servers ▪ 1-2 billion operations per second [1] “SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switching ASICs.” Rui Miao et al. Sigcomm 2017. [2] “NetCache: Balancing Key-Value Stores with Fast In-Network Caching”, Xin Jin et al. SOSP 2017
How programmability is used 3 Network telemetry
“I visited Switch 1 @780ns, 1 Switch 9 @1.3 µ s, Switch 12 @2.4 µ s” “ Which path did my packet take ?” # Rule 1 2 “ In Switch 1, I followed rules 75 and 250. 3 In Switch 9, I followed rules 3 and 80. ” … 2 “ Which rules did my packet follow ?” 75 192.168.0/24 …
3 “Delay: 100ns, 200ns, 19740ns” “ How long did my packet queue at each switch ?” Queue 4 “ Who did my packet share the queue with ?” Time
3 “ How long did my packet queue at each switch ?” “Delay: 100ns, 200ns, 19740ns” Aggressor flow! Queue 4 “ Who did my packet share the queue with ?” Time
These seem like pretty important questions “ Which path did my packet take ?” 1 “ Which rules did my packet follow ?” 2 “ How long did it queue at each switch ?” 3 “ Who did it share the queues with ?” 4 A programmable device can potentially answer all four questions. At line rate.
INT: In-band Network Telemetry Add : SwitchID, Arrival Time, Queue Delay, Matched Rules, … Original Packet Log, Analyze Visualize Replay
Example using INT [nanoseconds]
End.
Recommend
More recommend