Packet Transactions: High-Level Programming for Line-Rate Switches Anirudh Sivaraman, Alvin Cheung, Mihai Budiu, Changhoon Kim, Mohammad Alizadeh, Hari Balakrishnan, George Varghese, Nick McKeown, Steve Licking
Programmability at line rate • Programmable: Can we express new data-plane algorithms? • Active queue management • Congestion control • Measurement • Load balancing • Line rate: Highest capacity supported by dedicated hardware 2
Programmable switching chips Same performance as fixed-function chips, some programmability E.g., FlexPipe, Xpliant, Tofino Queues/ Scheduler Deparser Parser Egress pipeline Ingress pipeline match/action match/action match/action match/action match/action Eth VLAN In Out IPv4 IPv6 TCP New Stage 1 Stage 2 Stage 16 Stage 1 Stage 16
Where do programmable switches fall short? • Hard to program data-plane algorithms today • Hardware good for stateless tasks (forwarding), not stateful ones (AQM) • Low-level languages (P4, POF). • Challenges • Can we program data-plane algorithms in a high-level language? • Can we design a stateful instruction set supporting these algorithms?
Contributions • Packet transaction: High-level abstraction for data-plane algorithms • Examples of several algorithms as packet transactions • Atoms: A representation for switch instruction sets • Seven concrete stateful instructions • Compiler from packet transactions to atoms • Allows us to iteratively design switch instruction sets
Packet transactions • Packet transaction: block of imperative code • Transaction runs to completion, one packet at a time, serially p1.sample = 0 p1 count if (count == 9): p2.sample = 0 pkt.sample = pkt.src p2 0 1 2 9 0 count = 0 else : pkt.sample = 0 count++ persistent state packet fields p10.sample = 1.2.3.4 p10
Under the hood … pipeline match/action match/action match/action Stage 1 Stage 2 Stage 16 7
A machine model for line-rate switches pipeline action action action state state state unit unit unit Packet Header Stage 1 Stage 2 Stage 16 8
A machine model for line-rate switches pipeline action action action state state state unit unit unit Typical requirement: 1 pkt / nanosecond Stage 1 Stage 2 Stage 16 9
A machine model for line-rate switches action action action state state state unit unit unit Stage 1 Stage 2 Stage 16 10
A machine model for line-rate switches action action action state state state constant X unit unit unit Add Mul choice 2-to-1 Mux X Stage 1 Stage 2 Stage 16 • Atom: smallest unit of atomic packet/state update A switch’s atoms constitute its instruction set 11
Stateless vs. stateful operations Stateless operation: pkt.f4 = pkt.f1 + pkt.f2 – pkt.f3 f1 f1 f1 f2 f2 f2 pkt.f4 = pkt.tmp = f3 f3 f3 pkt.tmp - pkt.f3 pkt.f1 + pkt.f2 f4 f4 f4 = tmp – f3 tmp tmp = f1 tmp = f1 Can pipeline stateless operations + f2 + f2
Stateless vs. stateful operations X should be 2, Stateful operation: x = x + 1 not 1! X = 0 X = 1 tmp tmp tmp pkt.tmp = x pkt.tmp ++ x = pkt.tmp = 0 = 1 tmp tmp tmp = 0 = 1
Stateless vs. stateful operations Stateful operation: x = x + 1 X X++ tmp Cannot pipeline, need atomic operation in h/w
Stateful atoms can be fairly involved x 2 - t o - 1 M u 0 x Adder Const Sub 3 - t o - 1 pkt_1 M u x RELOP pkt_2 Const Const 3 - t o - 1 pkt_1 M u x pkt_2 Update state in one of four x x 2 - t o 2 - t o - 1 - 1 0 M u x 0 M u x Adder Adder Const Const Sub Sub 3 - t o - 1 2 - t o - 1 3 - t o - 1 pkt_1 pkt_1 M u x M u x M u x ways based on four RELOP pkt_2 pkt_2 Const Const Const 3 - t o - 1 3 - t o - 1 predicates. pkt_1 pkt_1 M u x M u x pkt_2 pkt_2 x 2 - t o - 1 0 M u x Adder Const Sub 3 - t o - 1 pkt_1 M u x 2 - t o - 1 x pkt_2 M u x Each predicate can itself Const 3 - t o - 1 pkt_1 M u x pkt_2 depend on the state. x 2 - t o - 1 0 M u x Adder Const Sub 3 - t o - 1 pkt_1 M u x RELOP pkt_2 Const Const 3 - t o - 1 pkt_1 M u x pkt_2 x 2 - t o - 1 0 M u x Adder Const Sub 3 - t o - 1 2 - t o - 1 pkt_1 M u x M u x pkt_2 Const 3 - t o - 1 pkt_1 M u x pkt_2 x 2 - t o - 1 M u x 0 Adder Const Sub 3 - t o - 1 pkt_1 M u x pkt_2 Const 3 - t o - 1 pkt_1 M u x pkt_2
Compiling packet transactions Packet Sampling Pipeline Packet Sampling Algorithm Stage 2 Stage 1 pkt.old = count; pkt.tmp = pkt.old == 9; if (count == 9): pkt.sample = pkt.tmp ? pkt.new = pkt.tmp ? 0 : (pkt.old + 1); pkt.src : 0 pkt.sample = pkt.src count = pkt.new; count = 0 Compiler else: pkt.sample = 0 count++ Stage 1 Stage 2 Stage 16
Designing programmable switches Modify pipeline geometry or atom. Pipeline geometry Algorithm doesn’t compile? Compiler Atom Algorithm compiles Algorithm Move on to another algorithm Focus on stateful atoms, stateless operations are easily pipelined
Demo
Stateful atoms for programmable switches Atom Description Least R/W Read or write state Expressive RAW Read, add, and write back PRAW Predicated version of RAW IfElseRA 2 RAWs, one each when a W predicate is true or false Sub IfElseRAW with a stateful subtraction capability Nested 4-way predication (nests 2 IfElseRAWs) Most Pairs Update a pair of state variables Expressive
Expressiveness of packet transactions Algorithm LOC Bloom filter 29 Heavy hitter detection 35 Rate-Control 23 Protocol Flowlet switching 37 Sampled NetFlow 18 HULL 26 Adaptive Virtual Queue 36 CONGA 32 CoDel 57
Compilation results Algorithm LOC Most expressive stateful atom required Bloom filter 29 R/W Heavy hitter detection 35 RAW Rate-Control 23 PRAW Protocol Flowlet switching 37 PRAW Sampled NetFlow 18 IfElseRAW HULL 26 Sub Adaptive Virtual Queue 36 Nested CONGA 32 Pairs CoDel 57 Doesn’t map
Compilation results Algorithm LOC Most expressive Pipeline Pipeline stateful atom required Depth Width Bloom filter 29 R/W 4 3 Heavy hitter detection 35 RAW 10 9 Rate-Control 23 PRAW 6 2 Protocol Flowlet switching 37 PRAW 3 3 Sampled NetFlow 18 IfElseRAW 4 2 HULL 26 Sub 7 1 Adaptive Virtual Queue 36 Nested 7 3 CONGA 32 Pairs 4 2 CoDel 57 Doesn’t map 15 3 ~100 atom instances are sufficient
Modest cost for programmability • All atoms meet timing at 1 GHz in a 32-nm library. • They occupy modest additional area relative to a switching chip. Atom Description Atom area Area for 100 atoms relative (micro m^2) to 200 mm^2 chip R/W Read or write state 250 0.0125% RAW Read, add, and write back 431 0.022% PRAW Predicated version of RAW 791 0.039% IfElseRAW 2 RAWs, one each when a 985 0.049% predicate is true or false Sub IfElseRAW with a stateful 1522 0.076% subtraction capability Nested 4-way predication (nests 2 3597 0.179% IfElseRAWs) <1 % additional area for 100 atom instances Pairs Update a pair of state variables 5997 0.30%
Conclusion • Packet transactions: an abstraction for data-plane algorithms • Atoms: a representation for switch instruction sets • A blue print for designing switch instruction sets • Source code: http://web.mit.edu/domino
Backup slides
Sequential to pipelined code pkt.old = count pkt.tmp = pkt.old == 9 pkt.new = pkt.tmp ? 0 : (pkt.old + 1) Create one node for each instruction pkt.sample = pkt.tmp ? pkt.src : 0 count = pkt.new
Sequential to pipelined code pkt.old = count pkt.tmp = pkt.old == 9 pkt.new = pkt.tmp ? 0 : (pkt.old + 1) Packet field dependencies pkt.sample = pkt.tmp ? pkt.src : 0 count = pkt.new
Sequential to pipelined code pkt.old = count pkt.tmp = pkt.old == 9 pkt.new = pkt.tmp ? 0 : (pkt.old + 1) State dependencies pkt.sample = pkt.tmp ? pkt.src : 0 count = pkt.new
Sequential to pipelined code pkt.old = count pkt.tmp = pkt.old == 9 pkt.new = pkt.tmp ? 0 : (pkt.old + 1) Strongly connected components pkt.sample = pkt.tmp ? pkt.src : 0 count = pkt.new
Sequential to pipelined code pkt.old = count pkt.tmp = pkt.old == 9 pkt.new = pkt.tmp ? 0 : (pkt.old + 1); count = pkt.new Condensed DAG pkt.sample = pkt.tmp ? pkt.src : 0
Sequential to pipelined code Stage 1 Stage 2 pkt.old = count; pkt.tmp = pkt.old == 9; pkt.sample = pkt.tmp ? pkt.new = pkt.tmp ? 0 : (pkt.old + 1); pkt.src : 0 count = pkt.new; Code pipelining
Hardware constraints Stage 1 Stage 2 pkt.old = count; pkt.tmp = pkt.old == 9; pkt.sample = pkt.tmp ? pkt.new = pkt.tmp ? 0 : (pkt.old + 1); pkt.src : 0 count = pkt.new; Stage 1 Stage 2 Stage 16
Hardware constraints: example constant 1 X x = x + 1 maps to this atom Add Mul x = x * x doesn’t map 2-to-1 Mux Add choice X § Determines if algorithm can/cannot run at line rate
Our work pipeline Packet transaction in Domino For each packet match/action match/action match/action Calculate average queue size if min < avg < max Compiler calculate probability p mark packet with probability p else if avg > max mark packet Stage 1 Stage 2 Stage 16 Program in imperative DSL, compile to run at line-rate
Recommend
More recommend