generic external memory for switch data planes
play

Generic External Memory for Switch Data Planes Daehyeok Kim Yibo - PowerPoint PPT Presentation

Generic External Memory for Switch Data Planes Daehyeok Kim Yibo Zhu, Changhoon Kim, Jeongkeun Lee, Srinivasan Seshan Enabling Virtual Switching on ToR Switch Move virtual switch (Tenant, VM IP) Host IP Multi-million to ToR switch (1,


  1. Generic External Memory for Switch Data Planes Daehyeok Kim Yibo Zhu, Changhoon Kim, Jeongkeun Lee, Srinivasan Seshan

  2. Enabling Virtual Switching on ToR Switch Move virtual switch (Tenant, VM IP) Host IP Multi-million to ToR switch (1, 20.0.0.1) 10.0.0.1 Entries (1, 20.0.0.2) 10.0.1.1 ≫ SRAM size! (1, 20.0.0.3) 10.0.2.1 … … Cannot install virtual Customers’ Bare-metal servers switches on the servers Limited SRAM space is bottleneck for memory-intensive applications! 2

  3. Current Trend: Moving Functionality to Switches All these applications can benefit from large memory space! 3

  4. Programmable Switch Chips Need More Memory • Programmable data plane technology • E.g., Protocol-Independent Switch Architecture (PISA) + P4 • Flexible but only with on-chip SRAM cache Programmable + switch chip DRAM w/ SRAM cache = Lots of innovative applications! 4

  5. Status quo • Fixed-function switch chips built with fixed-function external memory • These aren’t very useful • Inflexible: Usage fixed at design time • Fixed and small scale: Memory size and bandwidth fixed at design time • Expensive: Chip getting larger and complex Is programmable switch chip + general-purpose memory possible? 5

  6. GEM: Generic External Memory for Programmable Data Planes Programmable switch chip Re-use commodity hardware Flexible memory access BW and size General-purpose DRAM pool 6

  7. Key Components Match Action 20.0.0.1:80 10.0.0.1:20 … … C2: Packet management miss Dst: Dst: during remote memory access 20.0.0.2:80 10.0.1.1:20 C3: Remote data structures and APIs C1: Remote memory Match Action access channel 20.0.0.1:80 10.0.0.1:20 20.0.0.2:80 10.0.1.1:20 20.0.0.3:80 10.0.2.1:20 … … 7

  8. C1: Remote Memory Access Channel • Goal: Enable programmable switch chip to directly access memory • Purely access DRAM: No impact to the server’s existing compute and networking workloads • Minimal latency between the chip and memory Leverage RDMA! • Challenge: How to generate RDMA requests from the data plane? • Programmable switch chip cannot generate arbitrary new packets *RDMA: Remote Direct Memory Access 8

  9. Accessing Remote Memory from Data Plane via RDMA Match Action DRAM server Context 20.0.0.1:80 10.0.0.1:20 Server #1 QP#, SEQ#, ACK#, … … … … … miss Dst: Dst: 20.0.0.2:80 10.0.1.1:20 Generating RDMA request ETH Header 1. Clone and truncate a packet RDMA Header 2. Add RDMA headers (READ) Match Action READ Resp. 20.0.0.1:80 10.0.0.1:20 READ (entry) 20.0.0.2:80 10.0.1.1:20 Implementable in P4 20.0.0.3:80 10.0.2.1:20 … … 9

  10. Key Components Match Action 20.0.0.1:80 10.0.0.1:20 … … C2: Packet management miss Dst: Dst: during remote memory access 20.0.0.2:80 10.0.1.1:20 C3: Remote data structures and APIs C1: Remote memory Match Action access channel 20.0.0.1:80 10.0.0.1:20 20.0.0.2:80 10.0.1.1:20 20.0.0.3:80 10.0.2.1:20 … … 10

  11. C2: Packet Management during Remote Memory Access Match Action 20.0.0.1:80 10.0.0.1:20 … … miss Dst: Dst: 20.0.0.2:80 10.0.1.1:20 Packet buffer in on-chip SRAM Match Action Consuming too much READ Resp. 20.0.0.1:80 10.0.0.1:20 SRAM space! L 20.0.0.2:80 10.0.1.1:20 READ (entry) 20.0.0.3:80 10.0.2.1:20 … … 11

  12. Depositing Packets on Remote Buffer Match Action 20.0.0.1:80 10.0.0.1:20 … … miss Dst: Dst: 20.0.0.2:80 10.0.1.1:20 Match Action Packet READ Resp. 20.0.0.1:80 10.0.0.2:20 20.0.0.2:80 10.0.1.1:20 20.0.0.3:80 10.0.2.1:20 WRITE (pkt) READ (entry) … … 12

  13. Key Components Match Action 20.0.0.1:80 10.0.0.1:20 … … C2: Packet management miss Dst: Dst: during remote memory access 20.0.0.2:80 10.0.1.1:20 C3: Remote data structures and APIs C1: Remote memory Match Action access channel 20.0.0.1:80 10.0.0.1:20 20.0.0.2:80 10.0.1.1:20 20.0.0.3:80 10.0.2.1:20 … … 13

  14. C3: Remote Data Structures and APIs Match Mem addr Match Action 20.0.0.1:80 10.0.0.1:20 20.0.0.1:80 0xA 20.0.0.2:80 0xB … … miss 20.0.0.3:80 0xC Dst: … … 20.0.0.2:80 Consuming too much SRAM space! L Match Action Packet READ Resp. How to locate 0xA 20.0.0.1:80 10.0.0.2:20 remote entry? 0xB 20.0.0.2:80 10.0.1.1:20 WRITE (pkt) READ (entry) 0xC 20.0.0.3:80 10.0.2.1:20 @ 0xB @ 0xB … … … 14

  15. General Data Structures and APIs? • Ongoing work: designing general data structures for remote memory • Proof-of-concept use cases for specific applications • Lookup table extension for extending virtual switch table • Packet buffer extension for mitigating packet drops due to incast • State store extension for network telemetry 15

  16. Use Case: Extending Lookup Table Match Action Fetch entries from 20.0.0.1:80 10.0.0.1:20 remote tables J … … Hot entries on SRAM Match Action 20.0.0.1:80 10.0.0.1:20 Entire Customers’ Bare-metal servers Remote table 20.0.0.2:80 10.0.1.1:20 entries 20.0.0.3:80 10.0.2.1:20 16 … …

  17. Other Use Cases • Packet buffer extension for • State store extension for mitigating packet drops network telemetry Can’t maintain many Update the stateful objects L remote stores J Reduce Queue is full… packet drops J Dropping packets L Remote State stores Remote buffer servers 17

  18. Experiment Setup ETH ETH IP/DSCP=0x00 IP/DSCP=0x28 TCP TCP Server Server Payload Payload Run NPTcp Action DSCP -> 0x28 *Baseline: Simple L2 switch 18

  19. Results • End-to-end latency 1 - 2 μs additional latency • Packet store / load throughput: close to the line rate (≈ 37.5 Gbps) 19

  20. Summary Vision: G eneric E xternal M emory for Programmable Data Plane Q1: Efficient caching on SRAM Q2: Dynamically scaling DRAM pool Q3: Handling server failures GEM will be a key enabler for innovations in networking and computational networking! 20

Recommend


More recommend