Chair of Network Architectures and Services Department of Informatics Technical University of Munich Key Properties of Programmable Data Plane Targets Dominik Scholz , Henning Stubbe, Sebastian Gallenmüller, Georg Carle Chair of Network Architectures and Services Department of Informatics Technical University of Munich
Motivation Move to the Data Plane From SDN with OpenFlow to (fully) programmable data planes (e.g. P4, POF , eBPF) Lots of new P4 applications that run in the data plane • inband network telemetry • in-network computation • protocol acceleration (e.g. congestion control) • middleboxes (DDoS mitigation) • . . . Scholz, Stubbe, Gallenmüller, Carle — Key Properties of Programmable Data Plane Targets 2
Motivation Move to the Data Plane From SDN with OpenFlow to (fully) programmable data planes (e.g. P4, POF , eBPF) Lots of new P4 applications that run in the data plane • inband network telemetry • in-network computation Image from https://bit.ly/2LHVmDZ • protocol acceleration (e.g. congestion control) P4 is of high interest to industry, e.g. avionics • middleboxes (DDoS mitigation) • rapid prototyping • . . . • program verification • . . . • e.g. used for 20+ years with same hardware Scholz, Stubbe, Gallenmüller, Carle — Key Properties of Programmable Data Plane Targets 2
Motivation Move to the Data Plane Lots of new target platforms • CPU From SDN with OpenFlow to (fully) programmable data • Network Processing Unit (NPU) planes (e.g. P4, POF , eBPF) • FPGA Lots of new P4 applications that run in the data plane • ASIC • inband network telemetry • in-network computation Lots of key performance indicators • protocol acceleration (e.g. congestion control) • throughput & packet rate • middleboxes (DDoS mitigation) • latency & jitter • . . . • resources • price • . . . → Need to understand properties of devices and P4 programs → Focus on certain aspects for modeling Scholz, Stubbe, Gallenmüller, Carle — Key Properties of Programmable Data Plane Targets 2
Outline P4 Programmable Network Devices Methodology CPU Performance Model ASIC Resource Model Conclusion Scholz, Stubbe, Gallenmüller, Carle — Key Properties of Programmable Data Plane Targets 3
P4 Programmable Network Devices What is P4? Image from https://bit.ly/3mDpaE9 Programmable data planes Centerpiece: Match-Action tables • custom network device behavior • matches key to action • blocks: parser, match-action, deparser • key: packet or meta data • (ideally) target independent • exact, ternary, LPM match Scholz, Stubbe, Gallenmüller, Carle — Key Properties of Programmable Data Plane Targets 4
P4 Programmable Network Devices Comparison of Available Targets CPU NPU FPGA ASIC Throughput + ++ +++ ++++ Latency > 10 µ s 5 µs to 10 µs < 2 µ s < 2 µ s Jitter −−−− −−− −− − Resources ++++ +++ ++ + Flexibility ++++ +++ ++ + Example t4p4s DPDK NFP-4000 SmartNIC NetFPGA SUME Intel Tofino Table 1: Categorizations are estimates for available products based on own measurements and related work In this work we focus on the extremes: CPU and ASIC Scholz, Stubbe, Gallenmüller, Carle — Key Properties of Programmable Data Plane Targets 5
Methodology Performance Analysis of P4 Programs Dang et al. [1] divide P4 program into components • parser • processing • packet modification • actions • . . . Idea: evaluate components (e.g. match-action tables) in isolation [1] [1] Dang et al. "Whippersnapper: A p4 language benchmark suite." Proceedings of the Symposium on SDN Research. 2017. Scholz, Stubbe, Gallenmüller, Carle — Key Properties of Programmable Data Plane Targets 6
Methodology Model P4 Programs Model P4 components individually → Combine component models to model complete system Match-Action table properties • match type (exact, ternary, LPM) • entry size (key, action, action data) • number of entries • number of (independent) tables Scholz, Stubbe, Gallenmüller, Carle — Key Properties of Programmable Data Plane Targets 7
CPU Performance Model t4p4s – a DPDK-Based Software P4 Target No P4 DPDK P4 t4p4s core NIC Baseline pipeline runtime t4p4s Device-under-Test hardware • Intel Xeon CPU E5-2640 v2 (2.0 GHz) • P4 compiler • Intel X540-AT2 NIC (dual port, 10 Gbit/s) • generates hardware-independent C code • turboboost and hyperthreading disabled (jitter) • hardware-dependent library for e.g. DPDK Scholz, Stubbe, Gallenmüller, Carle — Key Properties of Programmable Data Plane Targets 8
CPU Performance Model Baseline – Maximum Packet Rate with 64 B Packets 20 Bound by 10 GbE line-rate Rate [Mpps] 15 Packet Multi-core scaling 10 No P4 No P4 (extrap.) Baseline Model � P 5 Baseline Baseline (extrap.) 0 1 . 2 1 . 4 1 . 6 1 . 8 2 2 . 2 2 . 4 2 . 6 2 . 8 3 Core Frequency [GHz] Derive model for CPU cycle usage � • 6 Mpps reduction for baseline P4 program C • bottleneck: CPU C = CPU frequency � Derive model for packet rate � P � P • using linear regression for baseline � C base = 146 Scholz, Stubbe, Gallenmüller, Carle — Key Properties of Programmable Data Plane Targets 9
CPU Performance Model Number of Table Entries – Exact Match 2 · 10 8 15 Packet Rate Packet Rate 1 core 2 cores L3 Misses L3 Misses 10 [Mpps] Packet Rate [Mpps] 15 estim. L3 Cache Limit 10 8 5 10 0 0 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 5 Table Entries [log] 0 Model resources � R exact based on L3 cache size 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 � Table Entries [log] R exact ( n , k , a ) = 2 · 64 B + ( k · n ) + (8 B · n ) + ( a · n ) ���� � �� � ���� Hash table Entries Actions = 128 B + n · ( k + a + 8 B) � �� � Observations Table entry size • double cores results in double performance n number of entries a action size (64 B) • 2 different “phases” k key size (4x4 B) R L3 20 MB L3 cache • bottleneck: L3 cache Set � R exact = R L3 , solve for n Scholz, Stubbe, Gallenmüller, Carle — Key Properties of Programmable Data Plane Targets 10
CPU Performance Model Number of Table Entries – Exact Match Model Derive model for packet rate � P exact � � 1 core 2 cores P exact ( n , 1) P exact ( n , 2) • linear regression for 1 core Packet Rate [Mpps] 15 900 Cycles per Packet • scale for multiple cores 10 600 � � C exact ( n , 1) C exact ( n , 2) Derive model for CPU cycles � 5 300 C exact � 0 0 � 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 p · ln ( q · n ) + r , C e,exact ( n , c ) = 1 R ( n ) < R L 3 � c · Table Entries [log] s t · n + u + v , otherwise with parameters { p , q , r , s , t , u , v } Scholz, Stubbe, Gallenmüller, Carle — Key Properties of Programmable Data Plane Targets 11
CPU Performance Model Number of Table Entries – Ternary & LPM Match Model LPM Match Ternary Match Packet Rate [Mpps] 15 900 Packet Rate [Mpps] 15 900 Cycles per Packet Cycles per Packet 10 600 10 600 5 300 5 300 0 0 0 0 10 0 10 1 10 2 10 3 10 4 10 5 10 0 10 1 10 2 10 3 10 4 10 5 Table Entries [log] Table Entries [log] • CPU cycles: logarithmic increase (log scale!) • CPU cycles: exponential increase • DIR-24-8 data-structure for IPv4 • ternary match difficult to implement in software • theoretic search complexity: O (1) • currently: loop over all elements • bottleneck: shared L3 cache size • in hardware: ternary content-addressable memory (TCAM) • part of data structure already requires 64 MB Scholz, Stubbe, Gallenmüller, Carle — Key Properties of Programmable Data Plane Targets 12
ASIC Resource Model Intel Tofino ASIC P4 programmable switch ASIC Focus on resource consumption • 64 100 Gbit/s ports • SRAM & TCAM resources limited → guarantees switching 6,4 Tbit/s for any program • need to fit program on chip • latency well below 1 µs • model to indicate if program will fit • stable latency: no jitter or long-tail Scholz, Stubbe, Gallenmüller, Carle — Key Properties of Programmable Data Plane Targets 13
ASIC Resource Model Table Resources SRAM usage for different exact match widths 20 9 b 25 b 41 b 73 b Usage [%] Resources R for individual table (e.g. exact match) 15 SRAM 105 b 153 b 201 b 10 5 R ( n , k , a ) = n · ( R width ( k ) + a ) 0 0 0 . 2 0 . 4 0 . 6 0 . 8 1 n number of entries Table Entries · 10 5 k key size Determine p , q : interpolate gradients a action data · 10 − 3 4 exact interpol. Resources R width for key width 3 Gradient ternary/lpm interpol. 2 R width ( k ) = p · k + q 1 with parameters p , q 0 0 20 40 60 80 100 120 140 160 180 200 Key Width [bit] Scholz, Stubbe, Gallenmüller, Carle — Key Properties of Programmable Data Plane Targets 14
Conclusion CPU – performance model • high-performance DPDK-based switch Increase of P4 programmable data planes • linear scaling with CPU cores • more applications • typical DPDK latency histogram • more platforms • platform-dependent influences • more metrics → need for models ASIC – resource model → focus on certain aspects • line-rate guaranteed → Model isolated P4 components • no long-tail latency → Model for P4 centerpiece: match-action tables • number of table entries limit program complexity • simplified model Future work: compare with other modeling approaches, e.g. network calculus Scholz, Stubbe, Gallenmüller, Carle — Key Properties of Programmable Data Plane Targets 15
Recommend
More recommend