performance evaluation of software dataplanes
play

Performance Evaluation of Software Dataplanes Final talk for the - PowerPoint PPT Presentation

Chair of Network Architectures and Services Department of Informatics Technical University of Munich Performance Evaluation of Software Dataplanes Final talk for the Masters Thesis by Maximilian Endra advised by Dominik Scholz, Henning


  1. Chair of Network Architectures and Services Department of Informatics Technical University of Munich Performance Evaluation of Software Dataplanes Final talk for the Master’s Thesis by Maximilian Endraß advised by Dominik Scholz, Henning Stubbe, Sebastian Gallenmüller Wednesday 27 th November, 2019 Chair of Network Architectures and Services Department of Informatics Technical University of Munich

  2. Chair of Network Architectures and Services Department of Informatics Technical University of Munich Table of contents • Background • T 4 P 4 S contributions • Measurements and models • Conclusion • Bibliography M. Endraß — Performance Evaluation of Software Dataplanes ( T 4 P 4 S ) 2

  3. Chair of Network Architectures and Services Department of Informatics Technical University of Munich T 4 P 4 S architecture gcc code.p4 compiler.py Core C core calls Linked against DPDK Switch DPDK runtime NetHAL T 4 P 4 S switch M. Endraß — Performance Evaluation of Software Dataplanes ( T 4 P 4 S ) 3

  4. Chair of Network Architectures and Services Department of Informatics Technical University of Munich T 4 P 4 S pipeline handle_packet() Ingress/ Verify Compute Deparser Parser Egress NIC C’sum C’sum NIC Core DPDK runtime T 4 P 4 S switch M. Endraß — Performance Evaluation of Software Dataplanes ( T 4 P 4 S ) 4

  5. Chair of Network Architectures and Services Department of Informatics Technical University of Munich T 4 P 4 S contributions exact table - T 4 P 4 S contributions 16 14 12 Packets received [Mpps] 10 8 6 4 2 Shared pointer No Shared pointer 0 d s s s s s s s a d d d d d d d e a a a a a a a r e e e e e e e h r r r r r r r t h h h h h h h t t t t t t t S 4 S S S S S S S 4 4 4 4 4 4 4 P 4 P P P P P P P 4 4 4 4 4 4 4 T 1 T T T T T T T 2 3 4 5 6 7 8 M. Endraß — Performance Evaluation of Software Dataplanes ( T 4 P 4 S ) 5

  6. Chair of Network Architectures and Services Department of Informatics Technical University of Munich exact tables - Mean throughput Mean throughput - exact table with 4 × 4 Byte fields 16 14 12 Packets received [Mpps] 10 8 1 T 4 P 4 S thread 2 T 4 P 4 S threads 6 3 T 4 P 4 S threads 4 T 4 P 4 S threads 4 5 T 4 P 4 S threads 6 T 4 P 4 S threads 2 7 T 4 P 4 S threads 8 T 4 P 4 S threads 0 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 Table entries M. Endraß — Performance Evaluation of Software Dataplanes ( T 4 P 4 S ) 6

  7. Chair of Network Architectures and Services Department of Informatics Technical University of Munich exact tables - Caching effects Cache miss count for exact matches ( 4 × 4 Bytes) × 10 8 16 L3 miss count Throughput (1 lcore) approx. L3-filling table size 1.75 14 1.50 12 Packet rate [Mpps] 1.25 10 L3 miss count 1.00 8 0.75 6 0.50 4 0.25 2 0.00 0 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 Table entries [log] M. Endraß — Performance Evaluation of Software Dataplanes ( T 4 P 4 S ) 7

  8. Chair of Network Architectures and Services Department of Informatics Technical University of Munich exact tables - model  � �  1932.81057 for n < ∼ L3 cache filling c · 1.44044 · n +1608.13472 + 4.97731 , � � � T exact ( n , c )=  19575541.84198 otherwise . c · 55.85253 · n +2.75052 + 2.67665 , (1) M. Endraß — Performance Evaluation of Software Dataplanes ( T 4 P 4 S ) 8

  9. Chair of Network Architectures and Services Department of Informatics Technical University of Munich exact tables - Throughput Mean throughput - exact table with 4 × 4 Byte fields 16 14 12 Packets received [Mpps] 1 T 4 P 4 S thread 2 T 4 P 4 S threads 10 3 T 4 P 4 S threads 4 T 4 P 4 S threads 8 5 T 4 P 4 S threads 6 T 4 P 4 S threads 6 7 T 4 P 4 S threads 8 T 4 P 4 S threads 4 � T exact ( n , 1) � T exact ( n , 2) 2 � T exact ( n , 3) 0 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 Table entries M. Endraß — Performance Evaluation of Software Dataplanes ( T 4 P 4 S ) 9

  10. Chair of Network Architectures and Services Department of Informatics Technical University of Munich exact tables - Latencies Latency boxplot - exact table with 4 × 4 Byte entries 50 Baseline linear latency model ˜ L ( c ) Median latency 40 Latency [ µ s ] 30 20 10 1 lcore 2 lcores 3 lcores 4 lcores 5 lcores 6 lcores 7 lcores 8 lcores 0 Table entries (sorted by lcore) M. Endraß — Performance Evaluation of Software Dataplanes ( T 4 P 4 S ) 10

  11. Chair of Network Architectures and Services Department of Informatics Technical University of Munich exact tables - Key segmentation Mean throughput for exact table - 16 Byte key segmentation 16 4 × 4 Bytes 2 × 8 Bytes 14 1 × 16 Bytes 12 Packets received [Mpps] 10 8 6 4 2 0 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 Table entries [log] M. Endraß — Performance Evaluation of Software Dataplanes ( T 4 P 4 S ) 11

  12. Chair of Network Architectures and Services Department of Informatics Technical University of Munich exact tables - Load Store Forwarding Key segmentation - exact matching - % Loads blocked by failed Store Forwarding 100 Loads Blocked by Store Forwarding [%] 80 60 40 20 4 × 4 Bytes 2 × 8 Bytes 1 × 16 Bytes 0 1 200 400 600 800 1000 2000 4000 6000 8000 10000 20000 40000 60000 100000 500000 1000000 2000000 4000000 6000000 8000000 10000000 15000000 Table entries M. Endraß — Performance Evaluation of Software Dataplanes ( T 4 P 4 S ) 12

  13. Chair of Network Architectures and Services Department of Informatics Technical University of Munich Load-Store Forwarding Successful Load-Store Forwarding mov dword ptr [esi], eax ; Write 4 bytes mov edx, dword ptr [esi] ; Read 4 bytes Failed Load-Store Forwarding mov dword ptr [esi], eax ; Write lower 4 bytes mov dword ptr [esi+4], edx ; Write upper 4 bytes movq xmm0, qword ptr [esi] ; Read 8 bytes. Stall Excerpt from [2]. M. Endraß — Performance Evaluation of Software Dataplanes ( T 4 P 4 S ) 13

  14. Chair of Network Architectures and Services Department of Informatics Technical University of Munich ternary tables - Mean throughput Mean throughput for ternary table entries ( 4 × 4 Byte keys) - 64 Byte packets 16 1 T 4 P 4 S thread 2 T 4 P 4 S threads 14 3 T 4 P 4 S threads 4 T 4 P 4 S threads 12 5 T 4 P 4 S threads Packets received [Mpps] 6 T 4 P 4 S threads 10 7 T 4 P 4 S threads 8 T 4 P 4 S threads 8 Throughput model � T t ( n , 1) Throughput model � T t ( n , 2) 6 Throughput model � T t ( n , 3) Line-rate 4 2 0 10 0 10 1 10 2 10 3 10 4 10 5 10 6 Table entries [log] M. Endraß — Performance Evaluation of Software Dataplanes ( T 4 P 4 S ) 14

  15. Chair of Network Architectures and Services Department of Informatics Technical University of Munich ternary tables - Latencies Latency HDR histogram for ternary matching on 4 × 4 Byte keys (1 lcore) 20000.00 1 entry 10000.00 8000.00 10 entries 6000.00 4000.00 50 entries 100 entries 2000.00 150 entries 1000.00 800.00 200 entries 600.00 400.00 400 entries Latency [ µ s ] 600 entries 200.00 800 entries 100.00 80.00 60.00 1000 entries 40.00 2000 entries 20.00 Baseline latency ˜ L (1) 10.00 8.00 6.00 4.00 2.00 1.00 0%50% 90% 99% 99.9% 99.99% 99.999% 99.9999% Percentile [%] M. Endraß — Performance Evaluation of Software Dataplanes ( T 4 P 4 S ) 15

  16. Chair of Network Architectures and Services Department of Informatics Technical University of Munich lpm tables - Mean throughput Mean throughput for lpm table entries ( 4 Byte key) - 24 bit prefix length 16 1 T 4 P 4 S thread 2 T 4 P 4 S threads 14 3 T 4 P 4 S threads 4 T 4 P 4 S threads 12 5 T 4 P 4 S threads Packets received [Mpps] 6 T 4 P 4 S threads 10 7 T 4 P 4 S threads 8 T 4 P 4 S threads 8 � T lpm ( n , 1) � T lpm ( n , 2) 6 � T lpm ( n , 3) � T lpm ( n , 4) 4 Line rate 2 0 10 0 10 1 10 2 10 3 10 4 10 5 Table entries [log] M. Endraß — Performance Evaluation of Software Dataplanes ( T 4 P 4 S ) 16

  17. Chair of Network Architectures and Services Department of Informatics Technical University of Munich Conclusion • Reproducible P 4 measurements for the DPDK switch runtime and P 4 pipeline: • Parser • Tables (Ingress/Egress Match-Action Pipelines) • Deparser • Models for throughput and latency • Performance scales across multiple cores • Many platform specific (x86) behaviors that are not present in hardware switches • Unique possibility to modify switch itself M. Endraß — Performance Evaluation of Software Dataplanes ( T 4 P 4 S ) 17

  18. Chair of Network Architectures and Services Department of Informatics Technical University of Munich Conclusion Questions? M. Endraß — Performance Evaluation of Software Dataplanes ( T 4 P 4 S ) 18

  19. Chair of Network Architectures and Services Department of Informatics Technical University of Munich [1] DPDK Project. rte_hash_crc.h File Reference, 2019. https://doc.dpdk.org/api-19.02/rte__hash__crc_8h_source.html ; last ac- cessed on 2019/11/18. [2] A. Fog. The microarchitecture of intel, amd and via cpus: An optimization guide for as- sembly programmers and compiler makers. Copenhagen University College of Engineering , page 134, 2012. [3] P4@ELTE. T 4 P 4 S - Retargetable compiler for the P 4 language, 2019. https://github.com/P4ELTE/t4p4s ; last accessed on 2019/11/24. M. Endraß — Performance Evaluation of Software Dataplanes ( T 4 P 4 S ) 19

Recommend


More recommend