revisiting benchmarking methodology for interconnect
play

Revisiting Benchmarking Methodology for Interconnect Devices Daniel - PowerPoint PPT Presentation

Chair of Network Architectures and Services Department of Informatics Technical University of Munich Revisiting Benchmarking Methodology for Interconnect Devices Daniel Raumer, Sebastian Gallemller, Florian Wohlfart, Paul Emmerich, Patrick


  1. Chair of Network Architectures and Services Department of Informatics Technical University of Munich Revisiting Benchmarking Methodology for Interconnect Devices Daniel Raumer, Sebastian Gallemüller, Florian Wohlfart, Paul Emmerich, Patrick Werneck, and Georg Carle July 16, 2016

  2. Contents Case study: benchmarking software routers Flaws of benchmarks Latency metrics Latency under load Traffic pattern Omitted tests Reproducibility Conclusion Daniel Raumer – Revisiting Benchmarking Methodology 2

  3. Why to revisit benchmarking state of the art? • Numerous standards, recommendations, best practices • Well-known benchmarking definition RFC 2544 • Various extensions • Divergence of benchmarks • New class of devices • High speed network IO frameworks • Virtual switching • Many core CPU architectures: CPU NIC Daniel Raumer – Revisiting Benchmarking Methodology 3

  4. Common metrics • Throughput: highest rate that the devices under test (DuT) can serve without loss. • Back-to-Back frame burst size: longest duration (in frames) without loss. • Frame loss rate: percentage of dropped frames under a given load. • Latency: average duration a packet stays within the DuT. • . . . extended metrics, e.g., FIB-dependent performance • . . . additional SHOULDs, rarely measured Daniel Raumer – Revisiting Benchmarking Methodology 4

  5. Case study: RFC 2544 benchmarks RFC 2544 ◭ ◭ DuT Test Suite ◮ ◮ Three different DuTs • Linux router • FreeBSD router • MikroTik router Daniel Raumer – Revisiting Benchmarking Methodology 5

  6. Flaws of benchmarks: selected examples Daniel Raumer – Revisiting Benchmarking Methodology 6

  7. Meaningful latency measurements: case study Probability [%] 0.6 0.4 0.2 0 0 5 10 15 20 25 30 Latency [ µ s] • FreeBSD, 64-byte packets • Average does not reflect long tail distribution Daniel Raumer – Revisiting Benchmarking Methodology 7

  8. Meaningful latency measurements: 2nd example Probability [%] 10 5 0 1 1.5 2 2.5 3 3.5 4 Latency [ µ s] • Pica8 switch tested in [IFIP NETWORKING 16] • Different processing paths through a device • Bimodal distribution • Average latency is misleading → Extensive reports: histograms for visualization → Short reports: percentiles (25th, 50th, 75th, 95th, 99th, and 99.9th) Daniel Raumer – Revisiting Benchmarking Methodology 8

  9. Latency under load CBR (median) 200 CBR (25 th /75 th percentile) Latency [ µ s] 100 0 0 0.5 1 1.5 2 Offered load [Mpps] • Open vSwitch (Linux NAPI & ixgbe) [IMC15] • Latency at maximum throughput is not worst case → Measurements at different loads (10, 20, ..., 100% max. throughput) Daniel Raumer – Revisiting Benchmarking Methodology 9

  10. Traffic pattern & latency CBR (median) 200 CBR (25 th /75 th percentile) Latency [ µ s] Poisson (median) Poisson (25 th /75 th percentile) 100 0 0 0.5 1 1.5 2 Offered load [Mpps] • Open vSwitch (NAPI + ixgbe) [IMC15] • Different behavior for different traffic patterns → Tests with different traffic patterns → Poisson process to approximate real world traffic Daniel Raumer – Revisiting Benchmarking Methodology 10

  11. Omitted tests Throughput L1 L2 L3 · 10 6 [Cache Misses/Pkt.] 1.6 Rate [Mpps] 10 1.4 5 1.2 10 6 0 10 0 10 0 10 1 10 1 10 2 10 2 10 3 10 3 10 4 10 4 10 5 10 5 10 6 IP addresses [log] • CPU caches affect the performance → Additional tests for certain device classes → Functionality dependent tests Daniel Raumer – Revisiting Benchmarking Methodology 11

  12. Reproducibility of configurations • Manual device configuration is error prone • Device configuration is hard to reproduce → Reproducible configuration of DuT via scripts → Configuration scripts executed by benchmarking tool Daniel Raumer – Revisiting Benchmarking Methodology 12

  13. Conclusion • Novel class of devices requires additional tests • There are arguments for reconsidering best practice: • Average latency may be misleading → Histograms / percentiles • Latency is load dependent → Measure 10, 20, ..., 100% of max. throughput • CBR traffic is a unrealistic test pattern → Poisson process • Device specific functionality → Perform device specific benchmarks • Manual configuration is error prone → Automatic configuration by benchmark tool Daniel Raumer – Revisiting Benchmarking Methodology 13

  14. Novelty: RFC 2544 test suite on commodity hardware • MoonGen [IMC15] is a fast software packet generator • Hardware-assisted latency measurements (misusing PTP support) • Precise software rate control and traffic patterns • http://net.in.tum.de/pub/router-benchmarking/ • RFC 2544 benchmark reports for Linux, FreeBSD, and MikroTik • Early version of the MoonGen RFC 2544 module Daniel Raumer – Revisiting Benchmarking Methodology 14

Recommend


More recommend