t ail b ench a b enchmark s uite and
play

T AIL B ENCH : A B ENCHMARK S UITE AND E VALUATION M ETHODOLOGY FOR L - PowerPoint PPT Presentation

T AIL B ENCH : A B ENCHMARK S UITE AND E VALUATION M ETHODOLOGY FOR L ATENCY - C RITICAL A PPLICATIONS H ARSHAD K ASTURE , D ANIEL S ANCHEZ IISWC 2016 tailbench.csail.mit.edu Executive Summary 2 Latency-critical applications have stringent


  1. T AIL B ENCH : A B ENCHMARK S UITE AND E VALUATION M ETHODOLOGY FOR L ATENCY - C RITICAL A PPLICATIONS H ARSHAD K ASTURE , D ANIEL S ANCHEZ IISWC 2016 tailbench.csail.mit.edu

  2. Executive Summary 2  Latency-critical applications have stringent performance requirements  low datacenter utilization  Wastes billions of dollars in energy and equipment annually  Research in this area hampered by the lack of a comprehensive benchmark suite  Few latency-critical applications  limited coverage  Complicated setup and configuration Inaccurate latency  Methodological issues measurements  TailBench makes latency-critical applications easy to analyze  Varied application domains and latency characteristics  Standardized, statistically sound methodology  Supports simplified load-testing configurations

  3. Outline 3  Background and Motivation  TailBench Applications  TailBench Harness  Simplified Configurations

  4. Understanding Latency-Critical Applications 4 Back End Back End Leaf Node Client Back End Client Root Node Back End Client Leaf Node Back End Back End Leaf Node Datacenter

  5. Understanding Latency-Critical Applications 5 Back End Back End Leaf Node Client Back End Client Root Node Back End Client Leaf Node Back End Back End Leaf Node Datacenter

  6. Understanding Latency-Critical Applications 6 Back End Back End Leaf Node Client Back End Client Root Node Back End Client Leaf Node Back End Back End Leaf Node Datacenter

  7. Understanding Latency-Critical Applications 7 Back End Back End 1 ms Leaf Node Client Back End Client 1 ms Root Node Back End Client Leaf Node Back End Back End Leaf Node Datacenter  The few slowest responses determine user-perceived latency  Tail latency (e.g., 95 th / 99 th percentile), not mean latency, determines performance

  8. Latency Requirements Cause Low Utilization 8  End-to-end latency increases rapidly with load  Must keep utilization low to keep latency within reasonable bounds  Traditional resource management techniques (e.g., colocation) often cannot be used since they degrade latency  Low resource utilization wastes billions of dollars in energy and equipment  Sparked research in latency-critical systems

  9. Benchmark Suite Design Goals 9  Applications from a diverse set of domains Hell K V 你好 o  Applications with diverse tail latency characteristics 100 μ s 1 ms 10 ms 100 ms 1 s Live VM Migration LLC Warmup DVFS  Easy to set up and run  Support different measurement scenarios  Robust latency measurement methodology

  10. Outline 10  Background and Motivation  TailBench Applications  TailBench Harness  Simplified Configurations

  11. TailBench Applications 11 xapian masstree moses sphinx K V Hello 你好 Speech Statistical Machine Online Search Key-Value Store Recognition Translation shore silo specjbb img-dnn On-disk Database Image Recognition Java Middleware In-memory Database

  12. Wide Range of End-to-End Latencies 12 100 μ s 1 ms 10 ms 100 ms 1 s silo specjbb masstree shore xapian img-dnn moses sphinx

  13. Varied Service Time Characteristics 13  masstree service times are more tightly distributed  xapian service times are more loosely distributed

  14. End-to-End Latency vs. Load 14

  15. Tail ≠ Mean 15  Tail latency increases more rapidly with load than mean latency  Relationship between mean and tail latencies is hard to predict

  16. Impact of Parallelism 16

  17. Parallelism Helps Some Applications 17

  18. …But Hurts Others 18

  19. Outline 19  Background and Motivation  TailBench Applications  TailBench Harness  Simplified Configurations

  20. TailBench Harness 20  Measuring tail latency accurately is complicated  Load generation, statistics aggregation, warmup periods…  Harness encapsulates most of the complexity  Harness makes TailBench easily extensible  New benchmarks reuse existing harness functionality  Simplified harness configurations enable different measurement scenarios  Trade off some accuracy for reduced setup complexity

  21. Example: Open- vs. Closed-Loop Clients 21 Client Ω Network Ω Client Application  Many popular load testers use closed-loop clients  Clients wait for response before submitting next request  Increase in application load throttles client request rate  Latency-critical applications typically service a large number of independent clients  Request rate independent of application load  Better modeled by open-loop clients  Closed-loop clients can underestimate latency by orders of magnitude [Tene LLS 2013, Zhang ISCA 2016]

  22. Networked Harness Configuration 22 TCP/IP App Traffic Shaper Client Req. Queue Network Application Stats Collector TCP/IP … App TCP/IP Traffic Shaper Client Stats Collector

  23. Networked Harness Configuration 23 TCP/IP App Traffic Shaper Client Req. Queue Network Application Stats Collector TCP/IP … App TCP/IP Traffic Shaper Client Stats Collector  Application and the clients run on separate machines  Traffic Shaper inserts inter-request delays to model load  Request Queue enqueues incoming requests and measures service times and queuing delays  Statistics Collector aggregates latency data

  24. Networked Harness Configuration 24 TCP/IP App Traffic Shaper Client Req. Queue Network Application Stats Collector TCP/IP … App TCP/IP Traffic Shaper Client Stats Collector  Application and the clients run on separate machines  Traffic Shaper inserts inter-request delays to model load  Request Queue enqueues incoming requests and measures service times and queuing delays  Statistics Collector aggregates latency data

  25. Networked Harness Configuration 25 TCP/IP App Traffic Shaper Client Req. Queue Network Application Stats Collector TCP/IP … App TCP/IP Traffic Shaper Client Stats Collector  Application and the clients run on separate machines  Traffic Shaper inserts inter-request delays to model load  Request Queue enqueues incoming requests and measures service times and queuing delays  Statistics Collector aggregates latency data

  26. Networked Harness Configuration 26 TCP/IP App Traffic Shaper Client Req. Queue Network Application Stats Collector TCP/IP … App TCP/IP Traffic Shaper Client Stats Collector  Application and the clients run on separate machines  Traffic Shaper inserts inter-request delays to model load  Request Queue enqueues incoming requests and measures service times and queuing delays  Statistics Collector aggregates latency data

  27. Networked Harness Configuration 27 TCP/IP App Traffic Shaper Client Req. Queue Network Application Stats Collector TCP/IP … App TCP/IP Traffic Shaper Client Stats Collector  Faithfully captures all sources of overhead X Difficult to configure and deploy

  28. Outline 28  Background and Motivation  TailBench Applications  TailBench Harness  Simplified Configurations

  29. Loopback Harness Configuration 29 App Client TCP/IP TCP/IP Loopback Application Loopback App Client  Application and clients reside on the same machine  Reduced setup complexity  Highly accurate in many cases X Difficult to simulate

  30. Load-Latency for Networked Configuration 30

  31. Loopback Configuration Highly Accurate 31  Loopback and Networked configurations have near-identical performance  Networking delays minimal in our setup

  32. Loopback Harness Configuration 32 App Client TCP/IP TCP/IP Loopback Application Loopback App Client  Application and clients reside on the same machine  Reduced setup complexity  Highly accurate in many cases X Still difficult to simulate

  33. Integrated Harness Configuration 33 App Client Application Single Process  Application and client integrated into a single process  Easy to setup X Some loss of accuracy 

  34. Integrated Configuration Validation 34 39% 23%  Networked/Loopback configurations saturate earlier for applications with short requests (silo, specjbb)  TCP/IP processing overhead a significant fraction of request

  35. Integrated Harness Configuration 35 App Client Application Single Process  Application and client integrated into a single process  Easy to setup X Some loss of accuracy  Enables user-level simulations

  36. Simulation vs. Real System 36 16% 32% 20% 16% 31%  Performance difference between real and simulated systems well within usual simulation error bounds  Average absolute error in saturation QPS: 14%  zsim IPC error for SPEC CPU2006 applications: 8.5 – 21%

  37. Conclusions 37  TailBench includes a diverse set of latency-critical applications with varied latency characteristics  TailBench harness implements a statistically sound experimental methodology to achieve accurate results  Various harness configurations allow trading off configuration complexity for some accuracy  Our results show that the integrated configuration is highly accurate for six of our eight benchmarks

  38. T HANKS F OR Y OUR A TTENTION ! Q UESTIONS ? tailbench.csail.mit.edu

Recommend


More recommend