c3 cutting tail latency in cloud data stores via adaptive
play

C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica - PowerPoint PPT Presentation

C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection Lalith Suresh (TU Berlin) with Marco Canini (UCL), Stefan Schmid, Anja Feldmann (TU Berlin) Tail-latency matters Tens to Thousands One of data accesses User


  1. C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection Lalith Suresh (TU Berlin) with Marco Canini (UCL), Stefan Schmid, Anja Feldmann (TU Berlin)

  2. Tail-latency matters Tens to Thousands One of data accesses User Request 2

  3. Tail-latency matters Tens to Thousands One of data accesses User Request th percentile latency For 100 100 leaf servers, 99 99 th will reflect in 63% 63% of user requests! 3

  4. Server performance fluctuations are the norm CDF Resource Queueing Background Skewed access contention delays activities patterns 4

  5. Effectiveness of replica selection in reducing tail latency? ? Server Request Client Server Server 5

  6. Replica Selection Challenges 6

  7. Replica Selection Challenges • Service-time variations 4 ms Server Request 5 ms Server Client 30 ms Server 7

  8. Replica Selection Challenges • Herd behavior and load oscillations Request Server Client Request Server Client Request Server Client 8

  9. Impact of Replica Selection in Practice? Dy Dynami mic Sn Snitching Uses history of read latencies and I/O load for replica selection 9

  10. Experimental Setup • Cassandra cluster on Amazon EC2 • 15 nodes, m1.xlarge instances • Read-heavy workload with YCSB (120 threads) • 500M 1KB records (larger than memory) • Zipfian key access pattern 10

  11. Cassandra Load Profile 11

  12. Cassandra Load Profile Also observed that 99.9 th percentile latency ~ 10x median latency 12

  13. Load Conditioning in our Approach 13

  14. C3 Adaptive replica selection mechanism that is robust to service time heterogeinity 14

  15. C3 • Replica Ranking • Distributed Rate Control 15

  16. C3 • Replica Ranking • Distributed Rate Control 16

  17. µ -1 = 2 ms Client Server Client Server Client µ -1 = 6 ms 17

  18. µ -1 = 2 ms Client Server Client Server Client µ -1 = 6 ms Balance product of queue-size and service time · µ -1 } { q q · 18

  19. Server-side Feedback Servers piggyback {q s } } and { µν 𝒕 #𝟐 } } in every response Server Client #𝟐 } { q s , , µν 𝒕 19

  20. Server-side Feedback Servers piggyback {q s } } and { µν 𝒕 #𝟐 } } in every response • Concurrency compensation 20

  21. Server-side Feedback Servers piggyback {q s } } and { µν 𝒕 #𝟐 } } in every response • Concurrency compensation 𝑟 & ' = 1 + ¡𝑝𝑡 ' . 𝑥 + 𝑟 ' Outstanding requests Feedback 21

  22. Select server #𝟐 ? with min ¡𝑟 & ' ¡. µν 𝒕 22

  23. Select server Potentially long queue sizes • #𝟐 ? with min ¡𝑟 What if a GC pause happens? & ' ¡. µν 𝒕 • µ -1 = 4 ms 100 requests! Server 20 requests Server µ -1 = 20 ms 23

  24. Penalizing Long Queues b Select server with min ¡ 𝑟 #𝟐 & ' ¡. µν 𝒕 µ -1 = 4 ms 35 requests Server b = 3 20 requests Server µ -1 = 20 ms 24

  25. C3 • Replica Ranking • Distributed Rate Control 25

  26. Need for rate control Replica ranking insufficient • Avoid saturating individual servers? • Non-internal sources of performance fluctuations? 26

  27. Cubic c Rate Control • Clients adjust sending rates according to cubic function • If receive rate isn’t increasing further, multiplicatively decrease 27

  28. Putting everything together C3 Client Replica group 1000 ¡ Server scheduler req/s 2000 ¡ Server Sort replicas req/s by score Rate Limiters { Feedback } 28

  29. Implementation in Cassandra Details in the paper! 29

  30. Evaluation Amazon EC2 Controlled Testbed Simulations 30

  31. Evaluation Amazon EC2 15 node Cassandra cluster • M1.xlarge • Workloads generated using YCSB (120 threads) • Read-heavy, update-heavy, read-only • 500M 1KB records dataset (larger than memory) • Compare against Cassandra’s Dynamic Snitching (DS) • 31

  32. Lower is better 32

  33. 2x – 3x improved 99.9 percentile latencies Also improves median and mean latencies 33

  34. 2x – 3x improved 99.9 percentile latencies 26% - 43% improved throughput 34

  35. Takeaway: C3 does not tradeoff throughput for latency 35

  36. How does C3 react to dynamic workload changes? • Begin with 80 read-heavy workload generators • 40 update-heavy generators join the system after 640s • Observe latency profile with and without C3 36

  37. Latency profile degrades gracefully with C3 Takeaway: C3 reacts effectively to dynamic workloads 37

  38. Summary of other results Higher system load > > 3x 3x better 99.9 th Skewed record sizes percentile latency SSDs instead of HDDs 50 50% higher throughput than with DS 38

  39. Ongoing work • Tests at SoundCloud and Spotify • Stability analysis of C3 • Alternative rate adaptation algorithms • Token aware Cassandra clients 39

  40. ? Server Client Server Server Summary C3 Replica Ranking + Dist. Rate Control 40

Recommend


More recommend