curb tail latency
play

CURB TAIL LATENCY WITH PELIKAN ABOUT ME 6 years at Twitter, on - PowerPoint PPT Presentation

IN-MEMORY CACHING: CURB TAIL LATENCY WITH PELIKAN ABOUT ME 6 years at Twitter, on cache maintainer of Twemcache & Twitters Redis fork operations of thousands of machines hundreds of (internal) customers Now working on


  1. IN-MEMORY CACHING: CURB TAIL LATENCY WITH PELIKAN

  2. ABOUT ME • 6 years at Twitter, on cache • maintainer of Twemcache & Twitter’s Redis fork • operations of thousands of machines • hundreds of (internal) customers • Now working on Pelikan, a next-gen cache framework to replace the above @twitter • Twitter: @thinkingfish

  3. THE PROBLEM: CACHE PERFORMANCE

  4. CACHE RULES SERVICE EVERYTHING AROUND ME CACHE DB

  5. 😤 CACHE RUINS SERVICE EVERYTHING AROUND 😤 ME CACHE DB

  6. LATENCY & FANOUT req : all tweets for #qcon ⇒ • what determines overall 99%-ile of SERVICE req ? tid 1, tid 2, …, tid n (assume n is large) fanout percentile 1 p99 10 p99 9 100 p99 99 CACHE CACHE CACHE 1000 p99 999

  7. LATENCY & DEPENDENCY • what determines overall 99%-ile? SERVICE A • adding all latencies together get timeline get tweets SERVICE B • N steps ⇒ N x exposure to tail latency get users for each tweet SERVICE C

  8. CACHE IS UBIQUITOUS SERVICE A • Exposure of cache tail CACHE A CACHE A CACHE A latency increases with both SERVICE B scale and dependency ! CACHE B CACHE B CACHE B SERVICE C CACHE C CACHE C CACHE C

  9. GOOD CACHE PERFORMANCE = PREDICTABLE LATENCY

  10. GOOD CACHE PERFORMANCE = PREDICTABLE TAIL LATENCY

  11. KING OF PERFORMANCE “MILLIONS OF QPS PER MACHINE” “SUB-MILLISECOND LATENCIES” “NEAR LINE-RATE THROUGHPUT” …

  12. GHOSTS OF PERFORMANCE “ USUALLY PRETTY FAST” “HICCUPS EVERY ONCE IN A WHILE ” “TIMEOUT SPIKES AT THE TOP OF THE HOUR ” “SLOW ONLY WHEN MEMORY IS LOW” …

  13. I SPENT FIRST 3 MONTHS AT TWITTER LEARNING CACHE BASICS… …AND THE NEXT 5 YEARS CHASING GHOSTS

  14. CHAINING DOWN GHOSTS = MINIMIZE INDETERMINISTIC BEHAVIOR

  15. HOW? IDENTIFY AVOID MITIGATE

  16. A PRIMER: CACHING IN DATACENTER

  17. DATACENTER • geographically centralized • highly homogeneous network • relatively reliable infrastructure

  18. CACHING MAINLY: REQUEST → RESPONSE INITIALLY: CONNECT ALSO (BECAUSE WE ARE GROWN-UPS): STATS, LOGGING, HEALTH CHECK…

  19. CACHE SERVER: BIRD’S VIEW data protocol storage event-driven server OS HOST network infrastructure

  20. HOW DID WE UNCOVER THE UNCERTAINTIES ?

  21. “ BANDWIDTH UTILIZATION WENT WAY UP, EVEN THOUGH REQUEST RATE WAS WAY LOWER. ”

  22. SYSCALLS

  23. CONNECTING IS SYSCALL-HEAVY read 4+ syscalls accept config register event

  24. REQUEST IS SYSCALL-LIGHT read IO post- event (read) read 3 syscalls* parse process compose write IO post- event (write) write *: event loop returns multiple read events at once, I/O syscalls can be further amortized by batching/pipelining

  25. TWEMCACHE IS MOSTLY SYSCALLS • 1-2 µs overhead per call • dominate CPU time in simple cache • What if we have 100k conns / sec? source

  26. culprit: CONNECTION STORM

  27. “ …TWEMCACHE RANDOM HICCUPS, ALWAYS AT THE TOP OF THE HOUR. ”

  28. cache t worker ⏱ l o g g i n g DISK O / I cron job “ x”

  29. culprit: BLOCKING I/O

  30. “ WE ARE SEEING SEVERAL “BLIPS” AFTER EACH CACHE REBOOT… ”

  31. A TIMELINE MEMCACHE RESTART … lock! MANY REQUESTS TIMED OUT lock! CONNECTION STORM SOME MORE REQUESTS TIMED OUT (REPEAT A FEW TIMES)

  32. culprit: LOCKING

  33. LOCKING FACTS • ~25ns per operation • more expensive on NUMA • much more costly when contended source

  34. “ HOSTS WITH LONG RUNNING TWEMCACHE/REDIS TRIGGER OOM DURING LOAD SPIKES. ”

  35. “ REDIS INSTANCES THAT STARTED EVICTING SUDDENLY GOT SLOWER. ”

  36. culprit: MEMORY LAYOUT / OPS

  37. SUMMARY CONNECTION STORM BLOCKING I/O LOCKING MEMORY

  38. HOW TO MITIGATE?

  39. HIDE EXPENSIVE OPS PUT OPERATIONS OF DIFFERENT NATURE / PURPOSE ON SEPARATE THREADS

  40. DATA PLANE, CONTROL PLANE

  41. SLOW: CONTROL PLANE STATS AGGREGATION STATS EXPORTING LOG DUMP LOG ROTATION …

  42. FAST: DATA PLANE / REQUEST read IO post- event (read) read t worker : parse process compose write IO post- event (write) write

  43. FAST: DATA PLANE / CONNECT t server read accept config dispatch : event t worker read register : event

  44. LATENCY-ORIENTED THREADING t worker REQUESTS new logging, connection stats update t server t admin CONNECTS OTHER logging, stats update

  45. WHAT TO AVOID?

  46. LOCKING

  47. WHAT WE KNOW • inter-thread communication in cache t worker • stats new logging, • logging connection stats update • connection hand-off t server t admin • locking propagates blocking/delay logging, between threads stats update

  48. LOCKLESS OPERATIONS MAKE STATS UPDATE LOCKLESS w/ atomic instructions

  49. LOCKLESS OPERATIONS MAKE LOGGING LOCKLESS RING/CYCLIC BUFFER writer reader read write position position

  50. LOCKLESS OPERATIONS MAKE CONNECTION HAND-OFF LOCKLESS … … RING ARRAY writer reader read write position position

  51. MEMORY

  52. WHAT WE KNOW • alloc-free cause fragmentation • internal vs external fragmentation • OOM/swapping is deadly • memory alloc/copy relatively expensive source

  53. PREDICTABLE FOOTPRINT AVOID EXTERNAL FRAGMENTATION CAP ALL MEMORY RESOURCES

  54. PREDICTABLE RUNTIME REUSE BUFFER PREALLOCATE

  55. IMPLEMENTATION PELIKAN CACHE

  56. WHAT IS PELIKAN CACHE? process • (Datacenter-) Caching framework server cache data model parse/compose/trace orchestration • A summary of Twitter’s cache ops data store request response • Perf goal: deterministically fast streams events • Clean, modular design poo ling • Open-source channels buffers timer alarm common core pelikan.io waitless logging lockless metrics composed config threading

  57. PERFORMANCE DESIGN DECISIONS A COMPARISON latency-oriented Memory/ Memory/ Memory/ locking threading fragmentation buffer caching pre-allocation, cap partial internal partial partial yes Memcached no->partial external no partial no->yes Redis yes internal yes yes no Pelikan

  58. TO BE FAIR… MEMCACHED REDIS • multiple threads can boost throughput • rich set of data structures • binary protocol + SASL • RDB • master-slave replication • redis-cluster • modules • tools

  59. SCALABLE CACHE IS… ALWAYS FAST

  60. “ CAREFUL ABOUT MOVING TO MULTIPLE WORKER THREADS ”

  61. QUESTIONS?

Recommend


More recommend