CURB TAIL LATENCY WITH PELIKAN ABOUT ME 6 years at Twitter, on - PowerPoint PPT Presentation

IN-MEMORY CACHING: CURB TAIL LATENCY WITH PELIKAN

ABOUT ME • 6 years at Twitter, on cache • maintainer of Twemcache (OSS), Twitter’s Redis fork • operations of thousands of machines • hundreds of (internal) customers • Now working on Pelikan, a next-gen cache framework to replace the above @twitter • Twitter: @thinkingfish

THE PROBLEM: CACHE PERFORMANCE

CACHE RULES SERVICE EVERYTHING AROUND ME CACHE DB

😤 CACHE RUINS SERVICE EVERYTHING AROUND 😤 ME SENSITIVE! CACHE DB

GOOD CACHE PERFORMANCE = PREDICTABLE LATENCY

GOOD CACHE PERFORMANCE = PREDICTABLE TAIL LATENCY

KING OF PERFORMANCE “MILLIONS OF QPS PER MACHINE” “SUB-MILLISECOND LATENCIES” “NEAR LINE-RATE THROUGHPUT” …

GHOSTS OF PERFORMANCE “ USUALLY PRETTY FAST” “HICCUPS EVERY ONCE IN A WHILE ” “TIMEOUT SPIKES AT THE TOP OF THE HOUR ” “SLOW ONLY WHEN MEMORY IS LOW” …

I SPENT FIRST 3 MONTHS AT TWITTER LEARNING CACHE BASICS… …AND THE NEXT 5 YEARS CHASING GHOSTS

CONTAIN GHOSTS = MINIMIZE INDETERMINISTIC BEHAVIOR

HOW? IDENTIFY AVOID MITIGATE

A PRIMER: CACHING IN DATACENTER

CONTEXT • geographically centralized • highly homogeneous network • reliable, predictable infrastructure • long-lived connections • high data rate • simple data/operations

CACHE IN PRODUCTION MAINLY: REQUEST → RESPONSE INITIALLY: CONNECT ALSO (BECAUSE WE ARE ADULTS): STATS, LOGGING, HEALTH CHECK…

CACHE: BIRD’S VIEW protocol data storage event-driven server OS HOST network infrastructure

HOW DID WE UNCOVER THE UNCERTAINTIES ?

“ BANDWIDTH UTILIZATION WENT WAY UP, BUT REQUEST RATE WAY DOWN. ”

SYSCALLS

CONNECTING IS SYSCALL-HEAVY read 4+ syscalls accept config register event

REQUEST IS SYSCALL-LIGHT read IO post- event (read) read 3 syscalls* parse process compose write IO post- event (write) write *: event loop returns multiple read events at once, I/O syscalls can be further amortized by batching/pipelining

TWEMCACHE IS MOSTLY SYSCALLS • 1-2 µs overhead per call • dominate CPU time in simple cache • What if we have 100k conns / sec? source

culprit: CONNECTION STORM

“ …TWEMCACHE RANDOM HICCUPS, ALWAYS AT THE TOP OF THE HOUR. ”

cache t worker ⏱ l o g g i n g DISK O / I cron job “ x”

culprit: BLOCKING I/O

“ WE ARE SEEING SEVERAL “BLIPS” AFTER EACH CACHE REBOOT… ”

LOCKING FACTS • ~25ns per operation • more expensive on NUMA • much more costly when contended source

A TIMELINE MEMCACHE RESTART … lock! EVERYTHING IS FINE REQUESTS SUDDENLY GET SLOW/TIMED-OUT lock! CONNECTION STORM CLIENTS TOPPLE SLOWLY RECOVER (REPEAT A FEW TIMES) … STABILIZE

culprit: LOCKING

“ HOSTS WITH LONG RUNNING CACHE TRIGGERS OOM WHEN LOAD SPIKE. ”

“ REDIS INSTANCES WERE KILLED BY SCHEDULER. ”

culprit: MEMORY

SUMMARY CONNECTION STORM BLOCKING I/O LOCKING MEMORY

HOW TO MITIGATE?

DATA PLANE, CONTROL PLANE

HIDE EXPENSIVE OPS PUT OPERATIONS OF DIFFERENT NATURE / PURPOSE ON SEPARATE THREADS

SLOW: CONTROL PLANE LISTENING (ADMIN CONNECTIONS) STATS AGGREGATION STATS EXPORTING LOG DUMP

FAST: DATA PLANE / REQUEST read IO post- event (read) read t worker : parse process compose write IO post- event (write) write

FAST: DATA PLANE / CONNECT t server read accept config dispatch : event t worker read register : event

LATENCY-ORIENTED THREADING t worker REQUESTS new logging, connection stats update t server t admin CONNECTS OTHER logging, stats update

WHAT TO AVOID?

LOCKING

WHAT WE KNOW • inter-thread communication in cache t worker • stats new logging, • logging connection stats update • connection hand-off t server t admin • locking propagates blocking/delay logging, between threads stats update

LOCKLESS OPERATIONS MAKE STATS UPDATE LOCKLESS w/ atomic instructions

LOCKLESS OPERATIONS MAKE LOGGING WAITLESS RING/CYCLIC BUFFER writer reader read write position position

LOCKLESS OPERATIONS MAKE CONNECTION HAND-OFF LOCKLESS … … RING ARRAY writer reader read write position position

MEMORY

WHAT WE KNOW • alloc-free cause fragmentation • internal vs external fragmentation • OOM/swapping is deadly • memory alloc/copy relatively expensive source

PREDICTABLE FOOTPRINT AVOID EXTERNAL FRAGMENTATION CAP ALL MEMORY RESOURCES

PREDICTABLE RUNTIME REUSE BUFFER PREALLOCATE

IMPLEMENTATION PELIKAN CACHE

WHAT IS PELIKAN CACHE? process • (Datacenter-) Caching framework server cache data model parse/compose/trace orchestration • A summary of Twitter’s cache ops data store request response • Perf goal: deterministically fast streams events • Clean, modular design poo ling • Open-source channels buffers timer alarm common core pelikan.io waitless logging lockless metrics composed config threading

PERFORMANCE DESIGN DECISIONS A COMPARISON latency-oriented Memory/ Memory/ Memory/ locking threading fragmentation buffer caching pre-allocation, cap partial internal partial partial yes Memcached no->partial external no partial no->yes Redis yes internal yes yes no Pelikan

TO BE FAIR… MEMCACHED REDIS • multiple worker threads • rich set of data structures • binary protocol + SASL • master-slave replication • redis-cluster • modules • tools

THE BEST CACHE IS… ALWAYS FAST

QUESTIONS?

CURB TAIL LATENCY WITH PELIKAN ABOUT ME 6 years at Twitter, on - PowerPoint PPT Presentation

IN-MEMORY CACHING: CURB TAIL LATENCY WITH PELIKAN ABOUT ME 6 years at Twitter, on cache maintainer of Twemcache (OSS), Twitters Redis fork operations of thousands of machines hundreds of (internal) customers Now working on

Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen

CS161 Recursion Continued Tail recursion n Tail recursion is a recursive call that occurs as

CURB TAIL LATENCY WITH PELIKAN ABOUT ME 6 years at Twitter, on cache maintainer of

Race Condition Shared Data: 4 5 6 1 8 5 6 20 9 ? Synchronization and Deadlocks tail

Race Condition Shared Data: 5 6 4 1 8 5 6 20 9 ? InterProcess Communication tail A[]

CURB Connect, Unite, Resist, Begin CURB Royal Greenwichs response to preventing children

Enforcement Specific Vehicles December 4, 2019 Agenda Overview of curb management History

Curb Ramps for Accessibility Statewide Curb Ramp Accessibility Program Federal requirement -

Equivalent Axle Loads Mazda Miata Curb weight = 2300 lb 1 ( ) = d consumption per passage

C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection Lalith Suresh (TU

Asynchronous I/O Stack: A Low-latency Kernel I/O Stack for Ultra-Low Latency SSDs Jinkyu Jeong

tail bounds tail bounds For a random variable X, the tails of X are the parts of the PMF/density

Probe or Wait : Handling tail losses using Multipath TCP Kiran Yedugundla, Per Hurtig, Anna

Lets talk locks! @kavya719 kavya locks. locks are slow locks are slow latency

FAILURE AT NETFLIX VELOCITY Cannot Connect to the Netflix Service 0 0 Ms % IMPACT LATENCY

Low Latency Live Video Streaming over HTTP 2.0 Sheng Wei, Vishy Swaminathan | Adobe Research

EXPOR ORTS: : A GL GLOB OBAL L & CUL ULTUR URAL L PERSPECTIVE VALERIE

Common App (USA) Questions Information regarding the Essay Prompts for 2017- 18 application

GUIDE TO BUYING A HOME The OKey Group at Brock Real Estate 2235 Hyperion Ave. Los Angeles,

NCSPRA-SC/NSPRA 2013 Fall Conference Sponsored by BlackBoard Renaissance Hotel Asheville, North

H12019 RESULTS FOR THE PERIOD ENDED 30 JUNE 2019 www.goldfields.co.za Interim Results

HT MEDIA LIMITED Investor Presentation 1 Cautionary Statements Certain statements in this

PDR Presentation Meeting Minutes Held: AEC429 on 2/3/16 and AEC513 on 2/4/16 To: Professor

Indian Banking at Inflection Topical Themes Credit Suisse Asian Investment Conference March

CURB TAIL LATENCY WITH PELIKAN ABOUT ME 6 years at Twitter, on - PowerPoint PPT Presentation

IN-MEMORY CACHING: CURB TAIL LATENCY WITH PELIKAN ABOUT ME 6 years at Twitter, on cache maintainer of Twemcache (OSS), Twitters Redis fork operations of thousands of machines hundreds of (internal) customers Now working on

Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen

CS161 Recursion Continued Tail recursion n Tail recursion is a recursive call that occurs as

CURB TAIL LATENCY WITH PELIKAN ABOUT ME 6 years at Twitter, on cache maintainer of

Race Condition Shared Data: 4 5 6 1 8 5 6 20 9 ? Synchronization and Deadlocks tail

Race Condition Shared Data: 5 6 4 1 8 5 6 20 9 ? InterProcess Communication tail A[]

CURB Connect, Unite, Resist, Begin CURB Royal Greenwichs response to preventing children

Enforcement Specific Vehicles December 4, 2019 Agenda Overview of curb management History

Curb Ramps for Accessibility Statewide Curb Ramp Accessibility Program Federal requirement -

Equivalent Axle Loads Mazda Miata Curb weight = 2300 lb 1 ( ) = d consumption per passage

C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection Lalith Suresh (TU

Asynchronous I/O Stack: A Low-latency Kernel I/O Stack for Ultra-Low Latency SSDs Jinkyu Jeong

tail bounds tail bounds For a random variable X, the tails of X are the parts of the PMF/density

Probe or Wait : Handling tail losses using Multipath TCP Kiran Yedugundla, Per Hurtig, Anna

Lets talk locks! @kavya719 kavya locks. locks are slow locks are slow latency

FAILURE AT NETFLIX VELOCITY Cannot Connect to the Netflix Service 0 0 Ms % IMPACT LATENCY

Low Latency Live Video Streaming over HTTP 2.0 Sheng Wei, Vishy Swaminathan | Adobe Research

EXPOR ORTS: : A GL GLOB OBAL L &amp; CUL ULTUR URAL L PERSPECTIVE VALERIE

Common App (USA) Questions Information regarding the Essay Prompts for 2017- 18 application

GUIDE TO BUYING A HOME The OKey Group at Brock Real Estate 2235 Hyperion Ave. Los Angeles,

NCSPRA-SC/NSPRA 2013 Fall Conference Sponsored by BlackBoard Renaissance Hotel Asheville, North

H12019 RESULTS FOR THE PERIOD ENDED 30 JUNE 2019 www.goldfields.co.za Interim Results

HT MEDIA LIMITED Investor Presentation 1 Cautionary Statements Certain statements in this

PDR Presentation Meeting Minutes Held: AEC429 on 2/3/16 and AEC513 on 2/4/16 To: Professor

Indian Banking at Inflection Topical Themes Credit Suisse Asian Investment Conference March

EXPOR ORTS: : A GL GLOB OBAL L & CUL ULTUR URAL L PERSPECTIVE VALERIE