AdaptSize: Orchestrating the Hot Object Memory Cache in a CDN Daniel S. Mor Ramesh K. Berger Harchol-Balter Sitaraman USENIX NSDI. Boston, March 28, 2017.
CDN Caching Architecture Content providers 1% 1% 1% 1% DC HOC CDN 100% 100% 100% 100% Users 1
Optimizing CDN Caches Two caching levels: 1% ❏ Disk Cache (DC) DC ❏ Hot Object Cache (HOC) 40% # reqs served HOC performance metric HOC by HOC object hit ratio = OHR = total # reqs 100% Goal: maximize OHR 2
Prior Approaches to Cache Management Frequent decisions required DC What to admit What to evict Today in practice historically everything LRU a few GBs capacity e.g., Nginx, Varnish HOC mixtures of 2000s in academia everything LRU/LFU e.g., Modha, Zhang, Kumar 500 GB concurrent 2010s in academia per hour everything LRU e.g., Kaminsky, Lim, Andersen 3
We Are Missing a Key Issue 9 orders of magnitude Not all objects are the same ❏ Should we admit every object? (no, we should favor small objects) ❏ A few key companies know this (but don’t know how to it well) ❏ Academia has not been helpful (almost all theoretical work assumes equal-sized objects) 4
What’s Hard About Size-Aware Admission Fixed Size Threshold: How to pick c: admit if size < Threshold c pick c to maximize OHR 9pm 2pm m a 8 t a c t s e b Threshold c The best threshold changes with traffic mix 5
Can we avoid picking a threshold c Probabilistic admission: Unfortunately, many curves example: exp(c) family high admission low admission Which curve makes big difference probability probability We need to adapt c 6
What to admit What to evict concurrent LRU AdaptSize adaptive size-aware The AdaptSize Caching System adapt adapt First system that continuously adapts with with the parameter of size-aware admission time traffic Enforce Calculate Take traffic Calculate admission measurements the best c the best c control 7 Incorporated into high-throughput production caching system (Varnish)
Δ interval Δ interval Δ interval … time How to Find Best c Within Each Δ Interval Traditional approach AdaptSize approach Hill climbing Markov model Enables speedy Local optima on global optimization OHR-vs-c curve 8
How AdaptSize Gets the OHR-vs-c curve hit miss Markov chain OUT IN ➢ track IN/OUT for each object Algorithm request request For every Δ interval and for every value of c use Markov chain to solve for OHR( c ) ❏ find c to maximize OHR ❏ Why hasn’t this been done? Too slow: exponential state space New technique: approximation with linear state space 9
Implementing AdaptSize Incorporated into Varnish highly concurrent HOC system, 40+ Gbit/s DC Enforce Take traffic Calculate admission measurements the best c control HOC A dapt S ize 10 Goal: low overhead on request path
Implementing AdaptSize Incorporated into Varnish highly concurrent HOC system, 40+ Gbit/s DC Enforce Take traffic Calculate admission measurements the best c control HOC Challenges A dapt 40% 1% 1) Concurrent write conflicts requests objects S ize 2) Locks too slow [NSDI’13 & 14] AdaptSize: producer/consumer + ring buffer Lock-free implementation 11
Implementing AdaptSize Incorporated into Varnish highly concurrent HOC system, 40+ Gbit/s DC Enforce Take traffic Calculate admission measurements the best c control HOC AdaptSize: A dapt admission is really simple S ize given c, and the object size ❏ admit with P(c, size) ❏ Enables lock free & low overhead implementation 12
AdaptSize Evaluation Testbed Origin 40 GBit / Origin : emulates 100s of web servers 100ms RTT 55 million / 8.9 TB unique objects DC DC : unmodified Varnish 4x 1TB/ 7200 Rpm HOC unmodified Varnish A dapt HOC systems : ❏ S ize 1.2 GB NGINX cache ❏ 16 threads AdaptSize ❏ 40 GBit / 30ms RTT Clients : replay Akamai requests trace 440 million / 152 TB total requests 13
Comparison to Production Systems what to admit what to evict Varnish everything concurrent LRU Nginx frequency filter LRU AdaptSize concurrent LRU adaptive size-aware +92% +48% 14
Comparison to Research-Based Systems manually tuned parameters recency and manually tuned parameters frequency combinations +67% manually tuned parameters 15
Robustness of AdaptSize Size-Aware OPT: offline parameter tuning AdaptSize: our Markovian tuning model HillClimb: local-search using shadow queues 16
Conclusion # reqs Goal: maximize OHR of the Hot Object Cache served by HOC OHR= total Approach: size-based admission control # reqs 17
Conclusion # reqs Goal: maximize OHR of the Hot Object Cache served by HOC OHR= total Approach: size-based admission control # reqs Key insight: need to adapt parameter c AdaptSize: adapts c via a Markov chain Result: 48-92% higher OHRs 18
Conclusion # reqs Goal: maximize OHR of the Hot Object Cache served by HOC OHR= total Approach: size-based admission control # reqs Key insight: need to adapt parameter c AdaptSize: adapts c via a Markov chain Result: 48-92% higher OHRs Throughput ❏ In our paper /dasebe/AdaptSize Disk utilization ❏ Byte hit ratio ❏ Request latency ❏ 19
Recommend
More recommend