Practical Bounds on Optimal Caching with Variable Object Sizes Daniel S. Berger Nathan Beckmann Mor Harchol-Balter Carnegie Mellon University ACM Sigmetrics, Irvine, June 19, 2018.
Caches are Everywhere 10x Web browser Web Cache Server in Europe 10-100x Web App DB Cache Slow database Goal: minimize cache miss ratio # requests not served from cache miss ratio = total # requests 1
Everyone Has Worked in Caching Results on 2016 CDN trace. Cache size: 4GB. Are we there yet? lower is better ? OPT Key Question: how much further can miss ratios be improved? Def: OPT = lowest miss ratio possible on a given trace 2
Defining OPT Def: OPT = lowest miss ratio possible on a given trace = offline optimal miss ratio on a given trace a b c b d a c d a b c d Constraints: 1. Limited cache size 2. Gets to see full request trace 3. No prefetching ⇒ admit an object only when it is requested 3
Finding OPT What is OPT? Belady? No! Belady assumes equal sizes! 9 orders of magnitude variability! So, can we find OPT? Unfortunately, NP-hard In fact, strongly NP complete Can we approximate OPT? 4
OPT Approximation Algorithms Time Approximation Technique Complexity Guarantee OFMA [STOC’97] O(N 2 ) O(log cache size) LP rounding [SODA’99] O(N 5 ) O(log ( max size /min size )) LocalRatio [JACM’01] O(N 3 ) 4 State-of-the-art 4-approximation not practical OPT 5
OPT Approximation Algorithms Time Approximation Technique Complexity Guarantee Best prior approximation O(N 3 ) 4 Traces are not adversarial in practice ⇒ Probabilistic assumptions Independent Reference Model (IRM) Large systems: #objects, cache size 6
Our Main Result Time Approximation Technique Complexity Guarantee Best prior approximation O(N 3 ) 4 Flow-offline optimum (FOO) O(N 2 log 2 N) 1 Traces are not adversarial in practice On trace with strong correlations: error < 0.14% ⇒ Probabilistic assumptions Independent Reference Model (IRM) Large systems: #objects, cache size 7
How does FOO attain OPT for large systems? How to get OPT fast: Detailed Interval FOO Min Cost Interval LP OPT ILP ILP relaxation Flow graph Ω(N 3.5 ) O(N 2 log 2 N) NP-hard NP-hard a b c b a Trace: DVs for x (a,1) x (a,2) x (a,3) x (a,4) x (a,5) object a: 8
How does FOO attain OPT for large systems? How to get OPT fast: Detailed Interval FOO Min Cost Interval LP OPT ILP ILP relaxation Flow graph Ω(N 3.5 ) O(N 2 log 2 N) NP-hard NP-hard How to prove FOO’s correctness: Non- Precedence Integer Coupon integer relation, DVs always under large collector decision which forces almost exists IRM systems problem vars (DVs) integer DVs surely 9
Our Main Result Time Approximation Technique Complexity Guarantee Best prior approximation O(N 3 ) 4 Flow-offline optimum (FOO) O(N 2 log 2 N) 1 Computable with up to 10 7 requests 10
Empirical Results Key Question: how much further can miss ratios be improved? Are we there yet? lower is better 30% gap Could be optimal… or not 11
Results for Other Configurations Key Question: how much further can miss ratios be improved? WebApp Storage CDN Small Cache 30% gap 61% gap 15% gap Large Cache 45% gap 51% gap 41% gap 12
Conclusion Time Approximation Technique Complexity Guarantee Best prior approximation O(N 3 ) 4 Flow-offline optimum (FOO) O(N 2 log 2 N) 1 Actually can do: O(N log N) Implication: large potential for new caching policies ⇒ e.g., 60% improvement possible for WebApps 13
Practical Bounds on Optimal Caching with Variable Object Sizes Daniel S. Berger Nathan Beckmann Mor Harchol-Balter Carnegie Mellon University Source code and data: available at /dasebe/optimalwebcaching ACM Sigmetrics, Irvine, June 19, 2018.
Recommend
More recommend