efficient miss ratio curve computation for heterogeneous
play

Efficient Miss Ratio Curve Computation for Heterogeneous Content - PowerPoint PPT Presentation

Efficient Miss Ratio Curve Computation for Heterogeneous Content Popularity Damiano Carra Giovanni Neglia Computer Science Dept. Universit Cte dAzur University of Verona Inria Verona, Italy Sophia Antipolis, France Context q Caches


  1. Efficient Miss Ratio Curve Computation for Heterogeneous Content Popularity Damiano Carra Giovanni Neglia Computer Science Dept. Université Côte d’Azur University of Verona Inria Verona, Italy Sophia Antipolis, France

  2. Context q Caches are fundamental components of computing architectures – Fast access to the most used items q Shared resource – Used by different processes or applications, with different requirements and access patterns q Main issue – How to divide and assign dynamically cache portions to applications? 2

  3. Miss Ratio Curves – MRCs q Profiling – Hit ratio vs amount of cache space for each application q Use – Compute the MRCs for each application for a given interval – Assign cache space that maximize some objective function q Main concern: Computational complexity 3

  4. Current approaches q MRC are computed from the trace q Exact computation requires: – O(M) memory – O(logM) computational complexity q Approximate computation – Trade precision with space and speed • O(1) memory and O(1) operation per request – Most of the solutions are based on sampling 4

  5. Spatial sampling q Approach: – Observe a randomly selected fraction R of the items – Build the MRC and scale the X-axis by 1/R q R can be adaptive – This allows to obtain O(1) memory and O(1) computational complexity 5

  6. Sampling bias q Spatial sampling biased if requests rates are highly heterogenous across objects – The sample may or may not include some objects that are crucial for the MRC q Solutions to such bias focus on the MRC tail – A correction factor accounts for the difference between the expected and actual average observed requests 6

  7. Experiments on synthetic traces q IRM traces, with Zipf-distributed object popularity – Various ⍺ (Zipf parameter) α = 0.8 α = 1.2 1.2 1.2 MRC 1 1 sample1 sample2 0.8 0.8 Miss ratio Miss ratio sample3 sample4 0.6 0.6 MRC sample1 0.4 0.4 sample2 sample3 0.2 0.2 sample4 0 0 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 Cache size (num. of items) Cache size (num. of items) 7

  8. [Detour] How to measure the accuracy? q Mean Absolute Error (MAE) – Average of the absolute difference between the exact and approximate MRC q Main issue à All sizes have the same weight ms-ex ms-ex 1 1 0.8 0.8 Miss ratio Miss ratio 0.6 0.6 0.4 0.4 0.2 0.2 approximate MRC approximate MRC exact MRC exact MRC 0 0 10 0 10 1 10 2 10 3 10 4 10 5 10 6 0 0.5 1 1.5 2 2.5 Cache size (num. of items x 10 6 ) Cache size (num. of items) 8

  9. [Detour] MAE per Quantile à MAEQ ms-ex q Divide the Y-axis into intervals 1 0.8 – And determine the corresponding intervals Miss ratio on the X-axis 0.6 q Compute the MAE i within each interval i 0.4 0.2 approximate MRC q MAEQ = average MAE over all intervals exact MRC 0 10 0 10 1 10 2 10 3 10 4 10 5 10 6 q In this example: Cache size (num. of items) – MAE = 0.025 – MAEQ = 0.09 9

  10. Accuracy with synthetic traces q As ⍺ increases (high variability in the popularities) à the error increases q Does the error depend on the tail of the popularity distribution? 10 0 R = 0.1 Average error (MAEQ) R = 0.01 10 -1 R = 0.001 10 -2 10 -3 10 -4 0.6 0.8 1.0 1.2 0.6p Parameter α of the Zipf 10

  11. The role of popular objects q Case with ⍺ = 0.6 à We add 20 popular objects IRM α = 0.6, with popular items 10 0 1.4 Zipf, 1.2 Zipf, 0.6 10 -1 1.2 0.6p 10 -2 1 request freq. Miss ratio 10 -3 0.8 10 -4 0.6 10 -5 0.4 MRC sample1 10 -6 sample2 0.2 sample3 sample4 10 -7 0 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 10 0 10 1 10 2 10 3 10 4 10 5 Item ID Cache size (num. of items) q The results are confirmed by a model (see the paper) 11

  12. Our solution: Key idea A B C B D q Small cache sizes depend Requests mostly on the highly popular Reuse Distance Histogram items Exact From samples hits hits q Large cache sizes can be built from sample 1 size size B R Exact MRC Approximate MRC 1 1 q Our approach à Combine 1 R exact and approximate MRCs B size B size Full MRC – Can adopt a “constant 1 sampling rate” or “constant complexity” approach B size 12

  13. Experimental methodology q Comparison with the state-of-the-art solutions – We set the parameters aiming at fair comparison • Use the same amount of memory q CPU overheads – With our scheme, the CPU usage is on average 10% higher than state-of-the-art solutions 13

  14. Results: IRM α = 0.6, with popular items α = 1.2 10 0 1 1 R = 0.1 MRC Average error (MAEQ) R = 0.01 sample1 0.8 0.8 10 -1 R = 0.001 sample2 Miss ratio Miss ratio sample3 0.6 0.6 sample4 10 -2 MRC 0.4 0.4 sample1 10 -3 sample2 0.2 0.2 sample3 sample4 10 -4 0 0 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 7 0.6 0.8 1 1.2 0.6p Parameter α of the Zipf Cache size (num. of items) Cache size (num. of items) q Accurate for any size – MAEQ always below 1% 14

  15. Results: IRM, sensitivity to B α = 0.6, with popular items 1 10 0 B = 0 (SHARDS adj ) Average error (MAEQ) 0.95 B = 32 10 -1 B = 64 0.9 Miss ratio B = 125 B = 250 MRC 0.85 10 -2 B = 500 B = 32 B = 64 0.8 B = 125 10 -3 B = 250 0.75 B = 500 0.7 10 -4 10 0 10 1 10 2 10 3 10 4 10 5 1.2 0.6p Cache size (num. of items) Parameter α of the Zipf q Error mainly where the curves join 15

  16. Real-world traces q Traces from different sources, with different characteristics systor CDN 10 -1 10 -1 10 -2 10 -2 request freq. request freq. 10 -3 10 -3 α = 0.7 α = 1.3 10 -4 10 -4 α = 1.1 10 -5 10 -5 10 -6 10 -6 10 -7 10 -7 10 0 10 1 10 2 10 3 10 4 10 5 10 0 10 1 10 2 10 3 10 4 10 5 Item ID Item ID 16

  17. Real-world traces: results systor systor 10 0 1 1 SHARDS adj Average error (MAEQ) our mixed approach 0.8 0.8 10 -1 Miss ratio Miss ratio 0.6 0.6 MRC MRC 10 -2 0.4 0.4 sample1 sample1 sample2 sample2 0.2 0.2 sample3 sample3 10 -3 sample4 sample4 0 0 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 -4 Cache size (num. of items) Cache size (num. of items) ms-exms-dev systor CDN fiu CDN CDN 1 1 Trace ID 0.8 0.8 Miss ratio Miss ratio 0.6 0.6 MRC MRC 0.4 0.4 sample1 sample1 sample2 sample2 0.2 0.2 sample3 sample3 sample4 sample4 0 0 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 0 10 1 10 2 10 3 10 4 10 5 10 6 Cache size (num. of items) Cache size (num. of items) 17

  18. Extension (1/2) q “Non-stack” algorithms – Eviction policies that do not satisfy the inclusion property – Need to compute the MRC by points – Our approach: • use high R with small cache sizes, decrease R as the cache size increases 18

  19. Extension (2/2) q Heterogenous item size – Web caches deal with items with heterogenous size – Can we build the MRC in such a case? à Order statistics tree – Does sampling work in this case? CDN, het. sizes CDN, het. sizes 1 1 0.8 0.8 Miss ratio Miss ratio 0.6 0.6 MRC MRC 0.4 0.4 sample1 sample1 sample2 sample2 0.2 0.2 sample3 sample3 sample4 sample4 0 0 10 0 10 1 10 2 10 3 10 4 10 5 10 6 10 0 10 1 10 2 10 3 10 4 10 5 10 6 19 Cache size (MB) Cache size (MB)

  20. Conclusions and perspectives q Build a MRC from samples requires a careful design – Highly popular items have a significant impact – Instead of the tail of the distribution, the head is important q In our approach we combine an exact MRC with an approximate one – Improving the accuracy of the final result q Future works – Online adaptation of the scheme parameters 20

  21. Thanks! For any question, you can reach me at

Recommend


More recommend