Demystify fying Cache Policies for Photo Stores at t Scale: A Tencent Case Stu tudy Ke Zhou, Si Sun , Hua Wang, Ping Huang, Xubin He, Rui Lan, Wenyan Li, Wenjie Liu, Tianming Yang Huazhong University of Science and Technology Key Laboratory of Information Storage System Intelligent Cloud Storage Joint Research Center of HUST and Tencent Tencent Inc. Temple University Huanghuai University 1
Outline ◼ Background ◼ The failure of cache policies ◼ Motivation ◼ Prefetching ◼ Performance ◼ Conclusion 2
Background ◼ More than 250 million photos uploaded in QQphoto every day ◼ Total photo view per day approaches to 50 billion ◼ QQphoto faces critical challenges of dealing with such huge mounts of photos • user experiences(needs lower latency) • backend storage burden(needs lower traffic) 3
The photo cache architecture 4
Upload and download 5
Upload channel ◼ Directly write to backend storage original photo : users upload Users/Apps physical photo : original photo and photos upload channel resized from original difference in format or specification … logical photo : a photo set Backend Storage System that containing several physical photos sharing : resize mechanism the same content 6
Two-tier cache Users/Apps where we delve into 7
Outside cache • 9-days logs • >5.8 billion requests • >801 million logical photos • >1.5 billion physical photos • total data size >46 TB • total network traffic >186 TB ◼ Sampling based on logical photos • Extract all logical photos in logs • Random sampling the logical photos by 1:100 • Extract logs containing the sampled logical photos 8
Outline ◼ Background ◼ The failure of cache policies ◼ Motivation ◼ Prefetching ◼ Performance ◼ Conclusion 9
Advanced algorithms fail ◼ ARC, MQ, S3LRU are almost identical and show negligible improvements over LRU X is the cache capacity in production. Belady is theoretical optimal algorithms. 10
Advanced algorithms fail ◼ Phenomenon: • Higher frequency photos contribute more to HR (hit ratio) • Or lower frequency photos are more difficult to hit The CDFs of photo reuse distance grouped by photo frequency. 11
Advanced algorithms fail 100% 90% f = 1 80% f = 2 70% Percentage 2 < f ≤ 5 60% 5 < f ≤ 10 50% 40% 10 < f ≤ 100 30% 100 < f ≤ 1000 20% 1000 < f ≤ 10000 10% f > 10000 0% PoP PoR CtoHR PoP : percentage of photos remove compulsory miss PoR : percentage of requests CoHR : contribution to hit ratio 𝐷𝑝𝐼𝑆 = 𝑏𝑑𝑑𝑓𝑡𝑡 𝑢𝑗𝑛𝑓𝑡 𝑗𝑜 𝑠𝑝𝑣𝑞 − 𝑜𝑣𝑛 𝑝𝑔 𝑞ℎ𝑝𝑢𝑝𝑡 𝑗𝑜 𝑠𝑝𝑣𝑞 𝑏𝑑𝑑𝑓𝑡𝑡 𝑢𝑗𝑛𝑓𝑡 𝑗𝑜 𝑢𝑠𝑏𝑑𝑓 12
Hit ratio contribution breakdown CDF of CtoHR 90% 80% 76.82% 70% 67.90% 60% Hit ratio 50% 40% LRU 30% Belady 20% 10% 0% At cache capacity of X, HR(of LRU) is 67.9% At infinite cache capacity, HR(of Belady) is 76.82% 13
Hit ratio contribution breakdown CDF of CtoHR 90% 80% 76.82% 70% 67.90% 60% Hit ratio 50% 40% LRU 30% Belady 20% 10% 0% ◼ To improve hit ratio, low frequency photos must be hit ◼ Advanced algorithms do no optimization for low frequency data thus they fail to improve 14
Cache size is too large 𝑏𝑤𝑓𝑠𝑏𝑓 𝑣𝑞𝑚𝑝𝑏𝑒 𝑞ℎ𝑝𝑢𝑝𝑡 𝑞𝑓𝑠 𝑒𝑏𝑧 = 𝑢𝑝𝑢𝑏𝑚 𝑒𝑏𝑢𝑏 𝑡𝑗𝑨𝑓 𝑜𝑣𝑛 𝑝𝑔 𝑒𝑏𝑧𝑡 = 46TB ≈ 5.1TB 9 ◼ The cache capacity in production is 5TB! Cache size is large enough to hold all uploaded data! 15
Outline ◼ Background ◼ The failure of cache policies ◼ Motivation ◼ Prefetching ◼ Performance ◼ Conclusion 16
Motivation 100% 90% f = 1 80% f = 2 70% Percentage 2 < f ≤ 5 60% 5 < f ≤ 10 50% 40% 10 < f ≤ 100 30% 100 < f ≤ 1000 20% 1000 < f ≤ 10000 10% f > 10000 0% PoP PoR CtoHR ◼ “Cold” photos( 𝑔𝑠𝑓𝑟 ≤ 5 ) accounts for the vast majority. Can we leverage those cold photos? Yes, make compulsory miss hit! 17
Immediacy ◼ Hint: “ Immediacy ” of social network • Recently uploaded photos are more likely to be requested by users The CDF of interval between photos uploading time and their first request time. 18
Immediacy ◼ More than 90% photos will be requested at least one time within 1 day following their uploading • If placing uploaded photos into cache in time, their compulsory miss will be eliminated → Prefetching • The prefetching is very efficient. 19
Prefetching ~73% >99% miss prefetching ◼ Prefetching • If prefetching photos uploaded every 1 second, nearly all compulsory miss of them will be eliminated • If prefetching every 10 min, about 73% compulsory miss of them will be eliminated • …… 20
Outline ◼ Background ◼ The failure of cache policies ◼ Motivation ◼ Prefetching ◼ Performance ◼ Conclusion 21
Prefetch architecture Request Trigger periodically OC Prefetcher isolated module besides OC Request, pass to DC 22
Which resolution to prefetch ◼ What to prefetch • Original photos are resized to varying physical photos with various resolutions( 𝑆𝑓𝑨1, 𝑆𝑓𝑨2, 𝑆𝑓𝑨3 … ) • Prefetching the needed resolutions We know what content (logical photos) the users need. But we do not know which resolutions(physical photos) they need! A logical photo contains several physical photos 23
Which resolution to prefetch ◼ Problem: • Which resolution being requested is unknown ◼ Intuition: • Prefetch more popular resolutions • The more resolutions prefetching, the higher chances of eliminating compulsory miss 24
Which resolution to prefetch ◼ If frequency of 𝑆𝑓𝑨1 > 𝑆𝑓𝑨2 , 𝑆𝑓𝑨1 has higher priority to be prefetched ◼ 𝑶𝑸𝑺 (number of prefetching resolutions) • Control how many resolutions to be prefetched • E.g. 𝑂𝑄𝑆 = 2 indicates prefetching both 𝑆𝑓𝑨1 and 𝑆𝑓𝑨2 25
When to prefetch ◼ When to prefetch • How long to perform a prefetching. uploading time or here? or here? prefetching here? 26
Prefetching Scheduling ◼ QQphoto service is 24x7 online service ◼ Prefetching should also be online • Triggered periodically: prefetching interval • On a prefetching, all photos uploaded during last period should be prefetched trigger prefetching time prefetching photos uploaded during this time 27
Inserting to cache queue ◼ Inserted prefetched photos into cache queue the same as general replaced photos Insert new photos MRU LRU LRU queue Insert prefetched photos 28
Outline ◼ Background ◼ The failure of cache policies ◼ Motivation ◼ Prefetching ◼ Performance ◼ Conclusion 29
Evaluation ◼ Setup: • a simulator • replaying trace • warming up: first 5 days • collect statistics: last 4 days • evaluating FIFO, LRU, S3LRU, Belady(offline optimal) • prefetching: • NPR: 1-8 • prefetching interval: 1 sec, 10 min, 1 hour 30
Hit ratio- NPR impact 𝐵𝑚𝑝𝑠𝑗𝑢ℎ𝑛𝑡 = LRU, 𝑂𝑄𝑆 = 1, … , 8, 𝑗𝑜𝑢𝑓𝑠𝑤𝑏𝑚 = 10𝑛𝑗𝑜 More NPRs rewards higher hit ratio 31
Hit ratio-prefetch interval impact exceed Belady 𝐵𝑚𝑝𝑠𝑗𝑢ℎ𝑛𝑡 = LRU, 𝑂𝑄𝑆 = 3, 𝑗𝑜𝑢𝑓𝑠𝑤𝑏𝑚 = 1𝑡, 10𝑛, 1ℎ Lower prefetch interval rewards higher hit ratio 32
Latency 𝑂𝑄𝑆 = 1, … , 8, 𝑗𝑜𝑢𝑓𝑠𝑤𝑏𝑚 = 10𝑛 Latency ∝ Hit ratio 🙃 Higher NPR → higher HR → lower latency 33
Network Traffic 𝑂𝑄𝑆 = 1, … , 8, 𝑗𝑜𝑢𝑓𝑠𝑤𝑏𝑚 = 10𝑛 🙂 Increase of NPRs result in huge growth of network traffic. 34
Latency and Network Traffic Trade-off ◼ Best NPR? • 🙃 : lower latency • 🙂 : more network traffic Network traffic and latency trade-offs at cache capacity of X Network traffic Latency 200% 150% best choices Percentage reduce latency by 6.9% 100% consumes 4.14% extra network resources. 50% 0% 1 2 3 4 5 6 7 8 -50% NPR 35
Resolution Popularity Evolution Distribution of resolution popularity keeps stationary. 36
Optimal Prefetch Interval ◼ Low interval is good • 🙃 Be conducive to promote hit ratio • 🙂 Indicates frequent prefetching • affect online caching service ◼ No consistently optimal interval on time-varying workload • max hit ratio loss should not exceed 1% • bias between actual interval and real time • 𝑗𝑜𝑢𝑓𝑠𝑤𝑏𝑚 = 10𝑛 turns out to be a appropriate solution which hit ratio loss is 0.95% 37
Outline ◼ Background ◼ The failure of cache policies ◼ Motivation ◼ Prefetching ◼ Performance ◼ Conclusion 38
Conclusion ◼ Large cache capacity results in failure of improvement of advanced cache policies ◼ Social network exhibits “immediacy” ◼ Prefetching method leverages such “immediacy” to improve hit ratio • Latency is cut by an average of 6.9% while sacrificing only 4.14% additional network cost. 39
Thank you & Questions 40
Recommend
More recommend