serving photos at scaaale
play

Serving Photos at Scaaale : Caching and Storage An Analysis of - PowerPoint PPT Presentation

Serving Photos at Scaaale : Caching and Storage An Analysis of Facebook Photo Caching. Huang et al. Finding a Needle in a Haystack. Beaver et al. Vlad Niculae for CS6410 Most slides from Qi Huang (SOSP 2013) and Peter Vajgel (OSDI 2010) Dynamic


  1. Serving Photos at Scaaale : Caching and Storage An Analysis of Facebook Photo Caching. Huang et al. Finding a Needle in a Haystack. Beaver et al. Vlad Niculae for CS6410 Most slides from Qi Huang (SOSP 2013) and Peter Vajgel (OSDI 2010)

  2. Dynamic (hard to cache; TAO) Static (photos, normally easy to cache)

  3. Dynamic (hard to cache; TAO) Static (photos, normally easy to cache)

  4. Dynamic (hard to cache; TAO) CDN Static (photos, normally easy to cache)

  5. An Analysis of Facebook Photo Caching Qi Huang , Ken Birman, Robbert van Renesse (Cornell), Wyatt Lloyd (Princeton, Facebook), Sanjeev Kumar, Harry C. Li (Facebook)

  6. Dynamic (hard to cache; TAO) CDN Static (photos, normally easy to cache) “Normal” site CDN hitrate ~99%

  7. Dynamic (hard to cache; TAO) CDN Static (photos, normally easy to cache) “Normal” site CDN hitrate ~99% For Facebook, CDN hitrate ~80%

  8. Cache Storage Layers Backend

  9. Facebook Datacenter Client Edge Cache Browser Origin Cache Backend Akamai Cache Cache layers

  10. Facebook Datacenter Client Edge Cache Browser Origin Cache Akamai Backend (no access) Cache Cache layers

  11. Points of presence: Independent FIFO Main goal: reduce bandwidth Datacenter Client Facebook Browser Edge Cache Origin Cache Backend Cache Cache layers

  12. Origin: Coordinated FIFO Main goal: traffic sheltering Datacenter Client Facebook Browser Edge Cache Origin Cache Backend Cache Cache layers

  13. Origin: Coordinated. FIFO Main goal: hash traffic sheltering Datacenter Client Facebook Browser Edge Cache Origin Cache Backend Cache Cache layers

  14. Analyze traffic in production! Correlate across layers. Instrument client JS Datacenter Client Facebook Browser Edge Cache Origin Cache Backend Cache Log successful requests. Cache layers

  15. Sampling on Power-law Object-based Object rank • Object-based: fair coverage of unpopular content • Sample 1.4M photos, 2.6M photo objects 18

  16. Data analysis

  17. Client Data Center PoP Browser Edge Origin Backend R Cache Cache Cache (Haystack) 77.2M 26.6M 65.5% 11.2M 58.0% 7.6M 31.8% 9.9% Traffic Share 65.5% 20.0% 4.6% 21

  18. Object rank 22

  19. Popularity Distribution • Backend resembles a stretched exponential dist. 23

  20. Popularity Impact on Caches 70% Haystack • Backend serves the tail 24

  21. Hit rates for each level (fig 4c) 100 90 80 70 60 50 40 30 20 10 0 A B C D E F G Browser Edge Origin

  22. What if?

  23. • Picked San Jose edge (high traffic, median hit ratio) Edge Cache with Different Sizes Infinite Cache 68% 65% 59% • “ Infinite ” size ratio needs 45x of current capacity 29

  24. Edge Cache with Different Algos Infinite Cache • Both LRU and LFU outperform FIFO slightly 30

  25. S4LRU Cache Space L3 L2 More Recent L1 L0 31

  26. Edge Cache with Different Algos Infinite Cache 68% 59% 1/3x • S4LRU improves the most 35

  27. Edge Cache with Different Algos Infinite Cache • Clairvoyant => room for algorithmic improvement. 36

  28. Origin Cache Infinite Cache 14% • S4LRU improves Origin more than Edge 37

  29. Geographic Coverage of Edge Small working set 38

  30. Geographic Coverage of Edge • Atlanta has 80% requests served by remote Edges. Not uncommon! 5% NYC 10% Chicago 35% D.C. 5% California Atlanta 20% local 5% Dallas 20% Miami 39

  31. Geographic Coverage of Edge Amplified working set 40

  32. Collaborative Edge 41

  33. Collaborative Edge 18% Collaborative • “ Collaborative ” Edge increases hit ratio by 18% 42

  34. What Facebook Could Do: • Improve cache algorithm (+invest in cache algo research) • Coordinate Edge caches • Let some phones resize their own photos • Use more machine learning at this layer!

  35. Backend storage for blobs • Some requests are bound to miss the caches. • Reads >> writes >> deletes. • Writes often come in batches (Photo Albums) • In this regime, Facebook found default solutions not to work.

Recommend


More recommend