semantics of caching with spoca a stateless proportional
play

Semantics of Caching with SPOCA: A Stateless, Proportional, - PowerPoint PPT Presentation

Context of the problem Requirements for routing Hashing algorithm Geographic distribution Experiments Semantics of Caching with SPOCA: A Stateless, Proportional, Optimally-Consistent Addressing Algorithm Piotr Skowron November 6, 2011


  1. Context of the problem Requirements for routing Hashing algorithm Geographic distribution Experiments Semantics of Caching with SPOCA: A Stateless, Proportional, Optimally-Consistent Addressing Algorithm Piotr Skowron November 6, 2011 Piotr Skowron Semantics of Caching with SPOCA: A Stateless, Proportional,

  2. Context of the problem Requirements for routing Hashing algorithm Geographic distribution Experiments Abstract This presentation is about: how Yahoo! efficiently serves millions of videos from its video library, what architecture they use to ensure efficient caching (and so the significant improvement in the quality of service), how their new algorithm improved disk cache misses from 5% to less than 1% and increased memory cache hits from 45% to 80% (thus improving overall cache hits from 95% to 99.6%). Piotr Skowron Semantics of Caching with SPOCA: A Stateless, Proportional,

  3. Context of the problem Requirements for routing Hashing algorithm Geographic distribution Experiments The way it works clients access videos using web browsers, clients connect to front-end servers which serve the video content, the front-end servers cache content, but are not the permanent repository, videos are stored in a storage farm that is accessible through front-end servers. Piotr Skowron Semantics of Caching with SPOCA: A Stateless, Proportional,

  4. Context of the problem Requirements for routing Hashing algorithm Geographic distribution Experiments The way it works – diagram Piotr Skowron Semantics of Caching with SPOCA: A Stateless, Proportional,

  5. Context of the problem Requirements for routing Hashing algorithm Geographic distribution Experiments Local or remote serving The videos are spread around the world so when the user is accessing a video we have one of the following options: retrieve the content from the storage farm � , if it is cached in the disks of a front-end server, the content can be served more efficiently � , in the best case the content may be cached in the memory of a front-end servers � . In case of videos the difference between caching in memory and the caching in disks is small. Piotr Skowron Semantics of Caching with SPOCA: A Stateless, Proportional,

  6. Context of the problem Requirements for routing Hashing algorithm Geographic distribution Experiments Local or remote serving The videos are spread around the world so when the user is accessing a video we have one of the following options: retrieve the content from the storage farm � , if it is cached in the disks of a front-end server, the content can be served more efficiently � , in the best case the content may be cached in the memory of a front-end servers � . In case of videos the difference between caching in memory and the caching in disks is small. Piotr Skowron Semantics of Caching with SPOCA: A Stateless, Proportional,

  7. Context of the problem Requirements for routing Hashing algorithm Geographic distribution Experiments Local or remote serving Retrieving a content from the storage farm not only causes significantly longer delivery, but also causes more load on the back-end infrastructure. This causes: higher cost of upgrading networking components, higher number of the servers in the storage farm required, in order to handle more load. Piotr Skowron Semantics of Caching with SPOCA: A Stateless, Proportional,

  8. Context of the problem Requirements for routing Hashing algorithm Geographic distribution Experiments Why caching is difficult in case of the video? Why caching is difficult in case of the video? videos are large – a typical front-end server can hold 500 unique videos in memory and 100,000 on disk, the demand is high – users make over 30,000,000 requests per day for over 800,000 unique videos; there is over 20,000,000 of unique videos in the library, the ratio: total/unique requests in low. Piotr Skowron Semantics of Caching with SPOCA: A Stateless, Proportional,

  9. Context of the problem Requirements for routing Hashing algorithm Geographic distribution Experiments The traditional VIP (Virtual IP) Piotr Skowron Semantics of Caching with SPOCA: A Stateless, Proportional,

  10. Context of the problem Requirements for routing Hashing algorithm Geographic distribution Experiments The traditional VIP (Virtual IP) With VIP each piece of popular content will end up on multiple servers → this redundant caching is highly inefficient compared to caching where each piece of content is kept at the single server. On the other hand remembering which front-end server hosts which content is expensive. The question of a day is: how to increase caching efficiency through intelligent routing without remembering content location in a database. Piotr Skowron Semantics of Caching with SPOCA: A Stateless, Proportional,

  11. Context of the problem Requirements for routing Hashing algorithm Geographic distribution Experiments Stateless There should be no need of keeping a data catalog associating each content file with a particular front-end server. Piotr Skowron Semantics of Caching with SPOCA: A Stateless, Proportional,

  12. Context of the problem Requirements for routing Hashing algorithm Geographic distribution Experiments Consistent The requests for a particular content file should all be directed to the same server. For stateless addressing the inputs are a filename and a list of currently available servers; the output is a server from the list. Consistency means that the same input always produces the same output. Piotr Skowron Semantics of Caching with SPOCA: A Stateless, Proportional,

  13. Context of the problem Requirements for routing Hashing algorithm Geographic distribution Experiments Proportional The requests should be partitioned between the front-end servers proportionally to the servers weights. Example A newer server might have twice the capacity of an older one, and therefore should service twice as large portion of the content library. Remark The proportionality requirement rules out the use of a distance-based consistent hashing algorithm, although such algorithms are consistent and stateless. Piotr Skowron Semantics of Caching with SPOCA: A Stateless, Proportional,

  14. Context of the problem Requirements for routing Hashing algorithm Geographic distribution Experiments Proportional Proportionality must be ensured also in case of adding or removing a server. The requests must not be distributed only between the nearest servers, therefore the distance-based hashing algorithms do not pass their exam. Piotr Skowron Semantics of Caching with SPOCA: A Stateless, Proportional,

  15. Context of the problem Requirements for routing Hashing algorithm Geographic distribution Experiments Optimally-consistent When a front-end server enters or leaves the pool as few as possible files are redistributed to the other servers. Example Suppose that a pool has 3 front-end servers of weight 100 and two servers of weight 200. If the new server of weight 200 is added to the pool it must be assigned 2 9 of the files in the content library. But also, more specifically, for each of the other servers it must take over 2 9 of the files that server was handling. Piotr Skowron Semantics of Caching with SPOCA: A Stateless, Proportional,

  16. Context of the problem Requirements for routing Hashing algorithm Geographic distribution Experiments Optimally-consistent Piotr Skowron Semantics of Caching with SPOCA: A Stateless, Proportional,

  17. Context of the problem Requirements for routing Hashing algorithm Geographic distribution Experiments Proportionality once again It must be taken into account that proportional distribution of the files does not necessarily means the proportional distribution of the requests. Piotr Skowron Semantics of Caching with SPOCA: A Stateless, Proportional,

  18. Context of the problem Requirements for routing Hashing algorithm Geographic distribution Experiments Hashing algorithm The routing function uses a hash function to map file names to the points in a hash space. Each front-end server is assigned a portion of hash space proportional to its capacity. Not every point in the hash space maps to a front-end server, so when the hash of the name of a requested video maps to unassigned space, the result of the hash function is hashed again until result lands in an assigned portion of the hash space. Piotr Skowron Semantics of Caching with SPOCA: A Stateless, Proportional,

  19. Context of the problem Requirements for routing Hashing algorithm Geographic distribution Experiments Unassigned space hit Piotr Skowron Semantics of Caching with SPOCA: A Stateless, Proportional,

  20. Context of the problem Requirements for routing Hashing algorithm Geographic distribution Experiments Leaving server Piotr Skowron Semantics of Caching with SPOCA: A Stateless, Proportional,

  21. Context of the problem Requirements for routing Hashing algorithm Geographic distribution Experiments Entering server Piotr Skowron Semantics of Caching with SPOCA: A Stateless, Proportional,

Recommend


More recommend