optimizing flash allocation to workloads in google s
play

Optimizing Flash Allocation to Workloads in Google's Colossus File - PowerPoint PPT Presentation

Optimizing Flash Allocation to Workloads in Google's Colossus File System Christoph Albrecht, Arif Merchant , Murray Stokely, Muhammad Waliji, Francois Labelle, Nate Coehlo, Xudong Shi, C. Eric Schrock Google Storage Analytics Team Model-driven


  1. Optimizing Flash Allocation to Workloads in Google's Colossus File System Christoph Albrecht, Arif Merchant , Murray Stokely, Muhammad Waliji, Francois Labelle, Nate Coehlo, Xudong Shi, C. Eric Schrock Google Storage Analytics Team Model-driven algorithms and architectures for self-aware computing systems Dagstuhl seminar 15041, January 2015

  2. Motivation: Trend of HDD and SSD ● Disk drives (HDD) are slow, and are getting larger but not faster. ● Flash (SSD) offers much higher I/O rate, but is expensive. IOPS and capacity of SSD and HDD of equal cost IOPS SSD HDD Capacity

  3. Workloads ● Thousands of users and applications (indexing, ad serving, email, video processing, ...). ● Many component jobs for one application with often separate data.

  4. Janus System (Flash tiering): Insertion on Write, Approximate FIFO New files How much Flash Flash or disk? Q2a Q1 flash? FIFO How long? Q2b Disk

  5. Janus System: Offline Analysis and Optimization (Scan of metadata) (Sampled RPC analysis) Input Statistic (by workload): Statistic (by workload): data Age of bytes stored Age of data accessed collection Cacheability Functions (Hit Rate Curve) characterization of each workload Global Optimization Flash or disk Amount of flash Time in flash (TTL) Q1 Q2a Q2b

  6. Constructing the Cacheability Function For a given amount of flash how many read operations can be absorbed from flash if we store the youngest data in flash?

  7. Cacheability Function Most of the read operations go to the very young data using only a small fraction of the total data size.

  8. Optimizing the Flash Allocation for Workloads Instance: ● Workloads with cacheability functions ● Total flash capacity ● Bound on write rate Task: ● Allocate flash to workloads to maximize the weighted flash read rate. Solution method: ● Lagrangian relaxation + Linear programming (assuming concave and piecewise linear cacheability function)

  9. Does it work? Comparing alternative methods Flash Hit Rate Cell A Cell B Allocation Method (low workload variance) (high workload variance) Optimized 28% 74% Proportional to 47% 26% 64% 76% read rate Single FIFO 19% 42% Proportional to 14% 15% data size

  10. Take away messages ● Specific: Flash is cost-effective for cloud-scale storage if used selectively ● Broader: It is feasible to use large-scale historical trace data for automated on-line configuration

Recommend


More recommend