storage fabric
play

Storage Fabric CS6453 Summary Last week: NVRAM is going to change - PowerPoint PPT Presentation

Storage Fabric CS6453 Summary Last week: NVRAM is going to change the way we thing about storage. Today: Challenges of storage layers (SSDs, HDs) that are created from massive data. Slowdowns in HDs and SSDs. Enforcing policies


  1. Storage Fabric CS6453

  2. Summary  Last week: NVRAM is going to change the way we thing about storage.  Today: Challenges of storage layers (SSDs, HDs) that are created from massive data.  Slowdowns in HDs and SSDs.  Enforcing policies for IO operations in Cloud architectures.

  3. Background: Storage for Big Data  One disk is not enough to handle massive amounts of data.  Last time: Efficient datacenter networks using large number of cheap commodity switches.  Solution: Efficient IO performance using large number of commodity storage devices.

  4. Background: RAIDS  Achieves Nx performance where N is the number of Disks.  Is this for free?  When N becomes large then the probability of Disk failures becomes large as well.  RAID 0 does not tolerate failures.

  5. Background: RAIDS  Achieves (K-1)-fault tolerance with Kx Disks.  Is this for free?  There are Kx more disks (e.g. if you want to tolerate 1 failure you need 2x more Disks than RAID 0).  RAID 1 does not utilize resources in an efficient way.

  6. Background: Erasure Code  Achieves K-fault tolerance with N+K Disks.  Efficient utilization of Disks (not as great as RAID 0).  Fault-Tolerance (not as great as RAID 1).  Is this for free?  Reconstruction Cost : # of Disks needed from a read in case of failure(s).  RAID 6 has a Reconstruction Cost of 3.

  7. Modern Erasure Code Techniques  Erasure Coding in Windows Azure Storage [Huang, 2012]  Exploit Point: 𝑄𝑠𝑝𝑐 1 𝑔𝑏𝑗𝑚𝑣𝑠𝑓 ≫ 𝑄𝑠𝑝𝑐[2 𝑔𝑏𝑗𝑚𝑣𝑠𝑓𝑡 𝑝𝑠 𝑛𝑝𝑠𝑓]  Solution: Construct Erasure Code Technique that has low reconstruction cost for 1 failure.  1.33x more storage overhead (relatively low).  Tolerate up to 3 failures in 16 storage devices.  Reconstruction cost of 6 for 1 failure and 12 for 2+ failures.

  8. The Tail at Store: Problem  We have seen how we treat failures with reconstruction. What about slowdowns in HDs (or SSDs)?  A slowdown of a disk (no failures) might have significant impact at overall performance.  Questions:  Do HDs or SSDs exhibit transient slowdowns?  Are slowdowns of disks frequent enough to affect the overall performance?  What causes slowdowns?  How do we deal with slowdowns?

  9. The Tail at Store: Study RAID D D … D Q P Disk SSD #RAID groups 38,029 572 #Data drives per group 3-26 3-22 #Data drives 458,482 4,069 Total drive hours 857,183,442 7,481,055 Total RAID hours 72,046,373 1,072,690

  10. The Tail at Store: Slowdowns? Hourly average I/O latency  CDF of Slowdown (Disk) per drive 𝑀 1 Slowdown:  𝑀 𝑇 = 0.98 𝑀 𝑛𝑓𝑒𝑗𝑏𝑜 Tail:  T = 𝑇 𝑛𝑏𝑦 0.96 Slow Disks: S ≥ 2  0.94 𝑇 ≥ 2 at 99.8 percentile  𝑇 ≥ 1.5 at 99.3 percentile  0.92 Si 𝑈 ≥ 2 at 97.8 percentile  T 𝑈 ≥ 1.5 at 95.2 percentile 0.9  1x 2x 4x 8x SSDs exhibit even more  slowdowns Slowdown

  11. The Tail at Store: Duration? Slowdowns are transient  CDF of Slowdown Interval 40% of HD slowdowns ≥ 2  1 hours 12% of HD slowdowns ≥ 10  0.8 hours 0.6 Many slowdowns happen in  consecutive hours (last more) 0.4 0.2 Disk SSD 0 1 2 4 8 16 32 64 128 256 Slowdown Interval (Hours)

  12. The Tail at Store: Correlation between slowdowns in the same storage? 90% of Disk slowdown are  CDF of Slowdown Inter-Arrival Period within 24 hours of another slowdown of the same Disk. 1 > 80% of SSDs slowdown  0.9 are within 24 hours of another slowdown of the same SSD. 0.8 Slowdowns happen in the  0.7 same Disks relatively close to each other. 0.6 Disk SSD 0.5 0 5 10 15 20 25 30 35 Inter-Arrival between Slowdowns (Hours)

  13. The Tail at Store: Causes? 𝐽/𝑃𝑆𝑏𝑢𝑓 𝑆𝐽 =  CDF of RI within Si >= 2 𝐽/𝑃𝑆𝑏𝑢𝑓 𝑛𝑓𝑒𝑗𝑏𝑜 1 Rate imbalance does not  seem to be the main cause of slowdowns for slow 0.8 Disks. 0.6 0.4 0.2 Disk SSD 0 0.5x 1x 2x 4x Rate Imbalance

  14. The Tail at Store: Causes? 𝐽/𝑃𝑇𝑗𝑨𝑓 𝑇𝐽 =  CDF of ZI within Si >= 2 𝐽/𝑃𝑇𝑗𝑨𝑓 𝑛𝑓𝑒𝑗𝑏𝑜 1 Size imbalance does not  seem to be the main cause of slowdowns for slow 0.8 Disks. 0.6 0.4 0.2 Disk SSD 0 0.5x 1x 2x 4x Size Imbalance

  15. The Tail at Store: Causes? Disk age seems to have  CDF of Slowdown vs. Drive Age (Disk) some correlation but it is not strongly correlated. 1 0.99 9 1 0.98 2 3 4 0.97 5 7 0.96 6 10 8 0.95 1x 2x 3x 4x 5x Slowdown

  16. The Tail at Store: Causes? No correlation of slowdowns to time of the day (0am – 24pm)  No explicit drive events around slow hours  Unplugging disks and plugging them back does not particularly help  SSD vendors have significant differences between them 

  17. The Tail at Store: Solutions Create Tail-Tolerant RAIDS.  Treat slow disks as failed disks.  Reactive   Detect slow Disks: take a lot of time to answer (>2x from other Disks).  Reconstruct answer from other disks using RAID redundancy if Disk is slow.  Latency is going to optimally be around 3x compared to a read from an average Disk. Proactive   Always use RAID redundancy for additional read.  Take fastest answer.  Uses much more I/O bandwidth. Adaptive   Combination of both approaches taking into account the findings.  Use reactive approach until a slowdown is detected.  After this use proactive approach since slowdowns are repetitive and last many hours.

  18. The Tail at Store: Conclusions  More research on possible causes for Disk and SSD slowdowns is required  Need Tail-Tolerant RAIDS to reduce the overhead from slowdowns  Since reconstruction of data is the way to deal with slowdowns and if 𝑄𝑠𝑝𝑐 1 𝑡𝑚𝑝𝑥𝑒𝑝𝑥𝑜 ≫ 𝑄𝑠𝑝𝑐[2 𝑡𝑚𝑝𝑥𝑒𝑝𝑥𝑜 𝑝𝑠 𝑛𝑝𝑠𝑓] the Azure paper [Huang, 2012] becomes more relevant.

  19. Background: Cloud Storage  General Purpose Applications  Separate VM-VM connections from VM- Storage connections  Storage is virtualized  Many layers from application to actual storage  Resources are shared across multiple tenants

  20. IOFlow: Problem  Cannot support end-to-end policies (e.g. minimum IO bandwidth from application to storage)  Applications do not have any way of expressing their storage policies  Sharing infrastructure where aggressive applications tend to get more IO bandwidth

  21. IOFlow: Challenges  No existing enforcing mechanism for controlling IO rates  Aggregate performance policies  Non-performance policies  Admission control  Dynamic enforcement  Support for unmodified applications and VMs

  22. IOFlow: Do it like SDNs

  23. IOFlow: Supported policies  <VM, Destination> -> Bandwidth (static, compute side)  <VM, Destination> -> Min Bandwidth (dynamic, compute side)  <VM, Destination> -> Sanitize (static, compute or storage side)  <VM, Destination> -> Priority Level (static, compute and storage side)  <Set of VMs, Set of Destinations> -> Bandwidth (dynamic, compute side)

  24. Example 1: Interface  Policies:  <VM1,Server X> -> B1  <VM2,Server X> -> B2  Controller to SMBc of physical server containing VM1 and VM2  createQueueRule(<VM1,Server X>,Q1)  createQueueRule(<VM2,Server X>,Q2)  createQueueRule(<*,*>,Q0)  configureQueueService(Q1, <B1, low, S>), where S is the size of the queue  configureQueueService(Q2, <B2, low, S>)  configureQueueService(Q0, <C-B1-B2, low, S>), where C is the Capacity of Server X.

  25. Example 2: Max-Min Fairness  Policies:  <VM1-VM3,Server X> -> 900 Mbps  Demand:  VM1 -> 600 Mbps  VM2 -> 400 Mbps  VM3 -> 200 Mbps  Result:  VM1 -> 350 Mbps  VM2 -> 350 Mbps  VM3 -> 200 Mbps

  26. IOFlow: Evaluation of Policy Enforcement  Windows-based IO stack  10 hypervisors with 12 VMs each (120 VMs total)  4 tenants using 30 VMs each (3 VMs per hypervisor for each tenant)  1 Storage Server  6.4 Gbps IO Bandwidth  1 Controller  1s interval between dynamic enforcements of policies

  27. IOFlow: Evaluation of Policy Enforcement Tenant Policy Index {VM 1 -30, X} -> Min 800 Mbps Data {VM 31 - 60, X} -> Min 800 Mbps Message {VM 61 -90, X} -> Min 2500 Mbps Log {VM 91 -120, X} -> Min 1500 Mbps

  28. IOFlow: Evaluation of Policy Enforcement

  29. IOFlow: Evaluation of Overhead

  30. IOFlow: Conclusions  Contributions  First Software Defined Storage approach  Fine-grain control over the IO operations in Cloud  Limitations  Network or other resources might be the bottleneck  Need to care about locating the VMs (spatial locality) close to data  Flat Datacenter Storage [Nightingale, 2012] provides solutions for this problem  Guaranteed latencies are not expressed by current policies  Best effort approach by setting priority

  31. Specialized Storage Architectures  HDFS [Shvachko, 2009] and GFS [Ghemawat, 2003] work well for Hadoop MapReduce applications.  Facebook’s Photo Storage [Beaver, 2010] exploits workload characteristics to design and implement better storage system.

Recommend


More recommend