Storage Fabric CS6453
Summary Last week: NVRAM is going to change the way we thing about storage. Today: Challenges of storage layers (SSDs, HDs) that are created from massive data. Slowdowns in HDs and SSDs. Enforcing policies for IO operations in Cloud architectures.
Background: Storage for Big Data One disk is not enough to handle massive amounts of data. Last time: Efficient datacenter networks using large number of cheap commodity switches. Solution: Efficient IO performance using large number of commodity storage devices.
Background: RAIDS Achieves Nx performance where N is the number of Disks. Is this for free? When N becomes large then the probability of Disk failures becomes large as well. RAID 0 does not tolerate failures.
Background: RAIDS Achieves (K-1)-fault tolerance with Kx Disks. Is this for free? There are Kx more disks (e.g. if you want to tolerate 1 failure you need 2x more Disks than RAID 0). RAID 1 does not utilize resources in an efficient way.
Background: Erasure Code Achieves K-fault tolerance with N+K Disks. Efficient utilization of Disks (not as great as RAID 0). Fault-Tolerance (not as great as RAID 1). Is this for free? Reconstruction Cost : # of Disks needed from a read in case of failure(s). RAID 6 has a Reconstruction Cost of 3.
Modern Erasure Code Techniques Erasure Coding in Windows Azure Storage [Huang, 2012] Exploit Point: 𝑄𝑠𝑝𝑐 1 𝑔𝑏𝑗𝑚𝑣𝑠𝑓 ≫ 𝑄𝑠𝑝𝑐[2 𝑔𝑏𝑗𝑚𝑣𝑠𝑓𝑡 𝑝𝑠 𝑛𝑝𝑠𝑓] Solution: Construct Erasure Code Technique that has low reconstruction cost for 1 failure. 1.33x more storage overhead (relatively low). Tolerate up to 3 failures in 16 storage devices. Reconstruction cost of 6 for 1 failure and 12 for 2+ failures.
The Tail at Store: Problem We have seen how we treat failures with reconstruction. What about slowdowns in HDs (or SSDs)? A slowdown of a disk (no failures) might have significant impact at overall performance. Questions: Do HDs or SSDs exhibit transient slowdowns? Are slowdowns of disks frequent enough to affect the overall performance? What causes slowdowns? How do we deal with slowdowns?
The Tail at Store: Study RAID D D … D Q P Disk SSD #RAID groups 38,029 572 #Data drives per group 3-26 3-22 #Data drives 458,482 4,069 Total drive hours 857,183,442 7,481,055 Total RAID hours 72,046,373 1,072,690
The Tail at Store: Slowdowns? Hourly average I/O latency CDF of Slowdown (Disk) per drive 𝑀 1 Slowdown: 𝑀 𝑇 = 0.98 𝑀 𝑛𝑓𝑒𝑗𝑏𝑜 Tail: T = 𝑇 𝑛𝑏𝑦 0.96 Slow Disks: S ≥ 2 0.94 𝑇 ≥ 2 at 99.8 percentile 𝑇 ≥ 1.5 at 99.3 percentile 0.92 Si 𝑈 ≥ 2 at 97.8 percentile T 𝑈 ≥ 1.5 at 95.2 percentile 0.9 1x 2x 4x 8x SSDs exhibit even more slowdowns Slowdown
The Tail at Store: Duration? Slowdowns are transient CDF of Slowdown Interval 40% of HD slowdowns ≥ 2 1 hours 12% of HD slowdowns ≥ 10 0.8 hours 0.6 Many slowdowns happen in consecutive hours (last more) 0.4 0.2 Disk SSD 0 1 2 4 8 16 32 64 128 256 Slowdown Interval (Hours)
The Tail at Store: Correlation between slowdowns in the same storage? 90% of Disk slowdown are CDF of Slowdown Inter-Arrival Period within 24 hours of another slowdown of the same Disk. 1 > 80% of SSDs slowdown 0.9 are within 24 hours of another slowdown of the same SSD. 0.8 Slowdowns happen in the 0.7 same Disks relatively close to each other. 0.6 Disk SSD 0.5 0 5 10 15 20 25 30 35 Inter-Arrival between Slowdowns (Hours)
The Tail at Store: Causes? 𝐽/𝑃𝑆𝑏𝑢𝑓 𝑆𝐽 = CDF of RI within Si >= 2 𝐽/𝑃𝑆𝑏𝑢𝑓 𝑛𝑓𝑒𝑗𝑏𝑜 1 Rate imbalance does not seem to be the main cause of slowdowns for slow 0.8 Disks. 0.6 0.4 0.2 Disk SSD 0 0.5x 1x 2x 4x Rate Imbalance
The Tail at Store: Causes? 𝐽/𝑃𝑇𝑗𝑨𝑓 𝑇𝐽 = CDF of ZI within Si >= 2 𝐽/𝑃𝑇𝑗𝑨𝑓 𝑛𝑓𝑒𝑗𝑏𝑜 1 Size imbalance does not seem to be the main cause of slowdowns for slow 0.8 Disks. 0.6 0.4 0.2 Disk SSD 0 0.5x 1x 2x 4x Size Imbalance
The Tail at Store: Causes? Disk age seems to have CDF of Slowdown vs. Drive Age (Disk) some correlation but it is not strongly correlated. 1 0.99 9 1 0.98 2 3 4 0.97 5 7 0.96 6 10 8 0.95 1x 2x 3x 4x 5x Slowdown
The Tail at Store: Causes? No correlation of slowdowns to time of the day (0am – 24pm) No explicit drive events around slow hours Unplugging disks and plugging them back does not particularly help SSD vendors have significant differences between them
The Tail at Store: Solutions Create Tail-Tolerant RAIDS. Treat slow disks as failed disks. Reactive Detect slow Disks: take a lot of time to answer (>2x from other Disks). Reconstruct answer from other disks using RAID redundancy if Disk is slow. Latency is going to optimally be around 3x compared to a read from an average Disk. Proactive Always use RAID redundancy for additional read. Take fastest answer. Uses much more I/O bandwidth. Adaptive Combination of both approaches taking into account the findings. Use reactive approach until a slowdown is detected. After this use proactive approach since slowdowns are repetitive and last many hours.
The Tail at Store: Conclusions More research on possible causes for Disk and SSD slowdowns is required Need Tail-Tolerant RAIDS to reduce the overhead from slowdowns Since reconstruction of data is the way to deal with slowdowns and if 𝑄𝑠𝑝𝑐 1 𝑡𝑚𝑝𝑥𝑒𝑝𝑥𝑜 ≫ 𝑄𝑠𝑝𝑐[2 𝑡𝑚𝑝𝑥𝑒𝑝𝑥𝑜 𝑝𝑠 𝑛𝑝𝑠𝑓] the Azure paper [Huang, 2012] becomes more relevant.
Background: Cloud Storage General Purpose Applications Separate VM-VM connections from VM- Storage connections Storage is virtualized Many layers from application to actual storage Resources are shared across multiple tenants
IOFlow: Problem Cannot support end-to-end policies (e.g. minimum IO bandwidth from application to storage) Applications do not have any way of expressing their storage policies Sharing infrastructure where aggressive applications tend to get more IO bandwidth
IOFlow: Challenges No existing enforcing mechanism for controlling IO rates Aggregate performance policies Non-performance policies Admission control Dynamic enforcement Support for unmodified applications and VMs
IOFlow: Do it like SDNs
IOFlow: Supported policies <VM, Destination> -> Bandwidth (static, compute side) <VM, Destination> -> Min Bandwidth (dynamic, compute side) <VM, Destination> -> Sanitize (static, compute or storage side) <VM, Destination> -> Priority Level (static, compute and storage side) <Set of VMs, Set of Destinations> -> Bandwidth (dynamic, compute side)
Example 1: Interface Policies: <VM1,Server X> -> B1 <VM2,Server X> -> B2 Controller to SMBc of physical server containing VM1 and VM2 createQueueRule(<VM1,Server X>,Q1) createQueueRule(<VM2,Server X>,Q2) createQueueRule(<*,*>,Q0) configureQueueService(Q1, <B1, low, S>), where S is the size of the queue configureQueueService(Q2, <B2, low, S>) configureQueueService(Q0, <C-B1-B2, low, S>), where C is the Capacity of Server X.
Example 2: Max-Min Fairness Policies: <VM1-VM3,Server X> -> 900 Mbps Demand: VM1 -> 600 Mbps VM2 -> 400 Mbps VM3 -> 200 Mbps Result: VM1 -> 350 Mbps VM2 -> 350 Mbps VM3 -> 200 Mbps
IOFlow: Evaluation of Policy Enforcement Windows-based IO stack 10 hypervisors with 12 VMs each (120 VMs total) 4 tenants using 30 VMs each (3 VMs per hypervisor for each tenant) 1 Storage Server 6.4 Gbps IO Bandwidth 1 Controller 1s interval between dynamic enforcements of policies
IOFlow: Evaluation of Policy Enforcement Tenant Policy Index {VM 1 -30, X} -> Min 800 Mbps Data {VM 31 - 60, X} -> Min 800 Mbps Message {VM 61 -90, X} -> Min 2500 Mbps Log {VM 91 -120, X} -> Min 1500 Mbps
IOFlow: Evaluation of Policy Enforcement
IOFlow: Evaluation of Overhead
IOFlow: Conclusions Contributions First Software Defined Storage approach Fine-grain control over the IO operations in Cloud Limitations Network or other resources might be the bottleneck Need to care about locating the VMs (spatial locality) close to data Flat Datacenter Storage [Nightingale, 2012] provides solutions for this problem Guaranteed latencies are not expressed by current policies Best effort approach by setting priority
Specialized Storage Architectures HDFS [Shvachko, 2009] and GFS [Ghemawat, 2003] work well for Hadoop MapReduce applications. Facebook’s Photo Storage [Beaver, 2010] exploits workload characteristics to design and implement better storage system.
Recommend
More recommend