Osama Khan and Randal Burns, Johns Hopkins University James Plank, - PowerPoint PPT Presentation

Osama Khan and Randal Burns, Johns Hopkins University James Plank, University of Tennessee Cheng Huang, Microsoft Research

 How do we ensure data reliability ◦ Replication (easy but inefficient) ◦ Erasure Coding (complex but efficient)  Storage space was a relatively expensive resource  MDS codes used to achieve optimal storage efficiency for a given fault tolerance

 Emergence of workloads/scenarios where recovery dictates overall I/O performance ◦ System updates ◦ Deep archival stores  A traditional k -of- n MDS code would require k I/Os to recover from a single failure  Can we do better than k I/Os?

 Existing approaches use matrix inversion ◦ Represents one possible solution, not necessarily the one with the lowest I/O cost  We have come up with a new way to recover lost data which minimizes the number of I/Os needed for recovery ◦ Its computationally intensive, though all common failure scenarios can be computed apriori ◦ Applicable to any matrix based erasure code

 Collection of bits in the codeword whose corresponding rows in the Generator matrix sum to zero ◦ We can decode any one bit as long as the remaining bits in that equation are not lost +  { D 0 , D 2 , C 0 } is a decoding equation

 Finds a decoding equation for each failed bit while minimizing the number of total elements accessed  Enumerate all decoding equations and for each f i ∈ F , determine set E i ◦ F is set of failed bits ◦ E i is set of decoding equations which include f i  Goal: Select one equation e i from each E i | F | such that | e i | is minimized i =1

 Finding all such e i is NP-Hard but we can convert equations into a directed weighted graph and find the shortest path ◦ Pruning makes it feasible to solve for practical values of | F | and | E i | Cumulative record of 1 D 0 equations applied so far 0 D 1 1 D 2 0 D 3 An edge for each 1 C 0 00110001 equation in E i 0 C 1 0 C 2 0 C 3 Level i Bitstring representation of decoding equation { D 0 , D 2 , C 0 }

F = { D 0 , D 1 }, so f 0 = D 0 and f 1 = D 1 Recovery op*ons Recovery op*ons for f 0 for f 1

Equations Equations from E 1 from E 0

* * * Results similar to existing work

 So we have found a way to make recovery I/O of matrix based MDS codes optimal ◦ How about non-MDS codes?  Can we achieve better recovery I/O performance at the cost of lower storage efficiency?  Replication and MDS codes seem to be the two extrema in this tradeoff

 GRID codes allow two (or more) erasure codes to be applied to the same data, each in its own dimension  To achieve low recovery I/O coupled with high fault tolerance, we use ◦ Weaver codes: recovery I/O independent of stripe size ◦ STAR codes: builds up redundancy  All single failures can be recovered in the Weaver dimension

STAR Weaver nv nh disk with parity disk with data and parity

I/Os for # disks Storage Fault recovery accessed efficiency tolerance GRID(S,W(2,2)) 4 3 31.25% 11 GRID(S,W(3,3)) 6 3 31.25% 15 GRID(S,W(2,4)) 7 4 20.8% 19 I/Os for # disks Storage Fault recovery accessed efficiency tolerance RS(20,31) 20 20 60.6% 11 RS(30,45) 30 30 66.6% 15 RS(30,49) 30 30 61.2% 19

 We conjecture that there is a fundamental tradeoff between storage efficiency and recovery I/O ◦ Formal relationship?  Programmatic search of generator matrices with optimal recovery I/O schedules ◦ Large search space but reasonably sized systems (100 disks?) may be a feasible option

Thank you!

Osama Khan and Randal Burns, Johns Hopkins University James Plank, - PowerPoint PPT Presentation

Osama Khan and Randal Burns, Johns Hopkins University James Plank, University of Tennessee Cheng Huang, Microsoft Research How do we ensure data reliability Replication (easy but inefficient) Erasure Coding (complex but efficient)

The Johns Hopkins Hospital Success in Hiring Ex-Offenders The Johns Hopkins Hospital Success in

Kunal Lillaney Advisor: Dr. Randal Burns Johns Hopkins University HBP CodeJamWorkshop #7 13

Reproducible and Shareable Data Science in Distributed Clouds Randal Burns Professor and Chair

Rethinking Erasure Codes for Cloud File Systems: Minimizing I/O for Recovery and Degraded Reads

Opening Session Opening Remarks and Best Paper Awards FAST 10 Program Co-Chairs Randal Burns,

Lecture 22 I/O Performance and Checkpoints EN 600.320/420/620 Instructor: Randal Burns 27 March

Lecture 12.1 MPI Messaging and Deadlock EN 600.320/420 Instructor: Randal Burns 7 March 2018

Lecture 10.1 Safety and Liveness EN 600.320/420 Instructor: Randal Burns 28 February 2018

Lecture 18 I/O Performance and Checkpoints EN 600.320/420/620 Instructor: Randal Burns 4

Lecture 12.2 MPI Async and Barrier EN 600.320/420 Instructor: Randal Burns 7 March 2018

Lecture 1.3 Moores Law and Dennard Scaling EN 600.320/420 Instructor: Randal Burns 29

Lecture 16.1 Spark and RDDs EN 600.320/420 Instructor: Randal Burns 9 April 2018 Department of

Lecture 5.2 Parallel Memory Models EN 600.320/420/620 Instructor: Randal Burns 12 February 2018

Lecture 20 Top 500 EN 600.320/420/620 Instructor: Randal Burns 12 March 2019 Department of

Lecture 15.3 Hadoop! Toolchain EN 600.320/420 Instructor: Randal Burns 4 April 2018 Department

Lecture 11.2 MPI EN 600.320/420 Instructor: Randal Burns 6 March 2018 Department of Computer

Pattern-Matching Spi-Calculus A Type System for Cryptographic Protocols Christian Haack and Alan

PSEUDOSPECTRA o Application of eigenvalue o Pseudospectra definition D E F I N I T I O N S A N D

Star Formation across cosmic time Florent Renaud & Oscar Agertz Lund Observatory Polaris

Valuative invariants for polymatroids Harm Derksen 1 Alex Fink 2 1 University of Michigan 2 UC

Distributed intelligence in multi agent systems Usman Khan Department of Electrical and

Regret bounds for online variational inference Pierre Alquier ACML Nagoya, Nov. 18, 2019

Workflow in General Practices Dr. Urooj R. Khan Introduction 1. PCEHR/MyHR been live for more

Fine-Grained Similarity Measurement of Educational Videos and Exercises Xin Wang 1 , Wei Huang 1 ,

Osama Khan and Randal Burns, Johns Hopkins University James Plank, - PowerPoint PPT Presentation

Osama Khan and Randal Burns, Johns Hopkins University James Plank, University of Tennessee Cheng Huang, Microsoft Research How do we ensure data reliability Replication (easy but inefficient) Erasure Coding (complex but efficient)

The Johns Hopkins Hospital Success in Hiring Ex-Offenders The Johns Hopkins Hospital Success in

Kunal Lillaney Advisor: Dr. Randal Burns Johns Hopkins University HBP CodeJamWorkshop #7 13

Reproducible and Shareable Data Science in Distributed Clouds Randal Burns Professor and Chair

Rethinking Erasure Codes for Cloud File Systems: Minimizing I/O for Recovery and Degraded Reads

Opening Session Opening Remarks and Best Paper Awards FAST 10 Program Co-Chairs Randal Burns,

Lecture 22 I/O Performance and Checkpoints EN 600.320/420/620 Instructor: Randal Burns 27 March

Lecture 12.1 MPI Messaging and Deadlock EN 600.320/420 Instructor: Randal Burns 7 March 2018

Lecture 10.1 Safety and Liveness EN 600.320/420 Instructor: Randal Burns 28 February 2018

Lecture 18 I/O Performance and Checkpoints EN 600.320/420/620 Instructor: Randal Burns 4

Lecture 12.2 MPI Async and Barrier EN 600.320/420 Instructor: Randal Burns 7 March 2018

Lecture 1.3 Moores Law and Dennard Scaling EN 600.320/420 Instructor: Randal Burns 29

Lecture 16.1 Spark and RDDs EN 600.320/420 Instructor: Randal Burns 9 April 2018 Department of

Lecture 5.2 Parallel Memory Models EN 600.320/420/620 Instructor: Randal Burns 12 February 2018

Lecture 20 Top 500 EN 600.320/420/620 Instructor: Randal Burns 12 March 2019 Department of

Lecture 15.3 Hadoop! Toolchain EN 600.320/420 Instructor: Randal Burns 4 April 2018 Department

Lecture 11.2 MPI EN 600.320/420 Instructor: Randal Burns 6 March 2018 Department of Computer

Pattern-Matching Spi-Calculus A Type System for Cryptographic Protocols Christian Haack and Alan

PSEUDOSPECTRA o Application of eigenvalue o Pseudospectra definition D E F I N I T I O N S A N D

Star Formation across cosmic time Florent Renaud &amp; Oscar Agertz Lund Observatory Polaris

Valuative invariants for polymatroids Harm Derksen 1 Alex Fink 2 1 University of Michigan 2 UC

Distributed intelligence in multi agent systems Usman Khan Department of Electrical and

Regret bounds for online variational inference Pierre Alquier ACML Nagoya, Nov. 18, 2019

Workflow in General Practices Dr. Urooj R. Khan Introduction 1. PCEHR/MyHR been live for more

Fine-Grained Similarity Measurement of Educational Videos and Exercises Xin Wang 1 , Wei Huang 1 ,

Star Formation across cosmic time Florent Renaud & Oscar Agertz Lund Observatory Polaris