beehive erasure codes for fixing multiple failures in
play

Beehive : Erasure Codes for Fixing Multiple Failures in Distributed - PowerPoint PPT Presentation

Beehive : Erasure Codes for Fixing Multiple Failures in Distributed Storage Systems Jun Li, Baochun Li University of Toronto HotStorage 15 Distributed Storage Store a massive amount of data over a large number of commodity servers,


  1. Beehive : Erasure Codes for Fixing Multiple Failures in Distributed Storage Systems Jun Li, Baochun Li University of Toronto HotStorage ’15

  2. Distributed Storage ‣ Store a massive amount of data over a large number of commodity servers, such as HDFS ‣ Servers are subject to frequent failures 2

  3. Distributed Storage ‣ Store redundant data to ensure data durability and availability regardless of failures ‣ replication: store multiple copies on different servers D1 D1 D1 D2 D3 D3 D3 D2 D2 3-way replication 3

  4. Distributed Storage ‣ Store redundant data to ensure data durability and availability regardless of failures ‣ replication: store multiple copies on different servers D1 D1 D1 D2 D3 storage overhead = 3x D3 D3 D2 D2 3-way replication 3

  5. Erasure Coding ‣ Use less storage space to tolerate the same number of failures ‣ (k,r) Reed-Solomon (RS) code ‣ compute r parity blocks from k data blocks D1 D2 D3 P1 P2 4 (k=3,r=2) RS code

  6. Erasure Coding ‣ Use less storage space to tolerate the same number of failures ‣ (k,r) Reed-Solomon (RS) code ‣ compute r parity blocks from k data blocks D1 D2 D3 P1 storage overhead = 1.67x P2 4 (k=3,r=2) RS code

  7. Reed-Solomon Code ‣ Achieve the optimal storage overhead to tolerate the same number of failures ‣ Typically high cost of reconstruction ‣ need to obtain k blocks to reconstruct one P1 D1 D2 D3 P1 P2 P2 5 (k=3,r=2) RS code

  8. Reed-Solomon Code ‣ Achieve the optimal storage overhead to tolerate the same number of failures ‣ Typically high cost of reconstruction ‣ need to obtain k blocks to reconstruct one P1 D1 D2 D3 P1 3x disk read and network transfer P2 P2 5 (k=3,r=2) RS code

  9. Network Transfer ‣ Minimum-storage regenerating (MSR) codes [Dimakis et al, Trans. IT, 2011] ‣ the optimal storage overhead like RS code ‣ minimize the network transfer during reconstruction 6

  10. Network Transfer ‣ Minimum-storage regenerating (MSR) codes [Dimakis et al, Trans. IT, 2011] ‣ the optimal storage overhead like RS code ‣ minimize the network transfer during reconstruction (k=3,r=2) RS D1 total transfer = 384 MB D2 128 MB D3 D3 128 MB P1 D2 P1 128 MB P2 P2 6 128 MB

  11. Network Transfer ‣ Minimum-storage regenerating (MSR) codes [Dimakis et al, Trans. IT, 2011] download ‣ the optimal storage overhead like RS code a small fraction of data from d ‣ minimize the network transfer during reconstruction helpers (k=3,r=2,d=4) MSR (k=3,r=2) RS D1 D1 64 MB total transfer = 256 MB total transfer = 384 MB D2 D2 128 MB D3 D3 D3 64 MB D2 128 MB P1 D2 64 MB P1 P1 64 MB 128 MB P2 P2 P2 6 128 MB 128 MB

  12. Disk I/O ‣ MSR codes will incur even more disk I/O than RS codes since each helper needs to read all its data to compute a small fraction sent out. (k=3,r=2,d=4) MSR D1 64 MB D3 D2 D3 64 MB read compute transfer D3 64 MB D3 P1 128 MB 64 MB 64 MB P2 7

  13. Can we have erasure codes that save both network transfer and disk I/O during reconstruction? 8

  14. Multiple Failures ‣ Opportunities of fixing multiple failures exists. D1 D1 ‣ correlated failures (disk, switch, power) D2 D3 ‣ periodical check of failures 64MB*4 P1 ‣ reconstruct after a certain number of 64MB*4 failures P2 D3 ‣ Typically, erasure codes like RS and MSR P3 codes fix failures separately. 128 MB total transfer = 512 MB ‣ Coalesce reconstructions can instantly save disk read = 1024 MB disk I/O storage overhead = 2x (k=3,r=3,d=4) MSR 9

  15. Multiple Failures D1 D1 42.7MB*4 D1 D1 D2 D2 D3 D3 64MB*4 P1 P1 42.7MB*2 64MB*4 P2 P2 D3 D3 P3 P3 42.7MB*4 128 MB 128 MB total transfer = 512 MB total transfer = 427 MB disk read = 1024 MB disk read = 512 MB storage overhead = 2x storage overhead = 2x optimal network transfer (k=3,r=3,d=4) MSR 9 [Shum et al, Trans. IT, 2013]

  16. Multiple Failures D1 D1 42.7MB*4 D1 D1 D2 D2 D3 D3 code construction exists 64MB*4 only for limited values of P1 P1 42.7MB*2 parameters 64MB*4 P2 P2 D3 D3 P3 P3 42.7MB*4 128 MB 128 MB total transfer = 512 MB total transfer = 427 MB disk read = 1024 MB disk read = 512 MB storage overhead = 2x storage overhead = 2x optimal network transfer (k=3,r=3,d=4) MSR 9 [Shum et al, Trans. IT, 2013]

  17. Multiple Failures D1 D1 D1 42.7MB*4 42.7MB*4 D1 D1 D1 D2 D2 D2 D3 D3 D3 code construction exists 64MB*4 only for limited values of P1 P1 P1 42.7MB*2 42.7MB*2 parameters 64MB*4 P2 P2 P2 D3 D3 D3 P3 P3 P3 42.7MB*4 42.7MB*4 128 MB 128 MB 128 MB total transfer = 512 MB total transfer = 427 MB total transfer = 427 MB disk read = 1024 MB disk read = 512 MB disk read = 512 MB storage overhead = 2x storage overhead = 2x storage overhead = 2.25x optimal network transfer Beehive (k=3,r=3,d=4) MSR 9 [Shum et al, Trans. IT, 2013]

  18. Contributions ‣ Beehive, a new kind of erasure codes that achieve the optimal network transfer of coalesced reconstructions ‣ with a wide range of system parameters ‣ with marginally additional storage overhead ‣ C++ implementation to demonstrate the performance 10

  19. System Parameters ‣ k: the minimum number of blocks to decode the original data ‣ r: the maximum number of missing blocks to tolerate without hurting data durability/availability ‣ t: the number of failed blocks to reconstruct ‣ d: the number of existing blocks to contact during reconstruction (d ≥ 2k-1) 11

  20. Code Construction

  21. Code Construction (k,r,d) MSR (k-1,r+1) RS ‣ Beehive codes are constructed by combining MSR codes 1 1 and RS codes. k-1 data k data blocks blocks k-1 k-1 k d-k+1 t-1 segments segments

  22. Code Construction (k,r,d) MSR (k-1,r+1) RS ‣ Beehive codes are constructed by combining MSR codes 1 1 and RS codes. k-1 data k data blocks blocks k-1 k-1 k +a k,2 +…+a k,k-1 a k,1 k-1 2 1 k+1 +a k+1,2 +…+a k+1,k-1 a k+1,1 k-1 2 1 r parity r+1 parity blocks blocks k+r +a k+r,2 +…+a k+r,k-1 a k+r,1 k-1 2 1 d-k+1 t-1 segments segments

  23. Code Construction (k,r,d) MSR (k-1,r+1) RS ‣ Beehive codes are constructed by combining MSR codes 1 block 1 1 and RS codes. k-1 data k data blocks blocks k-1 block k-1 k-1 k block k +a k,2 +…+a k,k-1 a k,1 k-1 2 1 block k+1 k+1 +a k+1,2 +…+a k+1,k-1 a k+1,1 k-1 2 1 r parity r+1 parity blocks blocks k+r block k+r +a k+r,2 +…+a k+r,k-1 a k+r,1 k-1 2 1 d-k+1 t-1 segments segments

  24. Code Construction (k,r,d) MSR (k-1,r+1) RS ‣ Beehive codes are constructed by combining MSR codes 1 block 1 1 and RS codes. k-1 data k data blocks ‣ Beehive codes can be blocks decoded as long as k k-1 block k-1 k-1 blocks survive k block k +a k,2 +…+a k,k-1 a k,1 k-1 2 1 block k+1 k+1 +a k+1,2 +…+a k+1,k-1 a k+1,1 k-1 2 1 r parity r+1 parity blocks blocks k+r block k+r +a k+r,2 +…+a k+r,k-1 a k+r,1 k-1 2 1 d-k+1 t-1 segments segments

  25. Code Construction (k,r,d) MSR (k-1,r+1) RS ‣ Beehive codes are constructed by combining MSR codes 1 block 1 1 and RS codes. k-1 data k data blocks ‣ Beehive codes can be blocks decoded as long as k k-1 block k-1 k-1 blocks survive k block k +a k,2 +…+a k,k-1 a k,1 k-1 2 1 ‣ With k+r blocks in total, Beehive codes store t-1 block k+1 k+1 +a k+1,2 +…+a k+1,k-1 a k+1,1 less segments than RS k-1 2 1 codes and MSR codes r parity r+1 parity blocks blocks k+r block k+r +a k+r,2 +…+a k+r,k-1 a k+r,1 k-1 2 1 d-k+1 t-1 segments segments

  26. Code Construction (k,r,d) MSR (k-1,r+1) RS ‣ Beehive codes are constructed by combining MSR codes 1 block 1 1 and RS codes. k-1 data k data blocks ‣ Beehive codes can be blocks decoded as long as k k-1 block k-1 k-1 blocks survive k block k +a k,2 +…+a k,k-1 a k,1 k-1 2 1 ‣ With k+r blocks in total, Beehive codes store t-1 block k+1 k+1 +a k+1,2 +…+a k+1,k-1 a k+1,1 less segments than RS k-1 2 1 codes and MSR codes r parity r+1 parity blocks blocks ‣ storage overhead = k+r block k+r +a k+r,2 +…+a k+r,k-1 a k+r,1 k-1 2 1 ✓ k + r ◆ k + r , k + r ∈ d-k+1 t-1 t − 1 k k − 1 k − d − k + t segments segments

  27. Reconstruction 1 1 2 2 block i 3 3 d d block j d helpers t newcomers 13

  28. Reconstruction 1 1 1 1 2 2 block i 3 3 d d block j d helpers t newcomers 13

  29. Reconstruction + 1 1 1 1 1 2 2 block i 3 3 d d block j d helpers t newcomers 13

  30. Reconstruction + 1 1 1 1 1 2 1 3 d 2 2 block i 3 3 1 2 3 d d d block j d helpers t newcomers 13

  31. Reconstruction 1 i + 2 1 1 1 i 3 + 1 1 1 + 1 d k-1 2 k-1 1 3 d 2 2 block i 3 3 1 2 3 d d d block j d helpers t newcomers 13

  32. Reconstruction 1 i i + 2 1 1 1 i 3 + 1 1 1 + 1 d k-1 2 k-1 1 3 d 2 2 block i 3 3 1 2 3 d d d block j d helpers t newcomers 13

  33. Reconstruction 1 i i + 2 1 1 1 i 3 + 1 1 1 + 1 d k-1 2 k-1 1 3 d 2 2 block i 3 3 j 1 j 2 1 j 3 2 + 1 1 3 d d + d k-1 k-1 d block j d helpers t newcomers 13

  34. Reconstruction 1 i i + 2 1 1 1 i 3 + 1 1 + j j 1 + 1 d k-1 2 k-1 1 3 d 2 2 block i 3 3 j 1 j 2 1 j 3 2 + 1 1 3 + i i d d + d k-1 k-1 d block j d helpers t newcomers 13

Recommend


More recommend