Coding for loss tolerant systems Workshop APRETAF, 22 janvier 2009 Mathieu Cunche, Vincent Roca INRIA, équipe Planète INRIA Rhône-Alpes – Mathieu Cunche, Vincent Roca
The erasure channel Erasure codes Reed-Solomon codes LDPC codes Application to distributed storage 2
The erasure channel erasure channel 0 0 o definition: Erased ! a symbol either arrives to the destination, without any error… 1 1 … or is erased and never received ≠ BSC (binary symmetric) and AWGN channels… o the integrity assumption is a strong hypothesis o a received symbol is 100% guaranteed error free 3
The erasure channel where do we find erasure channels? o On the Internet o Because of routing error, congestion o Because of bad CRC/checksum o On wireless and satelitte networks o intermittent connection due to obstacles o Distributed storage o disk failure in RAID systems o node failure in a data center o Distributed computation o Fail stop 4
The erasure channel Erasure codes Reed-Solomon codes LDPC codes Application to distributed storage 5
Erasure codes o k sources symbols, encoded into n encoding symbols before encoding k o Code rate = = n after encoding o Close to 1 => little redundancy o Close to 0 => high amount of redundancy Symbol erasure Transmission k source symbols Source object Decoded object Encoding Decoding (n-k) repair symbols 6
Erasure codes Often used as AL-FEC codes o “Application Level - Forward Error Correction” codes AL-FEC differ from Physical-layer FEC codes o PHY codes: o correct bit errors, and if not possible detect the errors o Symbol = bit o AL-FEC: o recover from symbol erasures o Symbol = byte, IP datagram, file chunck 7
Erasure codes how can we define good erasure codes? performance metrics for erasure codes o erasure recovery capabilities o main metric, measured as the overhead ratio: decoding _ overhead #_ of _ symbols _ required _ for _ decoding 1 k o decoding needs (1+overhead)*k symbols to succeed, whereas ideal (MDS) codes need only k symbols o encoding and decoding speed o to appreciate the complexity o required memory during encoding and decoding 8
The erasure channel Erasure codes Reed-Solomon codes LDPC codes Application to distributed storage 9
Reed Solomon codes In short o Discovered by Reed & Solomon in 1959 o Linear codes over GF(2 n ) o Sum : simple binary XOR o Multiplication and Division: use a logarithmic table o Based on polynomial interpolation o Practical implementation with Vandermonde matrix o any k×k submatrix of a Vandermonde is invertible 10
Reed Solomon codes Encoding o Matrix vector multiplication X × G = Y × = Encoded vector: Source vector: n encoded symbols Generator matrix: k source symbols k x n Vandermonde o Complexity O(k 2 ) operations 11
Reed Solomon codes Decoding o Solve a linear system X × G’ = Y’ × = Received vector: Source vector: kxk submatrix of G k received symbols k source symbols (invertible) o Good VDM property: any kxk submatrix is invertible o k encoding symbols are enough to decode o Decoding overhead = 0, said differently RS are MDS o Complexity O(k 3 ) 12
Reed Solomon codes: summary Perfect codes o Decoding overhead = 0 o Decoding possible as soon as k symbols are received … but limited scalability o n<255 GF(2 8 ) is sufficient o Fast operation over GF(2 8 ), (small logarithmic table) o Decoding speed = a few 10 Mbps o n>255, use GF(2 16 ) or more o Log table too large, cannot fit in cache o Decoding speed falls = a few Mbps 13
The erasure channel Erasure codes Reed-Solomon codes LDPC codes Application to distributed storage 14
LDPC codes in short o “Low Density Parity Check” (LDPC) o linear block codes o Sparse parity check matrix o discovered by Gallager in the 60’s, re -discovered in mid-90s o In general encoding require to solve a linear system O(k 3 ) o but high performance, lightweight variants exist o in the remaining we focus on a binary LDPC o Based on XOR operations 15
LDPC codes LDPC-staircase codes (RFC 5170) o a simple (trivial) parity check matrix structure Source symbols Parity symbols S 1 S 2 S 3 S 4 S 5 P 1 P 2 P 3 P 4 P 5 0 0 1 1 0 1 0 0 0 0 Constraints S 1 S 4 S 5 P 1 P 2 = 0 1 0 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 0 0 0 1 0 1 1 0 0 1 1 0 1 1 1 0 1 0 0 0 1 1 o A.K.A. double diagonal or Repeat Accumulate codes o high encoding speed (encoding is trivial) o recovery capabilities can be made close to ideal codes 16
LDPC codes Encoding S 1 S 2 S 3 S 4 S 5 P 1 P 2 P 3 P 4 P 5 S 3 S 4 P 1 0 0 1 1 0 1 0 0 0 0 =0 P 2 S 1 S 4 S 5 P 1 =0 1 0 0 1 1 1 1 0 0 0 P 3 S 1 S 2 S 3 P 2 =0 1 1 1 0 0 0 1 1 0 0 S 2 S 4 S 5 P 3 P 4 =0 0 1 0 1 1 0 0 1 1 0 P 5 S 1 S 2 S 3 S 5 P 4 =0 1 1 1 0 1 0 0 0 1 1 S 1 S 4 S 5 P 1 P 2 =0 o Linear complexity O(k) Decoding o solve a system of linear equations o Several techniques are feasible… 17
LDPC codes Sol.1: Iterative Decoding (ID) o If an equation has only one unknown variable, this latter is equal to the sum of the others. Reiterate … o Efficient thanks to the sparsness of the parity check matrix o Pros: Low complexity (linear O(k)) o Low CPU load and high sustainable bandwidth o Cons: Suboptimal in terms of correction capabilities o Some full rank systems cannot be solved Overhead for a failure proba ≤ 10 -4 code rate Average overhead (k=1000,N1=3) 2/3 (=0.66) 9.99 % 13.93 % 2/5 (=0.4) 17.13 % 22.91 % 18
LDPC codes Sol.2: Maximum Likelihood(ML) decoding o Solve a linear system (Gaussian Elimination, LU decomposition …) xA = b Information of the Missing symbols Submatrix of the received symbols Generator matrix o Excellent erasure correction capabilities Overhead for a failure proba ≤ 10 -4 code rate Average overhead (k=1000,N1=5) 2/3 (=0.66) 0.63 % 2.21 % 2/5 (=0.4) 2.04 % 4.41 % o High complexity: O(k 3 ) 19
Some more details on LDPC codes considered Sol. 3: Hybrid ID/ML scheme o Hybrid decoder o start decoding with ID (fast) o finish with ML if necessary (optimal) o excellent erasure correction capabilities… o … while remaining very fast 20
LDPC codes Decoding speed of the hybrid decoder o LDPC-staircase (N1=5), code rate 2/3, k=1,000 o Reed Solomon over GF(2 8 ) 32.4 times faster than RS ML needed more (1.7 Gbps) and more often sustainable ID sufficient decoding speed still 10.2 times faster (Mbps) (500 Mbps) with RS: 54Mbps loss probability(%) 21
The erasure channel Erasure codes Reed-Solomon codes LDPC codes Application to distributed storage 22
Application to distributed storage Using replication : • A file partitionned into 8 blocks • Each block is replicated 4 times Client_1 2 4 1 3 1 2 5 8 1 3 4 6 6 8 3 4 6 7 6 7 2 5 2 3 1 4 7 8 5 7 5 8 Client_2 Can tolerate up to 3 failures 23
Application to distributed storage Using erasure codes: • A file encoded into 32 blocks: Client_1 8 source blocks 24 repair blocks 1 2 E F 3 4 M N G H A B I J O P C D K L Q R U V 5 6 S T W X 7 8 Client_2 Can tolerate up to 6 failures, since 8 blocks are enough to decode 24
Conclusion Erasure codes o Add redundancy to combat symbol erasures Reed-Solomon o Perfect codes (MDS), but inefficient for large objects LDPC codes o Can encode large objects o Corrections capabilities close to MDS o High encoding and decoding speed 25
Questions ?
Recommend
More recommend