An Explicit, Coupled-Layer Construction of a High-Rate MSR Code Birenjith Sasidharan, Myna Vajha and P. Vijay Kumar Indian Institute of Science, Bangalore Dagstuhl Seminar: Coding Theory in the Time of Big Data August 7-12, 2016
Regenerating Codes - Formal Definition Parameters: ( ( n , k , d ) , ( α, β ) , B , F q ) 1 1 1’ β α 2 2 β Data α 3 Collector β α k k+1 d+1 n n α capacity α capacity nodes nodes ◮ Data to be recovered by connecting to any k of n nodes ◮ Nodes to be repaired by connecting to any d nodes, downloading β symbols from each node; ( d β << file size B ) ◮ Focus here is on exact repair
The Storage-Repair Bandwidth Tradeoff The upper bound on file size: k � B ≤ min { α, ( d − i + 1) β } (multiple ( α, β ) pairs can achieve bound) i =1 ◮ Tradeoff curve drawn for fixed ( k , d ) , B . ◮ Extreme points: MSR & MBR ◮ MSR=Minimum Storage Regenerating α = ( d − k + 1) β ◮ Focus here is on the MSR point ( k , d ) = (120 , 129) , B = 725360
MSR Codes MSR Codes with All-Symbol Node Repair. α β Explicit n k d Product Matrix n k ≥ 2 k − 2 d − k + 1 1 Y 2 k +1 2 k Hadamard n n − 2 n − 1 Y ( n − k ) k +1 ( n − k ) k Mod Zig-Zag n k n − 1 N Cadambe et.al n k d N → ∞ → ∞ q t q t − 1 Sasidharan et.al qt q ( t − 1) n − 1 N q t q t − 1 Rawat et.al qt q ( t − 1) d N MSR Codes with Systematic-Node Repair. n k d α β Explicit MISER ≥ 2 k − 1 d − k + 1 1 n k Y ( n − k ) k ( n − k ) k − 1 Zig-Zag n − 1 n k N 2 α O ( k 2 ) n − k ) n − k k Poly MDS 3 n ≤ k ≤ n n − 1 ( Y n − k ( n − k ) k α Goparaju et.al n k d N n − k 1. Hadamard design based MSR code paper by Dimakis et.al also gives a non explicit msr code with high probability for ( n , k , n − 1 , ( n − k ) k +1 , ( n − k ) k ). 2. Zig-Zag Codes have explicit coefficient assignment for n − k = 2 and n − k = 3.
Recent Constructions α β Sys/All Explicit n k d k α Raviv et.al n − 2 , n − 3 n − 1 ( n − k ) Sys n Y r n − k ( n − k ) n ( n − k ) n − 1 Ye & Barg n − 1 All n k Y q t q t − 1 q ( t − 1) n − 1 All Ye & Barg qt Y q t q t − 1 q ( t − 1) n − 1 All Coupled Layer qt Y Scalar MDS Code with efficient repair bandwidth. n k d α β Sys/All Explicit log 2 ( n − 1 Guruswami et.al n k n − 1 1 n − k ) bits All Y Vector MDS Codes with efficient repair bandwidth. n k d α β 2 m , 4 m , Piggybacked RS ( m ≥ 1) n k d (2( n − k ) − 3) m (non-unif) β All Y (1 + 1 α ( n − k ) p Guruswami et.al ( p ≥ 1) qt q ( t − 1) n − 1 p ) All Y n − k
Parameters of Construction: ( n , k , d , ( α, β )) We adopt the same parameters introduced first in [1] . For t ≥ 2: n tq k ( t − 1) q ( n − 1) d r := ( n − k ) q q t α q t − 1 β t − 1 ≥ 1 Rate = t 2 (field size was large to accommodate data collection) [1] B. Sasidharan, G. K. Agarwal, PVK, “A High-Rate MSR Code With Polynomial Sub-Packetization Level,” ISIT 2015
(Present) Coupled-Layer Construction Code Parameters Code Parameter Value Block Length n qt , q ≥ 2 , t ≥ 2 ( n − k ) := r q k qt − q d ( n − 1) q t α q t − 1 β Field Size Q n (caution: Q , not q , is the field size!) r n / r α = r k / r = r n / r lower bound r
The Parity-Check Equations of [1]: Difficulty in Data Recovery Row-Sum Parity-Checks: For z ∈ Z t q , � � A ( x , y ; z ) = 0 . y ∈ [ t ] x ∈ Z q Jump Parity-Checks (for each 1 ≤ λ ≤ ( q − 1) and z ∈ Z t q ): � � � θ λ ( x , y ) A ( x , y ; z ) + cA ( y , z y ; ( z − λ e y ) ) = 0 . � �� � y ∈ [ t ] x ∈ Z q y ∈ [ t ] jump in y th position by λ ◮ The coupling with other planes depended on λ . ◮ The construction was non-explicit; the coefficient c was shown to exist in a large enough field.
The Parity-Check Equations of [1]: Resolving the Issue ◮ Question was to find an explicit assignment for c . ◮ We adopted a sequential decoding approach. ◮ This led to the need for proving the invertibility of data-recovery matrices of the form (but with larger number of sub-blocks): 1 1 1 θ 1 θ 2 θ 3 c θ 2 θ 2 θ 2 1 2 3 D = 1 1 1 θ 1 θ 2 θ 3 θ 2 θ 2 θ 2 c 1 2 3 This provide difficult, but this was resolved by altering the amount of coupling leading instead to: 1 1 1 u θ 1 θ 2 θ 3 u θ 2 θ 2 θ 2 θ 2 u θ 2 D ′ 1 2 3 2 = u 1 1 1 u θ 1 θ 1 θ 2 θ 3 u θ 2 θ 2 θ 2 θ 2 1 1 2 3
Coupled-Layer MSR Code: Example Parameters Chosen for Illustration α β Rate Field size , Q q t n k d 2 3 2 2 2 3 6 4 5 2 / 3 Q ≥ 6
Shortening for Other Parameters Can shorten the code to achieve other parameters ( n , k , d ) , ( α, β ) ⇒ ( n − δ, k − δ, d − δ ) , ( α, β )
The Data Cube ( q = 2 , t = 3) ◮ The data cube is a 3-D array A ( x , y ; z ) of code symbols. ◮ ( x , y ) ∈ ( Z q × [ t ]) used to identify a node. ◮ z = ( z 1 , z 2 , · · · , z t ) used to index a plane. y y=1 2 3 X x=0 Z 1 The plane z = (1 , 0 , 0) identified by placement of red dots. Data cube for q = 2, t = 3. It has 6 nodes each with 2 3 = 8 symbols .
Parity-Check Equations and the Pairwise Coupling For z ∈ Z qt , 0 ≤ λ ≤ ( q − 1), we have that � � θ λ ( x , y ) B ( x , y ; z ) = 0 , y ∈ [ t ] x ∈ Z q and the code symbols A ( x , y ; z ) are given from the B ( x , y ; z ) by: � 1 � � � − 1 � � A ( x , y ; z ) B ( x , y ; z ) u = . A ( z y , y ; x , z ∼ y ) u 1 B ( z y , y ; x , z ∼ y ) ( x , z ∼ y ) ⇒ vector obtained by replacing y th symbol of z by x
Parity-Check Equations and the Pairwise Coupling For z ∈ Z qt , 0 ≤ λ ≤ ( q − 1), we have that � � θ λ ( x , y ) B ( x , y ; z ) = 0 , y ∈ [ t ] x ∈ Z q and the code symbols A ( x , y ; z ) are given from the B ( x , y ; z ) by: � 1 � � � − 1 � � A ( x , y ; z ) B ( x , y ; z ) u = . A ( z y , y ; x , z ∼ y ) u 1 B ( z y , y ; x , z ∼ y ) ( x , z ∼ y ) ⇒ vector obtained by replacing y th symbol of z by x � 1 � � � � � B ( x , y ; z ) A ( x , y ; z ) u = , B ( z y , y ; x , z ∼ y ) u 1 A ( z y , y ; x , z ∼ y )
An Example: q = 2 , t = 3 ( n = 6 , k = 4 , d = 5 , α = 8 , β = 4) y X ◮ For every plane z , there are linear parity-check equations binding all symbols on z , and symbols from certain other Z planes.
Coupling and Decoupling of Symbols Across Planes y X Coupling: A 1 A 2 � B 1 � 1 � � A 1 � � u = Z B 2 u 1 A 2 Decoupling: � A 1 � 1 � − 1 � B 1 � � u = A 2 u 1 B 2 Coupling of symbols ( A 1 , A 2 ) are a coupled pair.
Encoding the Coupled-Layer MSR Code q = 2 , t = 3 data 0 . . . data 7 RS_ENCODER . b-code . . . n RS_ENCODER � n = 6, k = 4, � = 8 code 0 . . . code 7 b-code 1 6 9 12 9 1 6 12 coupling 2 4 10 11 engine 2 4 10 11 3 5 7 8 3 5 7 8 ◮ Encoding involves α = 8 parallel calls to a [6 , 4] Reed Solomon encoder in parallel. ◮ Number of pairs of symbols that are coupled = t ( q − 1) α = 12 . 2 ◮ Coupling involving 1 multiplication and 1 addition per code symbol.
Repairing the Coupled-Layer MSR Code y X ◮ The node (1 , 1) on extreme left has failed. Z ◮ Data from pink planes are transmitted during repair. ◮ Repair can be done in q t − 1 = 4 parallel operations, each involving a ( q × q ) = (2 × 2) matrix inversion.
Repairing the Coupled-Layer MSR Code 0 RS_DECODER 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 . 1 1 . . 2 2 3 3 4 4 5 5 6 6 6 RS_DECODER 1. Repair of node-1: Node-1 to be repaired 2. Repair operation can be performed β = 4 instances of RS decoding in parallel.
Data Collection and Erasures 1. The task of data collection is to recover the data from k nodes 2. Equivalently, one must recover the data following n − k = q erasures 3. q = 2 in our example We assume a given erasure pattern E of q nodes.
A Sequential Approach to Data Collection Erasures are indicated by a unfilled circle. The intersection score σ of a plane for given erasure pattern E is the number of dots in the plane that correspond to erased nodes. Example planes with σ = 0 , 1 , 2 respectively are shown below: y=1 2 3 y=1 2 3 y=1 2 3 x=0 x=0 x=0 1 1 1 1. Decode erased symbols in a plane-by-plane manner. 2. The planes are selected in the order of increasing intersection score σ 3. Each plane is decoded using a scalar MDS (RS) code decoder
A Sequential Decoding Algorithm Input set of erased nodes E Compute maximum intersection score � max Label planes with intersection scores s = 0 Decode symbols (mixture of A’s and B’s) YES from plane Z such that � (Z, E) = s by invoking SC-MDS-DEC and using previously decoded symbols NO s ≤ � max Transform B symbols EXIT to A symbols s = s + 1
Recommend
More recommend