Hierarchical Codes: How to Make Erasure Codes Attractive for Peer-to-Peer Storage Systems Alessandro Duminuco and Ernst Biersack EURECOM Sophia Antipolis, France (Best paper award in P2P'08) Presented by: Amir H. Payberah amir@sics.se Hierarchical Code, 11 Nov. 2008 1
What's The Problem? Hierarchical Code, 11 Nov. 2008 2
What Is The Problem? Does file backup fit the P2P model? Hierarchical Code, 11 Nov. 2008 3
Churn and Redundancy • The challenge in P2P model is to provide storage reliability under churn. • The key solution is to add redundancy to the data. Hierarchical Code, 11 Nov. 2008 4
The Basic Solution: Replication • With 4 replicas, even if 3 peers are offline we still have the file. • Every file consumes storage for 4 times its size!! file Hierarchical Code, 11 Nov. 2008 5
A Better Solution: Coding • Any k fragments are sufficient to reconstruct the file: We can sustain any h losses. • Every file consumes storage for (k+h)/k times its size: If k=6 and h=3, (k+h)/k=1.5 .... Instead of 4!! Dissemination Coding file k k + h Hierarchical Code, 11 Nov. 2008 6
Repair Communication Cost • Replication • Coding ... a repair means combining k fragments into a new one. k • To create a single fragment we must transfer k fragments, i.e. the size-equivalent of the whole file!! Hierarchical Code, 11 Nov. 2008 7
Storage vs Repair Cost • If we want to sustain 10 losses: Repair Cost makes coding unattractive. Hierarchical Code, 11 Nov. 2008 8
Motivation Can we mitigate the repair cost of coding Can we mitigate the repair cost of coding while retaining storage efficiency? while retaining storage efficiency? Hierarchical Code, 11 Nov. 2008 9
Efficiency Metrics • Redundancy factor β = |S| / |O| |S|: size of the stored data. |O|: size of the original data. • Repair degree The amount of data read with respect to the amount of new redundant data created. Denoted as d . Hierarchical Code, 11 Nov. 2008 10
Efficiency Analysis • Replication β = R d = 1 • Block replication β = R d = 1 • Erasure codes β = (k + h) / k d = k Hierarchical Code, 11 Nov. 2008 11
Linear Codes • A specific implementation of erasure codes. • f i : i th fragment b7 • b i :i th fragment b6 • c i,j : coefficient s b5 f4 b4 (k-h)-code f3 b3 f2 b2 f i i ≤ k b i = f1 b1 k k + h ∑(c i,j X f j ) k < i ≤ k + h • Any 4 of these 7 fragments can reconstruct the original file if the coefficients are linearly independent. (will be back to it later) • Repair degree d = k Hierarchical Code, 11 Nov. 2008 12
Hierarchical Code • Additional fragments can be linear combinations of a subset of the original ones. b7 b6 b5 f4 b4 f3 b3 f2 b2 f1 b1 k k + h • Not all the subsets of 4 fragments are sufficient to reconstruct the file. • The repair cost varies accordingly to the particular fragments that are available (we can have d < k). Hierarchical Code, 11 Nov. 2008 13
Comparison b7 b7 b6 b6 b5 b5 f4 b4 f4 b4 f3 b3 f3 b3 f2 b2 f2 b2 f1 b1 f1 b1 k k + h k k + h Hierarchical Code, 11 Nov. 2008 14
Generalizing The Concept • If we take a 64+64 traditional linear code and we apply the same idea hierarchically... • If we set the hierarchy differently we obtain a different trade-off. Hierarchical Code, 11 Nov. 2008 15
Experiments Hierarchical Code, 11 Nov. 2008 16
Synthetic Data • An event-driven simulator. • They compared a 64+64 Reed-Solomon code (linear code) with one instance of a 64+64 Hierarchical code. • They generated synthetic peer behavior with exponentially distributed uptimes, downtimes and lifetimes. • As a general rule, the smaller is the up-ratio the higher the number of repairs. Hierarchical Code, 11 Nov. 2008 17
Synthetic Data Results Hierarchical Code, 11 Nov. 2008 18
Real Data • PlanetLab traces consist in 669 nodes monitored for 500 days. • KAD traces consist in the availability of about 6500 peers in the KAD network for about 5 months. Hierarchical Code, 11 Nov. 2008 19
Real Data Results • PlanetLab • KAD Hierarchical Code, 11 Nov. 2008 20
Conclusion Hierarchical Code, 11 Nov. 2008 21
Conclusion • They proposed a new class of erasure codes called Hierarchical Codes. • They aim at coupling the communication efficiency of replication with the storage efficiency of coding. • Experiments showed that Hierarchical Codes require more repairs, but those repairs are so cheap that the resulting communication cost is smaller. Hierarchical Code, 11 Nov. 2008 22
More Detail About Coding Hierarchical Code, 11 Nov. 2008 23
Linear Codes • f i : i th fragment • b i :i th fragment f i i ≤ k b i = ∑(c i,j X f j ) k < i ≤ k + h • c i,j : coefficient s • Any 4 of these 7 fragments can reconstruct the original file if the coefficients are linearly independent. b7 b6 b5 f4 b4 f3 b3 f2 b2 f1 b1 k k + h Hierarchical Code, 11 Nov. 2008 24
Linear Codes B = C' F F = S -1 B s • If any sub-matrix S built using k rows from C' is invertible, then the original fragments can be always reconstructed by F = S −1 B s. • B S : The k-long subvector of B, corresponding to the coefficients chosen in S. • If this property is satisfied, the code obtained is a (k,h)-code. Hierarchical Code, 11 Nov. 2008 25
Coefficient Matrix • Reed-Solomon Codes • Random Linear Codes Hierarchical Code, 11 Nov. 2008 26
Reed-Solomon • I k, k : Indentity matrix. • C h, k : Coefficient Matrix. I B = F = C' F C • If k = 2 and h = 3 1 0 f 1 0 1 f 2 I f 1 c 1,1 c 1,2 c 1,1 f 1 + c 1,2 f 2 B = F = = C f 2 c 2,1 c 2,2 c 2,1 f 1 + c 2,2 f 2 c 3,1 c 3,2 c 3,1 f 1 + c 3,2 f 2 Hierarchical Code, 11 Nov. 2008 27
Reed-Solomon Codes • Define the matrix C as a h × k Vandermonde matrix. • c i,j = a i j-1 Hierarchical Code, 11 Nov. 2008 28
Reed-Solomon Codes • k = 2 • h = 3 • c i,j = j i-1 1 0 f 1 0 1 f 2 I f 1 1 1 f 1 + f 2 B = F = = C f 2 1 2 f 1 + 2f 2 1 3 f 1 + 3f 2 Hierarchical Code, 11 Nov. 2008 29
Reed-Solomon Codes 1 0 f 1 0 1 f 2 I f 1 1 1 f 1 + f 2 B = F = = C f 2 1 2 f 1 + 2f 2 1 3 f 1 + 3f 2 1 0 1 0 f 1 f 1 S -1 B s = = = F S = 1 3 -1/3 1/3 f 1 + 3f 2 f 2 Hierarchical Code, 11 Nov. 2008 30
Random Linear Code • It is shown that a k × k random matrix S in GF(2 q ) is invertible with a probability which depends only on the field size and will increase by the size increasing. GF(2 q ): Galois Field, where the elements can be expressed by q-bit words. • If q ≥ 16, the probability can be considered practically 1. • This means that any k × k sub-matrix of C' is invertible and that the property of a (k,h)-code is provided. Hierarchical Code, 11 Nov. 2008 31
Information Flow Graph (Code Graph) • Represents the evolution of the stored data through time. F B 1 B t-1 B t b3 b3 b3 b2 b2 b2 f2 ... f1 b1 b1 b1 0 1 t-1 t Hierarchical Code, 11 Nov. 2008 32
Information Flow Graph (Code Graph) F B 1 B t-1 B t b3 b3 b3 f2 b2 b2 b2 ... f1 b1 b1 b1 0 1 t-1 t • Proposition 1: At any time t, any of all the possible selections of k nodes B t k is sufficient to reconstruct the original fragments only if the disjoint paths condition is provided at time step t = 1 and the repair degree d ≥ k. • A Random linear code provides this condition By design any node in B1 is connected to all the source nodes in F. Hierarchical Code, 11 Nov. 2008 33
Block Replication vs Linear Codes • k = 8, h = 16 and R = 3 • Block replication: d = 1 • Linear codes: d = k Hierarchical Code, 11 Nov. 2008 34
Question? Is there a design space between these two Is there a design space between these two limits that can be explored to find a better limits that can be explored to find a better trade-off between storage efficiency and trade-off between storage efficiency and repair degree? repair degree? Hierarchical Code, 11 Nov. 2008 35
Hierarchical Codes Hierarchical Code, 11 Nov. 2008 36
Hierarchical Code Graph – Step 1 • Choose k 0 and h 0 and build (k 0 , h 0 )-code: f i i ≤ k b i = ∑(c i,j X f j ) k < i ≤ k + h • k 0 = 2 • h 0 = 1 f2 b2 G 2,1 b3 f1 b1 Hierarchical (2, 1)-code • The generated group denoted as G d0,1 , where d 0 = k 0 . Hierarchical Code, 11 Nov. 2008 37
Hierarchical Code Graph – Step 2 • Choose g 1 and h 1 . • Replicate G d0,1 for g 1 times. g 1 groups denoted as G d0,1, ..., G d0,g . f4 b4 G 2,1 b6 • Then add other h 1 redundant blocks. f3 b3 Combining all the existing g 1 k 0 original G 4,1 b7 fragments F. • The new group denoted as G d1,1 , f2 b2 G 2,2 Hierarchical (d 1 , H 1 )-code, b5 H 1 = g 1 h 0 + h 1 f1 b1 d 1 = g 1 k 0 = g 1 d 0 • g 1 = 2 Hierarchical (4, 3)-code • h 1 = 1 Hierarchical Code, 11 Nov. 2008 38
Hierarchical Code Graph – Step 3 • Repeat Step 2 several times. • Hs = g s H s-1 + h s • ds = g s d s-1 Hierarchical Code, 11 Nov. 2008 39
Recommend
More recommend