Locally decodable codes: from computational complexity to cloud computing Sergey Yekhanin Microsoft Research
Error-correcting codes: paradigm π π π π β πΊ 2 E(π) β πΊ 2 E(π) +noise 0110001 011000100101 01*00*10010* 0110001 Encoder Channel Decoder Corrupts up to π coordinates. The paradigm dates back to 1940s (Shannon / Hamming)
Local decoding: paradigm π π E(π) +noise E(π) β πΊ 2 π π π β πΊ 2 0110001 011000100101 01*00*10010* 1 Local Encoder Channel Decoder Corrupts up to π Reads up to π coordinates. coordinates. Local decoder runs in time much smaller than the message length! β’ First account: Reedβs decoder for Mullerβs codes (1954) β’ Implicit use: (1950s-1990s) β’ Formal definition and systematic study (late 1990s) [Levinβ 95, STVβ 98, KTβ 00] ο§ Original applications in computational complexity theory ο§ Cryptography ο§ Most recently used in practice to provide reliability in distributed storage
Local decoding: example E(X) X 1 X 2 X 3 X X 1 X 2 X 3 X 1 ο X 2 X 1 ο X 3 X 2 ο X 3 X 1 ο X 2 ο X 3 Message length: k = 3 Codeword length: n = 7 Corrupted locations: π = 3 Locality: π = 2
Local decoding: example E(X) X 1 X 2 X 3 X X 1 X 2 X 3 X 1 ο X 2 X 1 ο X 3 X 2 ο X 3 X 1 ο X 2 ο X 3 Message length: k = 3 Codeword length: n = 7 Corrupted locations: π = 3 Locality: π = 2
Locally decodable codes π β πΊ π is π -locally decodable, if for every message π , Definition: A code E: πΊ π π each π π can be recovered from reading some π symbols of πΉ(π) , even after up to π coordinates of πΉ(π) are corrupted. (Erasures.) Decoder is aware of erased locations. Output is always correct. β’ (Errors.) Decoder is randomized. Output is correct with probability 99%. β’ k symbol message Decoder reads only r symbols 0 1 β¦ 0 1 n symbol codeword Noise 0 0 1 0 1 β¦ 0 1 1 0 1 0 β¦ 0 1
Locally decodable codes Goal: Understand the true shape of the tradeoff between redundancy π β π and locality π , for different settings of π (e.g., π = ππ, π π , π 1 . ) π π π Multiplicity codes Local Projective reconst- (log π) π Reed Muller geometry ruction codes codes codes Matching π(1) vector codes π π π(1) ππ π Taxonomy of known families of LDCs
Plan β’ Part I: (Computational complexity) β’ Average case hardness β’ An avg. case hard language in EXP (unless EXP β BPP) β’ Construction of LDCs β’ Open questions β’ Part II: (Distributed data storage) β’ Erasure coding for data storage β’ LDCs for data storage β’ Constructions and limitations β’ Open questions
Part I: Computational complexity
Average case complexity A problem is hard-on-average if any efficient algorithm errs on 10% of the inputs. β’ Establishing hardness-on-average for a problem in NP is a major problem. β’ Below we establish hardness-on-average for a problem in EXP, assuming EXP β BPP. β’ Construction [STV]: πΉ(π) π π β πΊ 2 π E: πΊ 2 Level π is 1 1 0 1 0 1 1 1 1 0 0 0 π = ππππ§ π , a string π of 1 0 length 2 π π = (log π) π , π = π/10. π β² is in EXP π is EXP-complete Theorem: If there is an efficient algorithm that errs on <10% of π β² ; then EXP β BPP.
Average case complexity Theorem: If there is an efficient algorithm that errs on <10% of π β² ; then EXP β BPP. Proof: We obtain a BPP algorithm for π : Let A be the algorithm that errs on <10% of π β² ; β’ A gives us access to the corrupted encoding πΉ(π) . To decide if π π invoke the local decoder for πΉ(π) . β’ Time complexity is (log 2 π ) π β ππππ§ π = ππππ§ π . β’ Output is correct with probability 99%. β’ πΉ(π) π π β πΊ 2 0 π 1 0 E: πΊ 2 1 0 1 1 1 1 0 1 1 0 0 π = ππππ§ π , π = (log π) π , π = π/10. π β² is in EXP π is EXP-complete
Reed Muller codes Parameters: π, π, π = 1 β 4π π. β’ Codewords: evaluations of degree π polynomials in π variables over πΊ π . β’ π π π¨ 1 , β¦ , π¨ π , deg f < π yields a codeword: π(π¦ ) π¦ βπΊ Polynomial π β πΊ β’ π Parameters: π = π π , π = π + π , π = π β 1, π = ππ. β’ π
Reed Muller codes: local decoding Key observation: Restriction of a codeword to an affine line yields an β’ q , m , d . evaluation of a univariate polynomial π π of degree at most π. To recover the value at π¦ : β’ β Pick a random affine line through π¦ . β Do noisy polynomial interpolation. π πΊ π π¦ Locally decodable code: Decoder reads π β 1 random locations. β’
Reed Muller codes: parameters π = π + π π = π π , , π = 1 β 4π π, π = π β 1, π = ππ. π Setting parameters: 1 π β1 . q = π 1 , π β β: π = π 1 , π = exp π β’ Better codes are q = π 2 βΆ π = (log π) 2 , π = ππππ§ π . β’ known q β β, π = π 1 : π = π π , π = π π . β’ Reducing codeword length is a major open question.
Part II: Distributed storage
Data storage β’ Store data reliably β’ Keep it readily available for users
Data storage: Replication β’ Store data reliably β’ Keep it readily available for users β’ Very large overhead β’ Moderate reliability β’ Local recovery: Lose one machine, access one
Data storage: Erasure coding β’ Store data reliably β’ Keep it readily available for users β¦ β’ Low overhead β¦ β¦ β’ High reliability β’ No local recovery: Loose one machine, access k k data chunks n-k parity chunks Need: Erasure codes with local decoding
Codes for data storage β¦ β¦ P 1 P n-k X 1 X 2 X k β’ Goals: β’ (Cost) minimize the number of parities. β’ (Reliability) tolerate any pattern of h+1 simultaneous failures. β’ (Availability) recover any data symbol by accessing at most r other symbols β’ (Computational efficiency) use a small finite field to define parities.
Local reconstruction codes β’ Def: An (r,h) β Local Reconstruction Code (LRC) encodes k symbols to n symbols, and β’ Corrects any pattern of h+1 simultaneous failures; β’ Recovers any single erased data symbol by accessing at most r other symbols.
Local reconstruction codes β’ Def: An (r,h) β Local Reconstruction Code (LRC) encodes k symbols to n symbols, and β’ Corrects any pattern of h+1 simultaneous failures; β’ Recovers any single erased data symbol by accessing at most r other symbols. π β’ Theorem[GHSY]: In any (r,h) β (LRC), redundancy n-k satisfies π β π β₯ π + β.
Local reconstruction codes β’ Def: An (r,h) β Local Reconstruction Code (LRC) encodes k symbols to n symbols, and β’ Corrects any pattern of h+1 simultaneous failures; β’ Recovers any single erased data symbol by accessing at most r other symbols. π β’ Theorem[GHSY]: In any (r,h) β (LRC), redundancy n-k satisfies π β π β₯ π + β. β’ Theorem[GHSY]: If r | k and h<r+1; then any (r,h) β LRC has the following topology: Light β¦ parities L 1 L g Local group β¦ β¦ β¦ X k-r Data symbols X 1 X r X k Heavy β¦ H 1 H h parities
Local reconstruction codes β’ Def: An (r,h) β Local Reconstruction Code (LRC) encodes k symbols to n symbols, and β’ Corrects any pattern of h+1 simultaneous failures; β’ Recovers any single erased data symbol by accessing at most r other symbols. π β’ Theorem[GHSY]: In any (r,h) β (LRC), redundancy n-k satisfies π β π β₯ π + β. β’ Theorem[GHSY]: If r | k and h<r+1; then any (r,h) β LRC has the following topology: Light β¦ L 1 parities L g Local group β¦ β¦ β¦ X k-r Data symbols X 1 X r X k Heavy β¦ H 1 H h parities β’ Fact: There exist (r,h) β LRCs with optimal redundancy over a field of size k+h.
Reliability Set k=8, r=4, and h=3. L 1 L 2 X 5 X 1 X 2 X 3 X 6 X 4 X 7 X 8 H 2 H 1 H 3
Reliability Set k=8, r=4, and h=3. L 1 L 2 X 5 X 1 X 2 X 3 X 6 X 4 X 7 X 8 H 2 H 1 H 3 β’ All 4-failure patterns are correctable.
Reliability Set k=8, r=4, and h=3. L 1 L 2 X 5 X 1 X 2 X 3 X 6 X 4 X 7 X 8 H 2 H 1 H 3 β’ All 4-failure patterns are correctable. β’ Some 5-failure patterns are not correctable.
Reliability Set k=8, r=4, and h=3. L 1 L 2 X 5 X 1 X 2 X 3 X 6 X 4 X 7 X 8 H 2 H 1 H 3 β’ All 4-failure patterns are correctable. β’ Some 5-failure patterns are not correctable. β’ Other 5-failure patterns might be correctable.
Reliability Set k=8, r=4, and h=3. L 1 L 2 X 5 X 1 X 2 X 3 X 6 X 4 X 7 X 8 H 2 H 1 H 3 β’ All 4-failure patterns are correctable. β’ Some 5-failure patterns are not correctable. β’ Other 5-failure patterns might be correctable.
Combinatorics of correctable failure patterns Def: A regular failure pattern for a (r,h)-LRC is a pattern that can be obtained by failing one symbol in each local group and h extra symbols. L 1 L 2 L 1 L 2 X 3 X 4 X 7 X 3 X 4 X 7 X 1 X 2 X 5 X 6 X 8 X 1 X 2 X 5 X 6 X 8 H 1 H 2 H 3 H 1 H 2 H 3 Theorem: Every failure pattern that is not dominated by a regular failure pattern is not β’ correctable by any LRC. There exist LRCs that correct all regular failure patterns. β’
Recommend
More recommend