locally decodable codes
play

Locally decodable codes: from computational complexity to cloud - PowerPoint PPT Presentation

Locally decodable codes: from computational complexity to cloud computing Sergey Yekhanin Microsoft Research Error-correcting codes: paradigm 2 E() 2 E() +noise 0110001 011000100101 01*00*10010*


  1. Locally decodable codes: from computational complexity to cloud computing Sergey Yekhanin Microsoft Research

  2. Error-correcting codes: paradigm π‘œ 𝑙 π‘Œ π‘Œ ∈ 𝐺 2 E(π‘Œ) ∈ 𝐺 2 E(π‘Œ) +noise 0110001 011000100101 01*00*10010* 0110001 Encoder Channel Decoder Corrupts up to 𝑓 coordinates. The paradigm dates back to 1940s (Shannon / Hamming)

  3. Local decoding: paradigm 𝑙 π‘œ E(π‘Œ) +noise E(π‘Œ) ∈ 𝐺 2 π‘Œ 𝑗 π‘Œ ∈ 𝐺 2 0110001 011000100101 01*00*10010* 1 Local Encoder Channel Decoder Corrupts up to 𝑓 Reads up to 𝑠 coordinates. coordinates. Local decoder runs in time much smaller than the message length! β€’ First account: Reed’s decoder for Muller’s codes (1954) β€’ Implicit use: (1950s-1990s) β€’ Formal definition and systematic study (late 1990s) [Levin’ 95, STV’ 98, KT’ 00]  Original applications in computational complexity theory  Cryptography  Most recently used in practice to provide reliability in distributed storage

  4. Local decoding: example E(X) X 1 X 2 X 3 X X 1 X 2 X 3 X 1 οƒ… X 2 X 1 οƒ… X 3 X 2 οƒ… X 3 X 1 οƒ… X 2 οƒ… X 3 Message length: k = 3 Codeword length: n = 7 Corrupted locations: 𝑓 = 3 Locality: 𝑠 = 2

  5. Local decoding: example E(X) X 1 X 2 X 3 X X 1 X 2 X 3 X 1 οƒ… X 2 X 1 οƒ… X 3 X 2 οƒ… X 3 X 1 οƒ… X 2 οƒ… X 3 Message length: k = 3 Codeword length: n = 7 Corrupted locations: 𝑓 = 3 Locality: 𝑠 = 2

  6. Locally decodable codes 𝑙 β†’ 𝐺 π‘œ is 𝑠 -locally decodable, if for every message π‘Œ , Definition: A code E: 𝐺 π‘Ÿ π‘Ÿ each π‘Œ 𝑗 can be recovered from reading some 𝑠 symbols of 𝐹(π‘Œ) , even after up to 𝑓 coordinates of 𝐹(π‘Œ) are corrupted. (Erasures.) Decoder is aware of erased locations. Output is always correct. β€’ (Errors.) Decoder is randomized. Output is correct with probability 99%. β€’ k symbol message Decoder reads only r symbols 0 1 … 0 1 n symbol codeword Noise 0 0 1 0 1 … 0 1 1 0 1 0 … 0 1

  7. Locally decodable codes Goal: Understand the true shape of the tradeoff between redundancy π‘œ βˆ’ 𝑙 and locality 𝑠 , for different settings of 𝑓 (e.g., 𝑓 = πœ€π‘œ, π‘œ πœ— , 𝑃 1 . ) 𝑠 𝑙 𝜁 Multiplicity codes Local Projective reconst- (log 𝑙) 𝑑 Reed Muller geometry ruction codes codes codes Matching 𝑃(1) vector codes π‘œ 𝜁 𝑃(1) πœ€π‘œ 𝑓 Taxonomy of known families of LDCs

  8. Plan β€’ Part I: (Computational complexity) β€’ Average case hardness β€’ An avg. case hard language in EXP (unless EXP βŠ† BPP) β€’ Construction of LDCs β€’ Open questions β€’ Part II: (Distributed data storage) β€’ Erasure coding for data storage β€’ LDCs for data storage β€’ Constructions and limitations β€’ Open questions

  9. Part I: Computational complexity

  10. Average case complexity A problem is hard-on-average if any efficient algorithm errs on 10% of the inputs. β€’ Establishing hardness-on-average for a problem in NP is a major problem. β€’ Below we establish hardness-on-average for a problem in EXP, assuming EXP ⊈ BPP. β€’ Construction [STV]: 𝐹(π‘Œ) π‘Œ 𝑙 β†’ 𝐺 2 π‘œ E: 𝐺 2 Level 𝑙 is 1 1 0 1 0 1 1 1 1 0 0 0 π‘œ = π‘žπ‘π‘šπ‘§ 𝑙 , a string π‘Œ of 1 0 length 2 𝑙 𝑠 = (log 𝑙) 𝑑 , 𝑓 = π‘œ/10. 𝑀 β€² is in EXP 𝑀 is EXP-complete Theorem: If there is an efficient algorithm that errs on <10% of 𝑀 β€² ; then EXP βŠ† BPP.

  11. Average case complexity Theorem: If there is an efficient algorithm that errs on <10% of 𝑀 β€² ; then EXP βŠ† BPP. Proof: We obtain a BPP algorithm for 𝑀 : Let A be the algorithm that errs on <10% of 𝑀 β€² ; β€’ A gives us access to the corrupted encoding 𝐹(π‘Œ) . To decide if π‘Œ 𝑗 invoke the local decoder for 𝐹(π‘Œ) . β€’ Time complexity is (log 2 𝑙 ) 𝑑 βˆ— π‘žπ‘π‘šπ‘§ 𝑙 = π‘žπ‘π‘šπ‘§ 𝑙 . β€’ Output is correct with probability 99%. β€’ 𝐹(π‘Œ) π‘Œ 𝑙 β†’ 𝐺 2 0 π‘œ 1 0 E: 𝐺 2 1 0 1 1 1 1 0 1 1 0 0 π‘œ = π‘žπ‘π‘šπ‘§ 𝑙 , 𝑠 = (log 𝑙) 𝑑 , 𝑓 = π‘œ/10. 𝑀 β€² is in EXP 𝑀 is EXP-complete

  12. Reed Muller codes Parameters: π‘Ÿ, 𝑛, 𝑒 = 1 βˆ’ 4πœ€ π‘Ÿ. β€’ Codewords: evaluations of degree 𝑒 polynomials in 𝑛 variables over 𝐺 π‘Ÿ . β€’ 𝑛 π‘Ÿ 𝑨 1 , … , 𝑨 𝑛 , deg f < 𝑒 yields a codeword: 𝑔(𝑦 ) 𝑦 ∈𝐺 Polynomial 𝑔 ∈ 𝐺 β€’ π‘Ÿ Parameters: π‘œ = π‘Ÿ 𝑛 , 𝑙 = 𝑛 + 𝑒 , 𝑠 = π‘Ÿ βˆ’ 1, 𝑓 = πœ€π‘œ. β€’ 𝑛

  13. Reed Muller codes: local decoding Key observation: Restriction of a codeword to an affine line yields an β€’ q , m , d . evaluation of a univariate polynomial 𝑔 𝑀 of degree at most 𝑒. To recover the value at 𝑦 : β€’ – Pick a random affine line through 𝑦 . – Do noisy polynomial interpolation. 𝑛 𝐺 π‘Ÿ 𝑦 Locally decodable code: Decoder reads π‘Ÿ βˆ’ 1 random locations. β€’

  14. Reed Muller codes: parameters 𝑙 = 𝑛 + 𝑒 π‘œ = π‘Ÿ 𝑛 , , 𝑒 = 1 βˆ’ 4πœ€ π‘Ÿ, 𝑠 = π‘Ÿ βˆ’ 1, 𝑓 = πœ€π‘œ. 𝑛 Setting parameters: 1 π‘ βˆ’1 . q = 𝑃 1 , 𝑛 β†’ ∞: 𝑠 = 𝑃 1 , π‘œ = exp 𝑙 β€’ Better codes are q = 𝑛 2 ∢ 𝑠 = (log 𝑙) 2 , π‘œ = π‘žπ‘π‘šπ‘§ 𝑙 . β€’ known q β†’ ∞, 𝑛 = 𝑃 1 : 𝑠 = 𝑙 πœ— , π‘œ = 𝑃 𝑙 . β€’ Reducing codeword length is a major open question.

  15. Part II: Distributed storage

  16. Data storage β€’ Store data reliably β€’ Keep it readily available for users

  17. Data storage: Replication β€’ Store data reliably β€’ Keep it readily available for users β€’ Very large overhead β€’ Moderate reliability β€’ Local recovery: Lose one machine, access one

  18. Data storage: Erasure coding β€’ Store data reliably β€’ Keep it readily available for users … β€’ Low overhead … … β€’ High reliability β€’ No local recovery: Loose one machine, access k k data chunks n-k parity chunks Need: Erasure codes with local decoding

  19. Codes for data storage … … P 1 P n-k X 1 X 2 X k β€’ Goals: β€’ (Cost) minimize the number of parities. β€’ (Reliability) tolerate any pattern of h+1 simultaneous failures. β€’ (Availability) recover any data symbol by accessing at most r other symbols β€’ (Computational efficiency) use a small finite field to define parities.

  20. Local reconstruction codes β€’ Def: An (r,h) – Local Reconstruction Code (LRC) encodes k symbols to n symbols, and β€’ Corrects any pattern of h+1 simultaneous failures; β€’ Recovers any single erased data symbol by accessing at most r other symbols.

  21. Local reconstruction codes β€’ Def: An (r,h) – Local Reconstruction Code (LRC) encodes k symbols to n symbols, and β€’ Corrects any pattern of h+1 simultaneous failures; β€’ Recovers any single erased data symbol by accessing at most r other symbols. 𝑙 β€’ Theorem[GHSY]: In any (r,h) – (LRC), redundancy n-k satisfies π‘œ βˆ’ 𝑙 β‰₯ 𝑠 + β„Ž.

  22. Local reconstruction codes β€’ Def: An (r,h) – Local Reconstruction Code (LRC) encodes k symbols to n symbols, and β€’ Corrects any pattern of h+1 simultaneous failures; β€’ Recovers any single erased data symbol by accessing at most r other symbols. 𝑙 β€’ Theorem[GHSY]: In any (r,h) – (LRC), redundancy n-k satisfies π‘œ βˆ’ 𝑙 β‰₯ 𝑠 + β„Ž. β€’ Theorem[GHSY]: If r | k and h<r+1; then any (r,h) – LRC has the following topology: Light … parities L 1 L g Local group … … … X k-r Data symbols X 1 X r X k Heavy … H 1 H h parities

  23. Local reconstruction codes β€’ Def: An (r,h) – Local Reconstruction Code (LRC) encodes k symbols to n symbols, and β€’ Corrects any pattern of h+1 simultaneous failures; β€’ Recovers any single erased data symbol by accessing at most r other symbols. 𝑙 β€’ Theorem[GHSY]: In any (r,h) – (LRC), redundancy n-k satisfies π‘œ βˆ’ 𝑙 β‰₯ 𝑠 + β„Ž. β€’ Theorem[GHSY]: If r | k and h<r+1; then any (r,h) – LRC has the following topology: Light … L 1 parities L g Local group … … … X k-r Data symbols X 1 X r X k Heavy … H 1 H h parities β€’ Fact: There exist (r,h) – LRCs with optimal redundancy over a field of size k+h.

  24. Reliability Set k=8, r=4, and h=3. L 1 L 2 X 5 X 1 X 2 X 3 X 6 X 4 X 7 X 8 H 2 H 1 H 3

  25. Reliability Set k=8, r=4, and h=3. L 1 L 2 X 5 X 1 X 2 X 3 X 6 X 4 X 7 X 8 H 2 H 1 H 3 β€’ All 4-failure patterns are correctable.

  26. Reliability Set k=8, r=4, and h=3. L 1 L 2 X 5 X 1 X 2 X 3 X 6 X 4 X 7 X 8 H 2 H 1 H 3 β€’ All 4-failure patterns are correctable. β€’ Some 5-failure patterns are not correctable.

  27. Reliability Set k=8, r=4, and h=3. L 1 L 2 X 5 X 1 X 2 X 3 X 6 X 4 X 7 X 8 H 2 H 1 H 3 β€’ All 4-failure patterns are correctable. β€’ Some 5-failure patterns are not correctable. β€’ Other 5-failure patterns might be correctable.

  28. Reliability Set k=8, r=4, and h=3. L 1 L 2 X 5 X 1 X 2 X 3 X 6 X 4 X 7 X 8 H 2 H 1 H 3 β€’ All 4-failure patterns are correctable. β€’ Some 5-failure patterns are not correctable. β€’ Other 5-failure patterns might be correctable.

  29. Combinatorics of correctable failure patterns Def: A regular failure pattern for a (r,h)-LRC is a pattern that can be obtained by failing one symbol in each local group and h extra symbols. L 1 L 2 L 1 L 2 X 3 X 4 X 7 X 3 X 4 X 7 X 1 X 2 X 5 X 6 X 8 X 1 X 2 X 5 X 6 X 8 H 1 H 2 H 3 H 1 H 2 H 3 Theorem: Every failure pattern that is not dominated by a regular failure pattern is not β€’ correctable by any LRC. There exist LRCs that correct all regular failure patterns. β€’

Recommend


More recommend