algebraic structure in network information theory
play

Algebraic Structure in Network Information Theory Michael Gastpar - PowerPoint PPT Presentation

Algebraic Structure in Network Information Theory Michael Gastpar EPFL / Berkeley European Information Theory School, Antalya, Turkey April 2012 slides jointly with Bobak Nazer (Boston Univ.) download slides from linx.epfl.ch under


  1. Algebraic Structure in Network Information Theory Michael Gastpar EPFL / Berkeley European Information Theory School, Antalya, Turkey April 2012 slides jointly with Bobak Nazer (Boston Univ.) download slides from linx.epfl.ch under “Teaching”

  2. Motivation p Y | X

  3. Motivation p Y | X

  4. Motivation p Y | X p Y | X 1 X 2

  5. Motivation p Y | X p Y | X 1 X 2 p Y 1 Y 2 Y 3 | X 1 X 2 X 3

  6. Outline I. Discrete Alphabets II. AWGN Channels III. Network Applications

  7. Point-to-Point Channels x y p Y | X w E D w ˆ The Usual Suspects: • Message w ∈ { 0 , 1 } k w ∈ { 0 , 1 } k • Estimate ˆ • Encoder E : { 0 , 1 } k → X n • Decoder D : Y n → { 0 , 1 } k • Input x ∈ X n • Output y ∈ Y n n � • Memoryless Channel p ( y | x ) = p ( y i | x i ) i =1 • Rate R = k n . • (Average) Probability of Error: P { ˆ w � = w } → 0 as n → ∞ . Assume w is uniform over { 0 , 1 } k .

  8. i.i.d. Random Codes • Generate 2 nR codewords x = [ X 1 X 2 · · · X n ] independently and elementwise i.i.d. according to some distribution p X q − 1 . . . n � 4 p ( x ) = p X ( x i ) 3 i =1 2 • Bound the average error probability 1 for a random codebook. 0 4 · · · q − 1 0 1 2 3 • If the average performance over codebooks is good, there must exist at least one good fixed codebook.

  9. (Weak) Joint Typicality • Two sequences x and y are (weakly) jointly typical if � � � − 1 � � n log p ( x ) − H ( X ) � <ǫ � � � � � − 1 � � n log p ( y ) − H ( Y ) � <ǫ � � � − 1 � � � � n log p ( x , y ) − H ( X, Y ) � <ǫ � � • For our considerations, weak typicality is convenient as it can also be stated in terms of differential entropies. • If x and y are i.i.d. sequences, the probability that they are jointly typical goes to 1 as n goes to infinity.

  10. Joint Typicality Decoding Decoder looks for a codeword that is jointly typical with the received sequence y Error Events 1. Transmitted codeword x is not jointly typical with y . = ⇒ Low probability by the Weak Law of Large Numbers. 2. Another codeword ˜ x is jointly typical with y . Cuckoo’s Egg Lemma Let ˜ x be an i.i.d. sequence that is independent from the received sequence y . � � ≤ 2 − n ( I ( X ; Y ) − 3 ǫ ) ( ˜ x , y ) is jointly typical P See Cover and Thomas .

  11. Point-to-Point Capacity • We can upper bound the probability of error via the union bound: � � � P { ˆ w � = w } ≤ P ( x ( ˜ w ) , y ) is jointly typical. w � = w ˜ ≤ 2 − n ( I ( X ; Y ) − R − 3 ǫ ) ← Cuckoo’s Egg Lemma • If R < I ( X ; Y ) , then the probability of error can be driven to zero as the blocklength increases. Theorem (Shannon ’48) The capacity of a point-to-point channel is C = max p X I ( X ; Y ) .

  12. Linear Codes • Linear Codebook: A linear map between messages and codewords (instead of a lookup table). q -ary Linear Codes • Represent message w as a length- k vector over F q . • Codewords x are length- n vectors over F q . • Encoding process is just a matrix multiplication, x = Gw .    · · ·    x 1 g 11 g 12 g 1 k w 1 x 2 g 21 g 22 · · · g 2 k w 2        =  .   . . .   .  ... . . . . .       . . . . .      · · · x n g n 1 g n 2 g nk w k • Recall that, for prime q , operations over F q are just mod q operations over the reals. • Rate R = k n log q

  13. Random Linear Codes • Linear code looks like a regular subsampling of the elements of F n q . q − 1 . . . • Random linear code: Generate 4 each element g ij of the generator F q 3 matrix G elementwise i.i.d. 2 according to a uniform distribution 1 over { 0 , 1 , 2 , . . . , q − 1 } . 0 4 · · · q − 1 0 1 2 3 • How are the codewords distributed? F q

  14. Random Linear Codes • Linear code looks like a regular subsampling of the elements of F n q . q − 1 . . . • Random linear code: Generate 4 each element g ij of the generator F q 3 matrix G elementwise i.i.d. 2 according to a uniform distribution 1 over { 0 , 1 , 2 , . . . , q − 1 } . 0 4 · · · q − 1 0 1 2 3 • How are the codewords distributed? F q

  15. Codeword Distribution x = Gw ⊕ v It is convenient to instead analyze the shifted ensemble ¯ where v is an i.i.d. uniform sequence. (See Gallager. ) Shifted Codeword Properties 1. Marginally uniform over F n q . For a given message w , the codeword ¯ x looks like an i.i.d. uniform sequence. x = x } = 1 for all x ∈ F n P { ¯ q q n 2. Pairwise independent. For w 1 � = w 2 , codewords ¯ x 1 , ¯ x 2 are independent. 1 P { ¯ x 1 = x 1 , ¯ x 2 = x 2 } = q 2 n = P { ¯ x 1 = x 1 } P { ¯ x 2 = x 2 }

  16. Achievable Rates • Cuckoo’s Egg Lemma only requires independence between the true codeword x ( w ) and the other codeword x ( ˜ w ) . From the union bound: � � � P { ˆ w � = w } ≤ ( x ( ˜ w ) , y ) is jointly typical. P w � = w ˜ ≤ 2 − n ( I ( X ; Y ) − R − 3 ǫ ) • This is exactly what we get from pairwise independence. • Thus, there exists a good fixed generator matrix G and shift v for any rate R < I ( X ; Y ) where X is uniform.

  17. Removing the Shift z y ¯ ¯ x w E D w ˆ • For a binary symmetric channel (BSC), the output can be written as the modulo sum of the input plus i.i.d. Bernoulli ( p ) noise, ¯ y = ¯ x ⊕ z y = Gw ⊕ v ⊕ z ¯ • Due to this symmetry, the probability of error depends only on the realization of the noise vector z . = ⇒ For a BSC, x = Gw is a good code as well. • We can now assume the existence of good generator matrices for channel coding.

  18. Random I.I.D. vs. Random Linear • What have we gotten for linearity (so far)? Simplified encoding. (Decoder is still quite complex.) • What have we lost? Can only achieve R = I ( X ; Y ) for uniform X instead of max p X I ( X ; Y ) . • In fact, this is a fundamental limitation of group codes, Ahlswede ’71 . • Workarounds: symbol remapping Gallager ’68 , nested linear codes • Are random linear codes strictly worse than random i.i.d. codes?

  19. Slepian-Wolf Problem R 1 s 1 E 1 ˆ s 1 D R 2 ˆ s 2 s 2 E 2 m � • Joint i.i.d. sources p ( s 1 , s 2 ) = p S 1 S 2 ( s 1 i , s 2 i ) i =1 • Rate Region: Set of rates ( R 1 , R 2 ) such that the encoders can send s 1 and s 2 to the decoder with vanishing probability of error P { ( ˆ s 1 , ˆ s 2 ) � = ( s 1 , s 2 ) } → 0 as m → ∞

  20. Random Binning • Codebook 1: Independently and uniformly assign each source sequence s 1 to a label { 1 , 2 , . . . , 2 mR 1 } • Codebook 2: Independently and uniformly assign each source sequence s 2 to a label { 1 , 2 , . . . , 2 mR 2 } • Decoder: Look for jointly typical pair ( ˆ s 1 , ˆ s 2 ) within the received bin. Union bound: � � jointly typical ( ˆ s 1 , ˆ s 2 ) � = ( s 1 , s 2 ) in bin ( ℓ 1 , ℓ 2 ) P � 2 − m ( R 1 + R 2 ) ≤ jointly typical ( ˜ s 1 , ˜ s 2 ) ≤ 2 m ( H ( S 1 ,S 2 )+ ǫ ) 2 − m ( R 1 + R 2 ) • Need R 1 + R 2 > H ( S 1 , S 2 ) . • Similarly, R 1 > H ( S 1 | S 2 ) and R 2 > H ( S 2 | S 1 )

  21. Slepian-Wolf Problem: Binning Illustration · · · 1 2 3 4 2 nR 1 1 2 3 4 . . . 2 nR 2

  22. Slepian-Wolf Problem: Binning Illustration · · · 1 2 3 4 2 nR 1 1 2 3 4 . . . 2 nR 2

  23. Random Linear Binning • Assume source symbols take values in F q . • Codebook 1: Generate matrix G 1 with i.i.d. uniform entries drawn from F q . Each sequence s 1 is binned via matrix multiplication, w 1 = G 1 s 1 . • Codebook 2: Generate matrix G 2 with i.i.d. uniform entries drawn from F q . Each sequence s 2 is binned via matrix multiplication, w 2 = G 2 s 2 . • Bin assignments are uniform and pairwise independent (except for s ℓ = 0 ) • Can apply the same union bound analysis as random binning.

  24. Slepian-Wolf Rate Region Slepian-Wolf Theorem R 2 Reliable compression possible if and S-W only if: R 1 ≥ H ( S 1 | S 2 ) = h B ( p ) R 2 ≥ H ( S 2 | S 1 ) = h B ( p ) h B ( p ) R 1 + R 2 ≥ H ( S 1 , S 2 ) = 1 + h B ( p ) R 1 + R 2 = 1 + h B ( p ) Random linear binning is as good R 1 h B ( p ) as random i.i.d. binning! Example: Doubly Symmetric Binary Source S 1 ∼ Bern (1 / 2) U ∼ Bern ( p ) S 2 = S 1 ⊕ U

  25. K¨ orner-Marton Problem • Binary sources R 1 s 1 E 1 • s 1 is i.i.d. Bernoulli( 1 / 2 ) D ˆ u • s 2 is s 1 corrupted by Bernoulli( p ) R 2 noise s 2 E 2 • Decoder wants the modulo- 2 sum . u = s 1 ⊕ s 2 Rate Region: Set of rates ( R 1 , R 2 ) such that there exist encoders and decoders with vanishing probability of error P { ˆ u � = u } → 0 as m → ∞ Are any rate savings possible over sending s 1 and s 2 in their entirety?

  26. Random Binning • Sending s 1 and s 2 with random binning requires R 1 + R 2 > 1 + h B ( p ) ? • What happens if we use rates such that R 1 + R 2 < 1 + h B ( p ) ? • There will be exponentially many pairs ( s 1 , s 2 ) in each bin! • This would be fine if all pairs in a bin have the same sum, s 1 + s 2 . But this probability goes to zero exponentially fast!

  27. K¨ orner-Marton Problem: Random Binning Illustration · · · 1 2 3 4 2 nR 1 1 2 3 4 . . . 2 nR 2

  28. K¨ orner-Marton Problem: Random Binning Illustration · · · 1 2 3 4 2 nR 1 1 2 3 4 . . . 2 nR 2

Recommend


More recommend