information theory
play

INFORMATION THEORY FUNDAMENTALS AND MULTIPLE USER APPLICATIONS Max - PowerPoint PPT Presentation

INFORMATION THEORY FUNDAMENTALS AND MULTIPLE USER APPLICATIONS Max H. M. Costa Unicamp July 2018 LAWCI Unicamp Summary o Introduction o Entropy, K-L Divergence, Mutual Information o Asymptotic Equipartition Property (AEP) o 1. Data


  1. Transmission over Unreliable Channels ๏‚จ The Channel Coding Problem: X Y ๐‘‹ W Channel Channel Channel ๐‘ž(๐‘ง|๐‘ฆ) Decoder Encoder ๏‚จ W ๏ฅ {1,2,โ€ฆ, 2 ๐‘œ๐‘† } = message set of rate R ๏‚จ X = (x 1 x 2 โ€ฆ x n ) = codeword input to channel ๏‚จ Y = (y 1 y 2 โ€ฆ y n ) = codeword output from channel = decoded message P(error) = P{ W ๏‚น ๐‘‹} ๏‚จ ๐‘‹

  2. Shannonโ€™s Channel Coding Theorem ๏‚จ Using the channel n times: X n โ€ข Y n โ€ข โ€ข โ€ข โ€ข โ€ข โ€ข โ€ข โ€ข โ€ข

  3. Simple examples ๏‚จ Noiseless typewriter: 1 1 2 2 Input X Output Y 3 3 4 4 Number of noise free symbols = 4 Can transmit R = ๐‘š๐‘๐‘• 2 4 = 2 bits/transmission

  4. Simple examples ๏‚จ Noisy typewriter (type 1): 0.5 1 1 0.5 2 2 Input X Output Y 0.5 3 3 0.5 4 4 Number of noise free symbols = 2 Can transmit R = ๐‘š๐‘๐‘• 2 2 = 1 bit/transmission

  5. Simple examples ๏‚จ Noisy typewriter (type 2): 0.5 1 1 0.5 2 2 Input X Output Y 0.5 3 3 0.5 4 4 Number of noise free symbols = 2 (apparently, surprise later) Can transmit R โ‰ฅ ๐‘š๐‘๐‘• 2 2 = 1 bit/transmission

  6. Simple examples ๏‚จ Noisy typewriter (type 3): 0.5 1 1 0.5 0.5 2 2 Input X Output Y 0.5 0.5 3 3 0.5 0.5 4 4 0.5 Number of noise free symbols = 2 Use X=1 and X=3 Can transmit R = ๐‘š๐‘๐‘• 2 2 = 1 bit/transmission

  7. Simple examples ๏‚จ A tricky typewriter: 0.5 1 1 0.5 2 2 Input X Output Y 0.5 3 3 0.5 4 4 0.5 5 5 How many noise free symbols? Clearly at least 2, hopefully more.

  8. Simple examples ๏‚จ Consider the n=2 extension of the channel: X 1 1 3 2 4 5 X 2 1 2 3 Which code 4 squares to pick? 5

  9. Simple examples ๏‚จ Consider the n=2 extension of the channel: X 1 3 1 2 4 5 X 2 Let {X 1 ,X 2 } be 1 {(1,1), (2,3), 2 (3,5), (4,2), (5,4)} 3 4 5

  10. Reminder of the channel ๏‚จ A tricky typewriter: 0.5 1 1 0.5 2 2 Input X Output Y 0.5 3 3 0.5 4 4 0.5 5 5 How many noise free symbols? Clearly at least 2, hopefully more.

  11. Simple examples ๏‚จ Looking at the outputs: Y 1 3 1 2 4 5 Y 2 Let {X 1 ,X 2 } be 1 {(1,1), (2,3), 2 (3,5), (4,2), (5,4)} 3 4 5

  12. Simple examples - observations ๏‚จ Note that we get 5 noise-free symbols in n=2 transmissions. ๐‘š๐‘๐‘• 2 5 ๏‚จ Thus achieve rate 2 = 1.16 bits/transmission ๏‚จ with P(error) = 0. ๏‚จ For arbitrarily small P(error) can use long codes (n ๏ƒ  ๏‚ฅ ) to achieve log 2 5/2=1.32 bits/transmission, the channel capacity.

  13. The Binary Symmetric Channel (BSC) ๏‚จ How many noise-free symbols? 1- ๏ฅ 0 0 ๏ฅ X Y ๏ฅ 1- ๏ฅ 1 1 A.: Clearly for n=1 there are none. How about using n large?

  14. Shannonโ€™s Second Theorem ๏‚จ Using the channel ๐‘œ times: X n Y n โ€ข โ€ข โ€ข โ€ข โ€ข

  15. Shannonโ€™s Second Theorem ๏‚จ The Information Channel Capacity of a discrete memoryless channel is ๏‚จ ๐ท = max ๐ฝ(๐‘Œ; ๐‘) . ๐‘ž(๐‘ฆ) ๏‚จ Note: ๐ฝ ๐‘Œ; ๐‘ is a function of ๐‘ž ๐‘ฆ, ๐‘ง = ๐‘ž ๐‘ฆ ๐‘ž ๐‘ง ๐‘ฆ . ๏‚จ But ๐‘ž ๐‘ง ๐‘ฆ is fixed by the channel.

  16. Shannonโ€™s Second Theorem ๏‚จ Direct Part: ๏‚จ If R < C ๏‚” There exists a code with P(error) ๏ƒ  0 ๏‚จ Converse Part: ๏‚จ If R > C ๏‚” Communication with P(error) ๏ƒ  0 ๏‚จ is not possible.

  17. Shannonโ€™s Second Theorem ๏‚จ Theorem: For a discrete memoryless channel, all rates ๐‘† below the information channel capacity ๐ท are achievable with maximum probability of error arbitrarily small. Conversely, if the rate is above ๐ท, the probability of error is bounded away from zero. ๏‚จ Proof: Achievability: Use random coding to generate codes with a particular ๐‘ž ๐‘ฆ distribution in the codewords. Then show that the average P(error) tends to zero with n ๏ƒ  ๏‚ฅ if ๐‘† < ๐ท. Then expurgate bad codewords to get a code with small maximum P(error) .

  18. Shannonโ€™s Second Theorem ๏‚จ Proof of Converse (sketch using AEP): Y n ๏€ 2 ๐‘œ๐ผ(๐‘) Y n X n โ€ข โ€ข โ€ข โ€ข โ€ข typical balls ๏€ 2 ๐‘œ๐ผ(๐‘|๐‘Œ)

  19. Shannonโ€™s Second Theorem ๏‚จ Proof of Converse (sketch using AEP): ๏‚จ Recall the sphere packing problem. Maximum number of non-overlapping balls is bounded by 2 ๐‘œ๐ผ(๐‘) 2 ๐‘† โ‰ค 2 ๐‘œ๐ผ(๐‘|๐‘Œ) = 2 ๐‘œ๐ฝ(๐‘Œ:๐‘) โ‰ค 2 ๐‘œ๐ท ๏‚จ Thus 2 ๐‘† โ‰ค 2 ๐ท and ๐‘† โ‰ค ๐ท. ๏‚จ A formal proof uses Fanoโ€™s inequality.

  20. Example: The Binary Symmetric Channel ๏‚จ 1- ๏ฅ 0 0 ๏ฅ X Y ๏ฅ 1- ๏ฅ 1 1 1 ๏‚จ C = max (H(Y) โ€“ H(Y|X)) C( ๏ฅ ) ๏‚จ = 1 โ€“ h( ๏ฅ ) bits/transmission ๏‚จ Note: C=0 for ๏ฅ = ยฝ . ๏ฅ 0 ยฝ 1

  21. Example: The Binary Erasure Channel 1- ๏ก ๏‚จ 0 0 ๏ก X E Y ๏ก 1- ๏ก 1 1 C( ๏ก ) ๏‚จ C = max (H(Y) โ€“ H(Y|X)) ๏‚จ = 1 โ€“ ๏ก bits/transmission Note: C=0 for ๏ก = 1. ๏ก Capacity is achieved with ยฝ 0 1 ๐‘ž ๐‘Œ = 0 = ๐‘ž ๐‘Œ = 1 = ยฝ .

  22. Example: The Z Channel ๏‚จ 0 0 ยฝ X Y ยฝ 1 1 ๐‘๐‘—๐‘ข ๏‚จ C = max (H(Y) โ€“ H(Y|X)) = ๐‘š๐‘๐‘• 2 5 โˆ’ 2 = 0.322 ๐‘ข๐‘ . ๐‘ž(๐‘ฆ) 2 ๏‚จ Note: Maximizing ๐‘ž ๐‘Œ = 1 = 5 . ๏‚จ Homework: Obtain this capacity.

  23. Changing Z channel into BEC ๏‚จ Show that code {01, 10} can transform a Z channel into a BEC. ๏‚จ What is a lower bound to capacity of the Z channel?

  24. Typewriter type 2: ยฝ ๏‚จ ยฝ ยฝ ยฝ Sum channel: 2 C = 2 C1 + 2 C2 where C 1 = C 2 = 0.322 C = 1,322 bits/channel use How many noise free symbols?

  25. Example: Noisy typewriter ยฝ ๏‚จ A A ยฝ B B C C X Y D D โ€ข E โ€ข โ€ข Z Z ๏‚จ C = max (H(Y) โ€“ H(Y|X)) ๏‚จ = ๐‘š๐‘๐‘• 2 26 โˆ’ ๐‘š๐‘๐‘• 2 2 = ๐‘š๐‘๐‘• 2 13 bits/transmission ๏‚จ Achieved with uniform distribution on the inputs.

  26. Remark: ๏‚จ For this example, we can also achieve ๏‚จ ๐ท = ๐‘š๐‘๐‘• 2 13 bits/transmission with P(error)=0 and ๏‚จ n = 1 by transmitting alternating input symbols, i.e., ๏‚จ X = {A C E โ€ฆ Z}.

  27. Differential Entropy ๏‚จ Let ๐‘Œ be a continuous random variable with density ๐‘” ๐‘ฆ and support ๐‘‡ . The differential entropy of ๐‘Œ is โ„Ž ๐‘Œ = โˆ’ ๐‘” ๐‘ฆ log ๐‘” ๐‘ฆ ๐‘’๐‘ฆ (if it exists). ๐‘‡ Note: Also written as โ„Ž ๐‘” .

  28. Examples: Uniform distribution ๏‚จ Let ๐‘Œ be uniform in the interval 0, ๐‘ . Then 1 ๏‚จ ๐‘” ๐‘ฆ = ๐‘ in the interval and ๐‘” ๐‘ฆ = 0 outside. ๐‘ 1 1 ๏‚จ โ„Ž ๐‘Œ = โˆ’ ๐‘ ๐‘š๐‘๐‘• ๐‘ ๐‘’๐‘ฆ = log ๐‘ 0 ๏‚จ Note that โ„Ž ๐‘Œ can be negative for ๐‘ < 1. ๏‚จ However, 2 โ„Ž(๐‘”) = 2 log ๐‘ = ๐‘ is the size of the support set, which is non-negative.

  29. Example: Gaussian distribution โˆ’๐‘ฆ 2 1 ๏‚จ Let ๐‘Œ ~ ๏ฆ ๐‘ฆ = 2 ๏ฐ๏ณ 2 ๐‘“๐‘ฆ๐‘ž ( 2 ๏ณ 2 ) ๐‘ฆ 2 2 ๏ณ 2 โˆ’ ๐‘š๐‘œ 2 ๏ฐ๏ณ 2 ] ๐‘’๐‘ฆ ๏‚จ Then โ„Ž ๐‘Œ = โ„Ž ๏ฆ = โˆ’ ๏ฆ ๐‘ฆ [โˆ’ ๐น๐‘Œ 2 1 2 ๐‘š๐‘œ 2 ๏ฐ ๏ณ 2 ๏‚จ = 2 ๏ณ 2 + 1 2 ๐‘š๐‘œ ( 2 ๏ฐ e ๏ณ 2 ) nats ๏‚จ = 1 2 ๐‘š๐‘๐‘• ( 2 ๏ฐ e ๏ณ 2 ) bits ๏‚จ Changing the base we have โ„Ž ๐‘Œ =

  30. Relation of Differential and Discrete Entropies ๏‚จ Consider a quantization of X, denoted by X ๏„ ๏„ ๏‚จ Let X ๏„ = ๐‘ฆ ๐‘— i nside the ๐‘— th interval. Then ๐ผ(๐‘Œ ๏„ ) = - ๐‘ž ๐‘— ๐‘š๐‘๐‘• ๐‘ž ๐‘— ๐‘— = - ๏„ ๐‘”(๐‘ฆ ๐‘— ) ๐‘š๐‘๐‘• ๐‘”(๐‘ฆ ๐‘— ) - ๐‘š๐‘๐‘• ๏„ ๐‘— ๏€ โ„Ž ๐‘” โˆ’ log ๏„

  31. Differential Entropy ๏‚จ So the two entropies differ by the log of the quantization level ๏„ . ๏‚จ We can define joint differential entropy, conditional differential entropy, K-L divergence and mutual information with some care to avoid infinite differential entropies.

  32. K-L divergence and Mutual Information ๏‚จ ๐‘” ๏‚จ ๐ธ(๐‘” g) = ๐‘” ๐‘š๐‘๐‘• ๐‘• ๐‘”(๐‘ฆ,๐‘ง) ๏‚จ ๐ฝ ๐‘Œ; ๐‘ = ๐‘” ๐‘ฆ, ๐‘ง ๐‘š๐‘๐‘• ๐‘” ๐‘ฆ ๐‘”(๐‘ง) ๐‘’๐‘ฆ ๐‘’๐‘ง ๏‚จ Thus , I(X;Y) = h(X) + h(Y) โ€“ h(X,Y). Note: h(X) can be negative, but I(X;Y) is always ๏‚ณ 0.

  33. Differential entropy of a Gaussian vector ๏‚จ Theorem: Let ๐’€ be a Gaussian n -dimensional vector with mean ๏ญ and covariance matrix ๐ฟ. Then 2 log((2 ๏ฐ ๐‘“) ๐‘œ ๐ฟ ) 1 ๏‚จ โ„Ž ๐’€ = ๏‚จ where ๐ฟ denotes the determinant of ๐ฟ. ๏‚จ Proof: Algebraic manipulation.

  34. The Gaussian Channel Z~N (0, N I) X ๐‘‹ W Y Channel Channel + Decoder Encoder Power Constraint: EX 2 โ‰คP

  35. The Gaussian Channel ๏‚จ Z~N (0, N I) X Y ๐‘‹ W Channel Channel + Decoder Encoder Power constraint: EX 2 โ‰คP ๏‚จ W ๏ฅ {1,2,โ€ฆ, 2 ๐‘œ๐‘† } = message set of rate R ๏‚จ X = (x 1 x 2 โ€ฆ x n ) = codeword input to channel ๏‚จ Y = (y 1 y 2 โ€ฆ y n ) = codeword output from channel = decoded message P(error) = P{ W ๏‚น ๐‘‹} ๏‚จ ๐‘‹

  36. The Gaussian Channel ๏‚จ Using the channel n times: X n โ€ข Y n โ€ข โ€ข โ€ข โ€ข โ€ข โ€ข โ€ข โ€ข โ€ข

  37. The Gaussian Channel ๏‚จ ๏‚จ C๐‘๐‘ž๐‘๐‘‘๐‘—๐‘ข๐‘ง ๐ท = max ๐ฝ(๐‘Œ; ๐‘) f(x): EX 2 โ‰คP ๏‚จ ๐ฝ ๐‘Œ; ๐‘ = โ„Ž ๐‘ โˆ’ โ„Ž ๐‘ ๐‘Œ = โ„Ž ๐‘ โˆ’ โ„Ž ๐‘Œ + ๐‘Ž|๐‘Œ 1 1 ๏‚จ = โ„Ž ๐‘ โˆ’ โ„Ž ๐‘Ž โ‰ค 2 log 2 ๏ฐ e ๐‘„ + ๐‘‚ โˆ’ 2 log 2 ๏ฐ e ๐‘‚ 1 ๐‘„ ๏‚จ = 2 log 1 + ๐‘‚ bits/transmission

  38. The Gaussian Channel ๏‚จ ๏‚จ The capacity of the discrete time additive Gaussian channel: 1 ๐‘„ ๏‚จ ๐ท = 2 log 1 + ๐‘‚ bits/transmission ๏‚จ achieved with X ~ N(0 , P).

  39. Bandlimited Gaussian Channel ๏‚จ Consider the channel with continuous waveform inputs x(t) ๐‘ˆ 1 ๐‘ˆ ๐‘ฆ 2 ๐‘ข ๐‘’๐‘ข โ‰ค ๐‘„) and Bandwidth with power constraint ( 0 limited to W. The channel has white Gaussian noise with power spectral density N 0 /2 watt/Hz. ๏‚จ In the interval (0,T) we can specify the code waveform by 2WT samples (Nyquist criterion). We can transmit these samples over discrete time Gaussian channels with noise variance N 0 /2. This gives ๐‘„ ๏‚จ ๐ท = ๐‘‹ log ( 1+ ๐‘‚ 0 ๐‘‹ ) bit/second

  40. Bandlimited Gaussian Channel ๏‚จ ๐‘„ ๏‚จ ๐ท = ๐‘‹ log ( 1+ ๐‘‚ 0 ๐‘‹ ) bit/second ๏‚จ Note: If W ๏ƒ  ๏‚ฅ ๐‘„ ๐‘‚ 0 ๐‘š๐‘๐‘• 2 ๐‘“ bits/second. ๏‚จ we have C =

  41. Bandlimited Gaussian Channel ๐‘† ๐‘‹ be the spectral density ๏ฎ in bits per second ๏‚จ Let per Hertz. Also let ๐‘„ = ๐น ๐‘ ๐‘† where ๐น ๐‘ is the available energy per information bit. ๏‚จ We get ๐‘† ๐ท ๐น ๐‘ ๐‘† ๏‚จ ๐‘‹ โ‰ค ๐‘‹ = log ( 1+ ๐‘‚ 0 ๐‘‹ ) bit/second. ๏‚จ Thus 2 ๏ฎ โˆ’1 ๐น ๐‘ ๏‚จ ๐‘‚ 0 โ‰ฅ ๏ฎ This relation defines the so called Shannon Bound.

  42. The Shannon Bound 2 ๏ฎ โˆ’1 ๐น ๐‘ ๏‚จ ๐‘‚ 0 โ‰ฅ ๏ฎ ๐น ๐‘ ๐น ๐‘ ๏ฎ ๏ฎ ๐‘‚ 0 (dB) โ€“ ๐‘‚ 0 Shannon Bound ๏ƒ  0 0.69 -1.59 โ€“ 5 0.1 0.718 -1.44 โ€“ โ€ข 4 0.25 0.757 -1.21 0.5 0.828 -0.82 โ€“ 3 1 1 0 โ€“ โ€ข 2 2 1.5 1.76 ๐น ๐‘ โ€“ โ€ข ๐‘‚ 0 (dB) 1 4 3.75 5.74 ๏ผ 8 31.87 15.03 ๏ผ ๏ผ ๏ผ ๏ผ ๏ผ ๏ผ ๏ผ โ€ข -1 0 1 2 3 4 5 6

  43. Shannonโ€™s Water Filling Solution

  44. Parallel Gaussian Channels ๏‚จ 3 2.5 2 1

  45. Example of Water Filling ๏‚จ Channels with noise levels 2, 1 and 3. ๏‚จ Available power = 2 1 0.5 1 1.5 1 0 ๏‚จ Capacity= 2 log (1+ 2 ) + 2 log (1+ 1 ) + 2 log (1+ 3 ) ๏‚จ Level of noise + signal power = 2.5 ๏‚จ No power allocated to the third channel.

  46. Parallel Gaussian Channels ๏‚จ 3 2.5 2 1

  47. Differential capacity Discrete memoryless channel as a band limited channel

  48. Multiplex strategies (TDMA, FDMA) ๏ฌ P j ๏ฌ j

  49. Multiplex strategies (non-orthonal CDMA) P j ๏ฌ 1 ๐‘„ 2 log (1 + ๐‘‚+ ๐‘˜โˆ’1 ๐‘„ ) j Discrete memoryless channel as a band limited channel M ๐‘˜ = 1 (1 + ๐‘๐‘„ ๐ท 2 log ๐‘‚ ) Aggregate capacity: : j=1

  50. TDMA or FDMA versus CDMA Orthogonal schemes: ๏‚จ Bandwidth limitation (2WT dimensions) Number of Users Non-orthogonal CDMA (log has no cap) Aggregate Power

  51. Multiple User Information Theory ๏‚จ Building Blocks: ๏‚จ Multiple Access Channels (MACs) ๏‚จ Broadcast Channels (BCs) ๏‚จ Interference Channels (IFCs) ๏‚จ Relay Channels (RCs) ๏‚จ Note: These channels have their discrete memoryless and Gaussian versions. For simplicity we will look at the Gaussian models.

  52. Multiple Access Channel (MAC)

  53. Gaussian Broadcast Channel

  54. Superposition coding (1- ๏ก )P N 2 ๏ก P 1

  55. Superposition coding (1- ๏ก )P N 2 ๏ก P 1

  56. Standard Gaussian Interference Channel Power P1 ^ W 1 W 1 a b ^ W 2 W 2 Power P2

  57. Symmetric Gaussian Interference Channel Power P Power P

  58. Z-Gaussian Interference Channel

  59. Interference Channel: Strategies Things that we can do with interference: Ignore (take interference as noise (IAN) 1. Avoid (divide the signal space (TDM/FDM)) 2. Partially decode both interfering signals 3. Partially decode one, fully decode the other 4. Fully decode both (only good for strong inter- 5. ference , aโ‰ฅ1)

  60. Interference Channel: Brief history ๏‚จ Carleial (1975): Very strong interference does not reduce capacity (a 2 โ‰ฅ 1+P) ๏‚จ Sato (1981), Han and kobayashi (1981): Strong interference (a 2 โ‰ฅ 1) : IFC behaves like 2 MACs ๏‚จ Motahari, Khandani (2007), Shang, Kramer and Chen (2007), Annapureddy, Veeravalli (2007): Very weak interference (2a(1+a 2 P) โ‰ค 1 ) : Treat interference as noise โ€“ (IAN)

  61. Interference Ch.: History (continued) ๏‚จ Sason (2004): Symmetrical superposition to beat TDM ๏‚จ Etkin, Tse, Wang (2008): capacity to within 1 bit ๏‚จ C (2011): Noisebergs to get Gaussian H+K for Z IFCs ๏‚จ C, Nair (2012, 2013, 2016, 2017): Some progress on achievable region of symmetric Gaussian IFCs ๏‚จ Polyanskiy and Wu, 2016: Corner points established.

Recommend


More recommend