2 information theory
play

2. Information Theory -Channel Capacity Ying Cui Department of - PowerPoint PPT Presentation

1896 1920 1987 2006 Computing and Communications 2. Information Theory -Channel Capacity Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2018, Autumn 1 Outline Communication system Examples of


  1. 1896 1920 1987 2006 Computing and Communications 2. Information Theory -Channel Capacity Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2018, Autumn 1

  2. Outline • Communication system • Examples of channel capacity • Symmetric channels • Properties of channel capacity • Definitions • Channel coding theorem • Source-channel coding theorem 2

  3. Reference • Elements of information theory, T. M. Cover and J. A. Thomas, Wiley 3

  4. CHANNEL CAPACITY 4

  5. Communication System • Map source symbols from finite alphabet into some sequence of channel symbols, i.e., input sequence of channel • Output sequence of channel is random but has a distribution depending on input sequence of channel – two different input sequences may give rise to same output sequence, i.e., inputs are confusable – choose a “ nonconfusable ” subset of input sequences so that with high probability there is only one highly likely input that could have caused the particular output • Attempt to recover transmitted message from output sequence of channel – reconstruct input sequences with a negligible probability of error 5

  6. Channel Capacity conditioned on the input at that time relative entropy between p(x,y) and p(x)p(y) – I(X;Y)=H(X)-H(X|Y)= H(Y)-H(Y|X) 6

  7. EXAMPLES OF CHANNEL CAPACITY 7

  8. Noiseless Binary Channel • Binary input is reproduced exactly at output I(X;Y)=H(X)-H(X|Y)=H(X) • C = max I ( X ; Y ) = 1 bit, achieved using p(x) = (1/2, 1/2) – one error-free bit can be transmitted per channel use 8

  9. Noisy Channel with Nonoverlapping Outputs • Two p ossible outputs corresponding to each of the two inputs – appear to be noisy, but really not I(X;Y)=H(X)-H(X|Y)=H(X) • C = max I ( X ; Y ) = 1 bit, achieved using p(x) = (1/2, 1/2) – input can be determined from the output – every transmitted bit can be recovered without error 9

  10. Noisy Typewriter • Channel input is either unchanged with probability 1/2 or is transformed into the next letter with probability 1/2 • If the input has 26 symbols and we use every alternate input symbol, we can transmit one of 13 symbols without error with each transmission • C = max I ( X ; Y ) = max ( H ( Y ) – H ( Y | X )) = max H ( Y ) – 1 Y uniform dist. p(y)=(1/26, … ,1/26) = log 26 – 1 = log 13 achieved using p(x) = (1/26,…, 1/26) 10

  11. Binary Symmetric Channel • Input symbols are complemented with probability p • \\\ equality achieved when Y follows uniform dist. p(y)=(1/2,1/2) equality achieved using p(x)=(1/2,1/2) 11

  12. Binary Erasure Channel • Two inputs and three outputs, a fraction of bits are erased • Xx 𝜌 =1/2 achieved when • Recover at most 1- α of bits, as α of bits are lost 12

  13. SYMMETRIC CHANNELS 13

  14. Symmetric – example of symmetric channel • x (x,y)-th element: p(y|x) • binary symmetric channel 14

  15. Proof 15

  16. PROPERTIES OF CHANNEL CAPACITY 16

  17. Properties of Channel Capacity   • C 0 since ( I X Y ; ) 0  =  = • C log since C max ( I X Y ; ) max H X ( ) log  =  = • C log since C max ( I X Y ; ) max H Y ( ) log • is a continuous function of p (x) I X Y ( ; ) • is a concave function of p (x) I X Y ( ; ) • Problem for computing channel capacity is a convex problem – maximization of a bounded concave function over a closed convex set – maximum can then be found by standard nonlinear optimization techniques such as gradient search 17

  18. DEFINITIONS 18

  19. Discrete Memoryless Channel (DMC) 19

  20. Code 20

  21. Probability of Error 21

  22. Rate and Capacity – write (2 𝑜𝑆 , 𝑜) codes to mean ( 2 𝑜𝑆 , 𝑜) codes to simplify the notation 22

  23. CHANNEL CODING THEOREM (SHANNON’S SECOND THEOREM) 23

  24. Basic Idea • For large block lengths, every channel has a subset of inputs producing disjoint sequences at the output • Ensure that no two input X sequences produce the same output Y sequence, to determine which X sequence was sent 24

  25. Basic Idea • Total number of possible output Y sequences is ≈ 2 𝑜𝐼(𝑍) • Divide into sets of size 2 𝑜𝐼(𝑍|𝑌) corresponding to the different input X sequences • Total number of disjoint sets is less than or equal to 2 𝑜(𝐼 𝑍 −𝐼(𝑍|𝑌)) = 2 𝑜𝐽(𝑌;𝑍) • Send at most ≈ 2 𝑜𝐽(𝑌;𝑍) distinguishable sequences of length n 25

  26. Channel Coding Theorem 26

  27. New Ideas in Shannon’s Proof • Allowing an arbitrarily small but nonzero probability of error • Using the channel many times in succession, so that the law of large numbers comes into effect • Calculating the average of the probability of error over a random choice of codebooks – symmetrize the probability and can then be used to show the existence of at least one good code • Shannon’s proof outline was based on idea of typical sequences, but was not made rigorous until much later 27

  28. Current Proof • Use the same essential ideas – random code selection, calculation of the average probability of error for a random choice of codewords, and so on • Main difference is in the decoding rule-decode by joint typicality – look for a codeword that is jointly typical with the received sequence – if find a unique codeword satisfying this property, declare that word to be the transmitted codeword – properties of joint typicality • with high probability the transmitted codeword and the received sequence are jointly typical, since they are probabilistically related • probability that any other codeword looks jointly typical with the received sequence is 2 −𝑜𝐽 • thus, if we have fewer then 2 𝑜𝐽 codewords, then with high probability there will be no other codewords that can be confused with the transmitted codeword, and the probability of error is small 28

  29. SOURCE – CHANNEL SEPARATION THEOREM (SHANNON’S THIRD THEOREM) 29

  30. Two Main Basic Theorems • Data compression: R>H • Data transmission: R<C • Is condition H < C necessary and sufficient for sending a source over a channel? 30

  31. Example • Consider two methods for sending digitized speech over a discrete memoryless channel – one-stage method: design a code to map the sequence of speech samples directly into the input of the channel – two-stage method: compress the speech into its most efficient representation, then use the appropriate channel code to send it over the channel • Lose something by using the two-stage method? – data compression does not depend on the channel – channel coding does not depend on the source distribution 31

  32. Joint vs. Separate Channel Coding • Joint source and channel coding • Separate source and channel coding 32

  33. Source – Channel Coding Theorem – consider the design of a communication system as a combination of two parts • source coding: design source codes for the most efficient representation of the data • channel coding: design channel codes appropriate for the channel encodes (combat the noise and errors introduced by the channel) – the separate encoders can achieve the same rates as the joint encoder • hold for the situation where one transmitter communicates to one receiver 33

  34. Summary 34

  35. Summary 35

  36. cuiying@sjtu.edu.cn iwct.sjtu.edu.cn/Personal/yingcui 36

Recommend


More recommend