Information Theory Lecture 4 • Discrete channels, codes and capacity: CT7 • Channels: CT7.1–2 • Capacity and the coding theorem: CT7.3–7 and CT7.9 • Combining source and channel coding: CT7.13 Mikael Skoglund, Information Theory 1/19 Discrete Channels channel p ( y | x ) X Y • Let X and Y be finite sets. • A discrete channel is a random mapping p ( y | x ) : X �− → Y . • The nth extension of the discrete channel is a random 1 ) : X n �− mapping p ( y n 1 | x n → Y n , defined for all n ≥ 1 , 1 ∈ X n and y n x n 1 ∈ Y n . • A pmf p ( x n 1 ) induces a pmf p ( y n 1 ) via the channel, � p ( y n p ( y n 1 | x n 1 ) p ( x n 1 ) = 1 ) x n 1 Mikael Skoglund, Information Theory 2/19
• The channel is stationary if for any n 1 ) = p ( y n + k 1+ k | x n + k p ( y n 1 | x n 1+ k ) , k = 1 , 2 , . . . • A stationary channel is memoryless if 1 y m − 1 p ( y m | x m ) = p ( y m | x m ) , m = 2 , 3 , . . . 1 That is, the channel output at time m does not depend on past inputs or outputs . • Furthermore, if the channel is used without feedback n � p ( y n 1 | x n 1 ) = p ( y m | x m ) , n = 2 , 3 , . . . m =1 That is, each time the channel is used its effect on the output is independent of previous and future uses . Mikael Skoglund, Information Theory 3/19 • A discrete memoryless channel (DMC) is completely described by the triple ( X , p ( y | x ) , Y ) . • The binary symmetric channel (BSC) with crossover probability ε , • a DMC with X = Y = { 0 , 1 } and p (1 | 0) = p (0 | 1) = ε 1 − ε 0 0 ε X Y ε 1 1 1 − ε • The binary erasure channel (BEC) with erasure probability ε , • a DMC with X = { 0 , 1 } , Y = { 0 , 1 , e } and p ( e | 0) = p ( e | 1) = ε 1 − ε 0 0 ε e X Y ε 1 1 1 − ε Mikael Skoglund, Information Theory 4/19
A Block Channel Code encoder channel decoder x n 1 ( ω ) Y n 1 ω α ( · ) p ( y | x ) β ( · ) ω ˆ • An ( M, n ) block channel code for a DMC ( X , p ( y | x ) , Y ) is defined by: 1 An index set I M � { 1 , . . . , M } . → X n . The set 2 An encoder mapping α : I M �− � � x n 1 : x n C n � 1 = α ( i ) , ∀ i ∈ I M of codewords is called the codebook . 3 A decoder mapping β : Y n �− → I M . • The rate of the code is R � log M [bits per channel use] n Mikael Skoglund, Information Theory 5/19 Why? • M different codewords { x n 1 (1) , . . . , x n 1 ( M ) } can convey log M bits of information per codeword, or R bits per channel use. • Consider M = 2 k , |X| = 2 , and assume that k < n . Then k “information bits” are mapped into n > k “coded bits.” Introduces redundancy ; can be employed by the decoder to correct channel errors . Mikael Skoglund, Information Theory 6/19
Error Probabilities • Information symbol ω ∈ I M , with p ( i ) = Pr( ω = i ) . Then, for a given DMC and a given code → X n → Y n ω = β ( Y n ω − 1 = α ( ω ) − − → ˆ 1 ) 1 • Define: 1 The conditional error probability: λ i = Pr(ˆ ω � = i | ω = i ) 2 The maximal error probability: λ ( n ) = max { λ 1 , . . . , λ M } 3 The average error probability: M � P ( n ) = Pr(ˆ ω � = ω ) = λ i p ( i ) e i =1 Mikael Skoglund, Information Theory 7/19 Jointly Typical Sequences The set A ( n ) of jointly typical sequences with respect to a pmf ε p ( x, y ) is the set { ( x n 1 , y n 1 ) } of sequences for which � � � − 1 n log p ( x n � � 1 ) − H ( X ) � < ε � � � � � − 1 � n log p ( y n � 1 ) − H ( Y ) � < ε � � � � � − 1 n log p ( x n 1 , y n � � 1 ) − H ( X, Y ) � < ε � � where n � p ( x n 1 , y n 1 ) = p ( x m , y m ) m =1 � � p ( x n p ( x n 1 , y n p ( y n p ( x n 1 , y n 1 ) = 1 ) , 1 ) = 1 ) y n x n 1 1 and where the entropies are computed based on p ( x, y ) . Mikael Skoglund, Information Theory 8/19
The joint AEP 1 ) = � n Let ( X n 1 , Y n 1 ) drawn according to p ( x n 1 , y n m =1 p ( x m , y m ) 1 ) ∈ A ( n ) � ( X n 1 , Y n � • Pr > 1 − ε for sufficiently large n . ε • | A ( n ) ε | ≤ 2 n ( H ( X,Y )+ ε ) . • | A ( n ) ε | ≥ (1 − ε )2 n ( H ( X,Y ) − ε ) for sufficiently large n . • If ˜ 1 and ˜ X n Y n 1 are drawn independently according to p ( x n 1 p ( x n 1 , y n 1 ) and p ( y n 1 p ( x n 1 , y n 1 ) = � 1 ) = � 1 ) , then y n x n ( ˜ 1 , ˜ X n Y n 1 ) ∈ A ( n ) ≤ 2 − n ( I ( X ; Y ) − 3 ε ) � � Pr ε and for sufficiently large n ( ˜ 1 , ˜ 1 ) ∈ A ( n ) ≥ (1 − ε )2 − n ( I ( X ; Y )+3 ε ) X n Y n � � Pr ε with I ( X ; Y ) computed for the pmf p ( x, y ) . Mikael Skoglund, Information Theory 9/19 Channel Capacity • For a fixed n , a code can convey more information for large ⇒ we would like to maximize the rate R = 1 M = n log M without sacrificing performance • Which is the largest R that allows for a (very) low P ( n ) ?? e • For a given channel we say that the rate R is achievable if there exists a sequence of ( M, n ) codes, with M = ⌈ 2 nR ⌉ , such that the maximal probability of error λ ( n ) → 0 as n → ∞ . The capacity C of a channel is the supremum of all rates that are achievable over the channel . Mikael Skoglund, Information Theory 10/19
Random Code Design • Choose a joint pmf p ( x n 1 ) on X n . • Random code design : Draw M codewords x n 1 ( i ) , i = 1 , . . . , M , i.i.d. according to p ( x n 1 ) and let these define a codebook x n 1 (1) , . . . , x n � � C n = 1 ( M ) . • Note : The interpretation here is that the codebook is “designed” in a random fashion . When the resulting code then is used , the codebook must, of course, be fixed and known. . . Mikael Skoglund, Information Theory 11/19 A Lower Bound for C of a DMC • A DMC ( X , p ( y | x ) , Y ) . • Fix a pmf p ( x ) for x ∈ X . 1 ) = � p ( x m ) . Generate C n = { x n 1 (1) , . . . , x n 1 ( M ) } using p ( x n • A data symbol ω is generated according to a uniform distribution on I M , and x n 1 ( ω ) is transmitted. • The channel produces a corresponding output sequence Y n 1 . • Let A ( n ) be the typical set w.r.t. p ( x, y ) = p ( y | x ) p ( x ) . ε At the receiver, the decoder then uses the following decision rule. Index ˆ ω was sent if: ∈ A ( n ) • � x n ω ) , Y n � 1 (ˆ for some small ε ; ε 1 � x n 1 ( ω ) , Y n � • no other ω corresponds to a jointly typical . 1 Mikael Skoglund, Information Theory 12/19
Now study π n = Pr(ˆ ω � = ω ) where “Pr” is over the random codebook selection, the data variable ω and the channel. • Symmetry = ⇒ π n = Pr(ˆ ω � = 1 | ω = 1) • Let E i = { ( x n 1 ( i ) , Y n 1 ) ∈ A ( n ) ε } then for a sufficiently large n , M � π n = P ( E c 1 ∪ E 2 ∪ · · · ∪ E M ) ≤ P ( E c 1 ) + P ( E i ) i =2 ≤ ε + ( M − 1)2 − n ( I ( X ; Y ) − 3 ε ) ≤ ε + 2 − n ( I ( X ; Y ) − R − 3 ε ) because of the union bound and the joint AEP. Mikael Skoglund, Information Theory 13/19 • Note that p ( y | x ) p ( x ) log p ( y | x ) � I ( X ; Y ) = p ( y ) x,y with p ( y ) = � x p ( y | x ) p ( x ) , where p ( x ) generated the random codebook and p ( y | x ) is given by the channel. • Let C tot be the set of all possible codebooks that can be 1 ) = � p ( x m ) , then at least one C n ∈ C tot generated by p ( x n must give P ( n ) ≤ π n ≤ ε + 2 − n ( I ( X ; Y ) − R − 3 ε ) e = ⇒ as long as R < I ( X ; Y ) − 3 ε there exists at least one n , that can give P ( n ) C n ∈ C tot , say C ∗ → 0 as n → ∞ . e Mikael Skoglund, Information Theory 14/19
• Order the codewords in C ∗ n according to the corresponding λ i ’s and throw away the worst half = ⇒ • new rate R ′ = R − n − 1 • for the remaining codewords λ ( n ) ≤ ε + 2 − n ( I ( X ; Y ) − R − 3 ε ) 2 = ⇒ for any p ( x ) , all rates R < I ( X ; Y ) − 3 ε achievable = ⇒ all rates R < max p ( x ) I ( X ; Y ) − 3 ε achievable = ⇒ C ≥ max p ( x ) I ( X ; Y ) Mikael Skoglund, Information Theory 15/19 An Upper Bound for C of a DMC • Let C n = { x n 1 (1) , . . . , x n 1 ( M ) } be any sequence of codes that can achieve λ ( n ) → 0 at a fixed rate R = 1 n log M . • Note that λ ( n ) → 0 = ⇒ P ( n ) → 0 for any p ( ω ) . e We can assume C n encodes equally probable ω ∈ I M . • Fano’s inequality = ⇒ R ≤ 1 R + 1 1 ) ≤ 1 n + P ( n ) nI ( x n 1 ( ω ); Y n n + P ( n ) R +max p ( x ) I ( X ; Y ) e e That is, for any fixed achievable R λ ( n ) → 0 = ⇒ R ≤ max p ( x ) I ( X ; Y ) = ⇒ C ≤ max p ( x ) I ( X ; Y ) Mikael Skoglund, Information Theory 16/19
Recommend
More recommend