information theory
play

Information Theory Lecture 9 Error Exponents The part on discrete - PDF document

Information Theory Lecture 9 Error Exponents The part on discrete channels of R. Gallager, A Simple Derivation of the Coding Theorem and Some Applications, IEEE Trans. on Inform. Theory , Jan. 1965 In addition some concepts


  1. Information Theory Lecture 9 • Error Exponents • The part on discrete channels of • R. Gallager, “A Simple Derivation of the Coding Theorem and Some Applications,” IEEE Trans. on Inform. Theory , Jan. 1965 • In addition some concepts found in • R. Gallager, Information Theory and Reliable Communication , Wiley 1968 Mikael Skoglund, Information Theory 1/29 Discrete Channels (recap) channel X n Y n • Let X and Y be finite sets. A discrete channel is a random mapping from X n to Y n described by the conditional pmfs 1 ∈ X n and y n p n ( y n 1 | x n 1 ) for all n ≥ 1 , x n 1 ∈ Y n . • The channel is (stationary and) memoryless if n p n ( y n 1 | x n � 1 ) = p ( y m | x m ) , n = 2 , 3 , . . . m =1 • A discrete memoryless channel (DMC) is completely described by the triple ( X , p ( y | x ) , Y ) Mikael Skoglund, Information Theory 2/29

  2. Block Channel Codes (recap) encoder channel decoder x n 1 ( ω ) Y n ω ˆ ω 1 α β • Define an ( M, n ) block channel code for a DMC ( X , p ( y | x ) , Y ) by 1 An index set I M � { 1 , . . . , M } 2 An encoder mapping α : I M → X n . The set � � x n 1 : x n C � 1 = α ( i ) , ∀ i ∈ I M of codewords is called the codebook . 3 A decoder mapping β : Y n → I M , as characterized by the decoding subsets 1 ∈ Y n : β ( y n Y n ( i ) = { y n 1 ) = i } , i = 1 , . . . , M Mikael Skoglund, Information Theory 3/29 • The rate of the code is R � log M [bits per channel use] n • A code is often represented by its codebook only ; the decoder can often be derived from the codebook using a specific rule (joint typicality, maximum a posteriori, maximum likelihood,. . . ) • Assume, in the following, that ω ∈ I M is drawn according to p ( m ) = Pr( ω = m ) Mikael Skoglund, Information Theory 4/29

  3. Error Probabilities (recap) • For a given code • Conditional � p n ( y n 1 | x n P e,m = 1 ( m )) (= λ m in CT) y n 1 ∈ ( Y n ( m )) c • Maximal � = λ ( n ) in CT � P e, max = P ( n ) e, max = max m P e,m • Overall/average/total M � P e = P ( n ) = p ( m ) P e,m e m =1 Mikael Skoglund, Information Theory 5/29 “Random Coding” (recap) • Assume that the M codewords x n 1 ( m ) , m = 1 , . . . , M , of a codebook C are drawn independently according to 1 ∈ X n = q n ( x n 1 ) , x n � x n � � x n � ⇒ P ( C ) = q n 1 (1) · · · q n 1 ( M ) . • Error probabilities over an ensemble of codes, • Conditional ¯ � P e,m = P ( C ) P e,m ( C ) C • Overall/average/total ¯ � P e = P ( C ) P e ( C ) C • Note : In addition to C a decoder needs to be specified Mikael Skoglund, Information Theory 6/29

  4. The Channel Coding Theorem (recap) • A rate R is achievable if there exists a sequence of ( M, n ) codes, with M = ⌈ 2 nR ⌉ , such that P ( n ) e, max → 0 as n → ∞ . Capacity C is the supremum of all achievable rates. • For a discrete memoryless channel, C = max p ( x ) I ( X ; Y ) • Previous proof (in CT) based on typical sequences = ⇒ limited insight, e.g., into how fast P ( n ) e, max → 0 as n → ∞ for R < C . . . • In fact, for any n > 0 , P ( n ) e, max < 4 · 2 − nE r ( R ) where E r ( R ) is the random coding exponent Mikael Skoglund, Information Theory 7/29 Exponential Bounds • A code C ( n, R ) of length n and rate R • Assume p ( m ) = M − 1 , a DMC and consider the average error probability M = 1 � P ( n ) P e,m e M m =1 • Bounds easily extended to P ( n ) e, max • Non-zero lower bound may not exist for arbitrary p ( m ) • Upper-bounds (there exists a code) P ( n ) ≤ 2 − nE min ( R ) , any n > 0 e • Lower-bounds (for all codes) P ( n ) ≥ 2 − nE max ( R ) , as n → ∞ e Mikael Skoglund, Information Theory 8/29

  5. Reliability Function, Error Exponents • The reliability function of a channel, − log P ∗ e ( n, R ) E ( R ) = lim , n n →∞ where P ∗ e ( n, R ) is the minimum over all codes C ( n, R ) • Lower bounds to E ( R ) yield upper bounds to P ( n ) (as e n → ∞ ) • “random coding” E r ( R ) and “expurgated” E ex ( R ) exponents • Upper bounds to E ( R ) yield lower bounds to P ( n ) (as e n → ∞ ) • “sphere-packing” E sp ( R ) and “straight-line” E sl ( R ) exponents Mikael Skoglund, Information Theory 9/29 • With E max = max ( E r , E ex ) and E min = min ( E sp , E sl ) E max ( R ) ≤ E ( R ) ≤ E min ( R ) • The critical rate R cr is the smallest R in [0 , C ] such that E max ( R ) = E min ( R ) = E ( R ) for R cr ≤ R ≤ C ; • For R ∈ [ R cr , C ) the exponent E ( R ) > 0 in ≈ 2 − nE ( R ) as n → ∞ P ( n ) e for the best possible existing code is known! Mikael Skoglund, Information Theory 10/29

  6. Decoding Rules • Joint typicality ( A ( n ) jointly typical set) ǫ 1 ∈ Y n : ( x n ⇒ m ′ = m } Y n ( m ) = { y n 1 ( m ′ ) , y n 1 ) ∈ A ( n ) ⇐ ǫ • Maximum a posteriori (minimum error probability) 1 ∈ Y n : m = argmax Y n ( m ) = { y n Pr( m ′ | y n 1 ) } m ′ • Maximum likelihood (a priori unknown / unmeaningful / uniform) 1 ∈ Y n : m = argmax Y n ( m ) = { y n p n ( y n 1 | x n 1 ( m ′ )) } m ′ • To derive existence results it suffices to consider a specific rule Mikael Skoglund, Information Theory 11/29 Two Codewords • Two codewords, C = { x n 1 (1) , x n 1 (2) } , and any channel p n ( y n 1 | x n 1 ) • Assume maximum likelihood decoding, 1 ∈ Y n : p n ( y n Y n (1) = { y n 1 | x n 1 (1)) > p n ( y n 1 | x n 1 (2)) } Hence, for any s ∈ (0 , 1) it holds that � p n ( y n 1 | x n P e, 1 = 1 (1)) y n 1 ∈Y n (1) c � p n ( y n 1 | x n 1 (1)) 1 − s p n ( y n 1 | x n 1 (2)) s ≤ y n 1 ∈Y n (1) c � p n ( y n 1 | x n 1 (1)) 1 − s p n ( y n 1 | x n 1 (2)) s ≤ y n 1 ∈Y n • An equivalent bound applies to P e, 2 Mikael Skoglund, Information Theory 12/29

  7. • For a memoryless channel we get (with ¯ m = ( m mod 2) + 1 ) n n m )) s = � � p ( y i | x i ( m )) 1 − s p ( y i | x i ( ¯ � P e,m ≤ g n ( s ) , m = 1 , 2 i =1 y i ∈Y i =1 • For a BSC( ǫ ) with two codewords at distance d n � d � � � P e,m ≤ min g n ( s ) = 2 ǫ (1 − ǫ ) m = 1 , 2 s ∈ (0 , 1) i =1 ⇒ For a “best” pair of codewords ( d = n ) � n � � P e,m ≤ 2 ǫ (1 − ǫ ) m = 1 , 2 ⇒ For a “typical” pair of codewords ( d = n/ 2 ) � n/ 2 � � P e,m ≤ 2 ǫ (1 − ǫ ) m = 1 , 2 Mikael Skoglund, Information Theory 13/29 Ensemble Average – Two Codewords • Pick a probability assignment q n on X n , and choose M codewords in C = { x n 1 (1) , . . . , x n 1 ( M ) } independently; M � q n ( x n P ( C ) = 1 ( m )) m =1 • For memoryless channels, we take q n of the form n q n ( x n � 1 ) = q 1 ( x i ) i =1 Mikael Skoglund, Information Theory 14/29

  8. • Thus, for m = 1 , 2 ¯ � � q n ( x n 1 (1)) q n ( x n P e,m = 1 (2)) P e,m x n x n 1 (1) ∈X n 1 (2) ∈X n   � � q n ( x n 1 (1)) p n ( y n 1 | x n 1 (1)) 1 − s ≤   y n 1 ∈Y n x n 1 (1) ∈X n   � q n ( x n 1 (2)) p n ( y n 1 | x n 1 (2)) s ×   x n 1 (2) ∈X n Minimum over s ∈ (0 , 1) at s = 1 / 2 = ⇒ 2   � ¯ �  � q n ( x n p n ( y n 1 | x n P e,m ≤ 1 ) 1 )  y n x n 1 ∈Y n 1 ∈X n Mikael Skoglund, Information Theory 15/29 • For a memoryless channel n  � 2  ��   ¯ � � P e,m ≤ q 1 ( x ) p 1 ( y | x ) m = 1 , 2   y ∈Y n x ∈X • In particular, for a BSC( ǫ ) with q 1 ( x ) = 1 / 2 � 2 � n √ � √ ǫ + � 1 ¯ P e,m ≤ 1 − ǫ m = 1 , 2 2 1 � √ ǫ + √ 1 − ǫ � 2 (random) 1 – Solid: 0.8 2 0.6 � 1 / 2 � � – Dashed: 2 ǫ (1 − ǫ ) (typical) 0.4 0.2 � – Dotted: 2 ǫ (1 − ǫ ) (best) 0.1 0.2 0.3 0.4 0.5 Mikael Skoglund, Information Theory 16/29

  9. Alternative Derivation — Still Two Codewords • Examine the ensemble average directly ¯ � � q n ( x n p n ( y n 1 | x n 1 (1)) Pr( y n 1 ∈ Y n (1) c ) P e, 1 = 1 (1)) y n x n 1 ∈Y n 1 (1) ∈X n • Since the codewords are randomly chosen Pr( y n 1 ∈ Y n (1) c ) � q n ( x n = 1 (2)) x n 1 (2): p n ( y n 1 | x n 1 (1)) ≤ p n ( y n 1 | x n 1 (2)) � s � p n ( y n 1 | x n 1 (2)) � q n ( x n ≤ 1 (2)) p n ( y n 1 | x n 1 (1)) x n 1 (2) ∈X n • Substituting this into the first equation yields the result • This method generalizes more easily! Mikael Skoglund, Information Theory 17/29 Bound on ¯ P e,m – Many Codewords • As before, ¯ � q n ( x n � p n ( y n 1 | x n 1 ( m )) Pr( y n 1 ∈ Y n ( m ) c ) P e,m = 1 ( m )) y n x n 1 ( m ) ∈X n 1 ∈Y n • For M ≥ 2 codewords, any ρ ∈ [0 , 1] and s > 0 � Pr( y n 1 ∈ Y n ( m ) c ) { y n 1 ∈ Y n ( m ′ ) } ) ≤ Pr( m ′ � = m ρ    � Pr( y n 1 ∈ Y n ( m ′ )) ≤  m ′ � = m ρ   p n ( y n 1 | x n 1 ) s � q n ( x n ≤  ( M − 1) 1 )  p n ( y n 1 | x n 1 ( m )) s x n 1 ∈X n Mikael Skoglund, Information Theory 18/29

Recommend


More recommend