information theory
play

Information Theory Lecture 2 Sources and entropy rate: CT4 Typical - PDF document

Information Theory Lecture 2 Sources and entropy rate: CT4 Typical sequences: CT3 Introduction to lossless source coding: CT5.15 Mikael Skoglund, Information Theory 1/23 Information Sources source X n Source data : a speech


  1. Information Theory Lecture 2 • Sources and entropy rate: CT4 • Typical sequences: CT3 • Introduction to lossless source coding: CT5.1–5 Mikael Skoglund, Information Theory 1/23 Information Sources source X n • Source data : a speech signal, an image, a fax, a computer file,. . . • In practice source data is time-varying and unpredictable. • Bandlimited continuous-time signals (e.g. speech) can be sampled into discrete time and reproduced without loss. A source S is defined by a discrete-time stochastic process { X n } . Mikael Skoglund, Information Theory 2/23

  2. • If X n ∈ X , ∀ n , the set X is the source alphabet . • The source is • stationary if { X n } is stationary. • ergodic if { X n } is ergodic. • memoryless if X n and X m are independent for n � = m . • iid if { X n } is iid (independent and identically distributed). • stationary and memoryless = ⇒ iid • continuous if X is a continuous set (e.g. the real numbers). • discrete if X is a discrete set (e.g. the integers { 0 , 1 , 2 , . . . , 9 } ). • binary if X = { 0 , 1 } . Mikael Skoglund, Information Theory 3/23 • Consider a source S , described by { X n } . Define X N 1 � ( X 1 , X 2 , . . . , X N ) . • The entropy rate of S is defined as 1 N H ( X N H ( S ) � lim 1 ) N →∞ (when the limit exists). • H ( X ) is the entropy of a single random variable X , while entropy rate defines the “entropy per unit time” of the stochastic process S = { X n } . Mikael Skoglund, Information Theory 4/23

  3. • A stationary source S always has a well-defined entropy rate, and it furthermore holds that 1 N H ( X N H ( S ) = lim 1 ) = lim N →∞ H ( X N | X N − 1 , X N − 2 , . . . , X 1 ) . N →∞ That is, H ( S ) is a measure of the information gained when observing a source symbol, given knowledge of the infinite past. • We note that for iid sources N 1 1 � N H ( X N H ( S ) = lim 1 ) = lim H ( X m ) = H ( X 1 ) N N →∞ N →∞ m =1 • Examples (from CT4): Markov chain, Markov process, Random walk on a weighted graph, hidden Markov models,. . . Mikael Skoglund, Information Theory 5/23 Typical Sequences • A binary iid source { b n } with p = Pr( b n = 1) • Let R be the number of 1:s in a sequence, b 1 , . . . , b N , of ⇒ p ( b N 1 ) = p R (1 − p ) N − R length N = • P ( r ) � Pr( R N ≤ r ) for N = 10 , 50 , 100 , 500 , with p = 0 . 3 , P ( r ) 1 0.8 0.6 0.4 0.2 r 0.2 0.4 0.6 0.8 1 Mikael Skoglund, Information Theory 6/23

  4. • As N grows, the probability that a sequence will satisfy ⇒ given a b N R ≈ p · N is high = 1 that the source produced, it is likely that p ( b N 1 ) ≈ p pN (1 − p ) (1 − p ) N In the sense that the above holds with high probability, the “source will only produce” sequences for which 1 N log p ( b N 1 ) ≈ p log p + (1 − p ) log(1 − p ) = − H That is, for large N it holds with high probability that p ( b N 1 ) ≈ 2 − N · H where H is the entropy (entropy rate) of the source. Mikael Skoglund, Information Theory 7/23 • A general discrete source that produces iid symbols X n , with 1 ∈ X N we have X n ∈ X and Pr( X n = x ) = p ( x ) . For all x N N � log p ( x N 1 ) = log p ( x 1 , . . . , x N ) = log p ( x m ) . m =1 For an arbitrary random sequence X N 1 we hence get N 1 1 � N log p ( X N lim 1 ) = lim log p ( X m ) = E log p ( X 1 ) a.s. N N →∞ N →∞ m =1 by the (strong) law of large numbers. That is, for large N p ( X N 1 ) ≈ 2 − N · H ( X 1 ) holds with high probability. Mikael Skoglund, Information Theory 8/23

  5. • The result (the Shannon–McMillan–Breiman Theorem ) can be extended to (discrete) stationary and ergodic sources (CT16.8). For a stationary and ergodic source, S , it holds that 1 N log p ( X N − lim 1 ) = H ( S ) a.s. N →∞ where H ( S ) is the entropy rate of the source. • We note that p ( X N 1 ) is a random variable . However, the right-hand side of p ( X N 1 ) ≈ 2 − N · H ( S ) is a constant = ⇒ a constraint on the sequences the source “typically” produces! Mikael Skoglund, Information Theory 9/23 The Typical Set • For a given stationary and ergodic source S , the typical set 1 ∈ X N for which A ( N ) is the set of sequences x N ε 2 − N ( H ( S )+ ε ) ≤ p ( x N 1 ) ≤ 2 − N ( H ( S ) − ε ) ⇒ − N − 1 log p ( x N 1 ∈ A ( N ) 1 x N 1 ) ∈ [ H ( S ) − ε, H ( S ) + ε ] ε 1 ∈ A ( N ) 2 Pr( X N ) > 1 − ε , for N sufficiently large ε 3 | A ( N ) | ≤ 2 N ( H ( S )+ ε ) ε 4 | A ( N ) | ≥ (1 − ε )2 N ( H ( S ) − ε ) , for N sufficiently large ε That is, a large N and a small ε gives 1 ∈ A ( N ) ) ≈ 1 , | A ( N ) | ≈ 2 N H ( S ) Pr( X N ε ε | − 1 ≈ 2 − N H ( S ) for x N p ( x N 1 ) ≈ | A ( N ) 1 ∈ A ( N ) ε ε Mikael Skoglund, Information Theory 10/23

  6. The Typical Set and Source Coding 1 Fix ε (small) and N (large). Partition X N into two subsets: and B = X N \ A . A = A ( N ) ε 2 Observed sequences will “typically” belong to the set A . There are M = | A | ≤ 2 N ( H ( S )+ ε ) elements in A . 3 Let the different i ∈ { 0 , . . . , M − 1 } enumerate the elements of A . An index i can be stored or transmitted spending no more than ⌈ N · ( H ( S ) + ε ) ⌉ bits. 4 Encoding . For each observed sequence x N 1 1 if x N 1 ∈ A produce the corresponding index i . 2 if x N 1 ∈ B let i = 0 . 5 Decoding . Map each index i back into A ⊂ X M . Mikael Skoglund, Information Theory 11/23 • An error appears with probability Pr( X N 1 ∈ B ) ≤ ε for large N = ⇒ the probability of error can be made to vanish as N → ∞ • An “almost noiseless” source code that maps x N 1 into an index i , where i can be represented using at most ⌈ N · ( H ( S ) + ε ) ⌉ bits. However, since also M ≥ (1 − ε )2 N ( H ( S ) − ε ) , for a large enough N , we need at least ⌊ log(1 − ε ) + N ( H ( S ) − ε ) ⌋ bits. • Thus, for large N it is possible to design a source code with rate H ( S ) − ε + 1 < R ≤ H ( S ) + ε + 1 � � log(1 − ε ) − 1 N N bits per source symbol. = ⇒ “Operational” meaning of entropy rate: the smallest rate at which a source can be coded with arbitrarily low error probability . Mikael Skoglund, Information Theory 12/23

  7. Data Compression • For large N it is possible to design a source code with rate H ( S ) − ε + 1 < R ≤ H ( S ) + ε + 1 � � log(1 − ε ) − 1 N N bits per symbol, having a vanishing probability of error. • The above is an existence result ; it doesn’t tell us how to design codes. • For a fixed finite N , the typical-sequence codes discussed are “almost noiseless” fixed-length to fixed-length codes. • We will now start looking at concrete “zero-error” codes, their performance and how to design them. • Price to pay to get zero errors: fixed-length to variable -length Mikael Skoglund, Information Theory 13/23 Various Classifications • Source alphabet • Discrete sources • Continuous sources • Recovery requirement • Lossless source coding • Lossy source coding • Coding method • Fixed-length to fixed-length • Fixed-length to variable-length • Variable-length to fixed-length • Variable-length to variable-length Mikael Skoglund, Information Theory 14/23

  8. Zero-Error Source Coding • Source coding theorem for symbol codes (today) • Symbol codes, code extensions • Uniquely decodable and instantaneous (prefix) codes • Kraft(-McMillan) inequality • Bounds on the optimal codelength • Source coding theorem for zero-error prefix codes • Specific code constructions (next time) • Symbol codes: Huffman codes, Shannon-Fano codes • Stream codes: arithmetic codes, Lempel-Ziv codes Mikael Skoglund, Information Theory 15/23 What Is a Symbol Code? • D -ary symbol code C for a random variable X C : X → { 0 , 1 , . . . , D − 1 } ∗ • A ∗ = set of finite-length strings of symbols from a finite set A • C ( x ) codeword for x ∈ X • l ( x ) length of C ( x ) (i.e. number of D -ary symbols) • Data compression = ⇒ minimize expected length � L ( C, X ) = p ( x ) l ( x ) x ∈X • Extension of C is C ∗ : X ∗ → { 0 , 1 , . . . , D − 1 } ∗ C ∗ ( x n 1 ) = C ( x 1 ) C ( x 2 ) · · · C ( x n ) , n = 1 , 2 , . . . Mikael Skoglund, Information Theory 16/23

  9. Example: Encoding Coin Flips X Problem C 0 0 1 10 010 C u 00 1 10 10 · · · 0 C i 00 1 01 – Mikael Skoglund, Information Theory 17/23 Uniquely Decodable Codes • C is uniquely decodable if ∀ x , y ∈ X ∗ , ⇒ C ∗ ( x ) � = C ∗ ( y ) x � = y = • Any uniquely decodable code must satisfy the Kraft inequality D − l ( x ) ≤ 1 � x ∈X (McMillan’s result, Karush’s proof in C&T) Mikael Skoglund, Information Theory 18/23

Recommend


More recommend