lecture 2 lossless source coding
play

Lecture 2 Lossless Source Coding I-Hsiang Wang Department of - PowerPoint PPT Presentation

Lecture 2 Lossless Source Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 2, 2016 1 / 50 I-Hsiang Wang IT Lecture 2 The engineering problem motivating the study of this


  1. Lecture 2 Lossless Source Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 2, 2016 1 / 50 I-Hsiang Wang IT Lecture 2

  2. The engineering problem motivating the study of this lecture: For a (random) source sequence of length N , design an encoding scheme (mapping) to describe it using K bits, so that the decoder can reconstruct the source sequence at the destination from these K bits. How the encoding scheme works (the mapping) is known by the decoder a priori . Fundamental Questions : What is the minimum possible ratio K N (compression ratio/rate) ? How to achieve that fundamental limit? In this lecture, we will demonstrate that, for most random sources, the fundamental limit is the entropy rate of the random process of the source when we want to reconstruct the source losslessly. 2 / 50 I-Hsiang Wang IT Lecture 2

  3. The Source Coding Problem (Shannon's Abstraction) s [1 : N ] b [1 : K ] b s [1 : N ] Source Source Encoder Decoder Source Destination Meta Description 1 Encoder : Represent the source sequence s [1 : N ] by a binary source codeword { } 0 , 1 , . . . , 2 K − 1 w ≜ b [1 : K ] ∈ , with K as small as possible. 2 Decoder : From the source codeword w , reconstruct the source sequence either losslessly or within a certain distortion. 3 Efficiency : Determined by the code rate R ≜ K N bits/symbol time 3 / 50 I-Hsiang Wang IT Lecture 2

  4. Decoding Criteria s [1 : N ] b [1 : K ] s [1 : N ] b Source Source Encoder Decoder Source Destination Naturally, one would think of two different decoding criteria for the source coding problem. 1 Exact: the reconstructed sequence � s [1 : N ] = s [1 : N ] . 2 Lossy: the reconstructed sequence � s [1 : N ] ̸ = s [1 : N ] , but is within a prescribed distortion. 4 / 50 I-Hsiang Wang IT Lecture 2

  5. Let us begin with some simple back-of-envelope analysis of the system with the exact recovery criterion to get some intuition. For N fixed, if the decoder would like to reconstruct s [1 : N ] exactly for all possible s [1 : N ] ∈ S N , then it is simple to see that the smallest K must satisfy 2 K − 1 < |S| N ≤ 2 K = ⇒ K = ⌈ N log |S|⌉ . Why? Because every possible sequence has to be uniquely represented by K bits! It seems impossible to have data compression if we require exact reconstruction. What is going wrong? 5 / 50 I-Hsiang Wang IT Lecture 2

  6. Random Source Recall: data compression is possible because there is redundancy in the source sequence. One of the simplest ways to capture redundancy is to model the source as a random process. (Another reason to use a random source model is due to engineering reasons, as mentioned in Lecture 1.) Redundancy comes from the fact that different symbols in S take different probabilities to be drawn . With a random source model, immediately there are two approaches one can take to demonstrate data compression: Allow variable codeword length for different symbols with different probabilities, rather than fixing it to be K Allow (almost) lossless reconstruction rather than exact recovery 6 / 50 I-Hsiang Wang IT Lecture 2

  7. Block-to-Variable Source Coding s [1 : N ] b [1 : K ] b s [1 : N ] Source Source Encoder Decoder Variable Length Source Destination The key difference here is that we allow K to depend on the realization of the source, s [1 : N ] . Using variable codeword length is intuitive – for symbols with higher probability, we tend to use shorter codewords to represent it. The definition of the code rate is modified to R ≜ E [ K ] N . An optimal block-to-variable source code, Huffman code, is introduced to achieve the minimum compression rate for a given distribution of the random source. (See Chapter 5 of Cover&Thomas) Note : the decoding criterion here is exact reconstruction (zero error) 7 / 50 I-Hsiang Wang IT Lecture 2

  8. (Almost) Lossless Decoding Criterion Another way to let the randomness kick in: allow non-exact recovery. To be precise, we turn our focus to finding the smallest possible R = K N given that the error probability { } P ( N ) S [1 : N ] ̸ = � ≜ P → 0 as N → ∞ . S [1 : N ] e Key features of this approach: Focus on the asymptotic regime where N → ∞ ; instead of error-free reconstruction, the criterion is relaxed to vanishing error probability. Compared to the previous approach where the analysis is mainly combinatorial , the analysis here is majorly probabilistic . 8 / 50 I-Hsiang Wang IT Lecture 2

  9. Outline In this lecture, we shall 1 First, focusing on memoryless sources, introduce a powerful tool called typical sequences, and use typical sequences to prove a lossless source coding theorem 2 Second, extend the typical sequence framework to sources with memory, and prove a similar lossless source coding theorem there. We will show that the minimum compression rate is equal to the entropy of the random source. Let us begin with the simplest case where the source { S [ t ] | t = 1 , 2 , . . . } consists of i.i.d. random i.i.d. variables S [ t ] ∼ P S , which is called a discrete memoryless source (DMS). 9 / 50 I-Hsiang Wang IT Lecture 2

  10. Typical Sequences and a Lossless Source Coding Theorem Typical Sequences and a Lossless Source Coding Theorem 1 Typicality and AEP Lossless Source Coding Theorem 2 Weakly Typical Sequences and Sources with Memory Entropy Rate of Random Processes Typicality for Sources with Memory 10 / 50 I-Hsiang Wang IT Lecture 2

  11. Typical Sequences and a Lossless Source Coding Theorem Typicality and AEP Typical Sequences and a Lossless Source Coding Theorem 1 Typicality and AEP Lossless Source Coding Theorem 2 Weakly Typical Sequences and Sources with Memory Entropy Rate of Random Processes Typicality for Sources with Memory 11 / 50 I-Hsiang Wang IT Lecture 2

  12. Typical Sequences and a Lossless Source Coding Theorem Typicality and AEP Overview of Typicality Methods Goal : Understand and exploit the probabilistic asymptotic properties of a i.i.d. randomly generated sequence S [1 : N ] for coding. Key Observation : When N → ∞ , one often observe that a substantially small set of sequences become "typical", which contribute almost the whole probability, while others become "atypical". (cf. Lecture 2 "Operational Meaning of Entropy") For lossless reconstruction with vanishing error probability, we can use shorter codewords to label "typical" sequences and ignore "atypical" ones. Note : There are several notions of typicality and various definitions in the literature. In this lecture, we give two definitions: (robust) typicality and weak typicality. Notation : For notational convenience, we shall use the following interchangeably: → x t , and x [1 : N ] ← → x N . x [ t ] ← 12 / 50 I-Hsiang Wang IT Lecture 2

  13. p Typical Sequences and a Lossless Source Coding Theorem Typicality and AEP Typical Sequence A (robust) typical sequence is a sequence whose empirical distribution is close to true distribution. For a sequence x n , its empirical p.m.f. is given by the frequency of occurrence of a symbol in x n : ∑ n π ( a | x n ) ≜ 1 i =1 1 { x i = a } . n i.i.d. Due the law of large numbers, π ( a | X n ) → P X ( a ) for all a ∈ X as n → ∞ , if X i ∼ P X . That is, with high probability, the empirical p.m.f. does not deviate too much from the actual p.m.f. Definition 1 (Typical Sequence) For ε ∈ (0 , 1) , a sequence x n is called ε -typical with respect to random variable X ∼ P X if | π ( a | x n ) − P X ( a ) | ≤ εP X ( a ) , ∀ a ∈ X . ( X ) ≜ { x n ∈ X n | x n is ε -typical with respect to X } . The ε -typical set T ( n ) ε 13 / 50 I-Hsiang Wang IT Lecture 2

  14. 2 Typical Sequences and a Lossless Source Coding Theorem Typicality and AEP Note : In the following, if the context is clear, we will write " T ( n ) " instead of " T ( n ) ( X ) ". ε ε Example 1 ( 1 ) Consider a random bit sequence generated i.i.d. based on Ber . Let us set ε = 0 . 2 and n = 10 . What is T ( n ) ? How large is the typical set? ε sol : Based on the definition, a n -sequence x n is ε -typical iff π (0 | x n ) ∈ [0 . 4 , 0 . 6] and π (1 | x n ) ∈ [0 . 4 , 0 . 6] . In other words, the # of " 0 "s in the sequence should be 4, 5, or 6. Hence, T ( n ) consists of all ε length-10 sequences with 4, 5, or 6 " 0 "s. ( 10 ) ( 10 ) ( 10 ) The size of T ( n ) is = 714 . + + ε 4 5 6 14 / 50 I-Hsiang Wang IT Lecture 2

  15. lim Typical Sequences and a Lossless Source Coding Theorem Typicality and AEP Properties of Typical Sequences Let P ( x n ) ≜ P { X n = x n } = ∏ n i =1 P X ( x i ) , that is, the probability that the DMS generates the sequence x n . Similarly P ( A ) ≜ P { X n ∈ A} , denotes the probability of a set A . Proposition 1 (Properties of Typical Sequences and Typical Set) 1 ∀ x n ∈ T ( n ) ( X ) , 2 − n ( H ( X )+ δ ( ε )) ≤ P ( x n ) ≤ 2 − n ( H ( X ) − δ ( ε )) , where δ ( ε ) = εH ( X ) . ε (by definition of typical sequences and entropy) ( ) ( ) T ( n ) T ( n ) = 1 , i.e., P ≥ 1 − ε for n large enough. ( X ) ( X ) n →∞ P 2 ε ε (by the law of large numbers (LLN)) 3 |T ( n ) ( X ) | ≤ 2 n ( H ( X )+ δ ( ε )) . (by summing up the lower bound in property 1 over the typical set) ε ( X ) | ≥ (1 − ε )2 n ( H ( X ) − δ ( ε )) for n large enough. (by the upper bound in property 1, and property 2) 4 |T ( n ) ε 15 / 50 I-Hsiang Wang IT Lecture 2

Recommend


More recommend