Codes Codes Coding Theorems Coding Theorems 1 Codes Formal Modeling in Cognitive Science Source Codes Lecture 30: Codes; Kraft Inequality; Source Coding Theorem Properties of Codes Frank Keller 2 Coding Theorems Kraft Inequality School of Informatics University of Edinburgh Shannon Information keller@inf.ed.ac.uk Source Coding Theorem March 16, 2005 Frank Keller Formal Modeling in Cognitive Science 1 Frank Keller Formal Modeling in Cognitive Science 2 Codes Source Codes Codes Source Codes Coding Theorems Properties of Codes Coding Theorems Properties of Codes Source Codes Source Codes Definition: Source Code A source code C for a random variable X is a mapping from x ∈ X Example to { 0 , 1 } ∗ . Let C ( x ) denote the code word for x and l ( x ) denote Let X be a random variable with the following distribution and the length of C ( x ). code word assignment: x a b c d Here, { 0 , 1 } ∗ is the set of all finite binary strings (we will only 1 1 1 1 f ( x ) 2 4 8 8 consider binary codes). C ( x ) 0 10 110 111 Definition: Expected Length The expected code length of X is: The expected length L ( C ) of a source code C ( x ) for a random f ( x ) l ( x ) = 1 2 · 1 + 1 4 · 2 + 1 8 · 3 + 1 variable with the probability distribution f ( x ) is: � L ( C ) = 8 · 3 = 1 . 75 x ∈ X � L ( C ) = f ( x ) l ( x ) x ∈ X Frank Keller Formal Modeling in Cognitive Science 3 Frank Keller Formal Modeling in Cognitive Science 4
Codes Source Codes Codes Source Codes Coding Theorems Properties of Codes Coding Theorems Properties of Codes Properties of Codes Properties of Codes Definition: Extension Definition: Non-singular Code The extension C ∗ of a code C is: A code is called non-singular if every x ∈ X maps into a different C ∗ ( x 1 x 2 . . . x n ) = C ( x 1 ) C ( x 2 ) . . . C ( x n ) string in { 0 , 1 } ∗ . where C ( x 1 ) C ( x 2 ) . . . C ( x n ) indicates the concatenation of the If a code is non-singular, then we can transmit a value of X corresponding code words. unambiguously. However, what happens if we want to transmit several values Definition: Uniquely Decodable of X in a row? A code is called uniquely decodable if its extension is non-singular. We could use a special symbol to separate the code words. However, this is not an efficient use of the special symbol; If the code is uniquely decodable, then for each string there is only instead use self-punctuating codes (prefix codes). one source string that produced it; However, we have to look at the whole string to do the decoding. Frank Keller Formal Modeling in Cognitive Science 5 Frank Keller Formal Modeling in Cognitive Science 6 Codes Source Codes Codes Source Codes Coding Theorems Properties of Codes Coding Theorems Properties of Codes Properties of Codes Properties of Codes Definition: Prefix Code A code is called a prefix code (instantaneous code) if no code word Example is a prefix of another code word. The following table illustrates the different classes of codes: We don’t have to wait for the whole string to be able to decode it; Non-singular, not Uniq. decodable, the end of a code word can be recognized instantaneously. Singular uniq. decodable not instant. Instant. x a 0 0 10 0 Example b 0 010 00 10 The code in the previous example is a prefix code. Take the c 0 01 11 110 following sequence: 01011111010. d 0 10 110 111 The first symbol, 0, tells us we have an a ; the next two symbols 10, have to correspond to b ; the next three symbols have to correspond to a d , etc. The decoded sequence is: abdcb . Frank Keller Formal Modeling in Cognitive Science 7 Frank Keller Formal Modeling in Cognitive Science 8
Kraft Inequality Kraft Inequality Codes Codes Shannon Information Shannon Information Coding Theorems Coding Theorems Source Coding Theorem Source Coding Theorem Kraft Inequality Kraft Inequality Problem: construct an instantaneous code of minimum expected We can illustrate the Kraft Inequality using a coding tree. Start length for a given random variable. The following inequality holds: with a tree that contains all three-bit codes: Theorem: Kraft Inequality For an instantaneous code C for a random variable X , the code ✟ ❍ ✟✟✟✟✟ ❍ ❍ word lengths l ( x ) must satisfy the inequality: ❍ ❍ ❍ 0 1 2 − l ( x ) ≤ 1 � ✟ ❍ ✟ ❍ ✟✟ ❍ ✟✟ ❍ ❍ ❍ x ∈ X 00 01 10 11 ✟ ❍ ❍ ✟ ❍ ❍ ✟ ❍ ❍ ✟ ❍ ❍ Conversely, if the code word lengths satisfy this inequality, then ✟ ✟ ✟ ✟ 000 001 010 011 100 101 110 111 there exists an instantaneous code with these word lengths. Frank Keller Formal Modeling in Cognitive Science 9 Frank Keller Formal Modeling in Cognitive Science 10 Kraft Inequality Kraft Inequality Codes Codes Shannon Information Shannon Information Coding Theorems Coding Theorems Source Coding Theorem Source Coding Theorem Kraft Inequality Kraft Inequality Now if we decide to use the code word 10: For each code word, prune all the branches below it (as they ✟✟✟✟✟✟ ❍ ❍ violate the prefix condition). For example, if we decide to use the ❍ ❍ code word 0, we prune all the red branches: ❍ ❍ 0 1 ✟✟✟ ❍ ✟✟✟ ❍ ✟ ❍ ❍ ✟✟✟✟✟ ❍ ❍ ❍ ❍ ❍ ❍ 00 01 10 11 ❍ ❍ ✟ ❍ ❍ ✟ ❍ ❍ ✟ ❍ ❍ ✟ ❍ ❍ 0 1 ✟ ✟ ✟ ✟ 000 001 010 011 100 101 110 111 ✟✟✟ ❍ ✟ ❍ ❍ ✟✟ ❍ ❍ ❍ 10 11 00 01 The remaining leaves constitute a prefix code. Kraft inequality: ✟ ❍ ❍ ✟ ❍ ❍ ✟ ❍ ❍ ✟ ❍ ❍ ✟ ✟ ✟ ✟ 2 − l ( x ) = 2 − 1 + 2 − 2 + 2 − 3 + 2 − 3 = 1 2 + 1 4 + 1 8 + 1 100 101 110 111 000 001 010 011 � 8 = 1 x ∈ X Frank Keller Formal Modeling in Cognitive Science 11 Frank Keller Formal Modeling in Cognitive Science 12
Kraft Inequality Kraft Inequality Codes Codes Shannon Information Shannon Information Coding Theorems Coding Theorems Source Coding Theorem Source Coding Theorem Shannon Information Shannon Information Example The Kraft inequality tells us that an instantaneous code exists. But we are interested in finding the optimal code, i.e., one that Consider the following random variable with the optimal code minimized the expected code length L ( C ) . lengths given by the Shannon information: Theorem: Shannon Information x a b c d 1 1 1 1 The expected length L ( C ) of a code C for the random variable X f ( x ) 2 4 8 8 l ( x ) 1 2 3 3 with distribution f ( x ) is minimal if the code word lengths l ( x ) are given by: The expected code length L ( C ) for the optimal code is: l ( x ) = − log f ( x ) This quantity is called the Shannon information. � � L ( C ) = f ( x ) l ( x ) = − f ( x ) log f ( x ) = 1 . 75 x ∈ X x ∈ X Shannon information is pointwise entropy. (See mutual information Note that this is the same as the entropy of X , H ( X ). and pointwise mutual information.) Frank Keller Formal Modeling in Cognitive Science 13 Frank Keller Formal Modeling in Cognitive Science 14 Kraft Inequality Kraft Inequality Codes Codes Shannon Information Shannon Information Coding Theorems Coding Theorems Source Coding Theorem Source Coding Theorem Lower Bound on Expected Length Upper Bound on Expected Length Of course we are more interested in finding an upper bound, i.e., a code that has a maximum expected length: This observation about the relation between the entropy and the expected length of the optimal code can be generalized: Theorem: Source Coding Theorem Let C a code with optimal code lengths, i.e, l ( x ) = − log f ( x ) for Theorem: Lower Bound on Expected Length the random variable X with distribution f ( x ). Then the expected Let C be an instantaneous code for the random variable X . Then length L ( C ) is bounded by: the expected code length L ( C ) is bounded by: H ( X ) ≤ L ( C ) < H ( X ) + 1 L ( C ) ≥ H ( X ) Why is the upper bound H ( X ) + 1 and not H ( X )? Because sometimes the Shannon information gives us fractional lengths; we have to round up. Frank Keller Formal Modeling in Cognitive Science 15 Frank Keller Formal Modeling in Cognitive Science 16
Recommend
More recommend