✬ ✩ Universal Lossless Coding Performance Bounds 1 Applications of Random Coding and Algebraic Coding Theories to Universal Lossless Source Coding Performance Bounds Gil I. Shamir Department of Electrical & Computer Engineering University of Utah Salt Lake City, UT 84112 U.S.A. DIMACS - 2003 Workshop on Algebraic Coding Theory and Information Theory DIMACS Center, Rutgers University, Piscataway, NJ December 15-18, 2003 ✫ ✪
✬ ✩ Universal Lossless Coding Performance Bounds 2 Overview Research Problem • Average Case Universal Lossless Compression • Performance Lower Bounds (on Redundancy - best possible performance of any scheme for a specific model) Research Approach • Use Redundancy-Capacity Theorems to obtain bounds • Lower bound the relevant capacity for given source model Models Discussed • finite number of parameters parametric sources • i.i.d. sources with large alphabets • patterns induced by i.i.d. sources • piecewise stationary sources • piecewise stationary sources with slowly varying statistics ✫ ✪ • switching sources
✬ ✩ Universal Lossless Coding Performance Bounds 3 Universal Coding and Redundancy Problem Layout • A sequence x n of length n , governed by P θ , • θ unknown in a known class Λ, • uniquely decipherable code L ( · ) may depend on Λ but independent of θ . • Unknown parameters cost redundancy . Average Redundancy of code L ( · ) for n -sequences drawn by source θ = 1 △ nE θ L ( X n ) − H θ ( X n ) R n ( L, θ ) • E θ - mean w.r.t. θ , • H θ - per symbol entropy. Average Universality Measure of a Class Λ • Maximin R − n (Λ) and Minimax R + n (Λ) average redundancies - best code for some worst average (over x n ) case. [Davisson, 1973] ✫ ✪ • Average redundancy for most sources [Rissanen, 1984] (strongest sense).
✬ ✩ Universal Lossless Coding Performance Bounds 4 Redundancy-Capacity Theorem Weak Version [Implied from Davisson, 1973, Gallager, 1976] Let n → ∞ . Let ϕ be a set of M points θ in the class Λ k , that are distinguishable by x n . Then, the minimax and maximin redundancies satisfy n (Λ k ) ≥ (1 − ε ) log M R + n (Λ k ) = R − n Strong Random Coding Version [Merhav & Feder, 1995, 1996] Let n → ∞ . Define a distribution over Λ k , and partition most of the class Λ ε into disjoint countable sets ϕ , where the marginal of each θ ∈ ϕ is equal, and there are M φ ≥ M sources in ϕ , distinguishable by x n . Then, R n ( L, θ ) ≥ (1 − ε ) log M , n for every code L ( · ), and almost every θ ∈ Λ k . Distinguishability θ and θ ′ distinguishable if x n generated by θ appears to be generated by θ ′ with probability that goes to 0 and vice versa. ✫ ✪
✬ ✩ Universal Lossless Coding Performance Bounds 5 Use of Redundancy-Capacity Theorem Weak Version for Λ k 1. Demonstrate how to find ϕ . 2. Lower bound M . 3. Prove that all θ ∈ ϕ are distinguishable by x n . Strong Version for Λ k 1. Demonstrate how to define most of the class Λ ε . 2. Show that Λ ε is most of the class. 3. Show how to partition Λ ε such that every source in Λ ε is in exactly one ϕ , and sources in ϕ are uniformly distributed with the uniform prior on Λ k . Lower bound M . 4. Prove that for every valid ϕ , all θ ∈ ϕ are distinguishable by x n . Compound Classes If Λ = � k Λ k , redundancy for θ ∈ Λ k consists of Intra-class redundancy in Λ k , and Inter-class redundancy distinguishing Λ k from Λ. ✫ ✪
✬ ✩ Universal Lossless Coding Performance Bounds 6 Redundancy Capacity - Demo Λ Λ Λ Λ k Λ Λ ε Λ Λ ε ε ε M ϕ = 13 θ ∈ ϕ ϕ ϕ ϕ 1 1 1 1 M = 10 M ϕ = 10 θ ∈ ϕ ϕ ϕ ϕ 2 2 2 2 M ϕ = 12 θ ∈ ϕ ϕ ϕ ϕ 3 3 3 3 • The volume of Λ k outside Λ ε assumed negligible. • Any θ is contained in a unique ϕ and has equal probability to other θ ′ ∈ ϕ . • In every ϕ all points distinguishable by x n . By theorem, for every code and almost every θ ∈ Λ k , R n ( L, θ ) ≥ (1 − ε ) log 10 ✫ ✪ n
✬ ✩ Universal Lossless Coding Performance Bounds 7 Finite k -dimensional Parametric Sources • ϕ determined by initial shift u in a grid (one ϕ sufficient for maximin) • θ ∈ ϕ distinguishable if ϕ is a grid with spacing n − 0 . 5(1 − ε ) R n ( L, θ ) ≥ (1 − ε ) k log n 2 n for every code L ( · ) and almost every θ ∈ Λ [Rissanen, 1984] n -0.5(1- ε ) X X X X X X X X X X X X θ ∈ ϕ ϕ 1 ϕ ϕ 1 1 1 X X X X X X X X θ ∈ ϕ ϕ 2 ϕ ϕ X X X X 2 2 2 θ ∈ ϕ ϕ 3 ϕ ϕ X X X X 3 3 3 X X X X X X X X u 1 Initial shifts that X X X X u 2 X X X X define set ϕ ϕ ϕ ϕ X X X X u 1 u 3 u 2 ✫ ✪
✬ ✩ Universal Lossless Coding Performance Bounds 8 Distinguishability Setting and Proof in most sources sense • Choose a random grid ϕ (as in random coding). • Generate x n by a given θ ∈ ϕ . • Let ˆ θ be the Maximum Likelihood estimator of θ from x n . • Let ˆ θ g be the grid point whose components are nearest ˆ θ . � ˆ � • Prove that P e = Pr θ g � = θ | θ → 0 as n → ∞ . Use union bound on components of θ : k � ˆ � � P e ≤ Pr θ gi � = θ i j =1 k � � − n · min xn ∈ Ai D P ˆ θi || P θi � ≤ n · 2 j =1 2 (log k )+(log n ) − cn ε/ 2 → 0 . ≤ • A i - the event that ˆ θ gi � = θ i . n 1 − ε for ˆ � � c • D P ˆ θ i || P θ i ≥ θ ∈ A i , c is constant. ✫ ✪
✬ ✩ Universal Lossless Coding Performance Bounds 9 I.I.D. Sources - Large Alphabet k - Minimax [Shamir, 2003] Problems with Large k • Volume of Λ k is 1 / ( k − 1)! (decreases with n ), because k − 1 � θ i ≤ 1 . i =1 • Too large spacing in grid n − 0 . 5(1 − ε ) results in loose bound. • Too small spacing ( nk ) − 0 . 5(1 − ε ) results in lack of distinguishability in grids. Solution • Build non-uniform grids. √ a • Spacing near a n proportional to n 1 − ε/ 2 . √ a • Number of grid points preceding a n proportional to n ε/ 2 . Drawback • This structure violates the requirements of the strong version, and thus is only good for minimax/maximin redundancies. ✫ ✪
✬ ✩ Universal Lossless Coding Performance Bounds 10 Minimax/Maximin Redundancy - I.I.D. Large k • ϕ is grid below, • θ ∈ ϕ distinguishable by above definition (proved as in finite parametric case), • bounding number of points in grid results in n (Λ k ) ≥ (1 − ε ) ( k − 1) log n R + n (Λ k ) = R − 2 n k a 0.5 n -(1- ε/2 ) X X X X X X X X X X X X X X X X X X X X X X X X X ✫ ✪ a/n
✬ ✩ Universal Lossless Coding Performance Bounds 11 Most Sources - I.I.D. Large k Key Realizations • Non-uniform grid above is not useful here. • All sources outside a k − 1 dimensional sphere with radius r = n − 0 . 5(1 − ε ) around θ are distinguishable from θ by x n . Method • Pack as many as possible spheres with radius r and volume V k − 1 ( r ) in the k − 1 dimensional space Λ k of volume 1 / ( k − 1)!. • Place θ ∈ ϕ at centers of the spheres (whole grid shifted for random selection). • Factor in packing density 2 − ( k − 1) to reduce number of points. 1 M ≥ ( k − 1)! V k − 1 ( r ) 2 ( k − 1) . Result R n ( L, θ ) ≥ (1 − ε ) ( k − 1) log n 2 n k for every code L ( · ) and almost every θ ∈ Λ k . [Shamir, 2003] ✫ ✪ Note: Second order term is lower than that of minimax/maximin bound.
✬ ✩ Universal Lossless Coding Performance Bounds 12 Patterns Induced by I.I.D. Sources Motivation • Classical compression considers known small alphabets. • Sometimes alphabet is unknown and possibly large. • Coding cost of unknown alphabet is inevitable. Approach • Use the inevitable cost to improve compression. • Code sequence patterns in a second stage. Patterns • Indices assigned to original sequence letters in order of first occurrence. • Example: The strings: x n = ‘lossless’, ‘sellsoll’, ‘12331433’, ‘76887288’ all have the same pattern Ψ ( x n ) = ‘12331433’. • Individual sequence redundancy studied in [Aberg, et al. , 1997, Orlitsky et al. , 2002-]. ✫ ✪
✬ ✩ Universal Lossless Coding Performance Bounds 13 I.I.D. Induced Patterns - Derivation • Any θ ′ which is a permutation of θ appears to be the same source. Example: typical sequences - similar patterns θ ′ = { 0 . 7 , 0 . 2 } θ = { 0 . 1 , 0 . 2 } x n = 1223333333 x n = 3221111111 Ψ ( x n ) = 1223333333 Ψ ( x n ) = 1223333333 • There are at most k ! such permutations. 1 k = 2 k = 3 Remaining Same Types Space X Same Types X Remaining Space X X 0 1 0 1 1/2 1/2 Original Space Note: for k = 3 this is true for any combination of 2 out of 3 letters. ✫ ✪
Recommend
More recommend