6.975 Week 5 Universal Compression Via Grammar Based Codes Presenter: Emin Martinian
Grammar Based Compression • Initial data may contain complex relationships. • Transform data to a “basis” with independent components. • Use simple, memoryless compression on these components.
Example: Suppose we want to compress x = c c c c a b a b c c c a b a b c c c a b. Let A 1 → a b, x = c c c c A 1 A 1 c c c A 1 A 1 c c c A 1 A 2 → c c c, x = A 2 c A 1 A 1 A 2 A 1 A 1 A 2 A 1 A 3 → A 1 A 1 A 2 , x = A 2 c A 3 A 3 A 1
G x vs. Lempel-Ziv Transform: x = ccc, c, ababccc, ababccc, ab c, cc, ca, b, a, bc, cca, ba, bcc, cab A 0 A 1 A 2 A 3 A 4 A 5 A 6 A 7 A 8 A 9 A 10 → A 1 c → A 2 A 1 c → A 0 A 2 cA 3 A 3 A 1 A 3 A 1 a → → A 1 ab A 4 b → → A 2 ccc A 5 a → → A 3 A 1 A 1 A 2 A 6 A 4 c → → A 7 A 2 a → A 8 A 4 b → A 9 A 6 c → A 10 A 3 b →
Context Free Grammar: G = ( V, T, P, S ) V = { A 0 , A 1 , A 2 , A 3 } T = { a, b, c } P = { A 0 → A 2 cA 3 A 3 A 1 , A 1 → ab, A 2 → ccc, A 3 → A 1 A 1 A 2 } S = A 0 L ( G ) = all strings derivable from G . Grammar Transform: x → G x where L ( G x ) = { x } .
Advantages of Grammar Based Codes • Better matching of source correlations • Optimization for complexity, causality, side information, error resilience, etc. • Universal lossless compression
Asymptotically Compact Grammars Asymptotically compact grammars defined as grammars which satisfy • ∀ x , G x ∈ G ∗ ( A ) • lim n →∞ max x ∈A n | G x | | x | = 0 asymptotically compact grammars yield Theorem 7: universal compression.
Requirements for the set G ∗ ( A ) 1. ∀ A ∈ V ( G ), one rule in P ( G ) has left member A . 2. The empty string is not the right member of any rule. 3. L ( G ) is non-empty 4. G has no useless symbols. 5. Canonical variable naming. 6. f ∞ G ( A ) � = f ∞ G ( B ) for A � = B .
Irreducible Grammar Transforms: A grammar, G , is called irreducible if 1. G ∈ G ∗ ( A ) 2. ∀ A ∈ V ( G ) /A 0 , A appears at least twice in the right members of P ( G ). 3. No ( Y 1 , Y 2 ) ∈ V ( G ) ∪ T ( G ) exists where Y 1 Y 2 appears more than once as a substring of P ( G ). Kieffer and Yang present rules to reduce any grammar to one satisfying these conditions.
Encoding G = ( V, T, P, S ) • Canonical V ( G ) described by | V ( G ) | and requires | V ( G ) | bits in unary encoding. • T ∈ P ( A ) described by |A| bits in one-hot encoding. • S = A 0 in canonical encoding and requires 0 bits. Total = | V ( G ) | + |A| +0
Encoding G = ( V, T, P, S ) To encode P we must describe f G ( A 0 ) , f G ( A 1 ) , . . . , f G ( A | V ( G ) |− 1 ) or equivalently we must describe | A 0 | , | A 1 | , . . . , | A | V ( G ) |− 1 | . using | G | bits in unary encoding and ∆ ρ G = f G ( A 0 ) f G ( A 1 ) . . . , f G ( A | V ( G ) |− 1 ) .
Encoding G = ( V, T, P, S ) Instead of encoding ρ G directly, define ∆ ω G = ρ G with first occurence of each variable removed . Encode ρ G by • Indicating removed entries ( | G | bits) • Sending frequencies of V ( G ) ∪ T ( G ) occuring in ω G using unary encoding ( | G | bits) • Using frequencies to entropy code ω G ( ⌈ H ∗ ( ω G ) ⌉ bits) Total ≤ A + 4 | G | + ⌈ H ∗ ( ω G ) ⌉
Bounding ⌈ H ∗ ( ω G ) ⌉ for G ∈ G ∗ ( A ) • There exists a σ = σ 1 σ 2 . . . σ t ∼ ω G , with f ∞ G ( σ ) = x . • Let π be the parsing π = ( f ∞ G ( σ 1 ) , f ∞ G ( σ 2 ) , . . . , f ∞ G ( σ t )) = x . • If ∀ ( A → α ) ∈ P , | α | > 1, then f ∞ G ( · ) is a one-to-one map between σ i and π i so H ∗ ( ω G ) = H ∗ ( σ ) = H ∗ ( π ). In any case, H ∗ ( ω G ) ≤ H ∗ ( π ) + | G | .
Bounding ⌈ H ∗ ( π ) ⌉ Consider a k th-order finite state source, µ , and define m ∆ � � τ ( y ) = max p ( s i , y i | s i − 1 ) s 0 s 1 ,s 2 ,...,s m i =1 We design τ ( y ) to overestimate the probability y . To get a valid pdf, we normalize by Q k k − 1 | y | − 2 to obtain p ∗ ( y ) = Q k k − 1 | y | − 2 τ ( y ) , Q k ≥ 1 / 2 .
Bounding ⌈ H ∗ ( π ) ⌉ (Continued) Combining t t � � H ∗ ( π ) = min − log p ∗ ( π i ) − log q ( π i ) ≤ q i =1 i =1 with � t � � t � t � � � p ∗ ( π i ) { 2 k | π i | 2 } µ ( x ) ≤ τ ( x ) ≤ τ ( π i ) = i =1 i =1 i =1 yields t � H ∗ ( π ) ≤ − log µ ( x ) + t (1 + log k ) + 2 log | π i | . i =1
Summary For Encoding G x : � � | G x | | x | • Code length ≤ − log µ ( x ) + |A| + 5 | G x | + O | x | log . | G x | � � �� | G x | • Many parsings have H ∗ ( π ) near − log µ ( x ) + O ν . | x | • Obtaining universal codes requires choosing a parsing/grammar to make | G x | | x | small.
Bounding | G x | / | x | for G x ∈ G ∗ ( A ) • Consider “worst case” G x ∈ G ∗ ( A ) which maximizes | G x | . • But for G x ∈ G ∗ ( A ), rule expansions must be unique. • So there are at most |A| l rules expanding to length l . • Create all rules of length l before any of length l + 1.
Bounding | G x | / | x | for G x ∈ G ∗ ( A ) Exhausting all rules of length ≤ l requires l l |A| l +1 j |A| j ≥ � | x | ≥ ( |A| − 1) 2 . j =1 For G x ∈ G ∗ ( A ), rules are like A i → A i ′ α ( i.e. , | A i | = 2). 2 |A| j = 2( |A| l +1 − 1) l � | G x | ≤ . |A| − 1 j =1 Therefore ≤ 2( |A| l +1 − 1) · ( |A| − 1) 2 | G x | ≤ 2( |A| − 1) → 0 . l |A| l +1 | x | |A| − 1 l
Encoding Summary: • Grammar encoding takes ≤ A + 4 | G | + ⌈ H ∗ ( ω G ) ⌉ bits. � � | G x | • H ∗ ( ω G ) ≈ H ∗ ( π ) ≈ − log µ ( x ) + ν | x | • For G ∈ G ∗ ( A ), | G x | | x | → 0.
Conclusions • Grammar based codes provide a framework to build universal codes. • Many different parsings, π = ( π 1 , π 2 , . . . , π t ), yield H ∗ ( π ) = O ( − log µ ( x ) + t ). • Irreducible grammars yield π with t ≤ | G x | and | G x | | x | → 0 and also allow efficient encoding of π .
Further Thoughts... Can grammar ideas be used in • universal lossy compression? • universal prediction/estimation? • error correction coding?
Recommend
More recommend