ho how to to co compr mpress ss hid hidden en markov
play

Ho How to to Co Compr mpress ss Hid Hidden en Markov Source - PowerPoint PPT Presentation

Ho How to to Co Compr mpress ss Hid Hidden en Markov Source ces Preetum Nakkiran Harvard University Joint works with: Venkatesan Guruswami, Madhu Sudan + Jarosaw Basiok, Atri Rudra Compression Problem: Given ! symbols from a


  1. Ho How to to Co Compr mpress ss Hid Hidden en Markov Source ces Preetum Nakkiran Harvard University Joint works with: Venkatesan Guruswami, Madhu Sudan + Jarosław Błasiok, Atri Rudra

  2. Compression • Problem: Given ! symbols from a probabilistic source, compress down to < ! symbols (ideally to “entropy” of the source) (s.t. decompression succeeds with high probability) • Sources: Usually iid. This talk: Hidden-Markov Model B(p) B(p) … B(p) 0.1 0 1 1 0 1 0 1 1 0 1 0.9 0.9 Compress B(0.1) B(0.5) 0 0 1 0 0.1 ≈ $!%&'() (Symbol alphabet can be arbitrary)

  3. Organization 1. Goal: Compressing Symbols • What/why 2. Polarization & Polar Codes (for iid sources) 3. Polar codes for Markov Sources

  4. Compression: Main Questions For a source distribution on (" # , " % , … , " ' ) : 1. How much can we compress? • [Shannon ‘48]: Down to the entropy ) " # , " % , … " ' [non-explicit] E.g. for iid Bernoulli(p): entropy = *)(+) . 2. Efficiency? • Efficiency of algorithms: compression/decompression * • Efficiency of code: Quickly approach the entropy rate * ,-./01, ↦ *) + + * #45 ,-./01, vs. * ,-./01, ↦ *) + + 0(*) Achieves within 6 of entropy rate ( * ,-./01, ↦ *[) + + 6] ) at blocklength * ≥ +01-( # : ) 3. Linearity? • Useful for channel coding (as we will see)

  5. Our Scheme: Compressing HMM sources Compression/decompression algorithms which, given the HMM source, achieve: 1. Poly-time compression/decompression 2. Linear 3. Rapidly approach entropy rate: For ! " ≔ ! $ , ! & , … ! " from source ( symbols ↦ 0 ! " + 2 3 $ ⋅ ( $56 symbols (for HMM with mixing time 2 ) • Previously unknown how to achieve all 3 above. Non-explicit: ( ↦ 0 ! " + ( [Lempel-Ziv]: ( ↦ 0 ! " + 7 ( . Nonlinear. But, works for unknown HMM. • Our result: Enabled by Polar Codes

  6. Detour: Compression ⇒ Error-Correction • Given a source ", corresponding Additive Channel: ( Alice ⊕ Bob Alice sends $ ∈ & ' Bob receives ) = $ + , for - = - . , - / , … - ( ∼ " - ∼ " • Linear compression scheme for - ∼ " ⇒ Linear error-correcting code for " -channel: ( → & ' (56 be compression matrix. Pe can be • Let 2: & ' decoded to e whp when - ∼ " • Alice encodes into nullspace(P): $ ∈ 7899(;) • Bob receives ) = $ + , P e • Bob computes ;) = ;$ + ;, = ;, , and recovers the error e Efficiency: compression which rapidly approaches entropy rate ⇒ code which rapidly approaches capacity

  7. Application: Correcting Markovian Errors • Our result yields efficient error- correcting codes for Markovian 0.1 errors. • Eg: Channel has two states, 0.9 0.9 BSC(0.1) BSC(0.5) “noisy” and “nice”, and transitions 0.1 between them. “Noisy” “Nice” channel channel

  8. Remainder of this talk • Focus on compressing Hidden-Markov Sources • For simplicity, alphabet = ! " The plan: 1. Polar codes for compressing iid Bernoulli(p) bits. 2. Reduce HMM to iid case

  9. Polar Codes • Linear compression / error-correcting codes • Introduced by [Arikan ‘08], efficiency first analyzed in [Guruswami-Xia ’13], extended in [BG N RS ’18] • Efficiency: First error-correcting codes to ``achieve capacity at polynomial blocklengths’’: within ! of capacity at blocklengths " ≥ $%&'( ) * ) • Simple, elegant, purely information-theoretic construction

  10. Compression via Polarization • Goal: Compress n iid Bernoulli(p) "(#) Set S: ( & ' ≈ . "(#) bits … P • Polarization ⇒ Compression: Set /: ( & ) | & ' ≈ 0 "(#) • Suppose we have invertible transform P such that, on input " # $ , first block of outputs (set S) have ≈ full entropy • Compression: Output & ' . • Decompression: Since ( & ) | & ' ≈ 0 , can guess & ) whp, then invert P to decompress.

  11. Polar Transform • The following 2x2 transform over H(X+Y) > H(X) X X + Y H(X) = H(Y) # ! " “polarizes” entropies: " H(Y| X+Y) < H(Y) Y Y 1 • Consider $, & iid B(p), for ' ∈ (0, 1) H ( X + Y ) H ( X ) • # " invertible ⟹ . $, & = .($ + &, &) H ( Y | X + Y ) • H(X + Y) > H(X) • Thus , H(Y | X+Y) < H(Y) 0 • Now recurse! t = 0 t = 1

  12. Polar Transform Consider # $ iid B(p), for % ∈ (0, 1) 1 W 1 X 1 ! W 2 " H ( W 1 ) X 2 H ( X i ) W 3 H ( W 2 | W 1 ) X 3 ! W 4 " X 4 0 t = 0 t = 1

  13. Polar Transform Consider # $ iid B(p), for % ∈ (0, 1) 1 W 1 Z 1 X 1 ! ! W 2 " " X 2 Z 2 H ( X i ) W 3 X 3 ! W 4 " X 4 0 t = 0 t = 1 t = 2

  14. Polar Transform Consider # $ iid B(p), for % ∈ (0, 1) 1 W 1 Z 1 X 1 ! ! W 2 " " X 2 Z 2 H ( X i ) W 3 X 3 Z 3 ! ! W 4 " " Z 4 X 4 0 t = 0 t = 1 t = 2

  15. Polar Transform Consider # $ iid B(p), for % ∈ (0, 1) Consider , - $ - .$ ) : 1 W 1 Z 1 X 1 ! ! W 2 " " X 2 Z 2 H ( X i ) W 3 X 3 Z 3 ! ! W 4 " " Z 4 X 4 0 t = 0 t = 1 t = 2 Hope: most of these entropies eventually close to 0 or 1

  16. Polar Transform • In general, the recursion is: ⊗& Equivalent to: ! " # ≝ ! "

  17. Analysis: Arikan Martingale 1 • Let ! " be entropy of a random wire conditioned on wires above it: H ( X i ) ! " = $ % " & % " [< &]) • + , forms a martingale - ! "./ | ! " = ! " 0 because entropy conserved t = 0 t = 1 t = 2 W 1 Z 1 X 1 1 1 W 2 2 2 X 2 Z 2 W 3 X 3 Z 3 1 1 W 4 2 2 Z 4 X 4

  18. Analysis: Arikan Martingale We want fast convergence : To achieve ! -close to entropy rate efficiently, ie w ith blocklength - = 2 0 = 1234( 6 7 ) , we need: 1/- ; 1 W 1 Z 1 X 1 < < W 2 ; ; X 2 Z 2 H ( X i ) W 3 X 3 Z 3 < < W 4 ; ; Z 4 X 4 0 t = 0 t = 1 t = 2

  19. Martingale Convergence 1 • NOT every [0, 1] martingale converges to 0 or 1: H ( X i ) • ! "#$ = ! " ± 2 (" • lim "→- ! " converges to Uniform[0, 1] 0 • Will introduce sufficient local conditions for t = 0 t = 1 t = 2 fast convergence: ``Local Polarization”

  20. Local Polarization Properties of the Martingale: 1. Variance in the Middle: 2. Suction at the Ends: and symmetrically for the upper end. Recall, we want to show: (easy to show these properties)

  21. Results of Polarization • So far: After ! = #(log 1/*) steps of polarization, the resulting polar code of blocklength , = 2 . = /012 3 4 has a set T of indices s.t: < ; =< ) ≈ 0 • ∀6 ∈ 8: : ; E(/) • 8 /, ≤ 1 − : / + * Set S: : ; C ≈ F E(/) … P Set 8: : ; D | ; C ≈ 0 E(/) • Compression: Output ; C • Decompression: Guess ; D given ; C (ML decoding) /26 21

  22. Polar Codes Inputs Auxiliary Info Theorem: For every distribution ! over ", $ , where " ∈ & ' , Let " = " ) , " * , … " , and $ = $ ) , $ * , … $ , where (" . , $ . ) ∼ ! 112 Then, entropies of 3 ≔ 5 , (") are polarized: ) ∀7: if 9 ≥ ;<=> ? , then all but 7 -fraction of indices 1 ∈ [9] have entropies C. , $) ∉ (9 EF , 1 − 9 EF ) B 3 . 3 3 " ) $ ) ) " * $ * All B(3 . 3 C. , $ ≈ 0 <K 1 … P 7 -fraction bad $ " , ,

  23. Compressing Hidden Markov Sources ! " • ! " , ! $ , … ! & are outputs of a Hidden-Markov Model ! $ • Not independent: Lots of dependencies between neighboring symbols … • Goal: Want to compress to within ' ! & + )* • First glance: everything breaks! • Polar code analysis (Martingale) relied on input being independent, identical • But, simple construction works… ! &

  24. Compression Construction • ! " , ! $ , … ! & : outputs of a stationary HMM ! " Output high- ! $ entropy set P • Mixing time ≪ ( … • Break input into ( blocks of ( . ) -fract. • Polarize the 1 st symbols of each block. P • These are approx. independent! • Then Polarize the 2 nd symbols • Polarizing , conditioned on P • Joint distribution of all {( , )} is approx. independent across blocks … • Output last ) -fraction of each block in the clear ! &

  25. Example • HMM: Marginally, ! # ! " is uniform bit ! $ Entire set P1 0.9 B(0.1) output … • P1: inputs have full B(0.9) 0.1 entropy Smaller set output P2 • P2: inputs have B(0.9) lower entropy, B(0.5) conditioned on P1 P3 B(0.1) … ! %

  26. Decompression Polar-decoder Black Box: 1 ! " 1 ! # 0 Input: P1 ? … • Product distribution on inputs 1 • Setting of high-entropy polarization outputs 0 1 Output: 1 P2 ? • Estimate of input ? 0 Markov decoding: P3 1. Decompress P1 outputs 2. Compute distribution of P2 inputs, 1 conditioned on P1 … 3. Decompress P2 outputs 4. … ! $

  27. Decompression: Extras Note: " # " $ C Could have done this with any black-box … compression scheme for independent, non- identically distributed symbols. C’ But: non-linear (and messy) • Linear compression black-box for every fixed distribution on symbols ⇏ overall C’’ linear compression … Polar codes are particularly suited for this " %

Recommend


More recommend