Statistical Physics of Information Measures Neri Merhav Department - PowerPoint PPT Presentation

Statistical Physics of Information Measures Neri Merhav Department of Electrical Engineering Technion – Israel Institute of Technology Technion City, Haifa, Israel Partly joint work with D. Guo (Northwestern U.) and S. Shamai (Technion). Physics of Algorithms ‘09 , Santa Fe, NM, USA, Aug. 31 – Sep. 4, 2009 – p. 1/2

Outline Relations between Information Theory (IT) and statistical physics: Conceptual aspects – relations between principles in the two areas. Technical aspects – identifying similar mathematical formalisms and borrowing techniques. In this talk we: Briefly review basic background in IT. Discuss some physics of the Shannon limits. Briefly review basic background in estimation theory. Touch upon statistical physics of signal estimation via the mutual information. – p. 2/2

First Part: Physics of the Shannon Limits – p. 3/2

The Shannon Limits Lossless data compression: compression ratio ≥ H = entropy. Lossy compression: compression ratio ≥ R ( D )= rate–distortion func. Channel coding: coding rate ≤ C = channel capacity. Joint source–channel coding: decoding error ≥ R − 1 ( C )= distortion–rate func. at rate C . etc. etc. etc. – p. 4/2

The Information Inequality Each of the above–mentioned fundamental limits of IT, as well as many others, is based on the information inequality in some form: For any two distributions, P and Q , over an alphabet X : P ( x ) log P ( x ) D ( P � Q ) ∆ X Q ( x ) ≥ 0 . = x In physics, it is known as the Gibbs inequality. – p. 5/2

The Gibbs Inequality Let E 0 ( x ) and E 1 ( x ) be two Hamiltonians of a system. For a given β , let P i ( x ) = e − β E i ( x ) e − β E i ( x ) , X , Z i = i = 0 , 1 . Z i x Then, * + ln e − β E 0 ( X ) /Z 0 0 ≤ D ( P 0 � P 1 ) = e − β E 1 ( X ) /Z 1 0 ln Z 1 − ln Z 0 + β �E 1 ( X ) − E 0 ( X ) � 0 = or �E 1 ( X ) − E 0 ( X ) � 0 ≥ kT ln Z 0 − kT ln Z 1 F 1 − F 0 = – p. 6/2

Interpretation of �E 1 ( X ) − E 0 ( X ) � 0 ≥ ∆ F A system with Hamiltonian E 0 ( x ) – in equilibrium ∀ t < 0 . Free energy = − kT ln Z 0 . At t = 0 , the Hamiltonian jumps, by W = E 1 ( x ) − E 0 ( x ) : from E 0 ( x ) to E 1 ( x ) – by abruptly applying a force. Energy injected: � W � 0 = �E 1 ( X ) − E 0 ( X ) � 0 . New system, with Hamiltonian E 1 , equilibrates. Free energy = − kT ln Z 1 . Gibbs inequality: � W � 0 ≥ ∆ F . � W � 0 − ∆ F = kT · D ( P 0 � P 1 ) is the dissipated energy = entropy production (system + environment) due to irreversibility of the abruptly applied force. – p. 7/2

Example – Data Compression and the Ising Model Let X ∈ {− 1 , +1 } n ∼ Markov chain P 0 ( x ) = Q i P 0 ( x i | x i − 1 ) with P 0 ( x | x ′ ) = exp( Jx · x ′ ) x, x ′ ∈ {− 1 , +1 } , Z 0 Code designer thinks that X ∼ Markov with parameters: P 1 ( x | x ′ ) = exp( Jx · x ′ + Kx ) . Z 1 ( x ′ ) D ( P 0 � P 1 ) = loss in compression due to mismatch. Easy to see that X X X E 0 ( x ) = − J · E 1 ( x ) = − J · x i x i − 1 − B · x i x i − 1 ; x i i i i where 2 ln cosh( J − K ) B = K + 1 cosh( J + K ) . Thus, W = − B · P i x i means an abrupt application of the magnetic field B . – p. 8/2

Physics of the Data Processing Theorem (DPT) Mutual information: Let ( U, V ) ∼ P ( u, v ) : fi fl P ( U, V ) I ( U ; V ) ≡ log . P ( U ) P ( V ) DPT: X → U → V Markov chain ⇒ I ( X ; U ) ≥ I ( X ; V ) . = Pf: D E I ( X ; U ) − I ( X ; V ) = D ( P X | U,V ( ·| U, V ) � P X | V ( ·| V )) ≥ 0 . � Supports most, if not ∀ , Shannon limits. – p. 9/2

Physics of the DPT (Cont’d) Let β = 1 . Given ( u, v ) , let E 0 ( x ) = − ln P ( x | u, v ) = − ln P ( x | u ); E 1 ( x ) = − ln P ( x | v ) . e − 1 · [ − ln P ( x | u,v )] = X X P ( x | u, v ) = 1 Z 0 = x x and similarly, Z 1 = 1 . Thus, F 0 = F 1 = 0 , and so, ∆ F = 0 . After averaging over P UV : � W ( X ) � 0 = �− ln P ( X | V ) + ln P ( X | U ) � = H ( X | V ) − H ( X | U ) I ( X ; U ) − I ( X ; V ) . = � W � 0 = I ( X ; U ) − I ( X ; V ) ≥ 0 = ∆ F. – p. 10/2

Discussion The relation � W � 0 − ∆ F = kT · D ( P 0 � P 1 ) ≥ 0 is known (Jarzynski ‘97, Crooks ‘99, ..., Kawai et. al. ‘07), but with different physical interpretations, which require some limitations. Present interpretation – holds generally; Applied in particular to the DPT. In our case: Maximum irreversibility: � W � 0 – fully dissipated: ∆ F = 0 . All dissipation – in the system, none in the environment: � W � 0 = T ∆ S = 1 · [ H ( X | V ) − H ( X | U )] . Rate loss due to gap between mutual informations: irreversible process ⇐ ⇒ irreversible info: I ( X ; U ) > I ( X ; V ) − → U cannot be retrieved from V . – p. 11/2

Relation to Jarzynski’s Equality Let E λ ( x ) = E 0 ( x ) + λ [ E 1 ( x ) − E 0 ( x )] interpolate E 0 and E 1 . λ – a generalized force. Jarzynski’s equality (1997): ∀ protocol { λ t } with λ t = 0 ∀ t ≤ 0 and λ t = 1 ∀ t ≥ τ ( τ ≥ 0 ), the injected energy Z τ d λ t [ E 1 ( x t ) − E 0 ( x t )] W = 0 satisfies = e − β ∆ F . D e − βW E D e − βW E ≥ exp {− β � W �} so, � W � ≥ ∆ F more generally. Jensen: Equality – for a reversible process – W = deterministic. – p. 12/2

Informational Jarzynski Equality Taking E 0 ( x ) = − ln P 0 ( x ) , E 1 ( x ) = − ln P 1 ( x ) , β = 1 and defining a “protocol” 0 ≡ λ 0 → λ 1 → . . . → λ n ≡ 1 , and n − 1 ( λ i +1 − λ i ) ln P 0 ( X i ) X i ∼ P λ i ∝ P 1 − λ i X P λ i W = P 1 ( X i ) , 1 , 0 i =0 one can show: = 1 = e − ∆ F . D e − W E Jensen: generalized information inequality: Z 1 fi fl ln P 0 ( X ) d λ t ≥ 0 . P 1 ( X ) 0 λ t – p. 13/2

Summary of First Part Suboptimum commun. system ⇐ ⇒ irreversible process. Info rate loss ⇐ ⇒ dissipated energy → entropy ↑ Fundamental limits of IT ⇐ ⇒ second law. Possible implications of Jarzynski’s equality in IT. – p. 14/2

Second Part: Statistical Physics of Signal Estimation via the Mutual Information – p. 15/2

Signal Estimation – Preliminaries Let R n ) Y = X + Z (all vectors in I where X is the desired signal and Z is noise ⊥ X . Estimator: any function ˆ X = f ( Y ) . We want ˆ X as ‘close’ as possible to X . D X � 2 E D � X − f ( Y ) � 2 E � X − ˆ mean square error = = . A fundamental result: minimum mean square error (MMSE) = conditional mean: Z X ∗ = f ∗ ( y ) = � X � Y = y ≡ d x · x P ( x | y ) . Normally – difficult to apply X ∗ and assess performance. X ∗ and MMSE may exhibit irregularities – threshold effects ← → phase transitions in analogous physical systems. Motivates a statistical–mechanical perspective. – p. 16/2

The I–MMSE Relation [Guo–Shamai–Verdú 2005]: for Y = X + Z , Z ∼ N (0 , I · 1 /β ) , regardless of P ( X ) : mmse ( X | Y ) = 2 · d d β I ( X ; Y ) , where mmse ( X | Y ) ≡ �� X − f ∗ ( Y ) � 2 � . Simple example: If X ∼ N (0 , σ 2 I ) , I ( X ; Y ) = 1 2 log(1 + βσ 2 ) n σ 2 mmse ( X | Y ) ⇒ = = 1 + βσ 2 . n MMSE – calculated using stat–mech via the mutual info and I–MMSE relation ⇒ = – p. 17/2

Statistical Physics of the MMSE fi log P ( X | Y ) fl I ( X ; Y ) = P ( X ) β exp {− β � Y − X � 2 / 2 } fi fl = log x P ( x ) exp {− β � Y − x � 2 / 2 } P β − n = 2 − � log Z ( β | Y ) � β where P ( x ) exp {− β � Y − x � 2 / 2 } , X Z ( β | Y ) = x and so, mmse ( X | Y ) = 2 · d I ( X ; Y ) = − 2 ∂ ∂β � log Z ( β | Y ) � β . d β Similar to internal energy, but here also �·� β depends on β . – p. 18/2

Statistical Physics of the MMSE (Cont’d) A more detailed derivation yields: mmse ( X | Y ) = n β + Cov {� Y − X � 2 , log Z ( β | Y ) } The term n/β ∼ energy equipartition theorem. Covariance term – dependence of �·� β on β . – p. 19/2

Statistical Physics of the MMSE (Cont’d) In stat. mech: Σ( β ) = log Z ( β ) + β �E ( X ) � log Z ( β ) − β d log Z ( β ) ⇐ = = diff. eq. d β Z ∞ d ˆ β · Σ(ˆ β ) log Z ( β ) = − βE 0 + β · ; E 0 = ground–state energy ˆ β 2 β Z ∞ " # d ˆ β · Σ(ˆ ⇒ E = − d log Z ( β ) β ) + Σ( β ) E 0 − = = ˆ d β β β 2 β Similarly for � log Z ( β | Y ) � β except that = β 2 Cov {� Y − X � 2 , log Z ( β | Y ) } − I ( X ; Y ) Σ( β ) ⇐ = 1 D x � Y − x � 2 E E 0 ⇐ min β . 2 – p. 20/2

Examples Example 1 – Random Codebook on a Sphere Surface X ∼ Unif { x 1 , . . . , x M } , M = e nR Y = X + Z ; √ nσ 2 ) . Codewords: randomly drawn independently uniformly on Surf ( ( 2 log(1 + βσ 2 ) 1 β < β R � I ( X ; Y ) � lim = n n →∞ β ≥ β R R 2 log(1 + βσ 2 ) . Thus, where β R is the solution to the eqn R = 1 σ 2 ( β < β R mmse ( X | Y ) 1+ βσ 2 lim = n n →∞ 0 β ≥ β R A 1st–order φ transition in MMSE: At high temp. behaves as if X was Gaussian and at β = β R jumps to zero! – p. 21/2

Statistical Physics of Information Measures Neri Merhav Department - PowerPoint PPT Presentation

Statistical Physics of Information Measures Neri Merhav Department of Electrical Engineering Technion Israel Institute of Technology Technion City, Haifa, Israel Partly joint work with D. Guo (Northwestern U.) and S. Shamai (Technion).

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Transitional Measures Introduction to Regulatory Measures 1 Why Regulatory Measures ?

Statistical graphics with Statistical graphics with ggplot2 ggplot2 Programming for Statistical

Statistical physics lecture 4 Szymon Stoma 09-10-2009 Szymon Stoma Statistical physics

Investor Day May 15, 2017 Regulation G: Non-GAAP Measures and Reconciliation of Non-GAAP Measures

Atomic Physics 3 rd year B1 P. Ewart Oxford Physics: 3rd Year, Atomic Physics Lecture notes

Health Care Quality Measures IBHC Measures Atlas The Problem Finding quality measures to assess

2.2: Numerical summary Measures of location. Measures of spread. Measures of form.

From retina to statistical physics Bruno Cessac NeuroMathComp Team,INRIA Sophia Antipolis,France.

Statistical Methods for Particle Physics Day 2: Statistical Tests and Limits

Statistical presentation Statistical presentation Statistical tabulations by age, sex and 3 digit

EFTA Statistical Cooperation & the European Statistical System EEA Seminar EEA Seminar

EFTA Statistical Cooperation & the European Statistical System EEA Seminar EEA Seminar

13 Jan, 2011 Statistical Literacy: Confounding UTSA Confounding 2011 1 2011 2 Statistical

STAT 401A - Statistical Methods for Research Workers Statistical Inference Jarad Niemi (Dr. J)

Statistics 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical

Compression with Flows via Local Bits-Back Coding Jonathan Ho, Evan Lohn, Pieter Abbeel

In the Compression Hornet's Nest: A Security Study of Data Compression in Network Services

String Attractors: A unifying theory of repetitiveness Dominik Kempa 1 Nicola Prezza 2 1

Data suppression and compression SW in DUNE detector simula6on

Compression of Propositional Resolution Proofs by Lowering Subproofs Joseph Boudou 1 Bruno

Compression: Huffmans Algorithm Greg Plaxton Theory in Programming Practice, Spring 2005

Recent Developments in Video Compression Standardization CVPR CLIC Workshop, Salt Lake City,

Huffman Trees To save space when storing it. Greedy Algorithm for Data Compression To save

Statistical Physics of Information Measures Neri Merhav Department - PowerPoint PPT Presentation

Statistical Physics of Information Measures Neri Merhav Department of Electrical Engineering Technion Israel Institute of Technology Technion City, Haifa, Israel Partly joint work with D. Guo (Northwestern U.) and S. Shamai (Technion).

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Transitional Measures Introduction to Regulatory Measures 1 Why Regulatory Measures ?

Statistical graphics with Statistical graphics with ggplot2 ggplot2 Programming for Statistical

Statistical physics lecture 4 Szymon Stoma 09-10-2009 Szymon Stoma Statistical physics

Investor Day May 15, 2017 Regulation G: Non-GAAP Measures and Reconciliation of Non-GAAP Measures

Atomic Physics 3 rd year B1 P. Ewart Oxford Physics: 3rd Year, Atomic Physics Lecture notes

Health Care Quality Measures IBHC Measures Atlas The Problem Finding quality measures to assess

2.2: Numerical summary Measures of location. Measures of spread. Measures of form.

From retina to statistical physics Bruno Cessac NeuroMathComp Team,INRIA Sophia Antipolis,France.

Statistical Methods for Particle Physics Day 2: Statistical Tests and Limits

Statistical presentation Statistical presentation Statistical tabulations by age, sex and 3 digit

EFTA Statistical Cooperation &amp; the European Statistical System EEA Seminar EEA Seminar

EFTA Statistical Cooperation &amp; the European Statistical System EEA Seminar EEA Seminar

13 Jan, 2011 Statistical Literacy: Confounding UTSA Confounding 2011 1 2011 2 Statistical

STAT 401A - Statistical Methods for Research Workers Statistical Inference Jarad Niemi (Dr. J)

Statistics 435/535 Statistical Methods for Quality and Productivity Improvement / Statistical

Compression with Flows via Local Bits-Back Coding Jonathan Ho, Evan Lohn, Pieter Abbeel

In the Compression Hornet's Nest: A Security Study of Data Compression in Network Services

String Attractors: A unifying theory of repetitiveness Dominik Kempa 1 Nicola Prezza 2 1

Data suppression and compression SW in DUNE detector simula6on

Compression of Propositional Resolution Proofs by Lowering Subproofs Joseph Boudou 1 Bruno

Compression: Huffmans Algorithm Greg Plaxton Theory in Programming Practice, Spring 2005

Recent Developments in Video Compression Standardization CVPR CLIC Workshop, Salt Lake City,

Huffman Trees To save space when storing it. Greedy Algorithm for Data Compression To save

EFTA Statistical Cooperation & the European Statistical System EEA Seminar EEA Seminar

EFTA Statistical Cooperation & the European Statistical System EEA Seminar EEA Seminar