Statistical Physics of Information Measures Neri Merhav Department of Electrical Engineering Technion – Israel Institute of Technology Technion City, Haifa, Israel Partly joint work with D. Guo (Northwestern U.) and S. Shamai (Technion). Physics of Algorithms ‘09 , Santa Fe, NM, USA, Aug. 31 – Sep. 4, 2009 – p. 1/2
Outline Relations between Information Theory (IT) and statistical physics: Conceptual aspects – relations between principles in the two areas. Technical aspects – identifying similar mathematical formalisms and borrowing techniques. In this talk we: Briefly review basic background in IT. Discuss some physics of the Shannon limits. Briefly review basic background in estimation theory. Touch upon statistical physics of signal estimation via the mutual information. – p. 2/2
First Part: Physics of the Shannon Limits – p. 3/2
The Shannon Limits Lossless data compression: compression ratio ≥ H = entropy. Lossy compression: compression ratio ≥ R ( D )= rate–distortion func. Channel coding: coding rate ≤ C = channel capacity. Joint source–channel coding: decoding error ≥ R − 1 ( C )= distortion–rate func. at rate C . etc. etc. etc. – p. 4/2
The Information Inequality Each of the above–mentioned fundamental limits of IT, as well as many others, is based on the information inequality in some form: For any two distributions, P and Q , over an alphabet X : P ( x ) log P ( x ) D ( P � Q ) ∆ X Q ( x ) ≥ 0 . = x In physics, it is known as the Gibbs inequality. – p. 5/2
The Gibbs Inequality Let E 0 ( x ) and E 1 ( x ) be two Hamiltonians of a system. For a given β , let P i ( x ) = e − β E i ( x ) e − β E i ( x ) , X , Z i = i = 0 , 1 . Z i x Then, * + ln e − β E 0 ( X ) /Z 0 0 ≤ D ( P 0 � P 1 ) = e − β E 1 ( X ) /Z 1 0 ln Z 1 − ln Z 0 + β �E 1 ( X ) − E 0 ( X ) � 0 = or �E 1 ( X ) − E 0 ( X ) � 0 ≥ kT ln Z 0 − kT ln Z 1 F 1 − F 0 = – p. 6/2
Interpretation of �E 1 ( X ) − E 0 ( X ) � 0 ≥ ∆ F A system with Hamiltonian E 0 ( x ) – in equilibrium ∀ t < 0 . Free energy = − kT ln Z 0 . At t = 0 , the Hamiltonian jumps, by W = E 1 ( x ) − E 0 ( x ) : from E 0 ( x ) to E 1 ( x ) – by abruptly applying a force. Energy injected: � W � 0 = �E 1 ( X ) − E 0 ( X ) � 0 . New system, with Hamiltonian E 1 , equilibrates. Free energy = − kT ln Z 1 . Gibbs inequality: � W � 0 ≥ ∆ F . � W � 0 − ∆ F = kT · D ( P 0 � P 1 ) is the dissipated energy = entropy production (system + environment) due to irreversibility of the abruptly applied force. – p. 7/2
Example – Data Compression and the Ising Model Let X ∈ {− 1 , +1 } n ∼ Markov chain P 0 ( x ) = Q i P 0 ( x i | x i − 1 ) with P 0 ( x | x ′ ) = exp( Jx · x ′ ) x, x ′ ∈ {− 1 , +1 } , Z 0 Code designer thinks that X ∼ Markov with parameters: P 1 ( x | x ′ ) = exp( Jx · x ′ + Kx ) . Z 1 ( x ′ ) D ( P 0 � P 1 ) = loss in compression due to mismatch. Easy to see that X X X E 0 ( x ) = − J · E 1 ( x ) = − J · x i x i − 1 − B · x i x i − 1 ; x i i i i where 2 ln cosh( J − K ) B = K + 1 cosh( J + K ) . Thus, W = − B · P i x i means an abrupt application of the magnetic field B . – p. 8/2
Physics of the Data Processing Theorem (DPT) Mutual information: Let ( U, V ) ∼ P ( u, v ) : fi fl P ( U, V ) I ( U ; V ) ≡ log . P ( U ) P ( V ) DPT: X → U → V Markov chain ⇒ I ( X ; U ) ≥ I ( X ; V ) . = Pf: D E I ( X ; U ) − I ( X ; V ) = D ( P X | U,V ( ·| U, V ) � P X | V ( ·| V )) ≥ 0 . � Supports most, if not ∀ , Shannon limits. – p. 9/2
Physics of the DPT (Cont’d) Let β = 1 . Given ( u, v ) , let E 0 ( x ) = − ln P ( x | u, v ) = − ln P ( x | u ); E 1 ( x ) = − ln P ( x | v ) . e − 1 · [ − ln P ( x | u,v )] = X X P ( x | u, v ) = 1 Z 0 = x x and similarly, Z 1 = 1 . Thus, F 0 = F 1 = 0 , and so, ∆ F = 0 . After averaging over P UV : � W ( X ) � 0 = �− ln P ( X | V ) + ln P ( X | U ) � = H ( X | V ) − H ( X | U ) I ( X ; U ) − I ( X ; V ) . = � W � 0 = I ( X ; U ) − I ( X ; V ) ≥ 0 = ∆ F. – p. 10/2
Discussion The relation � W � 0 − ∆ F = kT · D ( P 0 � P 1 ) ≥ 0 is known (Jarzynski ‘97, Crooks ‘99, ..., Kawai et. al. ‘07), but with different physical interpretations, which require some limitations. Present interpretation – holds generally; Applied in particular to the DPT. In our case: Maximum irreversibility: � W � 0 – fully dissipated: ∆ F = 0 . All dissipation – in the system, none in the environment: � W � 0 = T ∆ S = 1 · [ H ( X | V ) − H ( X | U )] . Rate loss due to gap between mutual informations: irreversible process ⇐ ⇒ irreversible info: I ( X ; U ) > I ( X ; V ) − → U cannot be retrieved from V . – p. 11/2
Relation to Jarzynski’s Equality Let E λ ( x ) = E 0 ( x ) + λ [ E 1 ( x ) − E 0 ( x )] interpolate E 0 and E 1 . λ – a generalized force. Jarzynski’s equality (1997): ∀ protocol { λ t } with λ t = 0 ∀ t ≤ 0 and λ t = 1 ∀ t ≥ τ ( τ ≥ 0 ), the injected energy Z τ d λ t [ E 1 ( x t ) − E 0 ( x t )] W = 0 satisfies = e − β ∆ F . D e − βW E D e − βW E ≥ exp {− β � W �} so, � W � ≥ ∆ F more generally. Jensen: Equality – for a reversible process – W = deterministic. – p. 12/2
Informational Jarzynski Equality Taking E 0 ( x ) = − ln P 0 ( x ) , E 1 ( x ) = − ln P 1 ( x ) , β = 1 and defining a “protocol” 0 ≡ λ 0 → λ 1 → . . . → λ n ≡ 1 , and n − 1 ( λ i +1 − λ i ) ln P 0 ( X i ) X i ∼ P λ i ∝ P 1 − λ i X P λ i W = P 1 ( X i ) , 1 , 0 i =0 one can show: = 1 = e − ∆ F . D e − W E Jensen: generalized information inequality: Z 1 fi fl ln P 0 ( X ) d λ t ≥ 0 . P 1 ( X ) 0 λ t – p. 13/2
Summary of First Part Suboptimum commun. system ⇐ ⇒ irreversible process. Info rate loss ⇐ ⇒ dissipated energy → entropy ↑ Fundamental limits of IT ⇐ ⇒ second law. Possible implications of Jarzynski’s equality in IT. – p. 14/2
Second Part: Statistical Physics of Signal Estimation via the Mutual Information – p. 15/2
Signal Estimation – Preliminaries Let R n ) Y = X + Z (all vectors in I where X is the desired signal and Z is noise ⊥ X . Estimator: any function ˆ X = f ( Y ) . We want ˆ X as ‘close’ as possible to X . D X � 2 E D � X − f ( Y ) � 2 E � X − ˆ mean square error = = . A fundamental result: minimum mean square error (MMSE) = conditional mean: Z X ∗ = f ∗ ( y ) = � X � Y = y ≡ d x · x P ( x | y ) . Normally – difficult to apply X ∗ and assess performance. X ∗ and MMSE may exhibit irregularities – threshold effects ← → phase transitions in analogous physical systems. Motivates a statistical–mechanical perspective. – p. 16/2
The I–MMSE Relation [Guo–Shamai–Verdú 2005]: for Y = X + Z , Z ∼ N (0 , I · 1 /β ) , regardless of P ( X ) : mmse ( X | Y ) = 2 · d d β I ( X ; Y ) , where mmse ( X | Y ) ≡ �� X − f ∗ ( Y ) � 2 � . Simple example: If X ∼ N (0 , σ 2 I ) , I ( X ; Y ) = 1 2 log(1 + βσ 2 ) n σ 2 mmse ( X | Y ) ⇒ = = 1 + βσ 2 . n MMSE – calculated using stat–mech via the mutual info and I–MMSE relation ⇒ = – p. 17/2
Statistical Physics of the MMSE fi log P ( X | Y ) fl I ( X ; Y ) = P ( X ) β exp {− β � Y − X � 2 / 2 } fi fl = log x P ( x ) exp {− β � Y − x � 2 / 2 } P β − n = 2 − � log Z ( β | Y ) � β where P ( x ) exp {− β � Y − x � 2 / 2 } , X Z ( β | Y ) = x and so, mmse ( X | Y ) = 2 · d I ( X ; Y ) = − 2 ∂ ∂β � log Z ( β | Y ) � β . d β Similar to internal energy, but here also �·� β depends on β . – p. 18/2
Statistical Physics of the MMSE (Cont’d) A more detailed derivation yields: mmse ( X | Y ) = n β + Cov {� Y − X � 2 , log Z ( β | Y ) } The term n/β ∼ energy equipartition theorem. Covariance term – dependence of �·� β on β . – p. 19/2
Statistical Physics of the MMSE (Cont’d) In stat. mech: Σ( β ) = log Z ( β ) + β �E ( X ) � log Z ( β ) − β d log Z ( β ) ⇐ = = diff. eq. d β Z ∞ d ˆ β · Σ(ˆ β ) log Z ( β ) = − βE 0 + β · ; E 0 = ground–state energy ˆ β 2 β Z ∞ " # d ˆ β · Σ(ˆ ⇒ E = − d log Z ( β ) β ) + Σ( β ) E 0 − = = ˆ d β β β 2 β Similarly for � log Z ( β | Y ) � β except that = β 2 Cov {� Y − X � 2 , log Z ( β | Y ) } − I ( X ; Y ) Σ( β ) ⇐ = 1 D x � Y − x � 2 E E 0 ⇐ min β . 2 – p. 20/2
Examples Example 1 – Random Codebook on a Sphere Surface X ∼ Unif { x 1 , . . . , x M } , M = e nR Y = X + Z ; √ nσ 2 ) . Codewords: randomly drawn independently uniformly on Surf ( ( 2 log(1 + βσ 2 ) 1 β < β R � I ( X ; Y ) � lim = n n →∞ β ≥ β R R 2 log(1 + βσ 2 ) . Thus, where β R is the solution to the eqn R = 1 σ 2 ( β < β R mmse ( X | Y ) 1+ βσ 2 lim = n n →∞ 0 β ≥ β R A 1st–order φ transition in MMSE: At high temp. behaves as if X was Gaussian and at β = β R jumps to zero! – p. 21/2
Recommend
More recommend