Information-theoretic thresholds Amin Coja-Oghlan Goethe University Frankfurt based on joint work with Florent Krzakala (ENS Paris) Will Perkins (Birmingham) Lenka Zdeborová (CEA Saclay)
Inference from samples � to infer an unkown probability distribution from samples � the distribution itself is random, determined by parameters σ ∗
Example: error-correcting codes � A ∈ F m × n is the generator matrix 2 � A σ ∗ is subjected to noise
Example: the stochastic block model � random coloring σ ∗ : V → {1,..., q } � for each e = { v , w } independently, � e − β if σ ∗ ( v ) = σ ∗ ( w ), = d q e ∈ G ∗ | σ ∗ � � P n · q − 1 + e − β · if σ ∗ ( v ) �= σ ∗ ( w ) 1 � d = signal strength; e − β = noise
Example: the stochastic block model � the agreement of σ , τ : V → {1,..., q } is � � 1 q � α ( σ , τ ) = q − 1 max 1 { σ ( v ) = κ ◦ τ ( v )} − 1 . n κ ∈ S q v ∈ V � for what d , β is it possible to recover τ G ∗ such that E[ α ( σ ∗ , τ G ∗ )] ≥ Ω (1) ?
Example: the stochastic block model Easy–hard–impossible � for large d efficient algorithms should detect σ ∗ � for very small d there is nothing to detect � in-between the problem may be well-posed but hard 0 < d inf ( β ) < d alg ( β )
Example: the stochastic block model The algorithmic threshold � combinatorial algorithms for large d [1980s] � spectral algorithms for moderate d [1990s, 2000s] � the Kesten-Stigum threshold [AS15] � 2 � q − 1 + e − β ? d alg ( β ) = 1 − e − β
Example: the stochastic block model The information-theoretic threshold � statistical physics prediction [DKMZ11] � the case q = 2 [MNS13, MNS14, M14] � bounds on d inf ( q , β ) [BMNN16]
The information-theoretic threshold Theorem [COKPZ16] For β > 0, d > 0 let � � � B ∗ q , β ( d ) = sup B q , β , d ( π ) : π = T q , β , d ( π ), µ ( i ) d π ( µ ) = 1/ q where �� q d γ exp( − d ) γ � ∞ 1 − (1 − e − β ) µ j ( h ) δ BP µ 1,..., µγ d π ⊗ γ ( µ 1 ,..., µ γ ), � � � T q , β , d : π �→ q (1 − (1 − e − β )/ q ) γ γ ! γ = 0 h = 1 j = 1 � γ j = 1 1 − (1 − e − β ) µ j ( i ) BP µ 1 ,..., µ γ ( i ) = , � q � γ j = 1 1 − (1 − e − β ) µ j ( h ) h = 1 � Λ ( � q � γ Λ (1 − (1 − e − β ) � q i = 1 1 − (1 − e − β ) µ ( π ) σ = 1 µ ( π ) 1 ( σ ) µ ( π ) ( σ )) 2 ( σ )) − d � σ = 1 i B q , β , d ( π ) = E . q (1 − (1 − e − β )/ q ) γ 1 − (1 − e − β )/ q 2 � q , β ( d ) > ln q + d � d > 0 : B ∗ 2 ln(1 − (1 − e − β )/ q ) Then d inf ( q , β ) = inf .
The posterior distribution � define � ψ G ∗ ( σ ) = exp( − β 1 { σ ( v ) = σ ( w )}), { v , w } ∈ E ( G ) Z ( G ∗ ) = � ψ G ∗ ( σ ). σ ∈ Ω V σ ∗ = σ | G ∗ � ≍ µ G ∗ ( σ ) = ψ G ∗ ( σ )/ Z ( G ∗ ) � then � P
The posterior distribution � reconstruction is impossible iff 1 � � � lim E � µ G ∗ , v , w − µ G ∗ , v ⊗ µ G ∗ , w TV = 0 � n 2 n →∞ v , w
The posterior distribution 1 � � � lim E � µ G ∗ , v , w − µ G ∗ , v ⊗ µ G ∗ , w TV = 0 � n 2 n →∞ v , w 1 n E[log Z ( G ∗ )] = log q + d 2 log(1 − (1 − e − β )/ q ). ⇔ lim n →∞
The Aizenman-Sims-Starr scheme Z ( G ∗ n + 1 ) 1 � � n E[log Z ( G ∗ )] = lim lim n →∞ E log Z ( G ∗ n →∞ n )
The Aizenman-Sims-Starr scheme Z ( ˜ G + vw ) e − β 1 { σ = τ } µ G ∗ , v , w ( σ , τ ) � = Z ( ˜ G ) σ , τ ∈ [ q ]
Correlations � X = fixed finite set � µ ∈ P ( X n ) for some large integer n
Correlations Definition A probability measure µ ∈ P ( X n ) is ε -symmetric if n 1 � � � � µ i , j − µ i ⊗ µ j TV < ε � n 2 i , j = 1
Correlations The magic lemma [COKPZ16] For any ε > 0 there is a bounded random variable T such that for all µ ∈ P ( X n ) the following is true: � choose U ⊂ {1,..., n } of size T randomly � sample ˆ σ from µ � let µ ( τ ) = µ [ τ |∀ i ∈ U : τ ( i ) = ˆ ˆ σ ( i )]; then � � P µ is ε -symmetric ˆ > 1 − ε
Correlations Lemma [BCO15] For any ε > 0, k ≥ 3 there is δ > 0 s.t. for n > 1/ δ for δ -symmetric µ , n 1 � � � � µ i 1 ,..., i k − µ i 1 ⊗···⊗ µ i k TV < ε � n k i 1 ,..., i k = 1
Low density generator matrix codes � A ∈ F m × n with k ≥ 3 ones per row 2 � signal d = km / n , noise β
Low density generator matrix codes log P[ σ ∗ = s , τ = t ] σ ∗ = s , τ = t I ( σ ∗ , τ | A ) = � � � P P[ σ ∗ = s ]P[ τ = t ] s , t
Low density generator matrix codes � non-rigorous statistical physics analysis [KS99] � upper bound on the mutual information, even k [M05] � existence of lim n →∞ 1 n I ( σ ∗ , τ | A ), even k [AM15]
Low density generator matrix codes Theorem [CKPZ16] For k ≥ 2, β > 0, d > 0 and π ∈ P 0 ([ − 1,1]) let � � � � �� γ k − 1 k 1 − d ( k − 1) θ π θ π � � � � B d , β ( π ) = E 1 + (1 − 2 β ) σ J i 1 + (1 − 2 β ) J 2 Λ Λ i , j j k σ =± 1 i = 1 j = 1 j = 1 Then 1 n I ( σ ∗ , τ | A ) = (1 + d / k )log2 + β log β + (1 − β )log(1 − β ) − sup lim B d , β ( π ) n →∞ π ∈ P 0 ([ − 1,1]) The information-theoretic threshold is equal to � � d inf ( β ) = inf d > 0 : sup B d , β ( π ) > log2 π ∈ P 0 ([ − 1,1])
Conclusions � generalisation: the “teacher-student scheme” � justification of the ‘replica symmetric cavity method’ � other applications: � random graph colouring � Goldreich’s one-way function � the diluted p -spin model
Recommend
More recommend