On the Thermodynamic Equivalence between Hopfield Networks and Hybrid Boltzmann Machines Enrica Santucci On the equivalence of Hopfield Networks and Boltzmann Machines (A. Barra, A. Bernacchia, E. Santucci, P. Contucci, Neural Networks 34 (2012) 1-9) UNIVERSITY OF CAGLIARI PraLab - Department of Electrical and Electronic Engineering 4 novembre 2016
Parte I Description of the models
Spin glass: Sherrington Kirkpartick (SK) - 1975 Spin system whose low temperature state appears as a disordered one rather than the uniform or periodic structure pattern that one use to find in conventional Ising magnets K. H. Fischer, J. A. Hertz (1991) - M. Mezard, G. Parisi, M. Virasoro (1987) Figure 1 : Schematic representation of a spin glass structure versus a ferromagnet one
SK Hamiltonian • N particles (where N is very large) • σ i ∈ {− 1 , + 1 } Ising spin related to the i -th particle ( i = 1 , . . . , N ) • J ij ∼ N ( 0 , 1 ) interaction matrix between the lattice particles • T temperature of the system ( β = 1 / T ) Hamiltonian H sg ( σ, J ) = − β � √ J ij σ i σ j (1) N 1 ≤ i , j ≤ N • frustration : we cannot simultaneously minimize all the Hamiltonian terms because the interactions J ij are random variables
SK Phase Diagram • Partition Function Z N ( β ) = � σ exp ( − H N ( σ, J )) • Average over the interactions E ( F ( J )) = � d µ ( J ) F ( J ) Free Energy f N ( β ) = − 1 β N E ln Z N ( β ) i = 1 σ ( a ) σ ( b ) � N � N m = 1 q ab = 1 • Order Parameters i = 1 σ i N N i i • Free Energy for N → ∞ • Minimization of the free energy with respect to the order parameters • Self-consistence equations
Gaussian spin glass (A. Barra, G. Genovese, F. Guerra - 2012) • z i , i = 1 , . . . , N ∼ N ( 0 , 1 ) • J ij , i , j = 1 , . . . , N ∼ N ( 0 , 1 ) Hamiltonian H N ( z , J ) = − β � √ J ij z i z j (2) N 1 ≤ i < j ≤ N i = 1 z ( a ) z ( b ) � N • Order Parameter q ab = 1 i i N • Free Energy for N → ∞ • Minimization of the free energy with respect to the order parameters • Self-consistence equation
Hopfield Model (HM) - 1982 Σ 1 Σ 5 Σ 2 Σ 4 Σ 3 • Stored patterns: ξ µ = ( ξ µ 1 , . . . , ξ µ ξ µ N ) , µ = 1 , . . . , P i ∈ {− 1 , + 1 } • Digital units (activation levels): σ = ( σ 1 , . . . , σ N ) , σ i ∈ {− 1 , + 1 } • Activation function: Sign function • Two-way information flow • Symmetric synapses ( J ij = J ji )
HM Hamiltonian Hamiltonian H hop ( σ, J ) = − β � J ij σ i σ j (3) N 1 ≤ i , j ≤ N Hebbian learning rule P � ξ µ i ξ µ J ij = ∀ i . j = 1 , . . . , N j µ = 1 m µ = 1 � N � N i = 1 σ ( a ) σ ( b ) i = 1 ξ µ q ab = 1 • Order parameters i σ i i i N N • Free energy for N → ∞ • Minimization of the free energy with respect to the order parameters • Self-consistence equations
Analogy between Sherrington Kirkpatrick and Hopfield models • N particles ← → neurons • σ i Ising spin ← → neuronal activation level • J ij spin interactions ← → synapses • T temperature ← → noise level P → ∞ ⇓ Sherrington-Kirkpartick ⇐ ⇒ Hopfield
HM Phase Diagram P • α = lim N → ∞ control parameter ( high storage regime ) N • T temperature • Retrieval Phase F (0 < α ≤ 0 . 05) • Mixed phase M (0 . 05 < α ≤ 0 . 14) • Spin glass phase SG ( α > 0 . 14) • Paramagnetic phase P
Boltzmann Machine (G. E. Hinton, T. J. Sejnowski - 1983) � 1 � 2 Σ 1 � 3 Σ 2 Σ 3 Σ 4 Σ 5 Τ 1 Τ 2 • Digital visible layer: σ i ∈ { + 1 , − 1 } ( i = 1 , . . . , N ) • Two analog hidden layers: z µ , τ ν ∼ N ( 0 , 1 ) µ = 1 , . . . , P ν = 1 , . . . , K • Activation function: sigmoidal function • Two-way information flow • Symmetric synaptic weights ξ µ i η ν i
Restricted and Hybrid version of the Boltzmann Machine (RHBM) Assumptions • hybrid : one digital layer of visible units and two analog layers of hidden units • restricted : no connections between the hidden layers Hamiltonian � N , P N , K P K � H rhbm ( β, σ, z , τ ; ξ, η ) = 1 µ + 1 β � � z 2 � τ 2 � � σ i ξ µ σ i η ν ν − i z µ + i τ ν (4) 2 2 N µ = 1 ν = 1 i ,µ = 1 i ,ν = 1
Parte II Results
Dynamics of the hidden layers Ornstein-Uhlembeck Diffusion Process N � D dz µ 2 D � ξ µ = − z µ ( t ) + i σ i + ζ µ ( t ) β dt i = 1 � N 2 D ∗ D ∗ d τ ν � η ν = − τ ν ( t ) + i σ i + ρ ν ( t ) dt β i = 1 • ζ , ρ white Gaussian noises • D , D ∗ quantifiers of the timescale of the dynamics • β measure of the strength of the fluctuations Probability distribution of the hidden variables � N � � β − β � 2 � � ξ µ Pr ( z µ | σ ) = 2 π exp z µ − i σ i 2 i = 1 � N � � β − β � 2 � � η ν Pr ( τ ν | σ ) = 2 π exp τ ν − i σ i 2 i = 1 for µ = 1 , . . . , P and ν = 1 , . . . , K
Dynamics of the visible layer N P K � � � � � � ξ i � η i σ i ( t + 1 ) = sign µ σ i ( t ) + ν σ i ( t ) − T i µ = 1 ν = 1 i = 1 • t discrete time unit • T i threshold potential Probability distribution of the visible units (Glauber dynamics) � P µ = 1 ξ µ exp [ βσ i i z µ ] Pr ( σ i | z ) = exp [ β � P µ = 1 ξ µ i z µ ] + exp [ − β � P µ = 1 ξ µ i z µ ] � K ν = 1 η ν exp [ βσ i i τ ν ] Pr ( σ i | τ ) = exp [ β � K ν = 1 η ν i τ ν ] + exp [ − β � K ν = 1 η ν i τ ν ] P K � � Pr ( z | σ ) = Pr ( z µ | σ ) Pr ( τ | σ ) = Pr ( τ ν | σ ) µ = 1 ν = 1 N N � � Pr ( σ | z ) = Pr ( σ i | z ) Pr ( σ | τ ) = Pr ( σ i | τ ) i = 1 i = 1
Statistical equivalence between Hopfield network and Boltzmann machine Pr ( σ, z , τ ) ∝ exp [ − H rhbm ( σ, z , τ )] ⇓ N P K β � � � ξ µ i ξ µ η ν i η ν σ i σ j = exp [ − H hop ( σ )] Pr ( σ ) ∝ exp j + j 2 N i , j = 1 µ = 1 ν = 1 • Thermodynamics of the visible units in a RHBM is equivalent to the one of a Hopfield network • The dynamics of a Hopfield network, requiring the update of N neurons and the storage of N 2 synapses, can be simulated by a RHBM, requiring the update of N + P neurons but the storage of only NP synapses
Counterpart of the HM Phase Diagram in a RHBM • N number of neurons ← → number of visible units • P , K number of stored patterns ← → number of hidden units • ξ, η stored patterns ← → synaptic weights Hopfield model ⇐ ⇒ Boltzmann Machine • Retrieval Phase ← → Few hidden units • Spin Glass Phase ← → Too many hidden units
Numerical simulations of the RHBM with a single hidden layer for different values of the parameters β (= 1 / T ) and P • β = 0 . 5 (high T ) no retrieval is possible regardless of the number of hidden units P • β = 2 (intermediate T ) retrieval is possible provided that the number of hidden units is not too large • β = 10 (low T ) retrieval is maintained up to large values of P
Noise Source (I): Connection between the hidden layers � N , P N , K P , K P K � H rhbm ( σ, z , τ ; ξ, η ) = 1 µ + 1 β � ˜ � z 2 � τ 2 � ξ µ � η ν � ζ ν ν − i σ i z µ + i σ i τ ν + ǫ µ z µ τ ν 2 2 N µ = 1 ν = 1 i ,µ i ,ν µ,ν ⇓ Integration in z µ e τ ν ⇓ N � α N γ N H hop ( σ ; ξ, η ) = − β 1 − ǫ βγ 1 − ǫ βα � � � �� ˜ � � � ξ µ i ξ µ η ν i η ν + σ i σ j j j 2 N 4 4 i , j = 1 µ ν
Noise Source (II): System subjected to an external field z µ , τ ν ∼ N ( 0 , 1 ) − → z µ ∼ N ( z 0 , 1 ) , τ ν ∼ N ( τ 0 , 1 ) � N , P N , K P K � H rhbm ( σ, z , τ ; ξ, η ) = 1 ( z µ − z 0 ) 2 + 1 β � ˜ � � ( τ ν − τ 0 ) 2 − � ξ µ � η ν i σ i z µ + i σ i τ ν 2 2 N µ = 1 ν = 1 i ,µ i ,ν ⇓ N N ˜ � � � � H hop ( σ ) − → H hop ( σ ) + β z 0 χ i σ i + βτ 0 ψ i σ i i = 1 i = 1 � P � K 1 µ = 1 ξ µ 1 ν = 1 η ν • χ i = i , ψ i = external random fields √ √ i P K
Recommend
More recommend