Deep Learning and Physics 2019@YITP From Complex to Simple: Hierarchical Free-energy Landscape Renormalized in Deep Neural Networks Cybermedia Center & Department of Physics, Osaka Univ. Hajime Yoshino H. Yoshino, arXiv1910.09918 submitted to SciPost Phys referee round deadline Nov. 26 th… comments are welcome!
[Q] Why deep neural networks works? Why??? Why??? 2. learning is not too difficult 1. generalization is not bad on deep networks Empirical observations (glassy dynamics) learning is difficult 2. many local minimum 1. poor generalization Common sense over fitting e.g. # of parameters data size G. Carleo, et. al, arXiv:1903.10563v1 10 6 10 8 ⌧
correcting codes,.... History: Spinglasses, Hopfield model, error ? ? • Edwards-Anderson model(1975) � J ij > 0 H = − J ij s i s j J ij < 0 <i,j> s i = ± 1 ( i = 1 , 2 , . . . , N ) Quenched random spin-spin interactions ij = J 2 J 2 J ij = 0 Ground state: disordered A lot of energetic degeneracies due to frustration • Hopfield model (1982) : associative memory M 1 s i = ± 1 ξ µ i ξ µ J ij X Hebb rule J ij = j √ M µ =1 Firing of neurons synaptic weight ξ µ embedded patterns i = ± 1 • error correcting code : a statistical inference problem Sourlas (1989) J ij send receiver tries to reconstruct s i
<latexit sha1_base64="f6fwXjV2lpbD35eA2wSGmDwXznU=">ACcXichVHNSiNBEP4yrj+b9Se6F2Uvg0ERhFCjosaT6MWL4M9GBSMyM3Zi42R6mOkEY8gL7APoYQ/uCoLPoYX8CDjyB7zMJe9mBlEhERtRq6qr6qr/rbifwZKSJ7hJGx4fOru6ej8lPvX39A6nBoa1IlUNX5FzlqXDHsSPhSV/ktNSe2AlCYZcT2w7R8vN+nZFhJFU/ldDcReyS76siBdWzOUXzXzWpl56Rd0dT+VpkyWrOysZb4MrAzFlkb1lTqCnkcQMFGSUI+NAce7AR8dqFBULA2B5qjIUcybguUEeSuWXuEtxhM3rEe5Gz3Tbqc96cGcXsY+5QzFQ8pQ4TY3RLv6hBN/Sb7un/q9Nq8ZSmip7p8UVwf7At+HNf+ySuw1Dp9Yb6rWKGA+VitZfRAjzXu4LX7l5KyxubAxVhunC/rD+n/SHV3zDfzKX/dyXWx8R5K/4PGdzdeDramMNZ2ZWp9JLy61P6MHXzCKCX7xOSxiBWvI8bkBTnGOH4mGMWKYxmir1Ui0OZ/xzIzJByYljxU=</latexit> <latexit sha1_base64="6R4WA9A0JTWCWI9zsDIx8f7WTbo=">ACcXichVHNSiNBEP4y6290Na4XxcuQoAgLoUZF1z3JevHoz0YFI2Fm7GjZHqY6YTNBl/AB9DHvwBQfExvOwL7CGPIB6z4MWDlUlEliVaDV1VX9VX/XW3E3gy0kT1hPGhq7unt68/OTD4cWg4NfJpM1Ll0BU5V3kq3HbsSHjSFzktSe2g1DYJcTW87hcrO+VRFhJX/XVcDsVuy931ZlK6tGcrvmXmtzLz0i7paSGUou0jW4rxl/h9YWYotg7atqtQ18tiDgosyShDwoTn2YCPitQMLhICxXdQYCzmScV3gCEnmlrlLcIfN6CHv+5ztFGf8+bMKGb/4A7FTMVTjmBikv7QDTXoN93SPT1nFaLpzTVNk7La4ICsPHYxuP7JK7DUOXlvqtYo4kusVrL6IEa93Bb/MrP08bG1/XJ2hRd0gPrv6A63fEN/Mpf92pNrP9Ckr/g5Z3NzsHmTNazc6szWvrU/ow8TSGOaX3wBS1jBKnJ8boATnOE80TDGDdNIt1qNRJszin/M+PwMVCPLA=</latexit> 2008 1991 1987 2001 1999 �������������� � ������� ���������� � ���� � ��� � ����� � ��� � ����������������� � �������� � �������������������� � ��� � From spin glass to structural glass ������ � 10 µ m ������������ � p-spin to RFOT Kirkpatrick-Thirumalai Wolyness (1989) �������� � ���������������� � ������������ � ����������������� � Franz-Parisi (1995), Monasson (1995), Mezard-Parisi (1999) more on replicas �������������������� � ������������� � ������� � �� � �������������������� � d → ∞ Parisi-Zamponi (2010), Charbonneau-Kurchan-Parisi-Urbani-Zamponi (2014) ������������ � ��� � ������������ ������� ������ � ���������������� � ����������������� � Yoshino-Mezard (2010), Yoshino-Zamoponi (2014), Rainone-Urbani-Yoshinno-Zamponi (2015) ������������� � ������ �� ����� ��� � ������� � Jin-Yoshino (2017), Jin-Urbani-Zamponi-Yoshino (2018) ����������������� ������� �������� � �� � S 1 … and back to p-spins… but with spin components M → ∞ S 2 and without quenched disorder S 0 H. Yoshino, SciPost Phys. 4 (6), 040 (2018)
Glass order parameter with replicas replica 1 spontaneous breaking of ergodicity glass liquid symmetry breaking field replicas a=1,2,…n crystal We are interested with replica 2 n N X X X S a i S b H [ ✏ ] = H ( s a ) − ✏ ab i a =1 i =1 a,b ✏ ] = ln Tr S e − � H [ˆ ✏ ] − � F [ˆ N Q ab ≡ 1 = − 1 @ F [ˆ ✏ ] D E X Overlap S a i S b i N N @✏ ab i =1 ✏ ab → 0 + lim lim N →∞ Q ab G [ ˆ X Q ] = F [ˆ ✏ ] + N ✏ ab Q ab Q ab = 0 ab Q ab > 0 � ✏ ab = @ G [ ˆ Q ] � = 0 � @ Q ab � � ˆ Explicit RSB : Parisi-Virasoro (1989) Q = ˆ Q SP
ultra-metricity distance G. Parisi (1979) first found in the SK model for spinglass Toulouse-Dehaene-Changeaux (1986) breaking of ergodicity & permutation symmetry Edwards-Anderson (EA) order parameter Self-overlap Rammal-Toulouse-Virassoro (1986) overlap matrix <latexit sha1_base64="GZjFwLgB+JxI2TeI0dcstFSRoKc=">ACh3ichVFNTxRBEH0MoLCiLHgx8TJxg/G01LJmARMTwJh4ZIEFEpZMZoYGOsxMDzO9G3EyVw/+AQ6eJCHR4FX/gBf/gAd+gvEIiRcP1nwkxBilOul+/ape9etuJ/RkrInOB4zBoeEbN0dGK7fGbt8Zr05MrseqF7mi4ypPRZuOHQtPBqKjpfbEZhgJ23c8seEcPMvyG30RxVIFa/oFNu+vRfIXenamimrah5aSTfyzeL6dOuJ30rcyuVqadm0rsZ3UqtaoPt+iBs2YBWi1StBsmo065VFDGcuq+h5d7EDBRQ8+BAJoxh5sxDy20AhZG4bCXMRI5nBVJUWNvjKsEVNrMHPO/xbqtkA95nPeNc/ZIrFCsVd0lhYoq+0Qe6oK90Rt/p1z+7JXmXzM0Rr06hFaE1/ube6s9rVT6vGvtXqv+61tjFXO5WsvswZ7J7uIW+/+r4YvXJylTykE7oB/t/R+f0hW8Q9C/d07ZYeYtK8QVZXD3432B9pt5o1pvtx7WFpfIzRnAfD/CIX3wWC3iBZXT43Nf4iE/4bIwa0bLmCtKjYFScxd/hLH4G9Ulos=</latexit> overlap Replica symmetry braking and ultra-metricity Q ab = a b c Q ( a, b ) = min( Q ( a, c ) , Q ( b, c )) q EA = lim b → a Q ab
“Disorder-free” vector p-spin models on tree S 1 H. Yoshino, SciPost Phys. 4 (6), 040 (2018) S 2 p = 3 i , . . . , S M S i = ( S 1 i , S 2 i ) S 0 c = 4 2 | = M | S i i = 1 , 2 , . . . , N continuous or Ising S µ i = ± 1 X Hamiltonian H = − V ( r ⌅ ) . Locally tree like lattice with connectivity c ⌅ c = α M M 1 X S µ 1( ⌅ ) S µ 2( ⌅ ) · · · S µ “gap” r ⌅ = δ − √ p ( ⌅ ) M µ =1 # of factor nodes N ⌅ = Nc/p = NM α /p < N p /p ! “Inter-mediate sparseness “: high connectivity but not “global coupling”
“Vectorial” constraint satisfaction problems standard discrete coloring “continuous” version e l but he a - θ s “repulsive” vectorial spin model antiferromagnetic Potts model ✓ ◆ δ − S i · S j X X H = V δ q i ,q j H = √ M i,j i,j ✏ →∞ ✏ r 2 ✓ ( − r ) V ( r ) = lim
clustering = glass transition <latexit sha1_base64="f6fwXjV2lpbD35eA2wSGmDwXznU=">ACcXichVHNSiNBEP4yrj+b9Se6F2Uvg0ERhFCjosaT6MWL4M9GBSMyM3Zi42R6mOkEY8gL7APoYQ/uCoLPoYX8CDjyB7zMJe9mBlEhERtRq6qr6qr/rbifwZKSJ7hJGx4fOru6ej8lPvX39A6nBoa1IlUNX5FzlqXDHsSPhSV/ktNSe2AlCYZcT2w7R8vN+nZFhJFU/ldDcReyS76siBdWzOUXzXzWpl56Rd0dT+VpkyWrOysZb4MrAzFlkb1lTqCnkcQMFGSUI+NAce7AR8dqFBULA2B5qjIUcybguUEeSuWXuEtxhM3rEe5Gz3Tbqc96cGcXsY+5QzFQ8pQ4TY3RLv6hBN/Sb7un/q9Nq8ZSmip7p8UVwf7At+HNf+ySuw1Dp9Yb6rWKGA+VitZfRAjzXu4LX7l5KyxubAxVhunC/rD+n/SHV3zDfzKX/dyXWx8R5K/4PGdzdeDramMNZ2ZWp9JLy61P6MHXzCKCX7xOSxiBWvI8bkBTnGOH4mGMWKYxmir1Ui0OZ/xzIzJByYljxU=</latexit> M → ∞ H. Yoshino, SciPost Phys. 4 (6), 040 (2018) continuous RSB: hierarchical clustering of solutions connectivity Jamming (SAT-UNSAT) α = c/M b) 10 0 6.7 6.71 6.72 10 -1 6.722 14 1 − q ( x ) 6.724 6.726 power law 10 -2 12 10 -3 10 8 10 -4 1 − q ( x ) = x − κ α κ = 1 . 415726 ... 6 q � = 0 10 -5 glass 10 -4 10 -3 10 -2 10 -1 10 0 4 x 2 Liquid q = 0 Same jamming criticality 0 -0.4 -0.2 0 0.2 0.4 as hard spheres δ and perceptron (p=1) Franz-Parisi (2016), Franz-Parisi-Sevlev- Urbani-Zamponi (2017)
Recommend
More recommend