Deep Learning and Physics -- 2019 Deep Random Neural Field Shun-ichi Amari RIKEN Center for Brain Science ; Araya
Brief History of AI and NN First Boom : start 1956 ~ AI neural networks--perceptron Dartmouth Conf. Perceptron symbol universal computation logic learning machine Dark period (late 1960~1970’s) stochastic gradient descent learning (1967) for MLP
Perceptron F.Rosenblatt, Principles of Neurodynamics, 1961 x z McCulloch-Pitts neuron 0,1 binary learning Multilayer lateral & feedback connection
latt : multilayer Ros osenbla Dee eep Neu Neural Ne Networks er perceptron = x ( , ) z f x W ( ) ( ) 2 = − , , L W y f W x x ( ) ∂ , L W x → + ∆ ∆ = − , c w w w w learning of hidden neurons ∂ W differentiable : analog neuron analog neuron stochastic gradient learning Amari, Tsypkin, 1966~67: error back-prop, 1976
First stochastic descent learning of MLP (1967;1968) Information Theory II --Geometrical Theory of Information Shun-ichi Amari University of Tokyo Kyoritu Press, Tokyo, 1968
( ) { } { } θ = ⋅ ⋅ + ⋅ ⋅ , max , min , f v v x w x w x w x w x 1 1 2 2 3 4 w max 1 v 1 x y v 2 max w 4
Second Boom 1970~ AI 1980~ neural networks expert system MLP (backprop) (MYCIN) associative memory stochastic inference (Bayes) chess (1997)
Third Boom 2010~ Deep learning Stochastic inference (graphical model; Bayesian; WATSON) Deep learning pattern recognition: vision, auditory, sentence analysis, machine translation alpha-go Language processing; sequence and dynamics (word2vec, deep learning with rec. net) Integration of (symbol, logic) vs (pattern, dynamics)
De Deep L p Learni rning Self-Organization + Supervised Learning RBM: Restricted Boltzmann Machine Auto-Encoder, Recurrent Net Dropout Contrastive divergence Convolution Resnet ReLU Adversarial net
Victory of Deep Neural Networks Hinton 2005, 2006 ~ 2012 many others visual pattern, auditory pattern Go-game sentence analysis, machine translation adversarial network, pattern generation
Mathematical Neuroscience searches for the principles mathematical studies using simple idealistic models (not realistic) Computational neuroscience AI : technological realization
Mathematical Neuroscience and Brain Brain has found and implemented the principles through evolution (random search) historical restriction material restriction Very complex (not smartly designed)
Theoretical Problems on Learning: 1 Local solution and global solution ( ) Θ L Simulated annealing Quantum annealing : Θ
Theoretical Problem of Learning: 2 = θ + ε ( , ) y f x training loss and 1 ∑ = − θ 2 | ( , ) | L y f x generalization loss :overtraining emp i i N = − θ 2 [| ( , ) | ] L E y f x gen P ≈ + L L generalization loss gen emp N Training loss
Extremely wdie network P-> ∞ : P>>N Local minimum =global minimum Kawaguchi, 2019
Learning curve P>>N Double descent Belkin et al. 2019 ; Hastie et al. 2019
Random Neural Network Random is excellent !! Random is magic!! Statistical dynamics Random code
Random Deep Networks Poole et. Al., 2016 Schoenholtz et. Al., 2017 ~~ Signal propagation Error back propagation
Jacot et al; Neural tangent kernel 1 = θ θ = − θ 2 ( , ); ( , ) ( ( , )) y f x l x y f x 2 1 θ = − θ = θ − θ 2 ( , ) { ( , )} ; ( , ) ( , *) l x y f x e f x f x 2 ここに数式を入力します。 ∂ θ = − ∂ η = − ∂ η θ − θ ( ')( ( , ) ( , *)) l f x f x f x θ θ t ∂ θ = ∂ ∂ θ = − ∂ η ⋅∂ θ ( , ) ( ) ( ) ( ') ( ', ) f x f x f x f x e x θ θ θ t t K; Gaussian kernel
θ = ∂ ⋅∂ ( , '; ) ( ) ( ') K x x f x f x θ θ ∂ θ = − < η θ θ > ( , ) ( , '; ) ( ', ) f x K x x e x t θ ≈ ( , '; ) ( , ') : Gaussian kernel K x x K x x initial θ ≈ θ t ini
Theorem P>>N Optimal solution lies near a random network. random Bailey et al 2019 1 = ( ) w O ij n 1 ∆ = ( ) w O n ij
Random Neural Field − 1 l ∫ = + ( ') ( , ') ( ) ( ') u z w z z x z dz b z l = ϕ ( ') ( ( ')) x z u z ( ', ) : randam w z z (0 mean Gaussian correla ; t d e )
Stati tisti tical Ne Neurodynam amics microdynamics ( ) ( ) ( ) ( ) = + = sgn W x t x x t 1 T x x t w macrodynamics + = X F X ( ) : macrostate t 1 t = = X X ( x ) X T ( x ) 2 2 W 1 = = X X ( x ) X T T ( x )? 3 3 W W 1
Statistical Neurodynamics Rozonoer (1969 ) Amari (1969, 1971; 1973) Sompolinski Amari et al (2013) ~ (0, 1) w N ij Toyoizumi et al (2015) Poole, …, Ganguli (2016) Schoenholz et al (2017) Yang & Schoenholtz (2017), Macroscopic behaviors Karakida, et al (2019) common to almost all (typical) networks Jacot et al. (2019) ……
Random Deep Networks + 1 l l l ∑ = ϕ + ( ) x w x w ij j 0 i i 1 σ ∑ 2 ~ (0, / ) 2 w N n = A x ij l i n l l σ 2 l ~ (0, ) w N = 0 i b ( ) A F A + 1 l l
Macroscopic variables 1 ∑ = 2 activity : A x i n distance: = [ : '] D D x x metric,curvature & Fisher information = ( ) A F A + 1 l l = ( ) D K D + 1 l l
Dynamics of Activity: law of large numbers ∑ = ϕ + = ϕ = φ + ( ) ( ) : ( ) x w x b u x Wx b i ik k i i ~ (0, ) u N A i 1 ∑ = = ϕ = χ 2 2 ( ) [ ( ) ] ( ) A x E u A 0 i i n + 1 l ∫ χ = ϕ 2 ( ) ( ) ~ (0,1) A Av Dv v N 0
χ > '(0) 1 0 = χ ( ) A A 0 ∑ → 2 converge x i
Pullback Metric & Curvature = φ ( ) x Wx 1 ∑ l = = ⋅ 2 i j ds g dx dx d d x x ij n l
= φ ( ) Basis vec tors x Wx ( ) ∑ ∑ ′ = ϕ = i i dx u W dx B dx l l i i i i i i − − − − 1 1 1 1 l l l l l l − − 1 1 l l l l m m = = ここに数式を入力します。 ( .. . ) d Bd B Bd x x x ( ) ′ = ϕ i i : Jacob ia n B u W l l i i i − − 1 1 l l l − − 1 1 l l l l m m = = . . . B B B e e e a a a
1 = ⋅ e e g ab a b n l
Dynamics of Metric ∑ = a dx B dx a k k = e B e a a = = ϕ a a ( ) ( '( ) ) B B u w k a k ∑ = a b g B B g ab k j kj ϕ = ϕ 2 2 a a a a E[ '( )) ] E[ '( )) ]E[ ] u w w u w w a k j a k j − − mean field approximation ∫ χ = ϕ 2 ( ) '( ) A Av Dv 1
Mectric Law of large numbers − 1 l l l l = = , g BB g e e a b ab ab l l l ∑ = 2 ds g d x d x a b ab ( ) ∑ 2 ′ ′ = ϕ ≈ σ ϕ δ 2 2 i i BB w w u E l l ′ ′ i i i l i i − − − − 1 1 1 1 l l l l l i l ( ) 2 ′ χ = σ ϕ 2 E u 1 l i l
= ∏ l χ ( ) ( ( )) ( ) g x x g x 1 ij ij conformal geometry
= χ rotation, expansion ( ) ( ) ( ) g x A g x 1 ij ij conformal transformation! χ = χ δ ( ) A 1 1 ij l ⇒ = χ δ l g 1 ij ij
Domino Theorem − − 1 2 l l l ∂ ∂ ∂ x x x = = = B BB BBB m m m ∂ ∂ ∂ x x x − − 1 2 l l l ∂ ∂ ∂ x x x = = = B BB BB B m m m ∂ ∂ ∂ W W W ′ Σ δ = χ δ i i B B L L ′ ′ ′ 1 i i i i i i − − − − 1 1 1 1 L L L L L L ′ Σ δ = χ χ χ χ δ i i B B BBBB L L ′ ′ ′ 1 1 1 1 i i i i i i − − − − L L L 1 L 1 L 1 L 1
Dynamics of Curvature = ∇ = ∂ ∂ i i H e x ab a b a b = ϕ ⋅ ⋅ + ϕ ⋅∂ ''( )( )( ) '( ) u w e w e w e i a b a b ⊥ = + H H H ab ab ab = 2 2 | | H H ab ab
∫ χ = ϕ 2 ( ) ''( ) A Av Dv 2 2 2 + 1 1 l l l l = χ δ + + χ ( )( 2 1) ( ) H A A H ab ab χ 2 1 ab 2 n 1 χ > 1 1 exponential expansion! creation is smal l !
Poole et al (2016) Deep neural networks
Distance 1 [ ] ∑ 2 = − , D x y x y i i n
Dynamics of Distance (Amari, 1974) 1 ∑ = − 2 ( , ') ( ') D x x x x i i n 1 ∑ = ⋅ = ( , ') ' ' C x x x x x x i i n = + − ' 2 D A A C ∑ = i ~N(0, V) u w y i k k ∑ ( ) = i ' ' V= u w y A C i k k = ϕ − ε + ν ϕ − ε + ν ( ') E[ ( ) ( ' )] C A C A C C A C C
+ = ( ) D K D 1 l l dD = χ > 1 1 dD
Recommend
More recommend