deep random neural field
play

Deep Random Neural Field Shun-ichi Amari RIKEN Center for Brain - PowerPoint PPT Presentation

Deep Learning and Physics -- 2019 Deep Random Neural Field Shun-ichi Amari RIKEN Center for Brain Science Araya Brief History of AI and NN First Boom start 1956 ~ AI neural


  1. Deep Learning and Physics -- 2019 Deep Random Neural Field Shun-ichi Amari RIKEN Center for Brain Science ; Araya

  2. Brief History of AI and NN First Boom : start 1956 ~ AI neural networks--perceptron Dartmouth Conf. Perceptron symbol universal computation logic learning machine Dark period (late 1960~1970’s) stochastic gradient descent learning (1967) for MLP

  3. Perceptron F.Rosenblatt, Principles of Neurodynamics, 1961 x z McCulloch-Pitts neuron 0,1 binary learning Multilayer lateral & feedback connection

  4. latt : multilayer Ros osenbla Dee eep Neu Neural Ne Networks er perceptron = x ( , ) z f x W ( ) ( ) 2 = − , , L W y f W x x ( ) ∂ , L W x → + ∆ ∆ = − , c w w w w learning of hidden neurons ∂ W differentiable : analog neuron analog neuron stochastic gradient learning Amari, Tsypkin, 1966~67: error back-prop, 1976

  5. First stochastic descent learning of MLP (1967;1968) Information Theory II --Geometrical Theory of Information Shun-ichi Amari University of Tokyo Kyoritu Press, Tokyo, 1968

  6. ( ) { } { } θ = ⋅ ⋅ + ⋅ ⋅ , max , min , f v v x w x w x w x w x 1 1 2 2 3 4 w max 1 v 1 x y v 2 max w 4

  7. Second Boom 1970~ AI 1980~ neural networks expert system MLP (backprop) (MYCIN) associative memory stochastic inference (Bayes) chess (1997)

  8. Third Boom 2010~ Deep learning Stochastic inference (graphical model; Bayesian; WATSON) Deep learning pattern recognition: vision, auditory, sentence analysis, machine translation alpha-go Language processing; sequence and dynamics (word2vec, deep learning with rec. net) Integration of (symbol, logic) vs (pattern, dynamics)

  9. De Deep L p Learni rning Self-Organization + Supervised Learning RBM: Restricted Boltzmann Machine Auto-Encoder, Recurrent Net Dropout Contrastive divergence Convolution Resnet ReLU Adversarial net

  10. Victory of Deep Neural Networks Hinton 2005, 2006 ~ 2012 many others visual pattern, auditory pattern Go-game sentence analysis, machine translation adversarial network, pattern generation

  11. Mathematical Neuroscience searches for the principles mathematical studies using simple idealistic models (not realistic)  Computational neuroscience  AI : technological realization

  12. Mathematical Neuroscience and Brain Brain has found and implemented the principles through evolution (random search) historical restriction material restriction Very complex (not smartly designed)

  13. Theoretical Problems on Learning: 1 Local solution and global solution ( ) Θ L Simulated annealing Quantum annealing : Θ

  14. Theoretical Problem of Learning: 2 = θ + ε ( , ) y f x training loss and 1 ∑ = − θ 2 | ( , ) | L y f x generalization loss :overtraining emp i i N = − θ 2 [| ( , ) | ] L E y f x gen P ≈ + L L generalization loss gen emp N Training loss

  15. Extremely wdie network P-> ∞ : P>>N Local minimum =global minimum Kawaguchi, 2019

  16. Learning curve P>>N Double descent Belkin et al. 2019 ; Hastie et al. 2019

  17. Random Neural Network Random is excellent !! Random is magic!! Statistical dynamics Random code

  18. Random Deep Networks Poole et. Al., 2016 Schoenholtz et. Al., 2017 ~~ Signal propagation Error back propagation

  19. Jacot et al; Neural tangent kernel 1 = θ θ = − θ 2 ( , ); ( , ) ( ( , )) y f x l x y f x 2 1 θ = − θ = θ − θ 2 ( , ) { ( , )} ; ( , ) ( , *) l x y f x e f x f x 2 ここに数式を入力します。 ∂ θ = − ∂ η = − ∂ η θ − θ ( ')( ( , ) ( , *)) l f x f x f x θ θ t ∂ θ = ∂ ∂ θ = − ∂ η ⋅∂ θ ( , ) ( ) ( ) ( ') ( ', ) f x f x f x f x e x θ θ θ t t K; Gaussian kernel

  20. θ = ∂ ⋅∂ ( , '; ) ( ) ( ') K x x f x f x θ θ ∂ θ = − < η θ θ > ( , ) ( , '; ) ( ', ) f x K x x e x t θ ≈ ( , '; ) ( , ') : Gaussian kernel K x x K x x initial θ ≈ θ t ini

  21. Theorem P>>N Optimal solution lies near a random network. random Bailey et al 2019 1 = ( ) w O ij n 1 ∆ = ( ) w O n ij

  22. Random Neural Field − 1 l ∫ = + ( ') ( , ') ( ) ( ') u z w z z x z dz b z l = ϕ ( ') ( ( ')) x z u z ( ', ) : randam w z z (0 mean Gaussian correla ; t d e )

  23. Stati tisti tical Ne Neurodynam amics microdynamics ( ) ( ) ( ) ( ) = + = sgn W x t x x t 1 T x x t w macrodynamics + = X F X ( ) : macrostate t 1 t = = X X ( x ) X T ( x ) 2 2 W 1 = = X X ( x ) X T T ( x )? 3 3 W W 1

  24. Statistical Neurodynamics Rozonoer (1969 ) Amari (1969, 1971; 1973) Sompolinski Amari et al (2013) ~ (0, 1) w N ij Toyoizumi et al (2015) Poole, …, Ganguli (2016) Schoenholz et al (2017) Yang & Schoenholtz (2017), Macroscopic behaviors Karakida, et al (2019) common to almost all (typical) networks Jacot et al. (2019) ……

  25. Random Deep Networks + 1 l l l ∑ = ϕ + ( ) x w x w ij j 0 i i 1 σ ∑ 2 ~ (0, / ) 2 w N n = A x ij l i n l l σ 2 l ~ (0, ) w N = 0 i b ( ) A F A + 1 l l

  26. Macroscopic variables 1 ∑ = 2 activity : A x i n distance: = [ : '] D D x x metric,curvature & Fisher information = ( ) A F A + 1 l l = ( ) D K D + 1 l l

  27. Dynamics of Activity: law of large numbers ∑ = ϕ + = ϕ = φ +   ( ) ( ) : ( ) x w x b u x Wx b i ik k i i ~ (0, ) u N A i 1 ∑  = = ϕ = χ  2 2 ( ) [ ( ) ] ( ) A x E u A 0 i i n + 1 l ∫ χ = ϕ 2 ( ) ( ) ~ (0,1) A Av Dv v N 0

  28. χ > '(0) 1 0 = χ ( ) A A 0 ∑ → 2 converge x i

  29. Pullback Metric & Curvature = φ  ( ) x Wx 1 ∑ l = = ⋅ 2 i j ds g dx dx d d x x ij n l

  30. = φ  ( ) Basis vec tors x Wx ( ) ∑ ∑ ′ = ϕ =  i i dx u W dx B dx l l i i i i i i − − − − 1 1 1 1 l l l l l l − − 1 1 l l l l m m = =  ここに数式を入力します。 ( .. . ) d Bd B Bd x x x ( ) ′ = ϕ i i : Jacob ia n B u W l l i i i − − 1 1 l l l − − 1 1 l l l l m m = = . . . B B B e e e a a a

  31. 1 = ⋅ e e g ab a b n l

  32. Dynamics of Metric ∑ =  a dx B dx a k k =  e B e a a = = ϕ a a ( ) ( '( ) ) B B u w k a k ∑ =  a b g B B g ab k j kj ϕ = ϕ 2 2 a a a a E[ '( )) ] E[ '( )) ]E[ ] u w w u w w a k j a k j − − mean field approximation ∫ χ = ϕ 2 ( ) '( ) A Av Dv 1

  33. Mectric Law of large numbers − 1 l l l l = = , g BB g e e a b ab ab l l l ∑ = 2 ds g d x d x a b ab ( ) ∑ 2 ′  ′  = ϕ ≈ σ ϕ δ 2 2 i i BB w w u E   l l ′ ′ i i i l i i − − − − 1 1 1 1 l l l l l i l ( )   2 ′ χ = σ ϕ 2 E u     1 l i l

  34. = ∏ l χ ( ) ( ( )) ( ) g x x g x 1 ij ij conformal geometry

  35. = χ  rotation, expansion ( ) ( ) ( ) g x A g x 1 ij ij conformal transformation! χ = χ δ ( ) A 1 1 ij l ⇒ = χ δ l g 1 ij ij

  36. Domino Theorem − − 1 2 l l l ∂ ∂ ∂ x x x = = =  B BB BBB m m m ∂ ∂ ∂ x x x − − 1 2 l l l ∂ ∂ ∂ x x x = = =  B BB BB B m m m ∂ ∂ ∂ W W W ′ Σ δ = χ δ i i B B L L ′ ′ ′ 1 i i i i i i − − − − 1 1 1 1 L L L L L L ′ Σ δ = χ χ χ χ δ i i B B BBBB L L ′ ′ ′ 1 1 1 1 i i i i i i − − − − L L L 1 L 1 L 1 L 1

  37. Dynamics of Curvature  = ∇ = ∂ ∂   i i H e x ab a b a b = ϕ ⋅ ⋅ + ϕ ⋅∂ ''( )( )( ) '( ) u w e w e w e i a b a b  ⊥ = +  H H H ab ab ab   = 2 2 | | H H ab ab

  38. ∫ χ = ϕ 2 ( ) ''( ) A Av Dv 2 2 2 + 1 1 l l l l = χ δ + + χ ( )( 2 1) ( ) H A A H ab ab χ 2 1 ab 2 n 1 χ > 1 1 exponential expansion! creation is smal l !

  39. Poole et al (2016) Deep neural networks

  40. Distance 1 [ ] ∑ 2 = − , D x y x y i i n

  41. Dynamics of Distance (Amari, 1974) 1 ∑ = − 2 ( , ') ( ') D x x x x i i n 1 ∑ = ⋅ = ( , ') ' ' C x x x x x x i i n = + − ' 2 D A A C ∑ = i ~N(0, V) u w y i k k ∑ ( ) = i ' ' V= u w y A C i k k  = ϕ − ε + ν ϕ − ε + ν ( ') E[ ( ) ( ' )] C A C A C C A C C

  42. + = ( ) D K D 1 l l  dD = χ > 1 1 dD

Recommend


More recommend