statistical neurodynamics of deep networks
play

Statistical Neurodynamics of Deep Networks Shun ichi Amari RIKEN - PowerPoint PPT Presentation

Statistical Neurodynamics of Deep Networks Shun ichi Amari RIKEN Brain Science Institute Statistical Neurodynamics Rozonoer (1969 Amari (1971; 197 Amari et al (2013) Toyoizumi et al (2015) Poole, , Ganguli (2016) ~ (0, 1) w N ij


  1. Statistical Neurodynamics of Deep Networks Shun ‐ ichi Amari RIKEN Brain Science Institute

  2. Statistical Neurodynamics Rozonoer (1969 ) Amari (1971; 197 Amari et al (2013) Toyoizumi et al (2015) Poole, …, Ganguli (2016) ~ (0, 1) w N ij Macroscopic behaviors common to almost all (typical) networks

  3. Macroscopic variables 1   2 activity: A x i n distance: = [ : '] D D x x curvature:  ( ) A F A  1 l l  ( ) D K D  1 l l

  4. Deep Networks     ( ) x w x w 0 i ij i i  1 l l 1  ~ (0, 1/ ) w N n 2  A x ij i n l    2 l 0, 1 w l 0 i  ( ) A F A     '(0) =const  1 l l

  5. Pullback Metric 1 l     2 a b x x ds g g dx dx d d ab n l

  6. 1   e e g ab a b n l

  7.   n n 1 l l

  8. Poole et al (2016) Deep neural networks

  9. Dynamics of Activity       ( ) ( ) y w y u    k k ~ (0, ) u N A  1         2 2 ( ) [ ( ) ] ( ) A y E u A   0 n  1 l     2 ( ) ( ) ~ (0,1) A Av Dv v N 0

  10.       (0) (0) 1   ( ) A A 0   converge x i

  11. Dynamics of Metric     dy B dy  k k   e B e a a      ( ) ( '( ) ) B B u w  k k      g B B g ab k j kj        2 2 E[ '( )) ] E[ '( )) ]E[ ] u w w u w w   k j k j   mean field approximation     2 ( ) '( ) A Av Dv 1

  12. rotation, expansion    ( ) g A g 1 ab ab conformal transformation!     ( ) A 1 1 ab l     l g 1 ab ab

  13. Dynamics of Curvature           H e y ab a b a b        ''( )( )( ) '( ) u w e w e w e  a b a b      H H H ab ab ab    2 2 | | H H ab ab

  14.     2 ( ) ''( ) A Av Dv 2 2 2  1 l l l l l        ( ) ( )(2 1) ( ) H A A A H ab ab 2 1 1 ab   1 1 2 l       l ( )(2 1) H l A ab 2 1 ab exponwntial expansion!

  15. Dynamics of Distance (Amari, 1974) 1    2 ( , ') ( ') D x x x x i i n 1     ( , ') ' ' C x x x x x x i i n    ' 2 D A A C   ~N(0, V) u w y   k k     ' ' V= u w y A C   k k             ( ') E[ ( ) ( ' )] C A C A C C A C C

  16.   ( ) D K D 1 l l  dD    1 1 dD

  17. Poole et al (2016) Deep neural networks

  18. Problem!  ( , ) x x D D l l  ( ) D K D equidistance property

  19. Shuttering Multiplicity Dynamics of recurrent net Dropout and backprop

  20. Multilayer Perceptrons       v   w x   w x y i i 1  x ( , ,..., ) x x x x y 1 2 n        v   x w x , f i i   ( ,..., ; ,..., ) w w v v 1 1 m m

  21. Multilayer Perceptron  ( ) x neuromanifold space of functions S    x θ , y f       w x v i i    θ  w  w , ; , v v 1 1, m m

  22. singularities

  23. Geometry of singular model       w x  y v n v | | 0 w v W

  24. Natural Gradient Stochastic Descent              1 x , , G y  t t t t             : Fisher Information Matrix G l l   invarint; steepest descent

  25. model: 2 hidden neurons              x J x J x , f w w 1 1 2 2       x , y f 2 t 1     u   2 u e dt   2

  26. Singular Region in Parameter Space           J J J J , , R w w w w 1 2 1 2        J J 0, , w w w 1 2 2        J J , 0, w w w 1 2 1              x J x J x , f w w 1 1 2 2

  27. Coordinate transformation  J J w w  v 1 1 2 2 ,  w w 1 2   , w w w 1 2   u J J , 2 1  w w  2 1 z  w w 1 2     v u , , , w z

  28. Singular Region           J u  , 0 1 R w z

  29. Milnor attractor

  30. Topology of singular R      : = e blow-down coordinates , ,       2 2 u 1 , c z u u 1      2 3 1 , c z z u 2 u    e e , 1 S n u

  31. Dynamic vector fields: Redundant case

Recommend


More recommend