Statistical Neurodynamics of Deep Networks Shun ‐ ichi Amari RIKEN Brain Science Institute
Statistical Neurodynamics Rozonoer (1969 ) Amari (1971; 197 Amari et al (2013) Toyoizumi et al (2015) Poole, …, Ganguli (2016) ~ (0, 1) w N ij Macroscopic behaviors common to almost all (typical) networks
Macroscopic variables 1 2 activity: A x i n distance: = [ : '] D D x x curvature: ( ) A F A 1 l l ( ) D K D 1 l l
Deep Networks ( ) x w x w 0 i ij i i 1 l l 1 ~ (0, 1/ ) w N n 2 A x ij i n l 2 l 0, 1 w l 0 i ( ) A F A '(0) =const 1 l l
Pullback Metric 1 l 2 a b x x ds g g dx dx d d ab n l
1 e e g ab a b n l
n n 1 l l
Poole et al (2016) Deep neural networks
Dynamics of Activity ( ) ( ) y w y u k k ~ (0, ) u N A 1 2 2 ( ) [ ( ) ] ( ) A y E u A 0 n 1 l 2 ( ) ( ) ~ (0,1) A Av Dv v N 0
(0) (0) 1 ( ) A A 0 converge x i
Dynamics of Metric dy B dy k k e B e a a ( ) ( '( ) ) B B u w k k g B B g ab k j kj 2 2 E[ '( )) ] E[ '( )) ]E[ ] u w w u w w k j k j mean field approximation 2 ( ) '( ) A Av Dv 1
rotation, expansion ( ) g A g 1 ab ab conformal transformation! ( ) A 1 1 ab l l g 1 ab ab
Dynamics of Curvature H e y ab a b a b ''( )( )( ) '( ) u w e w e w e a b a b H H H ab ab ab 2 2 | | H H ab ab
2 ( ) ''( ) A Av Dv 2 2 2 1 l l l l l ( ) ( )(2 1) ( ) H A A A H ab ab 2 1 1 ab 1 1 2 l l ( )(2 1) H l A ab 2 1 ab exponwntial expansion!
Dynamics of Distance (Amari, 1974) 1 2 ( , ') ( ') D x x x x i i n 1 ( , ') ' ' C x x x x x x i i n ' 2 D A A C ~N(0, V) u w y k k ' ' V= u w y A C k k ( ') E[ ( ) ( ' )] C A C A C C A C C
( ) D K D 1 l l dD 1 1 dD
Poole et al (2016) Deep neural networks
Problem! ( , ) x x D D l l ( ) D K D equidistance property
Shuttering Multiplicity Dynamics of recurrent net Dropout and backprop
Multilayer Perceptrons v w x w x y i i 1 x ( , ,..., ) x x x x y 1 2 n v x w x , f i i ( ,..., ; ,..., ) w w v v 1 1 m m
Multilayer Perceptron ( ) x neuromanifold space of functions S x θ , y f w x v i i θ w w , ; , v v 1 1, m m
singularities
Geometry of singular model w x y v n v | | 0 w v W
Natural Gradient Stochastic Descent 1 x , , G y t t t t : Fisher Information Matrix G l l invarint; steepest descent
model: 2 hidden neurons x J x J x , f w w 1 1 2 2 x , y f 2 t 1 u 2 u e dt 2
Singular Region in Parameter Space J J J J , , R w w w w 1 2 1 2 J J 0, , w w w 1 2 2 J J , 0, w w w 1 2 1 x J x J x , f w w 1 1 2 2
Coordinate transformation J J w w v 1 1 2 2 , w w 1 2 , w w w 1 2 u J J , 2 1 w w 2 1 z w w 1 2 v u , , , w z
Singular Region J u , 0 1 R w z
Milnor attractor
Topology of singular R : = e blow-down coordinates , , 2 2 u 1 , c z u u 1 2 3 1 , c z z u 2 u e e , 1 S n u
Dynamic vector fields: Redundant case
Recommend
More recommend