about me ens mva fair engineer fair phd 3rd year about my
play

About me : ENS -> MVA -> FAIR (engineer) -> FAIR (PhD 3rd - PowerPoint PPT Presentation

Presentation About me : ENS -> MVA -> FAIR (engineer) -> FAIR (PhD 3rd year) About my PhD : Interested in sign matrices and tensors (graphs / multi-graphs) Observe a few entries, predict the remaining edges Factorization


  1. Learning Gradient Descent Back to minimizing Problem : How do we compute the gradient ? Large -> Stochastic Gradient Descent n 1 X ` ( f ( x i ; ✓ ) , y i ) r θ n i =1 Complicated function (neural net) -> BackProp

  2. Learning Gradient Descent Back to minimizing Problem : How do we compute the gradient ? Large -> Stochastic Gradient Descent n 1 X ` ( f ( x i ; ✓ ) , y i ) r θ n i =1

  3. Learning Stochastic Gradient Descent Killing n n 1 X ` ( f ( x i ; ✓ ) , y i ) r θ n i =1 One function n = 1 X r θ ` ( f ( x i ; ✓ ) , y i ) n i =1 ⇡ r θ ` ( f ( x j ; ✓ ) , y j )

  4. Learning Stochastic Gradient Descent Killing n n 1 X ` ( f ( x i ; ✓ ) , y i ) r θ The Gradient of the Average n i =1 n = 1 X r θ ` ( f ( x i ; ✓ ) , y i ) n i =1 ⇡ r θ ` ( f ( x j ; ✓ ) , y j )

  5. Learning Stochastic Gradient Descent Killing n n 1 X ` ( f ( x i ; ✓ ) , y i ) r θ The Gradient of the Average n i =1 n = 1 X r θ ` ( f ( x i ; ✓ ) , y i ) = The Average of the Gradients n i =1 ⇡ r θ ` ( f ( x j ; ✓ ) , y j )

  6. Learning Stochastic Gradient Descent Killing n n 1 X ` ( f ( x i ; ✓ ) , y i ) r θ The Gradient of the Average n i =1 n = 1 X r θ ` ( f ( x i ; ✓ ) , y i ) = The Average of the Gradients n i =1 ⇡ r θ ` ( f ( x j ; ✓ ) , y j ) In expectation, for uniform j

  7. Learning Stochastic Gradient Descent Killing n For some number of iterations : Gradient Step Pick some random example ( x j , y j ) ✓ n +1 ✓ n � ⌘ r θ ` ( f ( x j ; ✓ n ) , y j ) Learning-rate

  8. Learning Back Propagation Computing the gradient Problem : How do we compute the gradient ? n 1 X ` ( f ( x i ; ✓ ) , y i ) r θ n i =1 Complicated function (neural net) -> BackProp

  9. Learning BackProp Computing the gradient Problem : How do we compute the gradient ? f i ( x ) = σ ( A i x + b i ) Hidden Layer i

  10. Learning BackProp Computing the gradient Problem : How do we compute the gradient ? f i ( x ) = σ ( A i x + b i ) Hidden Layer i f = f h ( f h − 1 ( f h − 2 ( . . . ))) = ( f h � f h − 1 � . . . � f 1 )( x ) Complete Neural Network

  11. Learning BackProp Computing the gradient Problem : How do we compute the gradient ? f i ( x ) = σ ( A i x + b i ) Hidden Layer i f = f h ( f h − 1 ( f h − 2 ( . . . ))) = ( f h � f h − 1 � . . . � f 1 )( x ) Complete Neural Network r θ ` ( f ( x i ; ✓ ) , y i ) ??

  12. Learning BackProp Computing the gradient ∂ f ∂ x = ∂ f ∂ y Chain-rule : ∂ y ∂ x

  13. Learning BackProp Computing the gradient y h ( y h − 1 ) x y 1 ( x ) y h − 1 ( y h − 2 ) y 2 ( y 1 ) y h − 2 ( y h − 3 ) … ` ( y h ) θ h − 2 θ 2 θ 1 θ h − 1 θ h

  14. Learning BackProp Computing the gradient r θ h ` ( y h ) … ` ( y h ) θ h

  15. Learning BackProp r θ h ` ( y h ) Computing the gradient θ h

  16. Learning BackProp r θ h ` ( y h ) Computing the gradient @` ( y h ) = @` ( y h ) @ y h @✓ h,i @ y h @✓ h,i Chain-Rule θ h

  17. Learning BackProp r θ h ` ( y h ) Computing the gradient @` ( y h ) = @` ( y h ) @ y h @✓ h,i @ y h @✓ h,i Doesn’t depend on current layer Only depends on ` θ h

  18. Learning BackProp r θ h ` ( y h ) Computing the gradient @` ( y h ) = @` ( y h ) @ y h @✓ h,i @ y h @✓ h,i Only depends on current layer θ h

  19. Learning BackProp r θ h − 1 ` ( y h ) Computing the gradient = Φ h − 1 ( ✓ h − 1 , r y h − 1 ` ( y h )) θ h − 1

  20. Learning BackProp r θ h − 1 ` ( y h ) Computing the gradient = Φ h − 1 ( ✓ h − 1 , r y h − 1 ` ( y h )) Depends on current layer’s structure θ h − 1

  21. Learning BackProp r θ h − 1 ` ( y h ) Computing the gradient = Φ h − 1 ( ✓ h − 1 , r y h − 1 ` ( y h )) Known Depends on current layer’s structure θ h − 1

  22. Learning BackProp r θ h − 1 ` ( y h ) Computing the gradient = Φ h − 1 ( ✓ h − 1 , r y h − 1 ` ( y h )) Known Depends on Already computed current layer’s structure θ h − 1

  23. Learning BackProp r y h ( ` ( y h )) It’s Backwards !

  24. Learning BackProp r y h ( ` ( y h )) It’s Backwards ! r θ h ( ` ( y h ))

Recommend


More recommend