Learning From Data Lecture 21 Neural Networks: Backpropagation - PowerPoint PPT Presentation

Learning From Data Lecture 21 Neural Networks: Backpropagation Forward propagation: algorithmic computation h ( x ) ∂ e ( x ) Backpropagation: algorithmic computation of ∂ weights M. Magdon-Ismail CSCI 4100/6100

recap: The Neural Network Biology − − − − − − − − − − − → Engineering − − − → 1 1 1 x 1 h ( x ) θ θ θ x 2 θ θ θ ( s ) . . . s θ x d input layer ℓ = 0 hidden layers 0 < ℓ < L output layer ℓ = L M Neural Networks: Backpropagation : 2 /14 � A c L Creator: Malik Magdon-Ismail Zooming into a hidden node − →

Zooming into a Hidden Node 1 1 1 x 1 h ( x ) θ θ θ x 2 θ θ θ ( s ) . . . s θ x d input layer ℓ = 0 hidden layers 0 < ℓ < L output layer ℓ = L layer ( ℓ + 1) θ W ( ℓ +1) s ( ℓ ) x ( ℓ ) W ( ℓ ) + θ layer ( ℓ − 1) layer ℓ layers ℓ = 0 , 1 , 2 , . . . , L layer ℓ parameters layer ℓ has “dimension” d ( ℓ ) = ⇒ d ( ℓ ) + 1 nodes   w ( ℓ ) w ( ℓ ) w ( ℓ ) d ( ℓ ) dimensional input vector · · · s ( ℓ ) signals in 1 2 d ( ℓ ) W ( ℓ ) = .   d ( ℓ ) + 1 dimensional output vector . x ( ℓ ) outputs  .    ( d ( ℓ − 1) + 1) × d ( ℓ ) dimensional matrix W ( ℓ ) weights in W ( ℓ +1) ( d ( ℓ ) + 1) × d ( ℓ +1) dimensional matrix weights out W = { W (1) , W (2) , . . . , W ( L ) } ← specifies the network M Neural Networks: Backpropagation : 3 /14 � A c L Creator: Malik Magdon-Ismail Linear Signal − →

The Linear Signal Input s ( ℓ ) is a linear combination (using weights) of the 1 1 1 outputs of the previous layer x ( ℓ − 1) . x 1 h ( x ) θ θ θ x 2 θ θ θ ( s ) . s ( ℓ ) = (W ( ℓ ) ) t x ( ℓ − 1) . . s θ x d input layer ℓ = 0 hidden layers 0 < ℓ < L output layer ℓ = L s ( ℓ ) ( w ( ℓ )     1 ) t 1 s ( ℓ ) ( w ( ℓ )    2 ) t  2     . .     . . . .     s ( ℓ ) = ( w ( ℓ ) x ( ℓ − 1) j ) t x ( ℓ − 1) =     j s ( ℓ ) ( w ( ℓ )     j ) t  j    . .     . . . .         (recall the linear signal s = w t x ) s ( ℓ ) ( w ( ℓ ) d ( ℓ ) ) t d ( ℓ ) θ s ( ℓ ) → x ( ℓ ) − − − − − − M Neural Networks: Backpropagation : 4 /14 � A c L Creator: Malik Magdon-Ismail Forward propagation − →

Forward Propagation: Computing h ( x ) → x (2) · · · → x ( L ) = h ( x ) . W(1) W(2) W( L ) θ θ θ x = x (0) → s (1) → x (1) → s (2) → s ( L ) − − − − − − Forward propagation to compute h ( x ) : 1: x (0) ← x [Initialization] 2: for ℓ = 1 to L do [Forward Propagation] s ( ℓ ) ← (W ( ℓ ) ) t x ( ℓ − 1) 3: � � 1 x ( ℓ ) ← 4: θ ( s ( ℓ ) ) 5: end for 6: h ( x ) = x ( L ) [Output] M Neural Networks: Backpropagation : 5 /14 � A c L Creator: Malik Magdon-Ismail Minimizing E in − →

Minimizing E in N E in ( h ) = E in (W) = 1 � ( h ( x n ) − y n ) 2 W = { W (1) , W (2) , . . . , W ( L ) } N n =1 sign tanh linear tanh E in sign w Using θ = tanh makes E in differentiable so we can use gradient descent − → local minimum. M Neural Networks: Backpropagation : 6 /14 � A c L Creator: Malik Magdon-Ismail Gradient Descent − →

Gradient Descent W( t + 1) = W( t ) − η ∇ E in (W( t )) M Neural Networks: Backpropagation : 7 /14 � A c L Creator: Malik Magdon-Ismail Gradient of E in − →

Gradient of E in e n ւ N E in ( w ) = 1 � e ( h ( x n ) , y n ) N n =1 N ∂E in ( w ) = 1 ∂ e n � ∂ W ( ℓ ) ∂ W ( ℓ ) N n =1 We need ∂ e ( x ) ∂ W ( ℓ ) M Neural Networks: Backpropagation : 8 /14 � A c L Creator: Malik Magdon-Ismail Numerical approach − →

Numerical Approach ≈ e ( x | W ( ℓ ) ij + ∆) − e ( x | W ( ℓ ) ij − ∆) ∂ e ( x ) ∂ W ( ℓ ) 2∆ ij approximate inefficient M Neural Networks: Backpropagation : 9 /14 � A c L Creator: Malik Magdon-Ismail Algorithmic approach − →

Algorithmic Approach e ( x ) is a function of s ( ℓ ) and s ( ℓ ) = (W ( ℓ ) ) t x ( ℓ − 1) � ∂ e ∂ W ( ℓ ) = ∂ s ( ℓ ) � t ∂ e ∂ W ( ℓ ) · (chain rule) ∂ s ( ℓ ) = x ( ℓ − 1) ( δ ( ℓ ) ) t sensitivity δ ( ℓ ) = ∂ e ∂ s ( ℓ ) δ ( ℓ ) and the chain rule − M Neural Networks: Backpropagation : 10 /14 � A c L Creator: Malik Magdon-Ismail →

Computing δ ( ℓ ) Using the Chain Rule δ (1) ← − δ (2) · · · ← − δ ( L − 1) ← − δ ( L ) Multiple applications of the chain rule: → ∆ s ( ℓ +1) · · · − → ∆ x ( ℓ ) W( ℓ +1) θ ∆ s ( ℓ ) − − → ∆ e ( x ) 1 1 + don’t use 0 th component (bias) ↓ δ ( ℓ ) = θ ′ ( s ( ℓ ) ) ⊗ [W ( ℓ +1) δ ( ℓ +1) ] d ( ℓ ) δ ( ℓ ) δ ( ℓ +1) . . W ( ℓ +1) 1 . . ↑ . . componentwise multiplication + × θ ′ ( s ( ℓ ) ) layer ℓ layer ( ℓ + 1) M Neural Networks: Backpropagation : 11 /14 � A c L Creator: Malik Magdon-Ismail The Backpropagation algorithm − →

The Backpropagation Algorithm δ (1) ← − δ (2) · · · ← − δ ( L − 1) ← − δ ( L ) Backpropagation to compute sensitivities δ ( ℓ ) : (Assume s ( ℓ ) and x ( ℓ ) have been computed for all ℓ ) 1: δ ( L ) ← 2( x ( L ) − y ) · θ ′ ( s ( L ) ) [Initialization] 2: for ℓ = L − 1 to 1 do [Back-Propagation] Compute (for tanh hidden node): 3: 1 − x ( ℓ ) ⊗ x ( ℓ ) � d ( ℓ ) � θ ′ ( s ( ℓ ) ) = 1 � W ( ℓ +1) δ ( ℓ +1) � d ( ℓ ) δ ( ℓ ) ← θ ′ ( s ( ℓ ) ) ⊗ ← componentwise multiplication 4: 1 5: end for M Neural Networks: Backpropagation : 12 /14 � A c L Creator: Malik Magdon-Ismail Gradient Descent on E in − →

Algorithm for Gradient Descent on E in Algorithm to Compute E in ( w ) and g = ∇ E in ( w ) : Input: weights w = { W (1) , . . . , W ( L ) } ; data D . Output: error E in ( w ) and gradient g = { G (1) , . . . , G ( L ) } . 1: Initialize: E in = 0; for ℓ = 1 , . . . , L , G ( ℓ ) = 0 · W ( ℓ ) . 2: for Each data point x n ( n = 1 , . . . , N ) do Compute x ( ℓ ) for ℓ = 0 , . . . , L . [forward propagation] 3: Compute δ ( ℓ ) for ℓ = 1 , . . . , L . [backpropagation] 4: N ( x ( L ) E in ← E in + 1 − y n ) 2 . 5: 1 for ℓ = 1 , . . . , L do 6: G ( ℓ ) ( x n ) = [ x ( ℓ − 1) ( δ ( ℓ ) ) t ] 7: G ( ℓ ) ← G ( ℓ ) + 1 N G ( ℓ ) ( x n ). 8: end for 9: 10: end for Can do batch version or sequential version (SGD). M Neural Networks: Backpropagation : 13 /14 � A c L Creator: Malik Magdon-Ismail Digits Data − →

Digits Data 0 Gradient Descent -1 Symmetry log 10 (error) -2 -3 SGD -4 0 2 4 6 log 10 (iteration) Average Intensity M Neural Networks: Backpropagation : 14 /14 � A c L Creator: Malik Magdon-Ismail

Learning From Data Lecture 21 Neural Networks: Backpropagation - PowerPoint PPT Presentation

Learning From Data Lecture 21 Neural Networks: Backpropagation Forward propagation: algorithmic computation h ( x ) e ( x ) Backpropagation: algorithmic computation of weights M. Magdon-Ismail CSCI 4100/6100 recap: The Neural Network

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

(Very) Brief Introduction to Neural Networks IITP-03 Algorithms for NLP 1 / 31 Learning

CHAPTER VI VI CHAPTER Learning in Feedforward Feedforward Learning in Neural Networks Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks for Machine Learning Lecture 2a An overview of the main types of neural network

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Perceptron Lecturer: Barnabas Poczos Disclaimer : These notes have not been subjected to the usual

Gatekeeper: Supporting Bandwidth Guarantees for Multi-tenant Datacenter Networks Henrique

Modelling network performance with a spatial stochastic process algebra Vashti Galpin Laboratory

Sensitivity Analysis of Network Performance Models Intermediate talk for the Bachelors Thesis

CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8

Supervised Learning Artificial Neural Networks Marco Chiarandini Department of Mathematics &

GN3Plus SA3T3 - Multi Domain VPN - technical architecture 2nd TERENA Network Architects Workshop

Fuel Cell Electric Buses Transitioning to Zero Emissions Jaimie Levin Renewable Power to

Learning From Data Lecture 21 Neural Networks: Backpropagation - PowerPoint PPT Presentation

Learning From Data Lecture 21 Neural Networks: Backpropagation Forward propagation: algorithmic computation h ( x ) e ( x ) Backpropagation: algorithmic computation of weights M. Magdon-Ismail CSCI 4100/6100 recap: The Neural Network

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

(Very) Brief Introduction to Neural Networks IITP-03 Algorithms for NLP 1 / 31 Learning

CHAPTER VI VI CHAPTER Learning in Feedforward Feedforward Learning in Neural Networks Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks for Machine Learning Lecture 2a An overview of the main types of neural network

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Perceptron Lecturer: Barnabas Poczos Disclaimer : These notes have not been subjected to the usual

Gatekeeper: Supporting Bandwidth Guarantees for Multi-tenant Datacenter Networks Henrique

Modelling network performance with a spatial stochastic process algebra Vashti Galpin Laboratory

Sensitivity Analysis of Network Performance Models Intermediate talk for the Bachelors Thesis

CS885 Reinforcement Learning Lecture 4a: May 11, 2018 Deep Neural Networks [GBC] Chap. 6, 7, 8

Supervised Learning Artificial Neural Networks Marco Chiarandini Department of Mathematics &amp;

GN3Plus SA3T3 - Multi Domain VPN - technical architecture 2nd TERENA Network Architects Workshop

Fuel Cell Electric Buses Transitioning to Zero Emissions Jaimie Levin Renewable Power to

Supervised Learning Artificial Neural Networks Marco Chiarandini Department of Mathematics &