Neural Networks. Petr Po s k petr.posik@fel.cvut.cz Czech - PowerPoint PPT Presentation

CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Neural Networks. Petr Poˇ s´ ık petr.posik@fel.cvut.cz Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics P. Poˇ s´ ık c � 2020 Artificial Intelligence – 1 / 34 petr.posik@fel.cvut.cz

Introduction and Rehearsal P. Poˇ s´ ık c � 2020 Artificial Intelligence – 2 / 34 petr.posik@fel.cvut.cz

Notation In supervised learning , we work with ■ an observation described by a vector x = ( x 1 , . . . , x D ) , Intro ■ the corresponding true value of the dependent variable y , and • Notation y = f w ( x ) , where the model parameters are in vector w . ■ the prediction of a model � • Multiple regression • Logistic regression • Question • Gradient descent • Ex: Grad. for MR • Ex: Grad. for LR • Relations to NN Multilayer FFN Gradient Descent Regularization Other NNs Summary P. Poˇ s´ ık c � 2020 Artificial Intelligence – 3 / 34 petr.posik@fel.cvut.cz

Notation In supervised learning , we work with ■ an observation described by a vector x = ( x 1 , . . . , x D ) , Intro ■ the corresponding true value of the dependent variable y , and • Notation y = f w ( x ) , where the model parameters are in vector w . ■ the prediction of a model � • Multiple regression • Logistic regression ■ Very often, we use homogeneous coordinates and matrix notation, and represent the • Question whole training data set as T = ( X , y ) , where • Gradient descent • Ex: Grad. for MR     • Ex: Grad. for LR x ( 1 ) y ( 1 ) 1 • Relations to NN     . . .     X = y = . .  , and .  . Multilayer FFN   . . . x ( | T | ) y ( | T | ) Gradient Descent 1 Regularization Other NNs Summary P. Poˇ s´ ık c � 2020 Artificial Intelligence – 3 / 34 petr.posik@fel.cvut.cz

Notation In supervised learning , we work with ■ an observation described by a vector x = ( x 1 , . . . , x D ) , Intro ■ the corresponding true value of the dependent variable y , and • Notation y = f w ( x ) , where the model parameters are in vector w . ■ the prediction of a model � • Multiple regression • Logistic regression ■ Very often, we use homogeneous coordinates and matrix notation, and represent the • Question whole training data set as T = ( X , y ) , where • Gradient descent • Ex: Grad. for MR     • Ex: Grad. for LR x ( 1 ) y ( 1 ) 1 • Relations to NN     . . .     X = y = . .  , and .  . Multilayer FFN   . . . x ( | T | ) y ( | T | ) Gradient Descent 1 Regularization Other NNs Learning then amounts to finding such model parameters w ∗ which minimize certain loss Summary (or energy) function: w ∗ = arg min J ( w , T ) w P. Poˇ s´ ık c � 2020 Artificial Intelligence – 3 / 34 petr.posik@fel.cvut.cz

Multiple linear regression Multiple linear regression model: y = f w ( x ) = w 1 x 1 + w 2 x 2 + . . . + w D x D = xw T � Intro • Notation The minimum of • Multiple regression • Logistic regression � y ( i ) � 2 | T | • Question 1 y ( i ) − � ∑ J MSE ( w ) = • Gradient descent , | T | • Ex: Grad. for MR i = 1 • Ex: Grad. for LR • Relations to NN is given by Multilayer FFN w ∗ = ( X T X ) − 1 X T y , Gradient Descent Regularization or found by numerical optimization. Other NNs Summary P. Poˇ s´ ık c � 2020 Artificial Intelligence – 4 / 34 petr.posik@fel.cvut.cz

Multiple linear regression Multiple linear regression model: y = f w ( x ) = w 1 x 1 + w 2 x 2 + . . . + w D x D = xw T � Intro • Notation The minimum of • Multiple regression • Logistic regression � y ( i ) � 2 | T | • Question 1 y ( i ) − � ∑ J MSE ( w ) = • Gradient descent , | T | • Ex: Grad. for MR i = 1 • Ex: Grad. for LR • Relations to NN is given by Multilayer FFN w ∗ = ( X T X ) − 1 X T y , Gradient Descent Regularization or found by numerical optimization. Other NNs Summary Multiple regression as a linear neuron : x 1 w d 3 x 2 � y 3 x 3 3 P. Poˇ s´ ık c � 2020 Artificial Intelligence – 4 / 34 petr.posik@fel.cvut.cz

Logistic regression Logistic regression model: y = f ( w , x ) = g ( xw T ) , � Intro • Notation where • Multiple regression • Logistic regression 1 • Question g ( z ) = • Gradient descent 1 + e − z • Ex: Grad. for MR • Ex: Grad. for LR is the sigmoid (a.k.a logistic ) function. • Relations to NN ■ No explicit equation for the optimal weights. Multilayer FFN ■ The only option is to find the optimum numerically, usually by some form of gradient Gradient Descent descent. Regularization Other NNs Summary P. Poˇ s´ ık c � 2020 Artificial Intelligence – 5 / 34 petr.posik@fel.cvut.cz

Logistic regression Logistic regression model: y = f ( w , x ) = g ( xw T ) , � Intro • Notation where • Multiple regression • Logistic regression 1 • Question g ( z ) = • Gradient descent 1 + e − z • Ex: Grad. for MR • Ex: Grad. for LR is the sigmoid (a.k.a logistic ) function. • Relations to NN ■ No explicit equation for the optimal weights. Multilayer FFN ■ The only option is to find the optimum numerically, usually by some form of gradient Gradient Descent descent. Regularization Other NNs Logistic regression as a non-linear neuron : Summary x 1 w d 3 g ( xw T ) x 2 y ˆ 3 x 3 3 P. Poˇ s´ ık c � 2020 Artificial Intelligence – 5 / 34 petr.posik@fel.cvut.cz

Question Logistic regression uses sigmoid function to transform the result of the linear combination of inputs: Intro 1 • Notation g ( z ) = 1 + e − z • Multiple regression • Logistic regression • Question • Gradient descent • Ex: Grad. for MR 1 • Ex: Grad. for LR • Relations to NN 0.5 Multilayer FFN 0 Gradient Descent -3 -2 -1 0 1 2 3 Regularization Other NNs What is the value of the derivative of g ( z ) at z = 0? Summary A g ′ ( 0 ) = − 1 2 g ′ ( 0 ) = 0 B C g ′ ( 0 ) = 1 4 g ′ ( 0 ) = 1 D 2 P. Poˇ s´ ık c � 2020 Artificial Intelligence – 6 / 34 petr.posik@fel.cvut.cz

Gradient descent algorithm ■ Given a function J ( w ) that should be minimized, ■ start with a guess of w , and change it so that J ( w ) decreases, i.e. Intro ■ update our current guess of w by taking a step in the direction opposite to the • Notation gradient: • Multiple regression • Logistic regression w ← w − η ∇ J ( w ) , i.e. • Question • Gradient descent ∂ • Ex: Grad. for MR w d ← w d − η J ( w ) , ∂ w d • Ex: Grad. for LR • Relations to NN where all w d s are updated simultaneously and η is a learning rate (step size). Multilayer FFN ■ For cost functions given as the sum across the training examples Gradient Descent Regularization | T | Other NNs E ( w , x ( i ) , y ( i ) ) , ∑ J ( w ) = Summary i = 1 we can concentrate on a single training example because | T | ∂ ∂ E ( w , x ( i ) , y ( i ) ) , ∑ J ( w ) = ∂ w d ∂ w d i = 1 and we can drop the indices over training data set: E = E ( w , x , y ) . P. Poˇ s´ ık c � 2020 Artificial Intelligence – 7 / 34 petr.posik@fel.cvut.cz

Example: Gradient for multiple regression and squared loss x 1 w d 3 Intro x 2 • Notation y � 3 • Multiple regression • Logistic regression • Question x 3 • Gradient descent 3 • Ex: Grad. for MR • Ex: Grad. for LR Assuming the squared error loss • Relations to NN Multilayer FFN E ( w , x , y ) = 1 y ) 2 = 1 2 ( y − xw T ) 2 , 2 ( y − � Gradient Descent Regularization we can compute the derivatives using the chain rule as Other NNs Summary ∂ � ∂ E = ∂ E y , where ∂ w d ∂ � y ∂ w d ∂ E y = ∂ 1 y ) 2 = − ( y − � 2 ( y − � y ) , and ∂ � ∂ � y ∂ � y ∂ xw T = x d , = ∂ w d ∂ w d and thus ∂ � ∂ E = ∂ E y = − ( y − � y ) x d . ∂ w d ∂ � y ∂ w d P. Poˇ s´ ık c � 2020 Artificial Intelligence – 8 / 34 petr.posik@fel.cvut.cz

Example: Gradient for logistic regression and crossentropy loss Nonlinear activation function: x 1 w d 3 1 g ( a ) g ( a ) = a Intro 1 + e − a x 2 y � • Notation 3 • Multiple regression Note that • Logistic regression • Question g ′ ( a ) = g ( a ) ( 1 − g ( a )) . x 3 • Gradient descent 3 • Ex: Grad. for MR • Ex: Grad. for LR • Relations to NN Multilayer FFN Gradient Descent Regularization Other NNs Summary P. Poˇ s´ ık c � 2020 Artificial Intelligence – 9 / 34 petr.posik@fel.cvut.cz

Neural Networks. Petr Po s k petr.posik@fel.cvut.cz Czech - PowerPoint PPT Presentation

CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Neural Networks. Petr Po s k petr.posik@fel.cvut.cz Czech Technical University in Prague Faculty of Electrical Engineering Dept. of

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

s , ABM PDFs, and quark masses S.Alekhin ( Univ. of Hamburg & IHEP Protvino) sa, Blmlein,

Remez inequality and propagation of smallness for solutions of second order elliptic PDEs Part

Spectral methods for Quantum walks, ( aka Discrete time unitary evolutions) F. Alberto Grnbaum

Neural Network based NLP: Its Progresses and Challenges Dr. Ming Zhou Microsoft Research Asia

Violation Target Driven Design Reduction for ECO Timing Closure Presenter: Qiuyang Wu Authors:

Learning cubing heuristics for SAT from DRAT proofs Jesse Michael Han AITP 2020 University of

GAGTA-6 Conference On hyperbolicity of the free splitting and free factor complexes Ilya

Introduction to hardware design using VHDL Tim Gneysu and Nele Mentens ECC school November

Neural Networks. Petr Po s k petr.posik@fel.cvut.cz Czech - PowerPoint PPT Presentation

CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Neural Networks. Petr Po s k petr.posik@fel.cvut.cz Czech Technical University in Prague Faculty of Electrical Engineering Dept. of

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

s , ABM PDFs, and quark masses S.Alekhin ( Univ. of Hamburg &amp; IHEP Protvino) sa, Blmlein,

Remez inequality and propagation of smallness for solutions of second order elliptic PDEs Part

Spectral methods for Quantum walks, ( aka Discrete time unitary evolutions) F. Alberto Grnbaum

Neural Network based NLP: Its Progresses and Challenges Dr. Ming Zhou Microsoft Research Asia

Violation Target Driven Design Reduction for ECO Timing Closure Presenter: Qiuyang Wu Authors:

Learning cubing heuristics for SAT from DRAT proofs Jesse Michael Han AITP 2020 University of

GAGTA-6 Conference On hyperbolicity of the free splitting and free factor complexes Ilya

Introduction to hardware design using VHDL Tim Gneysu and Nele Mentens ECC school November

s , ABM PDFs, and quark masses S.Alekhin ( Univ. of Hamburg & IHEP Protvino) sa, Blmlein,