Deep learning Introduction to neural networks Hamid Beigy Sharif - PowerPoint PPT Presentation

Deep learning Deep learning Introduction to neural networks Hamid Beigy Sharif university of technology September 30, 2019 Hamid Beigy | Sharif university of technology | September 30, 2019 1 / 1

Deep learning Table of contents Hamid Beigy | Sharif university of technology | September 30, 2019 2 / 1

Deep learning | Brain Table of contents Hamid Beigy | Sharif university of technology | September 30, 2019 2 / 1

Deep learning | Brain Brain Hamid Beigy | Sharif university of technology | September 30, 2019 3 / 1

Deep learning | Brain Functions of different parts of Brain 2 1 3 4 5 12 6 11 7 8 10 9 Hamid Beigy | Sharif university of technology | September 30, 2019 4 / 1

Deep learning | Brain Brain network Hamid Beigy | Sharif university of technology | September 30, 2019 5 / 1

Deep learning | Brain Neuron Axonal arborization Axon from another cell Synapse Dendrite Axon Nucleus Synapses Cell body or Soma Hamid Beigy | Sharif university of technology | September 30, 2019 6 / 1

Deep learning | History of neural networks Table of contents Hamid Beigy | Sharif university of technology | September 30, 2019 6 / 1

Deep learning | History of neural networks McCulloch and Pitts network (1943) 1 The first model of a neuron was invented by McCulloch (physiologists) and Pitts (logician). 2 Inputs are binary. 3 This neuron has two types of inputs: Excitatory inputs (shown by a ) and Inhibitory inputs(shown by b ). 4 The output is binary: fires (1) and not fires (0). 5 Until the inputs summed up to a certain threshold level, the output would remain zero. Hamid Beigy | Sharif university of technology | September 30, 2019 7 / 1

Deep learning | History of neural networks McCulloch and Pitts network (logic functions) a 1 2 AND a 1 . . . a 2 a n a 1 c t+1 θ 1 OR b 1 . a 2 . . b m b 0 NOT 1 Hamid Beigy | Sharif university of technology | September 30, 2019 8 / 1

Deep learning | History of neural networks Perceptron (Frank Rosenblat (1958)) 1 Problems with McCulloch and Pitts -neurons Weights and thresholds are analytically determined (cannot learn them). Very difficult to minimize size of a network. What about non-discrete and/or non-binary tasks? 2 Perceptron solution. Weights and thresholds can be determined analytically or by a learning algorithm. Continuous, bipolar and multiple-valued versions. Rosenblatt randomly connected the perceptrons and changed the weights in order to achieve learning. Efficient minimization heuristics exist. Hamid Beigy | Sharif university of technology | September 30, 2019 9 / 1

Deep learning | History of neural networks Perceptron (Frank Rosenblat (1958)) Simplified mathematical model • Number of inputs combine linearly 1 Let y be the correct output, and f ( x ) the output function of the – Threshold logic: Fire if combined input exceeds network. Perceptron updates weights (Rosenblatt 1960) threshold w ( t ) ← w ( t ) + α x j ( y − f ( x )) j j 2 McCulloch and Pitts neuron is a better model for the electrochemical process inside the neuron than the Perceptron. 3 But Perceptron is the basis and building block for the modern neural networks. 70 Hamid Beigy | Sharif university of technology | September 30, 2019 10 / 1

Deep learning | History of neural networks Adaline (Bernard Widrow and Ted Hoff (1960) ) 1 The model is same as perceptron, but uses different learning algorithm 2 A multilayer network of Adaline units is known as a MAdaline. Hamid Beigy | Sharif university of technology | September 30, 2019 11 / 1

Deep learning | History of neural networks Adaline learning (Bernard Widrow and Ted Hoff (1960)) 1 Let y be the correct output, and f ( x ) = ∑ n j =0 w j x j . Adaline updates weights w ( t +1) ← w ( t ) + α x j ( y − f ( x )) j j 2 The Adaline converges to the least squares error which is ( y − f ( x )) 2 . This update rule is in fact the stochastic gradient descent update for linear regression. 3 In the 1960’s, there were many articles promising robots that could think. 4 It seems there was a general belief that perceptron could solve any problem. Hamid Beigy | Sharif university of technology | September 30, 2019 12 / 1

Deep learning | History of neural networks Minsky and Papert (1968) Perceptron 1 Minsky and Papert published their book Perceptrons. The book No solution for XOR! shows that perceptrons could only solve linearly separable problems. Not universal! 2 They showed that it is not possible for perceptron to learn an XOR function. X ? ? ? Y • Minsky and Papert, 1968 3 After Perceptrons was published, researchers lost interest in perceptron and neural networks. 74 Hamid Beigy | Sharif university of technology | September 30, 2019 13 / 1

Deep learning | History of neural networks Multi-layer Perceptron (Minsky and Papert (1968)) Multi-layer Perceptron! X 1 1 1 -1 2 1 1 -1 -1 Y Hidden Layer • XOR – The first layer is a “hidden” layer The first layer is a hidden layer. – Also originally suggested by Minsky and Paper 1968 Hamid Beigy | Sharif university of technology | September 30, 2019 76 14 / 1

Deep learning | History of neural networks History 1 Optimization 1 In 1969, Bryson and Ho described proposed Backpropagation as a multi-stage dynamic system optimization method. 2 In 1972, Stephen Grossberg proposed networks capable of learning XOR function. 3 In 1974, Paul Werbos, David E. Rumelhart, Geoffrey E. Hinton and Ronald J. Williams reinvented Backpropagation and applied in the context of neural networks. Back propagation allowed perceptrons to be trained in a multilayer configuration. 2 In 1980s, the filed of artificial neural network research experienced a resurgence. 3 In 2000s, neural networks fell out of favor partly due to BP limitations. 4 In 2010, we are now able to train much larger networks using huge modern computing power such as GPUs. Hamid Beigy | Sharif university of technology | September 30, 2019 15 / 1

Deep learning | History of neural networks History Hamid Beigy | Sharif university of technology | September 30, 2019 16 / 1

Deep learning | Gradient based learning Table of contents Hamid Beigy | Sharif university of technology | September 30, 2019 16 / 1

Deep learning | Gradient based learning Cost function 1 The goal of machine learning algorithms is to construct a model (hypothesis) that can be used to estimate y based on x . 2 Let the model be in form of h ( x ) = w 0 + w 1 x 3 The goal of creating a model is to choose parameters so that h ( x ) is close to y for the training data, x and y . 4 We need a function that will minimize the parameters over our dataset. A function that is often used is mean squared error, m J ( w ) = 1 ∑ ( h ( x i ) − y i ) 2 2 m i =1 5 How do we find the minimum value of cost function? Hamid Beigy | Sharif university of technology | September 30, 2019 17 / 1

Deep learning | Gradient based learning Gradient descent 1 Gradient descent is by far the most popular optimization strategy, used in machine learning and deep learning at the moment. 2 Cost (error) is a function of the weights (parameters). 3 We want to reduce/minimize the error. 4 Gradient descent: move towards the error minimum. 5 Compute gradient, which implies get direction to the error minimum. 6 Adjust weights towards direction of lower error. Hamid Beigy | Sharif university of technology | September 30, 2019 18 / 1

Deep learning | Gradient based learning Gradient descent Hamid Beigy | Sharif university of technology | September 30, 2019 19 / 1

Deep learning | Gradient based learning Gradient descent (Linear Regression) 1 We have the following hypothesis and we need fit to the training data h ( x ) = w 0 + w 1 x 2 We use a cost function such Mean Squared Error m J ( w ) = 1 ∑ ( h ( x i ) − y i ) 2 2 m i =1 3 This cost function can be minimized using gradient descent. − α∂ J ( w ( t ) ) w ( t +1) = w ( t ) 0 0 ∂ w 0 − α∂ J ( w ( t ) ) w ( t +1) = w ( t ) 1 1 ∂ w 1 α is step (learning) rate. Hamid Beigy | Sharif university of technology | September 30, 2019 20 / 1

Deep learning | Gradient based learning Gradient descent (effect of learning rate) Hamid Beigy | Sharif university of technology | September 30, 2019 21 / 1

Deep learning | Gradient based learning Gradient descent (landscape of cost function) 40 1 . 7 z z 1 . 6 20 − 5 0 5 0 0 2 4 0 2 x 0 − 4 − 2 − 2 − 4 y − 5 5 y x Hamid Beigy | Sharif university of technology | September 30, 2019 22 / 1

Deep learning | Gradient based learning Challenges with gradient descent 1 Local minimim: A local minimum is a minimum within some neighborhood that need not be (but may be) a global minimum. 2 Saddle points: For non-convex functions, having the gradient to be 0 is not good enough. Example: f ( x ) = x 2 1 − x 2 2 at x = (0 , 0) has zero gradient but it is clearly not a local minimum as x = (0 , ϵ ) has smaller function value. The point (0 , 0) is called a saddle point of this function. Hamid Beigy | Sharif university of technology | September 30, 2019 23 / 1

Deep learning | Gradient based learning Challenges with gradient descent Hamid Beigy | Sharif university of technology | September 30, 2019 24 / 1

Deep learning Introduction to neural networks Hamid Beigy Sharif - PowerPoint PPT Presentation

Deep learning Deep learning Introduction to neural networks Hamid Beigy Sharif university of technology September 30, 2019 Hamid Beigy | Sharif university of technology | September 30, 2019 1 / 1 Deep learning Table of contents Hamid Beigy |

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Medical Imaging Elisa Sayrol Medical Imaging Interest in this area in Deep Learning: DeepDeep

Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of

Minjie Wang Deep Learning Deep Learning trend in the past 10 years Caffe State-of-art DL

Parametric Equations We sometimes have several equations sharing an independent vari- able. In

Collision Detection CSE169: Computer Animation Instructor: Steve Rotenberg UCSD, Winter 2018

CS 403 - Path Planning Roderic A. Grupen 4/2/19 Robotics 1 4/2/19 Robotics 2 Why

Indexing Methods for Moving Object Databases: Games and Other Applications Jagan Sankaranarayanan

Neurons (nerve cells) Faculty of Science The stochastic Morris-Lecar neuron model embeds a

Modelling Membrane Potentials by Diffusion Leaky Integrate-and-Fire Models Patrick Jahn

.ed endocrino Reattivit SNA-S Neur- Prevertebrali Adrenalina ciliare SNA-P III

Oct. 25, 2019 We know - its hard. FIRSTRoboticsBC.org Thank you to the University of Victoria

Sambuz

Useful Links

Newsletter

Mail Us

Deep learning Introduction to neural networks Hamid Beigy Sharif - PowerPoint PPT Presentation

Deep learning Deep learning Introduction to neural networks Hamid Beigy Sharif university of technology September 30, 2019 Hamid Beigy | Sharif university of technology | September 30, 2019 1 / 1 Deep learning Table of contents Hamid Beigy |

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre &lt;

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Medical Imaging Elisa Sayrol Medical Imaging Interest in this area in Deep Learning: DeepDeep

Deep learning Optimization and Regularization in deep networks Hamid Beigy Sharif university of

Minjie Wang Deep Learning Deep Learning trend in the past 10 years Caffe State-of-art DL

Parametric Equations We sometimes have several equations sharing an independent vari- able. In

Collision Detection CSE169: Computer Animation Instructor: Steve Rotenberg UCSD, Winter 2018

CS 403 - Path Planning Roderic A. Grupen 4/2/19 Robotics 1 4/2/19 Robotics 2 Why

Indexing Methods for Moving Object Databases: Games and Other Applications Jagan Sankaranarayanan

Neurons (nerve cells) Faculty of Science The stochastic Morris-Lecar neuron model embeds a

Modelling Membrane Potentials by Diffusion Leaky Integrate-and-Fire Models Patrick Jahn

.ed endocrino Reattivit SNA-S Neur- Prevertebrali Adrenalina ciliare SNA-P III

Oct. 25, 2019 We know - its hard. FIRSTRoboticsBC.org Thank you to the University of Victoria

Sambuz

Useful Links

Newsletter

Mail Us

Deep learning for natural language processing A short primer on deep learning Benoit Favre <