Department of Computer Science University of Bristol COMSM0045 – Applied Deep Learning 2020/21 comsm0045-applied-deep-learning.github.io Lecture 01 BASICS OF ARTIFICIAL NEURAL NETWORKS Tilo Burghardt | tilo@cs.bris.ac.uk 35 Slides
Agenda for Lecture 1 • Neurons and their Structure • Single & Multi Layer Perceptron • Basics of Cost Functions • Gradient Descent and Delta Rule • Notation and Structure of Deep Feed-Forward Networks Lecture 1 | 2 Applied Deep Learning | University of Bristol
Biological Inspiration Lecture 1 | 3 Applied Deep Learning | University of Bristol
Golgi’s first Drawings of Neurons CAMILLO GOLGI Computation in biological neural networks is delivered based on the co-operation of individual computational components , namely neuron cells. image source: www.the-scientist.com Lecture 1 | 4 Applied Deep Learning | University of Bristol
Schematic Model of a Neuron myelin sheath axon axon terminals cell body nucleus dendrites main flow of information: feed-forward synapse Lecture 1 | 5 Applied Deep Learning | University of Bristol
Pavlov and Assistant Conditioning a Dog An environment can condition the behaviour of biological neural networks leading to the incorporation of new information. image source: www.psysci.co Lecture 1 | 6 Applied Deep Learning | University of Bristol
Neuro-Plasticity • plasticity refers to a system’s ability to adapt structure and/or behaviour to accommodate new information • the brain shows various forms of plasticity: - natural forms include synaptic plasticity (mainly chemical) , structural sprouting (growth) , rerouting (functional changes) , and neurogenesis (new neurons) temporal system evolution image source: Example of structural sprouting. www.cognifit.com Lecture 1 | 7 Applied Deep Learning | University of Bristol
Artificial Feed-forward Networks Lecture 1 | 8 Applied Deep Learning | University of Bristol
Rosenblatt’s (left) development of the Perceptron (1950s) image source: csis.pace.edu Lecture 1 | 9 Applied Deep Learning | University of Bristol
Simplification of a Neuron to a Computational Unit input x multiplication flow of information: feed-forward with weights w x 1 summation w 1 activation function input width w 2 x 2 ∑ sign y output w 3 x 3 ... y sign w x b i i ... i 1 if v 0 - b sign ( v ) def 1 otherwise bias Lecture 1 | 10 Applied Deep Learning | University of Bristol
Notational Details for the Perceptron unit function y=f (x) is shorthand for f (x;w) w 0 -1 bias s y w 1 x 1 ∑ g output w 2 CONVENTION: bias is x 2 incorporated ... in parameter vector activation function g summation ... various w [ w w ...] θ [ ...] different input parameters 0 1 0 1 variable names are used for parameters, most often we T T y sign ( w x ) g ( w x ) will use w parameters input unit output activation function NOTATION: a minor letter in non-italic NOTATION: NOTATION: f ( x ; w ) font refers to a semicolon italic font vector, a capital separates input refers to letter in non-italic input (left) from parameters scalars unit function font would refer to a parameters matrix or vector set (right) Lecture 1 | 11 Applied Deep Learning | University of Bristol
Geometrical Interpretation of the State Space T 0 w x The basic Perceptron defines a hyper plane . in x -state space that linearly separates x 2 two regions of that space (which corresponds to a two-class normal linear classification) w 0 /w 2 vector w 2 w 1 positive sign area negative sign area T w x 0 T w x 0 x 1 w 0 /w 1 T w x 0 hyper plane hyper plane defined by parameters w acts as decision boundary Lecture 1 | 12 Applied Deep Learning | University of Bristol
Basic Perceptron (Supervised) Learning Rule • Idea: whenever the system produces a misclassification with current weights, w adjust weights by towards a better performing weight vector: ground truth actual output * if * f ( x ) x f ( x ) f ( x ) w otherwise 0 update ... where is the learning rate. Lecture 1 | 13 Applied Deep Learning | University of Bristol
Training a Single-Layer Perceptron Compare Output and Ground Truth * f ( x ) f ( x ) ? Compute Output Adjust Weights * if * f ( x ) x f ( x ) f ( x ) T f (x) sign ( w x ) i w i otherwise 0 Consider Next (Training) Input Pair * x , f ( x ) Lecture 1 | 14 Applied Deep Learning | University of Bristol
Perceptron Learning Example: OR Perceptron Training Attempt of OR using * w ( f ( x ) f ( x )) x ; 0 . 5 OR x 0 x 1 x 2 parameters w f f* update ∆ w x 1 x 2 f* learning progress sampling some ( x , f*) -1 0 0 (0,0,0) 1 -1 (1,0,0) 0 0 -1 -1 1 0 (1,0,0) -1 1 (-1,1,0) 0 1 1 -1 0 0 (0,1,0) 1 -1 (1,0,0) 1 0 1 -1 0 1 (1,1,0) -1 1 (-1,0,1) 1 1 1 -1 0 0 (0,1,1) 1 -1 (1,0,0) -1 0 1 (1,1,1) 1 1 (0,0,0) encoding could be -1 1 0 (1,1,1) 1 1 (0,0,0) changed to traditional value 0 by adjusting -1 1 1 (1,1,1) 1 1 (0,0,0) the output of the sign function to 0; training -1 0 0 (1,1,1) -1 -1 (0,0,0) algorithm still valid ... ... ... ... ... ... ... Lecture 1 | 15 Applied Deep Learning | University of Bristol
Geometrical Interpretation of OR Space Learned x 2 class label 1 1 1 1= w 0 /w 2 positive sign area T w x 0 x 1 -1 1 1= w 0 /w 1 negative sign area class label -1 T T w x 0 w x 0 hyper plane defined by weights Lecture 1 | 16 Applied Deep Learning | University of Bristol
Larger Example Visualisation image source: datasciencelab.wordpress.com Lecture 1 | 17 Applied Deep Learning | University of Bristol
Cost Functions Lecture 1 | 18 Applied Deep Learning | University of Bristol
Cost (or Loss) Functions Idea: Given a set X of input vectors x of one or more variables and a parameterisation w , a Cost Function is a map J onto a real number representing a cost or loss associated with the input configurations. (Negatively related to ‘goodness of fit’.) * Expected Loss: J ( X; w ) L ( f ( x; w ), f ( x )) * ( x, f ( x )) ~ p 1 * Empirical Risk: J ( X; w ) L ( f ( x; w ), f ( x )) | X | x X 1 2 * MSE Example: MSE J ( X; w ) f ( x; w ) f ( x ) loss | X | x X loss function per example loss function Lecture 1 | 19 Applied Deep Learning | University of Bristol
Energy Landscapes over Parameter Space Cost Function J parameter dimensions of w Lecture 1 | 20 Applied Deep Learning | University of Bristol
Steepest Gradient Descent Lecture 1 | 21 Applied Deep Learning | University of Bristol
Idea of ‘Steepest’ Gradient Descent w w J ( X; w ) t 1 t t learning rate new old steepest gradient parameter dimensions of w Lecture 1 | 22 Applied Deep Learning | University of Bristol
The Delta Rule 1 2 MSE-type cost function T * J ( X; w ) w x f ( x ) with identity function as 2 | X | activation function x X weight vector change is w J ( X; w ) modelled as a move along change for a the steepest descent single weight w k J ( X; w ) T * w x w x f ( x ) k k w | X | x X k ...and for a single sample... T * w x w x f ( x ) k k this term looks similar to the Perceptron learning rule T * w x w x f ( x ) is the error derivative w x also known as The Delta Rule (Widrow & Hoff, 1960) Lecture 1 | 23 Applied Deep Learning | University of Bristol
Linear Separability Lecture 1 | 24 Applied Deep Learning | University of Bristol
Basic Learning Example: XOR Perceptron Training Attempt of XOR using * w ( f ( x ) f ( x )) x ; 0 . 5 XOR x 0 x 1 x 2 parameters f f* update x 1 x 2 f* learning progress sampling some ( x , f*) -1 0 0 (0,0,0) 1 -1 (1,0,0) 0 0 -1 -1 1 0 (1,0,0) -1 1 (-1,1,0) 0 1 1 -1 0 0 (0,1,0) 1 -1 (1,0,0) 1 0 1 -1 0 1 (1,1,0) -1 1 (-1,0,1) 1 1 -1 -1 0 0 (0,1,1) 1 -1 (1,0,0) -1 0 1 (1,1,1) 1 1 (0,0,0) Will the -1 1 0 (1,1,1) 1 1 (0,0,0) learning -1 1 1 (1,1,1) 1 -1 (1,-1,-1) process -1 1 0 (1,0,0) -1 1 (-1,1,0) ever produce a -1 0 1 (1,1,0) -1 1 (-1,0,1) solution? ... ... ... ... ... ... ... Lecture 1 | 25 Applied Deep Learning | University of Bristol
Recommend
More recommend