comp24111 machine learning and optimisation
play

COMP24111: Machine Learning and Optimisation Chapter 5: Neural - PowerPoint PPT Presentation

COMP24111: Machine Learning and Optimisation Chapter 5: Neural Networks and Deep Learning Dr. Tingting Mu Email: tingting.mu@manchester.ac.uk Outline Single-layer perceptron, the perceptron algorithm. Multi-layer perceptron.


  1. COMP24111: Machine Learning and Optimisation Chapter 5: Neural Networks and Deep Learning Dr. Tingting Mu Email: tingting.mu@manchester.ac.uk

  2. Outline • Single-layer perceptron, the perceptron algorithm. • Multi-layer perceptron. • Back-propagation method. • Convolutional neural network. • Deep learning. 1

  3. Neuron Structure • Simulating a neuron: each artificial neural network (ANN) neuron receives multiple inputs, and generates one output. A neuron is an electrically excitable cell that processes and transmits information by electro-chemical signaling. Figure is from http://2centsapiece.blogspot.co.uk/2015/10/identifying-subatomic-particles-with.html 2

  4. Neuron Structure • Simulating a neuron: each artificial neural network (ANN) neuron receives multiple inputs, and generates one output. Input signals sent A neuron is an from other neurons. electrically excitable cell that processes Connection strengths determine how and transmits the signals are accumulated. information by electro-chemical signaling. If enough signals accumulate, the neuron fires a signal. Figure is from http://2centsapiece.blogspot.co.uk/2015/10/identifying-subatomic-particles-with.html 3

  5. History • 1951 , the first randomly wired neural network learning machine SNARC (Stochastic Neural Analog Reinforcement Calculator) designed by Marvin Lee Minsky (09/08/1927 – 24/01/2016, American cognitive scientist). • 1957 , the perceptron algorithm was invented at the Cornell Aeronautical Laboratory by Frank Rosenblatt (11/07/1928 - 11/07/1971, American psychologist notable in A.I.) • 1969 , Neocognitron (a type of artificial neural network) proposed by Kunihiko Fukushima has been used for handwritten character recognition, and served as the inspiration for convolutional neural networks. 4

  6. History • 1982 , Hopfield network (a form of recurrent artificial neural network) was popularized by John Hopfield in 1982, but described earlier by Little in 1974. • 1986 , the process of backpropagation was described by David Rumelhart, Geoff Hinton and Ronald J. Williams. But the basics of continuous backpropagation were derived in the context of control theory by Henry J. Kelley (1926-1988, Professor of Aerospace and Ocean Engineering) in 1960. • 1997 , long short-term memory (LSTM) recurrent neural network was invented by Sepp Hochreiter and Jürgen Schmidhuber invent, improving the efficiency and practicality of recurrent neural networks. 5

  7. Single Neuron Model • An ANN neuron: multiple inputs [ x 1 , x 2 , … , x d ] and one output y . Basic elements of a typical neuron include: x 1 w 1 b (bias) w 1 x 1 • A set of synapses or x 2 connections . Each of w 2 adder these is characterised by y ϕ w 2 x 2 ∑ a weight (strength). • An adder for summing activation w d x d the input signals, weighted by the x d w d respective synapses. • An activation function , ⎛ ⎞ d which squashes the y = ϕ ∑ w i x i + b permissible amplitude ⎜ ⎟ neuron ⎜ ⎟ range of the output ⎝ ⎠ i = 1 signal. Given d input, a neuron is modeled by d+1 parameters. 6

  8. Types of Activation Function 1 ( ) = v • Identify function: ϕ v 3 1 Identity Threshold 0.8 2 0.6 0.4 1 • Threshold function: 0.2 (v) (v) 0 0 -0.2 ⎧ -1 -0.4 1 if v ≥ 0 ⎪ -0.6 ( ) = ϕ v -2 ⎨ -0.8 − 1 if v < 0 -1 ⎪ -3 -3 -2 -1 0 1 2 3 -1 ⎩ -3 -2 -1 0 1 2 3 v v • Sigmoid function (“S”-shaped curve): 1 1 1 1 Tanh 0.9 Sigmoid 0.8 1 0.8 ( ) = ( ) 0.6 ϕ v ∈ 0, + 1 or 0.7 0.4 ( ) 1 + exp − v 0.6 0.2 (v) (v) 0.5 0 0.4 -0.2 ( exp 2 v ) − 1 0.3 -0.4 ϕ v ( ) = tanh v ( ) = ∈ − 1, + 1 ( ) 0.2 -0.6 exp 2 v ( ) + 1 0.1 -0.8 0 -1 -10 -8 -6 -4 -2 0 2 4 6 8 0 10 -10 -8 -6 -4 -2 0 2 4 6 8 10 -1 v v • Rectified linear unit (ReLU): 3 ReLU 2.5 ⎧ v if v ≥ 0 ⎪ 2 ( ) = ϕ v ⎨ (v) 1.5 0 if v < 0 ⎪ ⎩ 1 0.5 0 -3 -2 -1 0 1 2 3 7 v

  9. The Perceptron Algorithm 3 • When the activation function is set as the Identity 2 identify function, the single neuron model 1 (v) becomes the linear model we learned in the 0 -1 previous chapters. The neuron weights and -2 bias are equivalent to the coefficient vector of -3 -3 -2 -1 0 1 2 3 v the linear model. 1 model output = w T x + b 1 Threshold 0.8 0.6 0.4 0.2 (v) 0 • When the activation function is set as the -0.2 -0.4 threshold function, the model is still linear, -0.6 -0.8 -1 and it is known as the perceptron of -3 -2 -1 0 1 2 3 -1 v Rosenblatt. Activation function: ⎧ 1 if v ≥ 0 ⎪ • The perceptron algorithm is for two-class ( ) = ϕ v ⎨ − 1 if v < 0 ⎪ classification, and it occupies an import place ⎩ in the history of pattern recognition algorithms. 8

  10. The Perceptron Algorithm w T ! ⎧ ⎪ 1 if x ≥ 0 perceptron output = ⎨ w T ! − 1 if x < 0 ⎪ ⎩ ⎡ ⎤ 1 ] , ! w = b , w 1 , … w d [ x = ⎢ ⎥ x ⎣ ⎦ • Training a perceptron: w ( t + 1) = w ( t ) − η ∇ O w ( t ) ) = w ( t ) + η y i ! ( x i Update using a misclassified sample in each iteration! 9

  11. The Perceptron Algorithm Update using one misclassified sample in each iteration: w ( t + 1) = w ( t ) + η y i ! x i • Perceptron Training: Initialise the weights (stored in w (0) ) to random numbers in range -1 to +1. For t = 1 to NUM_ITERATIONS For each training sample ( x i ,y i ) Calculate activation using current weight (stored in w (t) ). Update weight (stored in w (t+1) ) by learning rule. end end • What weight changes do the following cases produce? No change • if... (true label = -1, activation output = -1).... then No change • if... (true label = +1, activation output = +1).... then Add – η ! x i • if... (true label = -1, activation output = +1).... then Add + η ! x i • if... (true label = +1, activation output = +1).... then 10

  12. Why training like this? • Parameters stored in w are optimised by minimising an error function, called perceptron criterion: If a sample is correctly classified, applies an error y i w T ! ( ) ( ) = − O w ∑ x i penalty of zero; if incorrectly i ∈ Misclassified Set classified, applies an error penalty of the following quantity: • We want to reduce the number of misclassified samples, therefore to y i w T ! ) = − y i w T ! ( ( ) x i x i minimise the above error penalty. • Stochastic gradient descent is used for training. • Estimate gradient using a misclassified sample: ( ) ∂ O i w ( ) = − y i w T ! = − y i ! O i w x i ⇒ x i ∂ w w ( t + 1) = w ( t ) − η ∇ O w ( t ) ) = w ( t ) + η y i ! ( x i 11

  13. an input node x 1 One neuron x 2 can be used y It has only one to construct layer (input layer), and is called a a linear single layer x d perceptron . model. Input Layer x 1 w 1 b w 1 x 1 x 2 w 2 adder y ϕ w 2 x 2 ∑ w d x d activation ⎛ ⎞ d x d w d y = ϕ ∑ w i x i + b ⎜ ⎟ ⎜ ⎟ ⎝ ⎠ i = 1 12

  14. Adding Hidden Layers! • The presence of hidden layers allows to formulate more complex functions. • Each hidden node finds a partial solution to the problem to be combined in the next layer. x 1 x 1 x 2 x 2 y y x d Example: x d input hidden hidden layer layer 1 layer 2 input hidden layer layer 1 13

  15. Multilayer Perceptron • A multilayer perceptron (MLP), also called feedforward artificial neural network, consists of at least three layers of nodes (input, hidden and output layers). • Number of neurons in the input layer is equal to the number of input features. • Number of hidden layers is a hyperparameter to be set. • Numbers of neurons in hidden layers are also output hyperparameters to be layer set. • Number of neurons in input hidden hidden layer layer 1 layer 2 output layer depends on the task to be solved. 14

  16. Multilayer Perceptron • An MLP example with one hidden layer consisting of 4 hidden neurons. It takes 9 input features and returns 2 output variables (9 input neurons in the input layer, 2 output neurons in the output layer). • Output of the j-th neuron Feed-forward information flow when in the hidden layer computing the output variables. (j=1,2,3,4), for the n-th training sample: ⎛ ⎞ 9 ( ) x i n h ∑ ( h ) ( ) = ϕ ( ) + b j z j n w ij ⎜ ⎟ ⎝ ⎠ i = 1 • Output of the k-th neuron in the output layer (k=1,2), for the n-training sample: ⎛ ⎞ 4 ( ) z j n Hidden layer Output layer o ∑ ( o ) ( ) = ϕ ( ) + b k y k n ⎜ w jk ⎟ ⎜ ⎟ ⎝ ⎠ j = 1 15

  17. Multilayer Perceptron • An MLP example with one hidden layer consisting of 4 hidden neurons. It takes 9 input features and returns 2 output variables (9 input neurons in input layer, 2 output neuron in output layer). • Output of the j-th neuron Feed-forward information flow when in the hidden layer computing the output variables. (j=1,2,3,4), for the n-th training sample: z j (n) W ij (h) ⎛ ⎞ 9 x i (n) j ( ) x i n h ∑ ( h ) ( ) = ϕ ( ) + b j z j n w ij ⎜ ⎟ W jk (o) ⎝ ⎠ 9+1 =10 weights i = 1 y k (n) k • Output of the k-th neuron 4+1=5 weights in the output layer (k=1,2), for the n-training sample: ⎛ ⎞ 4 ( ) z j n Hidden layer Output layer o ∑ ( o ) ( ) = ϕ ( ) + b k y k n ⎜ w jk ⎟ ⎜ ⎟ ⎝ ⎠ j = 1 How many weights in total? 16

Recommend


More recommend