Neural networks Chapter 20, Section 5 Chapter 20, Section 5 1

Outline ♦ Brains ♦ Neural networks ♦ Perceptrons ♦ Multilayer perceptrons ♦ Applications of neural networks Chapter 20, Section 5 2

Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms–10ms cycle time Signals are noisy “spike trains” of electrical potential Axonal arborization Axon from another cell Synapse Dendrite Axon Nucleus Synapses Cell body or Soma Chapter 20, Section 5 3

McCulloch–Pitts “unit” Output is a “squashed” linear function of the inputs: � Σ j W j,i a j � a i ← g ( in i ) = g Bias Weight a 0 = − 1 a i = g ( in i ) W 0 ,i g in i Σ W j,i a j a i Input� Input� Activation� Output� Output Links Function Function Links A gross oversimplification of real neurons, but its purpose is to develop understanding of what networks of simple units can do Chapter 20, Section 5 4

Activation functions g ( in i ) g ( in i ) + 1 + 1 in i in i (a)� (b)� (a) is a step function or threshold function (b) is a sigmoid function 1 / (1 + e − x ) Changing the bias weight W 0 ,i moves the threshold location Chapter 20, Section 5 5

Implementing logical functions W 0 = 1.5 W 0 = 0.5 W 0 = – 0.5 W 1 = 1 W 1 = 1 W 1 = –1 W 2 = 1 W 2 = 1 AND OR NOT McCulloch and Pitts: every Boolean function can be implemented Chapter 20, Section 5 6

Network structures Feed-forward networks: – single-layer perceptrons – multi-layer perceptrons Feed-forward networks implement functions, have no internal state Recurrent networks: – Hopfield networks have symmetric weights ( W i,j = W j,i ) g ( x ) = sign ( x ) , a i = ± 1 ; holographic associative memory – Boltzmann machines use stochastic activation functions, ≈ MCMC in Bayes nets – recurrent neural nets have directed cycles with delays ⇒ have internal state (like flip-flops), can oscillate etc. Chapter 20, Section 5 7

Feed-forward example W 1,3 1 3 W 3,5 W 1,4 5 W W 2,3 4,5 2 4 W 2,4 Feed-forward network = a parameterized family of nonlinear functions: a 5 = g ( W 3 , 5 · a 3 + W 4 , 5 · a 4 ) = g ( W 3 , 5 · g ( W 1 , 3 · a 1 + W 2 , 3 · a 2 ) + W 4 , 5 · g ( W 1 , 4 · a 1 + W 2 , 4 · a 2 )) Adjusting weights changes the function: do learning this way! Chapter 20, Section 5 8

Single-layer perceptrons Perceptron output 1 0.8 0.6 0.4 0.2 -4 -2 0 2 4 0 x 2 -4 -2 Output Input 0 2 x 1 W j,i 4 Units Units Output units all operate separately—no shared weights Adjusting weights moves the location, orientation, and steepness of cliff Chapter 20, Section 5 9

Expressiveness of perceptrons Consider a perceptron with g = step function (Rosenblatt, 1957, 1960) Can represent AND, OR, NOT, majority, etc., but not XOR Represents a linear separator in input space: Σ j W j x j > 0 or W · x > 0 x 1 x 1 x 1 1 1 1 ? 0 0 0 x 2 x 2 x 2 0 1 0 1 0 1 (a) x 1 and x 2 (b) x 1 or x 2 (c) x 1 xor x 2 Minsky & Papert (1969) pricked the neural network balloon Chapter 20, Section 5 10

Perceptron learning Learn by adjusting weights to reduce error on training set The squared error for an example with input x and true output y is E = 1 2 Err 2 ≡ 1 2( y − h W ( x )) 2 , Perform optimization search by gradient descent: ∂E = Err × ∂ Err ∂ y − g ( Σ n � � = Err × j = 0 W j x j ) ∂W j ∂W j ∂W j = − Err × g ′ ( in ) × x j Simple weight update rule: W j ← W j + α × Err × g ′ ( in ) × x j E.g., +ve error ⇒ increase network output ⇒ increase weights on +ve inputs, decrease on -ve inputs Chapter 20, Section 5 11

Perceptron learning contd. Perceptron learning rule converges to a consistent function for any linearly separable data set Proportion correct on test set Proportion correct on test set 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 Perceptron 0.6 Decision tree 0.5 0.5 Perceptron Decision tree 0.4 0.4 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 Training set size - MAJORITY on 11 inputs Training set size - RESTAURANT data Perceptron learns majority function easily, DTL is hopeless DTL learns restaurant function easily, perceptron cannot represent it Chapter 20, Section 5 12

Multilayer perceptrons Layers are usually fully connected; numbers of hidden units typically chosen by hand Output units a i W j,i Hidden units a j W k,j Input units a k Chapter 20, Section 5 13

Expressiveness of MLPs All continuous functions w/ 2 layers, all functions w/ 3 layers h W ( x 1 , x 2 ) h W ( x 1 , x 2 ) 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 -4 -2 0 2 4 -4 -2 0 2 4 0 0 x 2 x 2 -4 -4 -2 -2 0 0 2 2 x 1 x 1 4 4 Combine two opposite-facing threshold functions to make a ridge Combine two perpendicular ridges to make a bump Add bumps of various sizes and locations to fit any surface Proof requires exponentially many hidden units (cf DTL proof) Chapter 20, Section 5 14

Back-propagation learning Output layer: same as for single-layer perceptron, W j,i ← W j,i + α × a j × ∆ i where ∆ i = Err i × g ′ ( in i ) Hidden layer: back-propagate the error from the output layer: ∆ j = g ′ ( in j ) i W j,i ∆ i . � Update rule for weights in hidden layer: W k,j ← W k,j + α × a k × ∆ j . (Most neuroscientists deny that back-propagation occurs in the brain) Chapter 20, Section 5 15

Back-propagation derivation The squared error on a single example is defined as E = 1 i ( y i − a i ) 2 , � 2 where the sum is over the nodes in the output layer. ∂E = − ( y i − a i ) ∂a i = − ( y i − a i ) ∂g ( in i ) ∂W j,i ∂W j,i ∂W j,i = − ( y i − a i ) g ′ ( in i ) ∂ in i ∂   = − ( y i − a i ) g ′ ( in i ) j W j,i a j �     ∂W j,i ∂W j,i = − ( y i − a i ) g ′ ( in i ) a j = − a j ∆ i Chapter 20, Section 5 16

Back-propagation derivation contd. ∂E i ( y i − a i ) ∂a i i ( y i − a i ) ∂g ( in i ) = − = − � � ∂W k,j ∂W k,j ∂W k,j i ( y i − a i ) g ′ ( in i ) ∂ in i ∂   = − = − i ∆ i j W j,i a j � � �     ∂W k,j ∂W k,j ∂a j ∂g ( in j ) = − i ∆ i W j,i = − i ∆ i W j,i � � ∂W k,j ∂W k,j i ∆ i W j,i g ′ ( in j ) ∂ in j = − � ∂W k,j ∂   i ∆ i W j,i g ′ ( in j ) = − k W k,j a k � �     ∂W k,j i ∆ i W j,i g ′ ( in j ) a k = − a k ∆ j = − � Chapter 20, Section 5 17

Back-propagation learning contd. At each epoch, sum gradient updates for all examples and apply Training curve for 100 restaurant examples: finds exact fit 14 Total error on training set 12 10 8 6 4 2 0 0 50 100 150 200 250 300 350 400 Number of epochs Typical problems: slow convergence, local minima Chapter 20, Section 5 18

Back-propagation learning contd. Learning curve for MLP with 4 hidden units: 1 Proportion correct on test set 0.9 0.8 0.7 0.6 Decision tree Multilayer network 0.5 0.4 0 10 20 30 40 50 60 70 80 90 100 Training set size - RESTAURANT data MLPs are quite good for complex pattern recognition tasks, but resulting hypotheses cannot be understood easily Chapter 20, Section 5 19

Handwritten digit recognition 3-nearest-neighbor = 2.4% error 400–300–10 unit MLP = 1.6% error LeNet: 768–192–30–10 unit MLP = 0.9% error Current best (kernel machines, vision algorithms) ≈ 0.6% error Chapter 20, Section 5 20

Summary Most brains have lots of neurons; each neuron ≈ linear–threshold unit (?) Perceptrons (one-layer networks) insufficiently expressive Multi-layer networks are sufficiently expressive; can be trained by gradient descent, i.e., error back-propagation Many applications: speech, driving, handwriting, fraud detection, etc. Engineering, cognitive modelling, and neural system modelling subfields have largely diverged Chapter 20, Section 5 21

Neural networks Chapter 20, Section 5 Chapter 20, Section 5 1 - PowerPoint PPT Presentation

Neural networks Chapter 20, Section 5 Chapter 20, Section 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 20, Section 5 2 Brains 10 11 neurons of > 20 types,

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

NOTES FOR TEACHERS TO ACCOMPANY POWERPOINT PRESENTATION PSYCHOACTIVE MOLECULES FROM PLANTS SLIDE

The selective RNA-binding protein QKI: a key player in glia development and myelination

COMP304 Introduction to Neural Networks based on slides by: Christian Borgelt

The cable equation A.K.A. the monodomain model Neurons Electric flow in neurons The neuron

Stochastic Nerve Axon Equations Wilhelm Stannat Institut f ur Mathematik, Fakult at II TU

Data-oriented Neuron Classification from Their Parts Evelyn Perez Cervantes 1 Cesar Henrique Comin

Chapter 20 Section 5 --- Slide Set 1 Prologue to neural networks: the neurons in the human brain

Jointly provided by This activity is supported by independent educational grants from Celgene