Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 - PowerPoint PPT Presentation

Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5

Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks • Neural networks arise from attempts to model human/animal brains • Many models, many claims of biological plausibility • We will focus on multi-layer perceptrons • Mathematical properties rather than plausibility • Prof. Hadley CMPT418

Feed-forward Networks Network Training Error Backpropagation Applications Uses of Neural Networks • Pros • Good for continuous input variables. • General continuous function approximators. • Highly non-linear. • Learn feature functions. • Good to use in continuous domains with little knowledge: • When you don’t know good features. • You don’t know the form of a good functional model. • Cons • Not interpretable, “black box”. • Learning is slow. • Good generalization can require many datapoints.

Feed-forward Networks Network Training Error Backpropagation Applications Applications There are many, many applications. • World-Champion Backgammon Player. • No Hands Across America Tour. • Digit Recognition with 99.26% accuracy. • ...

Feed-forward Networks Network Training Error Backpropagation Applications Outline Feed-forward Networks Network Training Error Backpropagation Applications

Feed-forward Networks Network Training Error Backpropagation Applications Feed-forward Networks • We have looked at generalized linear models of the form:   M � y ( x , w ) = f w j φ j ( x )   j = 1 for fixed non-linear basis functions φ ( · ) • We now extend this model by allowing adaptive basis functions, and learning their parameters • In feed-forward networks (a.k.a. multi-layer perceptrons) we let each basis function be another non-linear function of linear combination of the inputs:   M � φ j ( x ) = f . . .   j = 1

Feed-forward Networks Network Training Error Backpropagation Applications Feed-forward Networks • Starting with input x = ( x 1 , . . . , x D ) , construct linear combinations: D � w ( 1 ) ji x i + w ( 1 ) a j = j 0 i = 1 These a j are known as activations • Pass through an activation function h ( · ) to get output z j = h ( a j ) • Model of an individual neuron from Russell and Norvig, AIMA2e

Feed-forward Networks Network Training Error Backpropagation Applications Activation Functions • Can use a variety of activation functions • Sigmoidal (S-shaped) • Logistic sigmoid 1 / ( 1 + exp ( − a )) (useful for binary classification) • Hyperbolic tangent tanh • Radial basis function z j = � i ( x i − w ji ) 2 • Softmax • Useful for multi-class classification • Identity • Useful for regression • Threshold • . . . • Needs to be differentiable for gradient-based learning (later) • Can use different activation functions in each unit

Feed-forward Networks Network Training Error Backpropagation Applications Feed-forward Networks hidden units z M w (1) w (2) MD KM x D y K outputs inputs y 1 x 1 w (2) z 1 10 x 0 z 0 • Connect together a number of these units into a feed-forward network (DAG) • Above shows a network with one layer of hidden units • Implements function:   � D � M � � w ( 2 ) w ( 1 ) ji x i + w ( 1 ) + w ( 2 ) y k ( x , w ) = h kj h   j 0 k 0 j = 1 i = 1

Feed-forward Networks Network Training Error Backpropagation Applications Hidden Units Compute Basis Functions • red dots = network function • dashed line = hidden unit activation function. • blue dots = data points

Feed-forward Networks Network Training Error Backpropagation Applications Outline Feed-forward Networks Network Training Error Backpropagation Applications

Feed-forward Networks Network Training Error Backpropagation Applications Network Training • Given a specified network structure, how do we set its parameters (weights)? • As usual, we define a criterion to measure how well our network performs, optimize against it • For regression, training data are ( x n , t ) , t n ∈ R • Squared error naturally arises: N � { y ( x n , w ) − t n } 2 E ( w ) = n = 1 • For binary classification, this is another discriminative model, ML: N � y t n n { 1 − y n } 1 − t n p ( t | w ) = n = 1 N � E ( w ) = − { t n ln y n + ( 1 − t n ) ln ( 1 − y n ) } n = 1

Feed-forward Networks Network Training Error Backpropagation Applications Parameter Optimization E ( w ) w 1 w A w B w C w 2 ∇ E • For either of these problems, the error function E ( w ) is nasty • Nasty = non-convex • Non-convex = has local minima

Feed-forward Networks Network Training Error Backpropagation Applications Descent Methods • The typical strategy for optimization problems of this sort is a descent method: w ( τ + 1 ) = w ( τ ) + ∆ w ( τ ) • These come in many flavours • Gradient descent ∇ E ( w ( τ ) ) • Stochastic gradient descent ∇ E n ( w ( τ ) ) • Newton-Raphson (second order) ∇ 2 • All of these can be used here, stochastic gradient descent is particularly effective • Redundancy in training data, escaping local minima

Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 - PowerPoint PPT Presentation

Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Neural

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

CS 220: Discrete Structures and their Applications Functions Chapter 5 in zybooks Functions A

The Coq proof assistant : principles and practice J.-F. Monin Universit Grenoble Alpes 2016

CS 4803 If a, N are integers with N > 0 then there are unique integers r , q such that a = Nq

Representing Knowledge Automated Reasoning We are faced with several choices: which type of

Decorators Functions That Make Functions C-START Python PD Workshop C-START Python PD Workshop

CSC421/2516 Lecture 14: Exploding and Vanishing Gradients Roger Grosse and Jimmy Ba Roger Grosse

Algorithms for Big Data (IV) Chihao Zhang Shanghai Jiao Tong University Oct. 11, 2019

Linear approximation and Taylor expansion of -terms F. Olimpieri Aix-Marseille Univ, CNRS,

Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 - PowerPoint PPT Presentation

Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Neural

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

CS 220: Discrete Structures and their Applications Functions Chapter 5 in zybooks Functions A

The Coq proof assistant : principles and practice J.-F. Monin Universit Grenoble Alpes 2016

CS 4803 If a, N are integers with N &gt; 0 then there are unique integers r , q such that a = Nq

Representing Knowledge Automated Reasoning We are faced with several choices: which type of

Decorators Functions That Make Functions C-START Python PD Workshop C-START Python PD Workshop

CSC421/2516 Lecture 14: Exploding and Vanishing Gradients Roger Grosse and Jimmy Ba Roger Grosse

Algorithms for Big Data (IV) Chihao Zhang Shanghai Jiao Tong University Oct. 11, 2019

Linear approximation and Taylor expansion of -terms F. Olimpieri Aix-Marseille Univ, CNRS,

CS 4803 If a, N are integers with N > 0 then there are unique integers r , q such that a = Nq