Lecture 11: Multi-layer Perceptron Forward Pass Backpropagation - PowerPoint PPT Presentation

Lecture 11: − Multi-layer Perceptron − Forward Pass − Backpropagation Aykut Erdem November 2017 Hacettepe University

Administrative • Assignment 3 is out! − It is due November 24, 2017 − You will implement backpropagation to train   multi-layer neural networks   − Dataset: Fashion-MNIST 2

A reminder about course projects • From now on, you are required to write regular (weekly) blog posts about your progress on the course projects! • We will use medium.com 3

4 Last time… Linear classification slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson

Interactive web demo time…. slide by Fei-Fei Li & Andrej Karpathy & Justin Johnson http://vision.stanford.edu/teaching/cs231n/linear-classify-demo/ 7

Last time… Perceptron x n x 1 x 2 x 3 . . . w n w 1 synaptic weights output X slide by Alex Smola f ( x ) = w i x i = h w, x i i 8

This Week • Multi-layer perceptron   • Forward Pass • Backward Pass   9

Introduction 10

A brief history of computers 1970s 1980s 1990s 2000s 2010s Data 10 2 10 3 10 11 10 5 10 8 RAM ? 1MB 100MB 10GB 1TB CPU ? 10MF 1GF 100GF 1PF GPU deep kernel   deep • Data grows   nets methods nets at higher exponent • Moore’s law (silicon) vs. Kryder’s law (disks) slide by Alex Smola • Early algorithms data bound, now CPU/RAM bound 11

Not linearly separable data • Some datasets are not linearly separable ! - e.g. XOR problem   • Nonlinear separation is trivial slide by Alex Smola 12

Addressing non-linearly separable data • Two options: - Option 1: Non-linear features - Option 2: Non-linear classifiers slide by Dhruv Batra 13

Option 1 — Non-linear features • Choose non-linear features, e.g., - Typical linear features: w 0 + Σ i w i x i - Example of non-linear features: • Degree 2 polynomials, w 0 + Σ i w i x i + Σ ij w ij x i x j • Classifier h w ( x ) still linear in parameters w - As easy to learn - Data is linearly separable in higher dimensional spaces - Express via kernels slide by Dhruv Batra 14

Option 2 — Non-linear classifiers • Choose a classifier h w ( x ) that is non-linear in parameters w , e.g., - Decision trees, neural networks,… • More general than linear classifiers • But, can often be harder to learn (non-convex optimization required) • Often very useful (outperforms linear classifiers) • In a way, both ideas are related slide by Dhruv Batra 15

Biological Neurons • Soma (CPU)   Cell body - combines signals   • Dendrite (input bus)   Combines the inputs from   several other nerve cells   • Synapse (interface)   Interface and parameter store between neurons   • Axon (cable)   May be up to 1m long and will transport the slide by Alex Smola activation signal to neurons at di ff erent locations 16

Recall: The Neuron Metaphor • Neurons - accept information from multiple inputs, - transmit information to other neurons. • Multiply inputs by weights along edges • Apply some function to the set of inputs at each node slide by Dhruv Batra 17

Types of Neuron 1 θ 1 θ 0 θ 2 f ( ~ x, ✓ ) X θ D y = θ 0 + x i θ i i Linear Neuron slide by Dhruv Batra 18

Types of Neuron 1 θ 1 θ 0 θ 2 f ( ~ x, ✓ ) X θ D y = θ 0 + x i θ i i Linear Neuron 1 θ 1 θ 0 θ 2 f ( ~ x, ✓ ) X z = θ 0 + x i θ i θ D slide by Dhruv Batra i ⇢ 1 if z ≥ 0 Perceptron y = 19 0 otherwise

Types of Neuron X 1 z = θ 0 + x i θ i θ 1 θ 0 i 1 y = θ 2 f ( ~ x, ✓ ) 1 + e − z 1 X θ D y = θ 0 + x i θ i θ 1 θ 0 i Linear Neuron θ 2 f ( ~ x, ✓ ) 1 θ D θ 1 θ 0 Logistic Neuron θ 2 f ( ~ x, ✓ ) X z = θ 0 + x i θ i θ D slide by Dhruv Batra i ⇢ 1 if z ≥ 0 Perceptron y = 20 0 otherwise

Types of Neuron X 1 z = θ 0 + x i θ i θ 1 θ 0 i 1 y = θ 2 f ( ~ x, ✓ ) 1 + e − z 1 X θ D y = θ 0 + x i θ i θ 1 θ 0 i Linear Neuron θ 2 f ( ~ x, ✓ ) 1 θ D θ 1 θ 0 Logistic Neuron θ 2 f ( ~ x, ✓ ) • Potentially more. Requires a convex X z = θ 0 + x i θ i θ D loss function for gradient descent slide by Dhruv Batra i training. ⇢ 1 if z ≥ 0 Perceptron y = 21 0 otherwise

Limitation • A single “neuron” is still a linear decision boundary • What to do? • Idea: Stack a bunch of them together! slide by Dhruv Batra 22

Nonlinearities via Layers • Cascade neurons together • The output from one layer is the input to the next • Each layer has its own sets of weights y 1 i = k ( x i , x ) Kernels y 1 i ( x ) = σ ( h w 1 i , x i ) y 2 ( x ) = σ ( h w 2 , y 1 i ) Deep Nets slide by Alex Smola optimize all weights 23

Nonlinearities via Layers y 1 i ( x ) = σ ( h w 1 i , x i ) y 2 i ( x ) = σ ( h w 2 i , y 1 i ) y 3 ( x ) = σ ( h w 3 , y 2 i ) slide by Alex Smola 24

    Representational Power • Neural network with at least one hidden layer is a universal approximator (can represent any function).   Proof in: Approximation by Superpositions of Sigmoidal Function, Cybenko, paper   slide by Raquel Urtasun, Richard Zemel, Sanja Fidler • The capacity of the network increases with more hidden units and more hidden layers 25

A simple example • Consider a neural network 0 1 2 3 4 5 6 7 8 9 with two layers of neurons. - neurons in the top layer represent known shapes. - neurons in the bottom layer represent pixel intensities.   • A pixel gets to vote if it has ink on it. - Each inked pixel can vote for several di ff erent shapes.   𝑦 � • The shape that gets the 𝑦 � ¡ ¡𝑔(∑𝑥 � 𝑦 � ) slide by Geoffrey Hinton 𝑦 � most votes wins. … … 𝑦 � 26 �

How to display the weights 1 2 3 4 5 6 7 8 9 0 The input image Give each output unit its own “map” of the input image and display the weight coming from each pixel in the location of that pixel in the map. Use a black or white blob with the area representing the slide by Geoffrey Hinton magnitude of the weight and the color representing the sign. 27

How to learn the weights 1 2 3 4 5 6 7 8 9 0 The image Show the network an image and increment the weights from active pixels to the correct class. Then decrement the weights from active pixels to whatever class the network guesses. slide by Geoffrey Hinton 28

1 2 3 4 5 6 7 8 9 0 The image slide by Geoffrey Hinton 29

The learned weights 1 2 3 4 5 6 7 8 9 0 The image The details of the learning algorithm will be explained later. slide by Geoffrey Hinton 34

Why insu ffi cient • A two layer network with a single winner in the top layer is equivalent to having a rigid template for each shape. - The winner is the template that has the biggest overlap with the ink.   • The ways in which hand-written digits vary are much too complicated to be captured by simple template matches of whole shapes. - To capture all the allowable variations of a digit we need to learn the features that it is composed of. slide by Geoffrey Hinton 35

Multilayer Perceptron • Layer Representation y y i = W i x i W 4 x i +1 = σ ( y i ) x4 W 3 • (typically) iterate between   linear mapping Wx and   x3 nonlinear function W 2 • Loss function   l ( y, y i ) x2 to measure quality of   W 1 estimate so far x1 slide by Alex Smola 36

Forward Pass 37

Forward Pass: What does the Network Compute? • Output of the network can be written as: D X X h j ( x ) = f ( v j 0 + x i v ji ) i =1 J slide by Raquel Urtasun, Richard Zemel, Sanja Fidler X o k ( x ) = g ( w k 0 + h j ( x ) w kj ) j =1 (j indexing hidden units, k indexing the output units, D number of inputs) • Activation functions f , g : sigmoid/logistic, tanh, or rectified linear (ReLU) 1 + exp( − z ) , tanh ( z ) = exp( z ) − exp( − z ) 1 σ ( z ) = exp( z ) + exp( − z ) , ReLU ( z ) = max(0 , z ) 38

      Forward Pass in Python • Example code for a forward pass for a 3-layer network in Python:   slide by Raquel Urtasun, Richard Zemel, Sanja Fidler • Can be implemented e ffi ciently using matrix operations • Example above: W 1 is matrix of size 4 × 3, W 2 is 4 × 4. What about biases and W 3 ? 39 [http://cs231n.github.io/neural-networks-1/]

Special Case • What is a single layer (no hiddens) network with a sigmoid act. function? slide by Raquel Urtasun, Richard Zemel, Sanja Fidler • Network: 1 o k ( x ) = 1 + exp( − z k ) J X = w k 0 + z k x j w kj j =1 • Logistic regression! 40

Lecture 11: Multi-layer Perceptron Forward Pass Backpropagation - PowerPoint PPT Presentation

Lecture 11: Multi-layer Perceptron Forward Pass Backpropagation Aykut Erdem November 2017 Hacettepe University Administrative Assignment 3 is out! It is due November 24, 2017 You will implement backpropagation to train

Lecture # 5 - Monday, Aug 30th In this lecture I reviewed the previous lecture 4, and then

Algorithms (2IL15) Lecture 13 Wrap-up lecture 1 TU/e Algorithms (2IL15) Lecture 13

In 2020SP, this lecture and lecture 20 are both optional extra material CS 5412/LECTURE 17 Ken

Recall last lecture ... Lecture 8 Also last lecture: Painter's Algorithm More Hidden Surface

Plan Lecture 1 - String diagrams and symmetric monoidal categories Lecture 2 -

Where are we at - Topic overview Lecture 1A: Security requirements/features Lecture 7A

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

CSE 158 Lecture 4 Web Mining and Recommender Systems More Classifiers Last lecture How

CSE 158 Lecture 4 Web Mining and Recommender Systems More Classifiers Last lecture How

CSE 158 Lecture 4 Web Mining and Recommender Systems More Classifiers Last lecture How

Usability of Programming Languages Lecture 4 - directed by your research interests Lecture

Introduction to AI & Intelligent Agents This Lecture Chapters 1 and 2 Next Lecture

Introduction to Numerical Optimization Biostatistics 615/815 Lecture 14 Lecture 14 Course is

Lecture 12: Clustering 1 6.0002 LECTURE 12 Re Reading Chapter 23 6.0002 LECTURE 12 2 Mach Ma

Lecture Outline Regeltechniek Previous lecture: Stability and transient response. Lecture 4

Psycholinguistics Lecture 2 By Dr.Chelli Lecture Objectives At the end of this lecture, students

Methodology for Lecture Methodology for Lecture Computer Graphics (Spring 2008) Computer

Lecture Outline Regeltechniek Previous lecture: Nyquist plot and stability criterion. Lecture 11

CSE Fall 2014 311 Lecture 1 Lecture 1 Lecture 1: Propositional Logic Lecture 1 Foundations

Proteomics Steven Meinhardt Lectures Lecture 1 Introduction review of proteins

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Lecture 1: Neurons Lecture 2: Coding with spikes Lecture 3: Tuning curves and receptive fields

Algorithms (2IL15) Lecture 10 NP-Completeness, II 1 TU/e Algorithms (2IL15) Lecture 10

Lecture 1: Bioinformatic Algorithms In this lecture Logistics of the course

Lecture 11: Multi-layer Perceptron Forward Pass Backpropagation - PowerPoint PPT Presentation

Lecture 11: Multi-layer Perceptron Forward Pass Backpropagation Aykut Erdem November 2017 Hacettepe University Administrative Assignment 3 is out! It is due November 24, 2017 You will implement backpropagation to train

Lecture # 5 - Monday, Aug 30th In this lecture I reviewed the previous lecture 4, and then

Algorithms (2IL15) Lecture 13 Wrap-up lecture 1 TU/e Algorithms (2IL15) Lecture 13

In 2020SP, this lecture and lecture 20 are both optional extra material CS 5412/LECTURE 17 Ken

Recall last lecture ... Lecture 8 Also last lecture: Painter's Algorithm More Hidden Surface

Plan Lecture 1 - String diagrams and symmetric monoidal categories Lecture 2 -

Where are we at - Topic overview Lecture 1A: Security requirements/features Lecture 7A

Lecture Capture Introduction to Lecture Capture Learning Outcomes What will lecture capture

CSE 158 Lecture 4 Web Mining and Recommender Systems More Classifiers Last lecture How

CSE 158 Lecture 4 Web Mining and Recommender Systems More Classifiers Last lecture How

CSE 158 Lecture 4 Web Mining and Recommender Systems More Classifiers Last lecture How

Usability of Programming Languages Lecture 4 - directed by your research interests Lecture

Introduction to AI &amp; Intelligent Agents This Lecture Chapters 1 and 2 Next Lecture

Introduction to Numerical Optimization Biostatistics 615/815 Lecture 14 Lecture 14 Course is

Lecture 12: Clustering 1 6.0002 LECTURE 12 Re Reading Chapter 23 6.0002 LECTURE 12 2 Mach Ma

Lecture Outline Regeltechniek Previous lecture: Stability and transient response. Lecture 4

Psycholinguistics Lecture 2 By Dr.Chelli Lecture Objectives At the end of this lecture, students

Methodology for Lecture Methodology for Lecture Computer Graphics (Spring 2008) Computer

Lecture Outline Regeltechniek Previous lecture: Nyquist plot and stability criterion. Lecture 11

CSE Fall 2014 311 Lecture 1 Lecture 1 Lecture 1: Propositional Logic Lecture 1 Foundations

Proteomics Steven Meinhardt Lectures Lecture 1 Introduction review of proteins

Multiphase Modelling in Cancer Helen Byrne Wolfson Centre for Mathematical Biology Mathematical

Lecture 1: Neurons Lecture 2: Coding with spikes Lecture 3: Tuning curves and receptive fields

Algorithms (2IL15) Lecture 10 NP-Completeness, II 1 TU/e Algorithms (2IL15) Lecture 10

Lecture 1: Bioinformatic Algorithms In this lecture Logistics of the course

Introduction to AI & Intelligent Agents This Lecture Chapters 1 and 2 Next Lecture