CSC2515 Lecture 9: Convolutional Networks Marzyeh Ghassemi - PowerPoint PPT Presentation

CSC2515 Lecture 9: Convolutional Networks Marzyeh Ghassemi Material and slides developed by Roger Grosse, University of Toronto UofT CSC2515 Lec9 1 / 63

Neural Nets for Visual Object Recognition People are very good at recognizing shapes ◮ Intrinsically difficult, computers are bad at it Why is it difficult? UofT CSC2515 Lec9 2 / 63

Why is it a Problem? Difficult scene conditions [From: Grauman & Leibe] UofT CSC2515 Lec9 3 / 63

Why is it a Problem? Huge within-class variations. Recognition is mainly about modeling variation. [Pic from: S. Lazebnik] UofT CSC2515 Lec9 4 / 63

Why is it a Problem? Tons of classes [Biederman] UofT CSC2515 Lec9 5 / 63

Neural Nets for Object Recognition People are very good at recognizing object ◮ Intrinsically difficult, computers are bad at it Some reasons why it is difficult: ◮ Segmentation: Real scenes are cluttered ◮ Invariances: We are very good at ignoring all sorts of variations that do not affect class ◮ Deformations: Natural object classes allow variations (faces, letters, chairs) ◮ A huge amount of computation is required UofT CSC2515 Lec9 6 / 63

How to Deal with Large Input Spaces How can we apply neural nets to images? Images can have millions of pixels, i.e., x is very high dimensional How many parameters do I have? UofT CSC2515 Lec9 7 / 63

How to Deal with Large Input Spaces How can we apply neural nets to images? Images can have millions of pixels, i.e., x is very high dimensional How many parameters do I have? Prohibitive to have fully-connected layers What can we do? We can use a locally connected layer UofT CSC2515 Lec9 7 / 63

Locally Connected Layer Example: 200x200 image 40K hidden units Filter size: 10x10 4M parameters Note: This parameterization is good when input image is registered (e.g., 34 face recognition). Ranzato UofT CSC2515 Lec9 8 / 63

When Will this Work? When Will this Work? This is good when the input is (roughly) registered UofT CSC2515 Lec9 9 / 63

General Images The object can be anywhere [Slide: Y. Zhu] UofT CSC2515 Lec9 10 / 63

The Invariance Problem Our perceptual systems are very good at dealing with invariances ◮ translation, rotation, scaling ◮ deformation, contrast, lighting We are so good at this that it’s hard to appreciate how difficult it is ◮ It’s one of the main difficulties in making computers perceive ◮ We still don’t have generally accepted solutions UofT CSC2515 Lec9 13 / 63

Locally Connected Layer STATIONARITY? Statistics is similar at different locations Example: 200x200 image 40K hidden units Filter size: 10x10 4M parameters Note: This parameterization is good when input image is registered (e.g., 35 face recognition). Ranzato UofT CSC2515 Lec9 14 / 63

The replicated feature approach Adopt approach apparently used in monkey visual systems The red connections all Use many different copies of the same have the same weight. feature detector. ◮ Copies have slightly different positions. ◮ Could also replicate across scale and orientation. ◮ Tricky and expensive ◮ Replication reduces the number of free parameters to be learned. Use several different feature types , each with its own replicated pool of detectors. 5 ◮ Allows each patch of image to be represented in several ways. UofT CSC2515 Lec9 15 / 63

Convolutional Neural Net Idea: statistics are similar at different locations (Lecun 1998) Connect each hidden unit to a small input patch and share the weight across space This is called a convolution layer and the network is a convolutional network UofT CSC2515 Lec9 16 / 63

Convolution Convolution layers are named after the convolution operation. If a and b are two arrays, � ( a ∗ b ) t = a τ b t − τ . τ UofT CSC2515 Lec9 17 / 63

Convolution Method 1: translate-and-scale UofT CSC2515 Lec9 18 / 63

Convolution Method 2: flip-and-filter UofT CSC2515 Lec9 19 / 63

Convolution Convolution can also be viewed as matrix multiplication:   1 1 1  2      (2 , − 1 , 1) ∗ (1 , 1 , 2) = 2 1 1 − 1       2 1 1   2 Aside: This is how convolution is typically implemented. (More efficient than the fast Fourier transform (FFT) for modern conv nets on GPUs!) UofT CSC2515 Lec9 20 / 63

Convolution Some properties of convolution: Commutativity a ∗ b = b ∗ a Linearity a ∗ ( λ 1 b + λ 2 c ) = λ 1 a ∗ b + λ 2 a ∗ c UofT CSC2515 Lec9 21 / 63

2-D Convolution 2-D convolution is defined analogously to 1-D convolution. If A and B are two 2-D arrays, then: � � ( A ∗ B ) ij = A st B i − s , j − t . s t UofT CSC2515 Lec9 22 / 63

2-D Convolution Method 1: Translate-and-Scale UofT CSC2515 Lec9 23 / 63

2-D Convolution Method 2: Flip-and-Filter UofT CSC2515 Lec9 24 / 63

2-D Convolution The thing we convolve by is called a kernel, or filter. What does this filter do? 1 0 0 ∗ 1 4 1 0 1 0 UofT CSC2515 Lec9 25 / 63

2-D Convolution What does this filter do? 0 -1 0 ∗ 8 -1 -1 0 -1 0 UofT CSC2515 Lec9 26 / 63

2-D Convolution What does this filter do? 0 -1 0 ∗ 4 -1 -1 0 -1 0 UofT CSC2515 Lec9 27 / 63

2-D Convolution What does this filter do? 1 0 -1 ∗ 0 2 -2 1 0 -1 UofT CSC2515 Lec9 28 / 63

Convolutional Layer Figure: Left: CNN, right: Each neuron computes a linear and activation function Hyperparameters of a convolutional layer: The number of filters (controls the depth of the output volume) The stride : how many units apart do we apply a filter spatially (this controls the spatial size of the output volume) The size w × h of the filters [http://cs231n.github.io/convolutional-networks/] UofT CSC2515 Lec9 29 / 63

Pooling Options Max Pooling: return the maximal argument Average Pooling: return the average of the arguments Other types of pooling exist. UofT CSC2515 Lec9 30 / 63

Pooling Figure: Left: Pooling, right: max pooling example Hyperparameters of a pooling layer: The spatial extent F The stride [http://cs231n.github.io/convolutional-networks/] UofT CSC2515 Lec9 31 / 63

Backpropagation with Weight Constraints The backprop procedure from last lecture can be applied directly to conv nets. This is covered in csc2516. As a user, you don’t need to worry about the details, since they’re handled by automatic differentiation packages. UofT CSC2515 Lec9 32 / 63

MNIST Dataset MNIST dataset of handwritten digits ◮ Categories: 10 digit classes ◮ Source: Scans of handwritten zip codes from envelopes ◮ Size: 60,000 training images and 10,000 test images, grayscale, of size 28 × 28 ◮ Normalization: centered within in the image, scaled to a consistent size ◮ The assumption is that the digit recognizer would be part of a larger pipeline that segments and normalizes images. In 1998, Yann LeCun and colleagues built a conv net called LeNet which was able to classify digits with 98.9% test accuracy. ◮ It was good enough to be used in a system for automatically reading numbers on checks. UofT CSC2515 Lec9 33 / 63

LeNet Here’s the LeNet architecture, which was applied to handwritten digit recognition on MNIST in 1998: UofT CSC2515 Lec9 34 / 63

Questions? ? UofT CSC2515 Lec9 35 / 63

Size of a Conv Net Ways to measure the size of a network: ◮ Number of units. This is important because the activations need to be stored in memory during training (i.e. backprop). UofT CSC2515 Lec9 36 / 63

Size of a Conv Net Ways to measure the size of a network: ◮ Number of units. This is important because the activations need to be stored in memory during training (i.e. backprop). ◮ Number of weights. This is important because the weights need to be stored in memory, and because the number of parameters determines the amount of overfitting. UofT CSC2515 Lec9 36 / 63

Size of a Conv Net Ways to measure the size of a network: ◮ Number of units. This is important because the activations need to be stored in memory during training (i.e. backprop). ◮ Number of weights. This is important because the weights need to be stored in memory, and because the number of parameters determines the amount of overfitting. ◮ Number of connections. This is important because there are approximately 3 add-multiply operations per connection (1 for the forward pass, 2 for the backward pass). UofT CSC2515 Lec9 36 / 63

CSC2515 Lecture 9: Convolutional Networks Marzyeh Ghassemi - PowerPoint PPT Presentation

CSC2515 Lecture 9: Convolutional Networks Marzyeh Ghassemi Material and slides developed by Roger Grosse, University of Toronto UofT CSC2515 Lec9 1 / 63 Neural Nets for Visual Object Recognition People are very good at recognizing shapes

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

CSC2515 Lecture 6: Probabilistic Models Marzyeh Ghassemi Material and slides developed by Roger

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Convolutional Neural Networks in Speech Lecture 20 CS 753 Instructor: Preethi Jyothi

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

15-780 Graduate Artificial Intelligence: Convolutional and recurrent networks J. Zico Kolter

Semantic Segmentation of the sekleton in bone scintigraphy images with convolutional neural

and Inference for Convolutional Neural Networks 1 2 FFT IFFT 3 4 Mathieu et al.: Fast

Convolutional Neural Networks (Part III) 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image

Anytime Reliability of Systematic LDPC Motivation Convolutional Codes LDPC Convolutional Codes

Convolutional Autoencoder (CAE) Prof. Seungchul Lee Industrial AI Lab. Convolutional Autoencoder

Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC2515, Introduction to

Convolutional Networks Lecture slides for Chapter 9 of Deep Learning Ian Goodfellow 2016-09-12

Towards Interpretable Deep Learning for Natural Language Processing Roy Schwartz University of

Planting, Growing, and Pruning Trees: Connected Filters Applied to Document Image Analysis

Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks with Accurate Detector

Chapter Four: DFA Applications Formal Language, chapter 4, slide 1 1 We have seen how DFAs

Status LHC Collimation Phase I and Phase II Plans R. Assmann, CERN/AB 27/10/2008 for the

PRODUCTIVITY CORPORATION D R I V I N G P R O D U C T I V I T Y O F T H E N AT I O N KS#3

Seminar: Search and Optimization 2. Search Problems Florian Pommerening Universit at Basel

Selecting hybrid pine clone/s for deployment the pointy end of breeding for wood quality.

Sambuz

Useful Links

Newsletter

Mail Us

CSC2515 Lecture 9: Convolutional Networks Marzyeh Ghassemi - PowerPoint PPT Presentation

CSC2515 Lecture 9: Convolutional Networks Marzyeh Ghassemi Material and slides developed by Roger Grosse, University of Toronto UofT CSC2515 Lec9 1 / 63 Neural Nets for Visual Object Recognition People are very good at recognizing shapes

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

CSC2515 Lecture 6: Probabilistic Models Marzyeh Ghassemi Material and slides developed by Roger

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Convolutional Neural Networks in Speech Lecture 20 CS 753 Instructor: Preethi Jyothi

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Neural Networks 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

15-780 Graduate Artificial Intelligence: Convolutional and recurrent networks J. Zico Kolter

Semantic Segmentation of the sekleton in bone scintigraphy images with convolutional neural

and Inference for Convolutional Neural Networks 1 2 FFT IFFT 3 4 Mathieu et al.: Fast

Convolutional Neural Networks (Part III) 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image

Anytime Reliability of Systematic LDPC Motivation Convolutional Codes LDPC Convolutional Codes

Convolutional Autoencoder (CAE) Prof. Seungchul Lee Industrial AI Lab. Convolutional Autoencoder

Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC2515, Introduction to

Convolutional Networks Lecture slides for Chapter 9 of Deep Learning Ian Goodfellow 2016-09-12

Towards Interpretable Deep Learning for Natural Language Processing Roy Schwartz University of

Planting, Growing, and Pruning Trees: Connected Filters Applied to Document Image Analysis

Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks with Accurate Detector

Chapter Four: DFA Applications Formal Language, chapter 4, slide 1 1 We have seen how DFAs

Status LHC Collimation Phase I and Phase II Plans R. Assmann, CERN/AB 27/10/2008 for the

PRODUCTIVITY CORPORATION D R I V I N G P R O D U C T I V I T Y O F T H E N AT I O N KS#3

Seminar: Search and Optimization 2. Search Problems Florian Pommerening Universit at Basel

Selecting hybrid pine clone/s for deployment the pointy end of breeding for wood quality.

Sambuz

Useful Links

Newsletter

Mail Us

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Networks (Part III) 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image