Lecture 18: Concluding Convolutional Neural Networks, Graphical - PowerPoint PPT Presentation

Lecture 18: Concluding Convolutional Neural Networks, Graphical Models as Foundation for Recurrent Neural Networks and Bayesian Networks Reference: We will be referring to sections etc of ‘Deep Learning’ by Yoshua Bengio, Ian J. Goodfellow and Aaron Courville https://youtu.be/4PtaZVUbilI?list=PLyo3HAXSZD3zfv9O-y9DJhvrWQPscqATa&t=1187

The Lego Blocks in Modern Deep Learning Depth/Feature Map [Eg: Red, Green and Blue feature 1 maps] Patches/Filters (provide for spatial interpolations) 2 Non-linear Activation unit (provided for 3 detection/classi fi cation) Strides (enable downsampling) 4 Padding (shrinking across layers) 5 Pooling (non-linear downsampling) 6 Inception [Optional: Extra slides] 7 RNN, Attention and LSTM (Backpropagation through time 8 and Memory cell) [Optional: Extra slides] Embeddings (Unsupervised learning) [Optional: Extra slides] 9

Convolution: Sparse Interactions through Filters K ( . ) (for Single Feature Map) input/( l − 1) th layer l th layer x 5 w l h 5 55 w l 45 w l w l 45 54 x 4 w l h 4 44 w l 34 w l w l 34 43 x 3 w l h 3 33 w l 23 w l w l 23 32 x 2 w l h 2 22 w l 12 w l w l 12 21 x 1 w l h 1 11

Convolution: Sparse Interactions through Filters K ( . ) (for Single Feature Map) input/( l − 1) th layer l th layer � h i = x m w mi K ( i − m ) x 5 w l h 5 55 m On RHS, K ( i − m ) = 1 i ff | m − i | ≤ 1 w l 45 w l w l 45 54 For 2-D inputs (such as images): x 4 w l h 4 44 w l 34 w l w l 34 43 x 3 w l h 3 33 w l 23 w l w l 23 32 x 2 w l h 2 22 w l 12 w l w l 12 21 x 1 w l h 1 11

Convolution: Sparse Interactions through Filters K ( . ) (for Single Feature Map) input/( l − 1) th layer l th layer � h i = x m w mi K ( i − m ) x 5 w l h 5 55 m On RHS, K ( i − m ) = 1 i ff | m − i | ≤ 1 w l 45 w l w l 45 54 For 2-D inputs (such as images): x 4 w l h 4 � � 44 h ij = x mn w ij , mn K ( i − m , j − n ) m n w l 34 w l w l 34 43 Intuition: Neighboring signals x m (or pixels x mn ) more relevant than one’s x 3 w l h 3 33 further away, reduces prediction time Can be viewed as multiplication with a w l 23 w l w l 23 32 Toeplitz matrix K (which has each row as the row above shifted by one element) x 2 w l h 2 22 Further, K is sparse wrt parameter θ (eg: w l 12 w l w l K ( i − m ) = 1 i ff | m − i | ≤ θ ) 12 21 x 1 w l h 1 11

Convolution: Shared parameters and Patches (for Single Feature Map) input/( l − 1) th layer l th layer x 5 w l h 5 0 w l 1 w l w l − 1 1 x 4 w l h 4 0 w l 1 w l w l 1 − 1 x 3 w l h 3 0 w l 1 w l w l 1 − 1 x 2 w l h 2 0 w l 1 w l w l − 1 1 x 1 w l h 1 0

Convolution: Shared parameters and Patches (for Single Feature Map) input/( l − 1) th layer l th layer � h i = x m w i − m K ( i − m ) m x 5 w l h 5 0 On LHS, K ( i − m ) = 1 i ff | m − i | ≤ 1 w l 1 w l w l For 2-D inputs (such as images): − 1 1 x 4 w l h 4 0 w l 1 w l w l 1 − 1 x 3 w l h 3 0 w l 1 w l w l 1 − 1 x 2 w l h 2 0 w l 1 w l w l − 1 1 x 1 w l h 1 0

Convolution: Shared parameters and Patches (for Single Feature Map) input/( l − 1) th layer l th layer � h i = x m w i − m K ( i − m ) m x 5 w l h 5 0 On LHS, K ( i − m ) = 1 i ff | m − i | ≤ 1 w l 1 w l w l For 2-D inputs (such as images): − 1 1 � � h ij = x mn w i − m , j − n K ( i − m , j − n ) x 4 w l h 4 0 m n Intuition: Neighboring signals x m (or w l 1 w l w l 1 − 1 pixels x mn ) a ff ect in similar way irrespective of location ( i.e. , value of m or x 3 w l h 3 0 n ) More Intuition: Corresponds to moving w l 1 w l w l 1 − 1 patches around the image x 2 w l h 2 Further reduces storage requirement; does 0 not a ff ect prediction time w l 1 w l w l − 1 Further, K is often sparse (eg: 1 K ( i − m ) = 1 i ff | m − i | ≤ θ ) x 1 w l h 1 0

Convolution: Strides and Padding (for Single Feature Map) input/( l − 1) th layer l th layer x 5 w l h 5 0 w l 1 w l w l − 1 1 x 4 w l h 4 0 w l 1 w l w l 1 − 1 x 3 w l h 3 0 w l 1 w l w l 1 − 1 x 2 w l h 2 0 w l 1 w l w l − 1 1 x 1 w l h 1 0

Convolution: Strides and Padding (for Single Feature Map) input/( l − 1) th layer l th layer Consider only h i ’s where i is a multiple of x 5 w l h 5 0 s . w l 1 w l w l Intuition: Stride of s corresponds to − 1 1 moving the patch by s steps at a time x 4 w l h 4 0 More Intuition: Stride of s corresponds to downsampling by s w l 1 w l w l 1 − 1 What to do at the corners? x 3 w l h 3 0 w l 1 w l w l 1 − 1 x 2 w l h 2 0 w l 1 w l w l − 1 1 x 1 w l h 1 0

Convolution: Strides and Padding (for Single Feature Map) input/( l − 1) th layer l th layer Consider only h i ’s where i is a multiple of x 5 w l h 5 0 s . w l 1 w l w l Intuition: Stride of s corresponds to − 1 1 moving the patch by s steps at a time x 4 w l h 4 0 More Intuition: Stride of s corresponds to downsampling by s w l 1 w l w l 1 − 1 What to do at the corners? Ans: Pad x 3 w l h 3 with 0 ’s at the edges to create output of 0 same size as input (same padding) or w l 1 w l w l 1 − 1 have no padding at all and let the next layer have fewer nodes (valid) x 2 w l h 2 0 Reduces storage requirement as well as w l 1 w l w l prediction time − 1 1 x 1 w l h 1 0

Examples of Convolutional Filters: Guess what each does +1 0 -1 +2 0 -2 +1 0 -1 5 Also referred to as kernels, but not to be confused with the positive semi-de fi nite kernel

Examples of Convolutional Filters: Guess what each does +1 0 -1 +2 0 -2 +1 0 -1 Sobel Vertical edge detector +1 +2 +1 0 0 0 -1 -2 -1 5 Also referred to as kernels, but not to be confused with the positive semi-de fi nite kernel

Examples of Convolutional Filters: Guess what each does +1 0 -1 1/9 1/9 1/9 +2 0 -2 1/9 1/9 1/9 +1 0 -1 1/9 1/9 1/9 Sobel Vertical edge detector Image blurring fi lter +1 +2 +1 0 -1 0 0 0 0 -1 3 -1 -1 -2 -1 0 -1 0 Sobel Horizontal edge detector Image sharpening fi lter Illustration at https://www.saama.com/blog/different-kinds-convolutional-filters/ In CNNs, these fi lters 5 ( i.e. weights w i − m , j − n ) are generally learnt from the data. Filter size ⇒ Strong prior, Filter value ⇒ Posterior 5 Also referred to as kernels, but not to be confused with the positive semi-de fi nite kernel

The Convolutional Filter

Question: MLP Vs CNN Convolution leverages three important ideas that can help improve a machine learning system: (a) sparse interactions, (b) parameter sharing and (c) equivariant representations: f ( g ( x )) = g ( f ( x )) when f is convolution and g is shift function. We just saw these in action:

Question: MLP Vs CNN Convolution leverages three important ideas that can help improve a machine learning system: (a) sparse interactions, (b) parameter sharing and (c) equivariant representations: f ( g ( x )) = g ( f ( x )) when f is convolution and g is shift function. We just saw these in action: Input Image Size: 200 × 200 × 3 MLP : Hidden Layer has 40k neurons, resulting in 4.8 billion parameters. CNN : Say, hidden layer has 20 feature-maps each of size 5 X 5 X 3 with stride = 1 and zero padding of 4 on each side, i.e. , maximum overlapping of convolution windows. A feature map corresponds to one set of weights w l ij . F feature maps ⇒ F times the number of weight parameters Question : How many parameters? Answer : Question : How many neurons (location speci fi c)? Answer :

Answer: MLP Vs CNN MLP : Hidden Layer has 40k neurons, so it has 4800000 parameters. CNN : Hidden layer has 20 feature-maps each of size 5 X 5 X 3 with stride = 1, and zero padding of 4 on each side, i.e. , maximum overlapping of convolution windows. Question : How many parameters? Answer : Just 1500 Question : How many neurons (location speci fi c)? Let M × N × 3 be dimension of image and P × Q × 3 be dimension of fi lter for convolution. Let D be number of zero paddings and s be stride length. Answer : Output size =

Answer: MLP Vs CNN MLP : Hidden Layer has 40k neurons, so it has 4800000 parameters. CNN : Hidden layer has 20 feature-maps each of size 5 X 5 X 3 with stride = 1, and zero padding of 4 on each side, i.e. , maximum overlapping of convolution windows. Question : How many parameters? Answer : Just 1500 Question : How many neurons (location speci fi c)? Let M × N × 3 be dimension of image and P × Q × 3 be dimension of fi lter for convolution. Let D be number of zero paddings and s be stride length. � � � � M − P +2 D N − Q +2 D Answer : Output size = + 1 × + 1 . s s � � � � M + P N + Q In current case, D = P − 1 ⇒ Output size = − 1 × − 1 . s s 20 × ((200 + 5) / s ) − 1) × ((200 + 5) / s ) − 1) = 832320 (around 830 thousand which can increase with max-pooling). If D = ( P − 1) / 2 and S = 1 ,

Lecture 18: Concluding Convolutional Neural Networks, Graphical - PowerPoint PPT Presentation

Lecture 18: Concluding Convolutional Neural Networks, Graphical Models as Foundation for Recurrent Neural Networks and Bayesian Networks Reference: We will be referring to sections etc of Deep Learning by Yoshua Bengio, Ian J. Goodfellow

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Nets 4-25-16 Reading Quiz Convolutional neural networks are most commonly

Neural Network Part 3: Convolutional Neural Networks CS 760@UW-Madison Goals for the lecture

Convolutional Neural Networks in Speech Lecture 20 CS 753 Instructor: Preethi Jyothi

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Semantic Segmentation of the sekleton in bone scintigraphy images with convolutional neural

Convolutional Neural Networks (Part III) 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image

Lecture 11: Neural Networks (Part 3) March 2nd, 2020 Lecturer: Steven Wu Scribe: Steven Wu 1

MICROBOONE Taritree Wongjirad DPF 2017 Tufts/MIT Outline Convolutional neural networks

Neural Networks + Convolutional Neural Networks Last Class Global Features The perceptron

Untagging Tor: A Tale of Onions, Raccoons, and Security Definitions Jean Paul Degabriele Martijn

Optimizer for Timing Closure Yi-Shan Lu 1 , Wenmian Hua 2 , Rajit Manohar 2 , Keshav Pingali 1 1

Neural Network Basics Part II Content Image-to-image Why fully convolutional?

PHP Summary PHP tags <?php ?> Mixed with HTML tags File extension .php

alignment, arrays, and pointers hic 1 allocation of multiple variables Consider the program

Efficient Top-K Query Processing on Massively Parallel Hardware ANIL SHANBHAG, HOLGER PIRK, SAM

Embperl - How to Build Large Scale Websites/Webapplications With Perl ApacheCon 2002 Gerald

Lower Bounds for Encrypted Multi-Maps and Searchable Encryption in the Leakage Cell Probe Model

Lecture 18: Concluding Convolutional Neural Networks, Graphical - PowerPoint PPT Presentation

Lecture 18: Concluding Convolutional Neural Networks, Graphical Models as Foundation for Recurrent Neural Networks and Bayesian Networks Reference: We will be referring to sections etc of Deep Learning by Yoshua Bengio, Ian J. Goodfellow

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Neural Networks 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Nets 4-25-16 Reading Quiz Convolutional neural networks are most commonly

Neural Network Part 3: Convolutional Neural Networks CS 760@UW-Madison Goals for the lecture

Convolutional Neural Networks in Speech Lecture 20 CS 753 Instructor: Preethi Jyothi

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Semantic Segmentation of the sekleton in bone scintigraphy images with convolutional neural

Convolutional Neural Networks (Part III) 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image

Lecture 11: Neural Networks (Part 3) March 2nd, 2020 Lecturer: Steven Wu Scribe: Steven Wu 1

MICROBOONE Taritree Wongjirad DPF 2017 Tufts/MIT Outline Convolutional neural networks

Neural Networks + Convolutional Neural Networks Last Class Global Features The perceptron

Untagging Tor: A Tale of Onions, Raccoons, and Security Definitions Jean Paul Degabriele Martijn

Optimizer for Timing Closure Yi-Shan Lu 1 , Wenmian Hua 2 , Rajit Manohar 2 , Keshav Pingali 1 1

Neural Network Basics Part II Content Image-to-image Why fully convolutional?

PHP Summary PHP tags &lt;?php ?&gt; Mixed with HTML tags File extension .php

alignment, arrays, and pointers hic 1 allocation of multiple variables Consider the program

Efficient Top-K Query Processing on Massively Parallel Hardware ANIL SHANBHAG, HOLGER PIRK, SAM

Embperl - How to Build Large Scale Websites/Webapplications With Perl ApacheCon 2002 Gerald

Lower Bounds for Encrypted Multi-Maps and Searchable Encryption in the Leakage Cell Probe Model

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Networks (Part III) 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image

PHP Summary PHP tags <?php ?> Mixed with HTML tags File extension .php