Convolutional Neural Networks Computer Vision Jia-Bin Huang, - PowerPoint PPT Presentation

Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech

Today’s class • Overview • Convolutional Neural Network (CNN) • Training CNN • Understanding and Visualizing CNN

Image Categorization: Training phase Training Training Training Labels Images Image Classifier Trained Features Training Classifier

Image Categorization: Testing phase Training Training Training Labels Images Image Classifier Trained Features Training Classifier Testing Prediction Trained Image Classifier Features Outdoor Test Image

Features are the Keys HOG [Dalal and Triggs CVPR 05] SIFT [Loewe IJCV 04] SPM [Lazebnik et al. CVPR 06] DPM [Felzenszwalb et al. PAMI 10] Color Descriptor [Van De Sande et al. PAMI 10]

Learning a Hierarchy of Feature Extractors • Each layer of hierarchy extracts features from output of previous layer • All the way from pixels  classifier • Layers have the (nearly) same structure Labels Image/video Image/Video Simple Layer 1 Layer 2 Layer 3 Pixels Classifier

Biological neuron and Perceptrons A biological neuron An artificial neuron (Perceptron) - a linear classifier

Simple, Complex and Hypercomplex cells David H. Hubel and Torsten Wiesel Suggested a hierarchy of feature detectors in the visual cortex, with higher level features responding to patterns of activation in lower level cells, and propagating activation upwards to still higher level cells. David Hubel's Eye, Brain, and Vision

Hubel/Wiesel Architecture and Multi-layer Neural Network Hubel and Weisel’s architecture Multi-layer Neural Network - A non-linear classifier

Multi-layer Neural Network • A non-linear classifier • Training: find network weights w to minimize the error between true training labels 𝑧 𝑗 and estimated labels 𝑔 𝒙 𝒚 𝒋 • Minimization can be done by gradient descent provided 𝑔 is differentiable • This training method is called back-propagation

Convolutional Neural Networks • Also known as CNN, ConvNet, DCN • CNN = a multi-layer neural network with 1. Local connectivity 2. Weight sharing

CNN: Local Connectivity Hidden layer Input layer Global connectivity Local connectivity • # input units (neurons): 7 • # hidden units: 3 • Number of parameters – Global connectivity: 3 x 7 = 21 – Local connectivity: 3 x 3 = 9

CNN: Weight Sharing Hidden layer w 1 w 3 w 1 w 3 w 5 w 7 w 9 w 2 w 1 w 3 w 2 w 4 w 6 w 2 w 3 w 8 w 1 w 2 Input layer Without weight sharing With weight sharing • # input units (neurons): 7 • # hidden units: 3 • Number of parameters – Without weight sharing: 3 x 3 = 9 – With weight sharing : 3 x 1 = 3

CNN with multiple input channels Hidden layer Input layer Channel 1 Channel 2 Single input channel Multiple input channels Filter weights Filter weights

CNN with multiple output maps Hidden layer Map 1 Map 2 Input layer Single output map Multiple output maps Filter 1 Filter 2 Filter weights Filter weights

Putting them together • Local connectivity • Weight sharing • Handling multiple input channels • Handling multiple output maps Weight sharing Local connectivity # input channels # output (activation) maps Image credit: A. Karpathy

Neocognitron [Fukushima, Biological Cybernetics 1980] Deformation-Resistant Recognition S-cells: (simple) - extract local features C-cells: (complex) - allow for positional errors

LeNet [LeCun et al. 1998] Gradient-based learning applied to document recognition [LeCun, Bottou, Bengio, Haffner 1998] LeNet-1 from 1993

What is a Convolution? • Weighted moving sum . . . Feature Activation Map Input slide credit: S. Lazebnik

Convolutional Neural Networks Feature maps Normalization Spatial pooling Non-linearity Convolution (Learned) Input Image slide credit: S. Lazebnik

Convolutional Neural Networks Feature maps Normalization Spatial pooling Non-linearity . . Convolution . (Learned) Feature Map Input Input Image slide credit: S. Lazebnik

Convolutional Neural Networks Feature maps Normalization Rectified Linear Unit (ReLU) Spatial pooling Non-linearity Convolution (Learned) Input Image slide credit: S. Lazebnik

Convolutional Neural Networks Feature maps Normalization Max pooling Spatial pooling Non-linearity Max-pooling: a non-linear down-sampling Convolution (Learned) Provide translation invariance Input Image slide credit: S. Lazebnik

Convolutional Neural Networks Feature maps Normalization Feature Maps Spatial pooling Feature Maps After Contrast Normalization Non-linearity Convolution (Learned) Input Image slide credit: S. Lazebnik

Convolutional Neural Networks Feature maps Normalization Spatial pooling Non-linearity Convolution (Learned) Input Image slide credit: S. Lazebnik

Engineered vs. learned features Label Convolutional filters are trained in a Dense supervised manner by back-propagating classification error Dense Dense Convolution/pool Label Convolution/pool Classifier Convolution/pool Pooling Convolution/pool Feature extraction Convolution/pool Image Image

Gradient-Based Learning Applied to Document Recognition , LeCun, Bottou, Bengio and Haffner, Proc. of the IEEE, 1998 Imagenet Classification with Deep Convolutional Neural Networks , Krizhevsky, Sutskever, and Hinton, NIPS 2012 Slide Credit: L. Zitnick

Gradient-Based Learning Applied to Document Recognition , LeCun, Bottou, Bengio and Haffner, Proc. of the IEEE, 1998 Imagenet Classification with Deep Convolutional Neural * Rectified activations and dropout Networks , Krizhevsky, Sutskever, and Hinton, NIPS 2012 Slide Credit: L. Zitnick

SIFT Descriptor Lowe [IJCV 2004] Image Apply gradient Pixels filters Spatial pool (Sum) Feature Normalize to unit Vector length

SIFT Descriptor Lowe [IJCV 2004] Image Apply Pixels oriented filters Spatial pool (Sum) Feature Normalize to unit Vector length slide credit: R. Fergus

Spatial Pyramid Matching Lazebnik, Schmid, SIFT Ponce Filter with Features [CVPR 2006] Visual Words Max Multi-scale spatial pool Classifier (Sum) slide credit: R. Fergus

Deformable Part Model Deformable Part Models are Convolutional Neural Networks [Girshick et al. CVPR 15]

AlexNet • Similar framework to LeCun’98 but: • Bigger model (7 hidden layers, 650,000 units, 60,000,000 params) More data (10 6 vs. 10 3 images) • • GPU implementation (50x speedup over CPU) • Trained on two GPUs for a week A. Krizhevsky, I. Sutskever, and G. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012

Using CNN for Image Classification Fully connected layer Fc7 d = 4096 AlexNet Averaging Fixed input size: 224x224x3 “Jia - Bin” Softmax d = 4096 Layer

Progress on ImageNet 15 ImageNet Image Classification Top5 Error 10 5 2012 2013 2014 2014 2016 2015 AlexNet ZF VGG GoogLeNet ResNet GoogLeNet-v4

VGG-Net • The deeper, the better • Key design choices: – 3x3 conv. Kernels - very small – conv. stride 1 - no loss of information • Other details: – Rectification (ReLU) non-linearity – 5 max-pool layers (x2 reduction) – no normalization – 3 fully-connected (FC) layers

VGG-Net • Why 3x3 layers? – Stacked conv. layers have a large receptive field – two 3x3 layers – 5x5 receptive field – three 3x3 layers – 7x7 receptive field • More non-linearity – Less parameters to learn – ~140M per net

ResNet • Can we just increase the #layer? • How can we train very deep network? - Residual learning

DenseNet • Shorter connections (like ResNet) help • Why not just connect them all?

Training Convolutional Neural Networks • Backpropagation + stochastic gradient descent with momentum – Neural Networks: Tricks of the Trade • Dropout • Data augmentation • Batch normalization • Initialization – Transfer learning

Training CNN with gradient descent • A CNN as composition of functions 𝑔 𝒙 𝒚 = 𝑔 𝑀 (… (𝑔 2 𝑔 1 𝒚; 𝒙 1 ; 𝒙 2 … ; 𝒙 𝑀 ) • Parameters 𝒙 = (𝒙 𝟐 , 𝒙 𝟑 , … 𝒙 𝑴 ) • Empirical loss function 𝑀 𝒙 = 1 𝑜 ෍ 𝑚(𝑨 𝑗 , 𝑔 𝒙 (𝒚 𝒋 )) 𝑗 • Gradient descent 𝜖𝒈 𝒙 𝒖+𝟐 = 𝒙 𝒖 − 𝜃 𝑢 𝜖𝒙 (𝒙 𝒖 ) New weight Old weight Learning rate Gradient

An Illustrative example 𝜖𝑔 𝜖𝑦 = 𝑧, 𝜖𝑔 𝑔 𝑦, 𝑧 = 𝑦𝑧, 𝜖𝑧 = 𝑦 Example: 𝑦 = 4, 𝑧 = −3 ⇒ 𝑔 𝑦, 𝑧 = −12 Partial derivatives Gradient 𝜖𝑔 𝜖𝑔 𝛼𝑔 = [ 𝜖𝑔 𝜖𝑦 , 𝜖𝑔 𝜖𝑦 = −3, 𝜖𝑧 = 4 𝜖𝑧] Example credit: Andrej Karpathy

𝑔 𝑦, 𝑧, 𝑨 = 𝑦 + 𝑧 𝑨 = 𝑟𝑨 𝑟 = 𝑦 + 𝑧 𝑔 = 𝑟𝑨 𝜖𝑟 𝜖𝑟 𝜖𝑔 𝜖𝑔 𝜖𝑦 = 1, 𝜖𝑧 = 1 𝜖𝑟 = 𝑨, 𝜖𝑨 = 𝑟 Goal: compute the gradient 𝛼𝑔 = [ 𝜖𝑔 𝜖𝑦 , 𝜖𝑔 𝜖𝑧 , 𝜖𝑔 𝜖𝑨] Example credit: Andrej Karpathy

Convolutional Neural Networks Computer Vision Jia-Bin Huang, - PowerPoint PPT Presentation

Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Todays class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization: Training phase Training

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Nets 4-25-16 Reading Quiz Convolutional neural networks are most commonly

Neural Network Part 3: Convolutional Neural Networks CS 760@UW-Madison Goals for the lecture

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Semantic Segmentation of the sekleton in bone scintigraphy images with convolutional neural

Convolutional Neural Networks in Speech Lecture 20 CS 753 Instructor: Preethi Jyothi

Convolutional Neural Networks (Part III) 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image

MICROBOONE Taritree Wongjirad DPF 2017 Tufts/MIT Outline Convolutional neural networks

Neural Networks + Convolutional Neural Networks Last Class Global Features The perceptron

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Convolutional Neural Networks II Milan Straka March 30, 2020 Charles University in Prague

Announcements Class is 170. Matlab Grader homework, 1 and 2 (of less than 9) homeworks Due 22

CNN Case Studies M. Soleymani Sharif University of Technology Fall 2017 Slides are based on Fei

Learning Based Vision II Computer Vision Fall 2018 Columbia University Project Project

Convolutional Neural Networks Rachel Hu and Zhi Zhang Amazon AI d2l.ai Outline GPUs

Statistical Machine Translation Clause Restructuring for

Restructuring The Care Delivery System: Will Limited Growth In Revenues Force A Change? Stuart H.

Programming for Engineers Iteration ICEN 200 Spring 2018 Prof. Dola Saha 1 Data

Convolutional Neural Networks Computer Vision Jia-Bin Huang, - PowerPoint PPT Presentation

Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Todays class Overview Convolutional Neural Network (CNN) Training CNN Understanding and Visualizing CNN Image Categorization: Training phase Training

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Neural Networks 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Nets 4-25-16 Reading Quiz Convolutional neural networks are most commonly

Neural Network Part 3: Convolutional Neural Networks CS 760@UW-Madison Goals for the lecture

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Semantic Segmentation of the sekleton in bone scintigraphy images with convolutional neural

Convolutional Neural Networks in Speech Lecture 20 CS 753 Instructor: Preethi Jyothi

Convolutional Neural Networks (Part III) 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image

MICROBOONE Taritree Wongjirad DPF 2017 Tufts/MIT Outline Convolutional neural networks

Neural Networks + Convolutional Neural Networks Last Class Global Features The perceptron

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Convolutional Neural Networks II Milan Straka March 30, 2020 Charles University in Prague

Announcements Class is 170. Matlab Grader homework, 1 and 2 (of less than 9) homeworks Due 22

CNN Case Studies M. Soleymani Sharif University of Technology Fall 2017 Slides are based on Fei

Learning Based Vision II Computer Vision Fall 2018 Columbia University Project Project

Convolutional Neural Networks Rachel Hu and Zhi Zhang Amazon AI d2l.ai Outline GPUs

Statistical Machine Translation Clause Restructuring for

Restructuring The Care Delivery System: Will Limited Growth In Revenues Force A Change? Stuart H.

Programming for Engineers Iteration ICEN 200 Spring 2018 Prof. Dola Saha 1 Data

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

Convolutional Neural Networks (Part III) 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image