Deep Neural Networks II Sen Wang UDRC Co-I – WP3.1 and WP3.2 Assistant Professor in Robotics and Autonomous Systems Institute of Signals, Sensors and Systems Heriot-Watt University UDRC-EURASIP Summer School 26th June 2019 - Edinburgh Slides adapted from Andrej Karpathy, Kaiming He
Outline Learning features for machines to solve problems Convolutional Neural Networks (CNNs) • Deep Learning Architectures (focus on CNNs) - learning features • Some Deep Learning Applications - problems • o Object detection (image, radar, sonar) o Semantic segmentation o Visual odometry o 3D reconstruction o Semantic mapping o Robot navigation o Manipulation and grasping o ………… UDRC-EURASIP Summer School 1
Deep Learning Deep Learning: a learning technique combining layers of neural networks to automatically identify features that are relevant to the problem to solve Training (supervised learning) forward prediction error label ⊗ backward big data data label Testing forward prediction Low-level Middle-level High-level Raw Data test data trained DNN Features Features Features UDRC-EURASIP Summer School 2
Deep Learning in Robotics ICRA2018 ~2500 submissions: the most popular keyword IJRR 2016 IJCV 2018 UDRC-EURASIP Summer School 3
Deep Learning in Robotics UDRC-EURASIP Summer School 4
Convolutional Neural Networks (CNNs) UDRC-EURASIP Summer School 5
From MLPs to CNNs • Feed-forward Neural Networks or Multi-Layer Perceptrons (MLPs) o many multiplications • CNNs are similar to Feed-forward Neural Networks o convolution instead of general matrix multiplication UDRC-EURASIP Summer School 6
CNNs fully-connected layer ………. • 3 Main Types of Layers: o convolutional layer pooling layer o activation layer o pooling layer activation layer • repeat many times convolutional layer input layer UDRC-EURASIP Summer School 7
CNNs: Convolution Layer 32x32x3 image 5x5x3 filter height 32 Convolve the filter with the image i.e. “slide over the image spatially, computing dot products” width 32 depth 3 Slides courtesy of Andrej Karpathy UDRC-EURASIP Summer School 8
CNNs: Convolution Layer Filters always extend the full depth of the input volume 32x32x3 image 5x5x3 filter 32 Convolve the filter with the image i.e. “slide over the image spatially, computing dot products” 32 3 UDRC-EURASIP Summer School 9
CNNs: Convolution Layer 32x32x3 image 5x5x3 filter 32 1 number: the result of taking a dot product between the filter and a small 5x5x3 chunk of the image 32 (i.e. 5*5*3 = 75-dimensional dot product + bias) 2 important ideas: 3 • local connectivity • parameter sharing UDRC-EURASIP Summer School 10
CNNs: Convolution Layer activation map 32x32x3 image 5x5x3 filter 32 28 convolve (slide) over all spatial locations 28 32 3 1 UDRC-EURASIP Summer School 11
CNNs: Convolution Layer consider a second, green filter activation maps 32x32x3 image 5x5x3 filter 32 28 convolve (slide) over all spatial locations 28 32 3 1 UDRC-EURASIP Summer School 12
CNNs: Convolution Layer For example, if we had 6 of 5x5 filters, we’ll get 6 separate activation maps: activation maps 32 28 Convolution Layer 28 32 6 3 We stack these up to get a “new image” of size 28x28x6 UDRC-EURASIP Summer School 13
CNNs: Convolution Layer For example, if we had 6 of 5x5 filters, we’ll get 6 separate activation maps: activation maps 32 28 Convolution Layer 28 32 3 6 We processed [32x32x3] volume into [28x28x6] volume. Q: how many parameters would this be if we used a fully connected layer instead? courtesy of Andrej Karpathy UDRC-EURASIP Summer School 14
CNNs: Convolution Layer For example, if we had 6 of 5x5 filters, we’ll get 6 separate activation maps: activation maps 32 28 Convolution Layer 28 32 3 6 We processed [32x32x3] volume into [28x28x6] volume. Q: how many parameters would this be if we used a fully connected layer instead? A: (32*32*3)*(28*28*6) = 14.5M parameters , ~ 14.5M multiplies UDRC-EURASIP Summer School 15
CNNs: Convolution Layer For example, if we had 6 of 5x5 filters, we’ll get 6 separate activation maps: activation maps 32 28 Convolution Layer 28 32 3 6 We processed [32x32x3] volume into [28x28x6] volume. Q: how many parameters are used instead? UDRC-EURASIP Summer School 16
CNNs: Convolution Layer For example, if we had 6 of 5x5 filters, we’ll get 6 separate activation maps: activation maps 32 28 Convolution Layer 28 32 3 6 We processed [32x32x3] volume into [28x28x6] volume. Q: how many parameters are used instead? --- And how many multiplies? A: (5*5*3)*6 = 450 parameters UDRC-EURASIP Summer School 17
CNNs: Convolution Layer For example, if we had 6 of 5x5 filters, we’ll get 6 separate activation maps: activation maps 32 28 Convolution Layer 2 Merits: • vastly reduce the amount of parameters • more efficient 28 32 3 6 We processed [32x32x3] volume into [28x28x6] volume. Q: how many parameters are used instead? A: (5*5*3)*6 = 450 parameters , (5*5*3)*(28*28*6) = ~350K multiplies UDRC-EURASIP Summer School 18
CNNs: Activation Layer • 3 Main Types of Layers: fully-connected layer o convolutional layer o activation layer ………. o pooling layer pooling layer activation layer convolutional layer input layer UDRC-EURASIP Summer School 19
CNNs: Pooling Layer • 3 Main Types of Layers: fully-connected layer o convolutional layer o activation layer ………. o pooling layer pooling layer • repeat many times activation layer convolutional layer makes the representations smaller and more manageable input layer UDRC-EURASIP Summer School 20
CNNs: A sequence of Convolutional Layers 32 28 32 28 24 …. CONV, CONV, CONV, CONV, ReLU ReLU ReLU ReLU e.g. 6 e.g. 6 e.g. 10 5x5x3 5x5x3 5x5x 6 32 28 32 28 24 filters filters filters 3 3 6 6 10 UDRC-EURASIP Summer School 21
Deep Learning Architectures UDRC-EURASIP Summer School 22
Hand-Crafted Features by Human Feature Extraction Pervasive Data Inference (hand-crafted) Activities, Context, … time-series data Locations, Scene types, Semantics, … vision Objects, Structure, … point cloud UDRC-EURASIP Summer School 23
Feature Engineering and Representation Pervasive Data time-series data 256 3 x800x600 Raw data vision ≈ Bad Representation 2 ?x?x? point cloud UDRC-EURASIP Summer School 24
Deep Learning: Representation Learning Pervasive Data End-to-End Learning Inference time-series data Activities, Context, … Locations, Scene types, … Structure, vision Semantics, … automatically learn effective point cloud feature representation to solve the problem UDRC-EURASIP Summer School 25
LeNet - 1998 Convolution: • o locally-connected o spatially weight-sharing Foundation of modern ConvNets! weight-sharing is a key in DL • Subsampling • Fully-connected outputs • “Gradient-based learning applied to document recognition”, LeCun et al. 1998 UDRC-EURASIP Summer School 26
AlexNet – 2012 8 layers: 5 conv and max-pooling + 3 fully-connected LeNet-style backbone, plus: ReLU • o Accelerate training o better gradprop (vs. tanh) Dropout • o Reduce overfitting Data augmentation • o Image transformation o Reduce overfitting “ImageNet Classification with Deep Convolutional Neural Networks”, Krizhevsky, Sutskever, Hinton. NIPS 2012 UDRC-EURASIP Summer School 27
VGG16/19 - 2014 Very deep ConvNet Modularized design • 3x3 Conv as the module • Stack the same module • Same computation for each module Stage-wise training • VGG-11 => VGG-13 => VGG-16 “Very Deep Convolutional Networks for Large-Scale Image Recognition”, Simonyan & Zisserman. arXiv 2014 (ICLR 2015) UDRC-EURASIP Summer School 28
GoogleNet/Inception - 2014 22 layers Multiple branches • e.g., 1x1, 3x3, 5x5, pooling • merged by concatenation • Reduce dimensionality by 1x1 before expensive 3x3/5x5 conv Szegedy et al. “Going deeper with convolutions”. arXiv 2014 (CVPR 2015) UDRC-EURASIP Summer School 29
Going Deeper Simply stacking layers? • Plain nets: stacking 3x3 conv layers • 56-layer net has higher training error and test error than 20-layer net • A deeper model should not have higher training error UDRC-EURASIP Summer School 30
Going Deeper Cannot go deeper for deep neural networks! Problem: deeper plain nets have higher training error on various datasets Optimization difficulties: o vanishing gradient o solvers struggle to find the solution when going deeper Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”. CVPR 2016. UDRC-EURASIP Summer School 31
ResNets-2016 Plain net Residual net gradients can flow directly through the skip connections backwards Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”. CVPR 2016. UDRC-EURASIP Summer School 32
ResNets-2016 • Deep ResNets can be trained easier • Deeper ResNets have lower training error, and also lower test error Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”. CVPR 2016. UDRC-EURASIP Summer School 33
ImageNet experiments top 5 error % UDRC-EURASIP Summer School 34
Recommend
More recommend