Neural Network Basics Part II 冯远滔
Content • Image-to-image • Why fully convolutional? • Fully Convolutional Networks (FCN) • Up-sampling • Network architecture • Recurrent Neural Networks • Sequence data and representation • RNNs model: Forward & backward • Different types of RNNs • LSTM unit • Deep Learning Frameworks • Deep learning frameworks & popularity • Data representation • Typical training steps • Model convertors • Standard model format 2
Image-to-Image
Why fully convolutional? • Detection: class image deep CNNs bbox One- stage: YOLO, SSD, … input output Two-stage: Faster R- CNN,… 4
Why fully convolutional? • Graphics & … : image volume ? image deep CNNs 3D mesh ~15’: AlexNet , VGG, … …… input output with fully connected ✖ layer 5
Fixed input size in NNs with FC layers • Fully connected layers in VGG-16: fully connected layer 6 vector output conv layers & flatten pooling layers ⋮ … feature image ⋮ map 7, 7, 512 𝑔 𝑌 = 𝑋 𝑈 𝑌 + 𝑐 224, 224, 3 ⋮ 𝑌 1, 7 × 7 × 512 𝑋 7 × 7 × 512, 4096 6
Fully Connected vs. Fully Convolutional Fully connected Fully convolutional Input size ✖ Fixed ✔ Any Computation ✖ Intensive ✔ Less intensive Spatial information ✖ Lost ✔ Preserved Computation in AlexNet: - Weights: Conv layers 90% : FC layers 10% - Computation: Conv layers 10% : FC layers 90% Spatial information: - Conv layers: Volume -> Volume - FC layers: Volume -> Vector 7
Fully Convolutional Networks Questions: 1. Up-sampling? 2. Original Size? J. Long, et al, Fully Convolutional Networks for Semantic Segmentation, 2014 8
How to do up-sampling? (i, j) (i+1, j) 𝑣 < 0.5 𝑣 ≥ 0.5 • Interpolation 𝑤 < 0.5 𝑤 < 0.5 • Nearest neighbor interpolation • Linear interpolation 𝑣 < 0.5 𝑣 ≥ 0.5 • Bi-linear interpolation 𝑤 ≥ 0.5 𝑤 ≥ 0.5 • Bi-cubic interpolation (i, j+1) (i+1, j+1) • Drawbacks for interpolation variants • Manual feature engineering (𝑦 𝑐 , 𝑧 𝑐 ) • Nothing for the networks to learn (𝑦, 𝑧) (𝑦 𝑏 , 𝑧 𝑏 ) 9
How to do up-sampling? • Padding with zeros/Un-pooling Matthew D. Zeiler, et al, Visualizing and Understanding Convolutional Networks, 2013 10
How to do up-sampling? • Transpose convolution 11
Transpose Convolution • Convolutions: • Input ( 𝑜 × 𝑜 ) • 4 × 4 feature map • Kernel ( 𝑔 × 𝑔 , 𝑞 , 𝑡 ) • 3 × 3 kernel • 0 padding • 1 stride • Output • 2 × 2 feature map • Output size 𝑜+2𝑞−𝑔 • 𝑔𝑚𝑝𝑝𝑠 + 1 𝑡 12
Transpose Convolution • Going backward of convolution 𝑍 = 𝐷𝑌 𝐷 𝑈 𝑍 = 𝑌 13
Transpose Convolution 1 0 2 • Convolutional matrix for 𝑍 = 𝐷𝑌 1 4 1 0 • Kernel 3 × 3 1 4 3 1 • Padding 0 3 3 1 • Stride 1 2 Kernel (3, 3) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 4 1 0 1 4 3 0 3 3 1 0 0 0 0 0 0 0 1 4 1 0 1 4 3 0 3 3 1 0 0 0 0 1 0 0 0 0 1 4 1 0 1 4 3 0 3 3 1 0 2 0 0 0 0 0 1 4 1 0 1 4 3 0 3 3 1 3 Convolution matrix 𝐷 (4, 16) 14
4 0 Transpose Convolution 5 1 8 2 • Flatten the input matrix 7 3 4, 4 -> (16, 1) • 1 4 8 5 8 0 1 2 3 6 4 5 8 7 8 0 7 Flatten 1 8 8 8 3 1 8 Flattened input matrix 𝑌 (16, 1) 6 3 6 6 4 9 2 6 6 5 7 8 10 3 4 11 Input matrix (4, 4) 6 12 5 13 7 14 8 15 15
Transpose Convolution • Perform 'convolution' and resize • 𝐷𝑌 = 𝑍 • Resize 𝑌 112 112 148 148 Resize 126 126 134 134 Output 𝑍 (2, 2) Output (4, 1) 16
Transpose convolution • Perform transpose convolution • 𝐷𝑌 = 𝑍 • 𝑌 = 𝐷 𝑈 𝑍 2 Resize 1 Transposed convolution matrix 𝐷 𝑈 (16, 4) 4 4 Input 𝑍 (4, 1) Output (16, 1) Output 𝑌 (4, 4) 17
Transpose Convolution in Caffe • Forward: im2col 18
Transpose Convolution in Caffe • Forward: col2im T col2im 𝐷 𝑝𝑣𝑢 × 𝐼 ′ × 𝑋 ′ feature map 19
Transpose Convolution in Caffe • Backward 𝑍 𝑚−1 • Loss function: 𝑀 𝑈 • 𝑍 𝑚 = 𝑑𝑝𝑚2𝑗𝑛 𝑗𝑛2𝑑𝑝𝑚 𝐿 𝑚 𝑗𝑛2𝑑𝑝𝑚 𝑌 𝑚 𝐷 𝑚 𝜖𝑍 𝑚 ∙ 𝜖𝑍 𝑚 𝜖𝑍 𝑚−1 = 𝜖𝑀 𝜖𝑀 𝜖𝑀 𝜖𝑀 𝜖𝑍 𝑚−1 𝜖𝑍 𝑚 ∙ 𝜖𝑍 𝑚 = 𝜖𝑀 𝜖𝑍 𝑚−1 𝜖𝑍 𝑚 𝜖𝑌 𝑚 = 𝐷 𝑚 𝑈 ∙ 𝜖𝑀 𝜖𝑍 𝑚 Transpose convolution: 𝑌 = 𝐷 𝑈 𝑍 20
Transpose Convolution in Caffe • 'deconv_layer.cpp' • 𝑌 = 𝐷 𝑈 𝑍 21
Original size 𝐼 2 × 𝑋 𝐼 2 × 𝑋 2 2 𝐼 4 × 𝑋 𝐼 4 × 𝑋 𝐼 8 × 𝑋 4 4 𝐼 8 × 𝑋 8 8 16 × 𝑋 𝐼 16 22
Network architectures for Image-to-image • Encoder-decoder Edgar Simo-Serra, et al, Fully Learning to Simplify: Fully Convolutional Networks for Rough Sketch Cleanup, 2016 23
Network architectures for Image-to-image • Encoder-decoder + skip connections 24 Olaf Ronneberger, et al, U-Net: Convolution Networks for Biomedical Image Segmentation, 2015
Summary for Image-to-Image • Why fully convolutional? • Analysis of fully connected layers • Fully convolutional vs. fully connected • Fully Convolutional Networks (FCN) • Up-sampling • Interpolation • Un-pooling • Transpose convolution Theory • Implementation in Caffe • • Network architecture • Encoder-decoder • Encoder-decoder + skip connections 25
Recurrent Neural Networks
Examples of Sequence data "The quick brown fox • Speech recognition jumped over the lazy dog." "There is nothing to • Sentiment classification like in this movie." "The quick brown fox • Machine translation " 快速的棕色狐狸跳过 jumped over the lazy dog." 懒狗。 " • Name entity recognition "Yesterday, John met "Yesterday, John met Merry." Merry ." • … Sequence data: ✓ Elements from a list ✓ Arrange elements in order 27
One-hot representation input: Harry Potter and Herminone Granger invented a new spell. Vocabulary: 10,000 words 𝑏 0 0 1 1 0 𝑏𝑏𝑠𝑝𝑜 0 0 2 0 0 ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 𝑏𝑜𝑒 1 0 367 0 0 ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ℎ𝑏𝑠𝑠𝑧 1 0 0 4,075 0 ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 𝑞𝑝𝑢𝑢𝑓𝑠 6,830 0 0 0 1 ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 𝑨𝑣𝑚𝑣 0 0 10,000 0 0 28
Why not standard networks? Problems : - Inputs, outputs can be different lengths in different examples. - Doesn't share features learned across different positions of text. 29
Recurrent Neural Networks • Forward propagation 𝑧 <1> 𝑧 <2> 𝑧 <3> 𝑧 <𝑈 𝑧 > ො ො ො ො RNN RNN RNN RNN 𝑏 <0> 𝑏 <1> 𝑏 <2> 𝑏 <3> 𝑏 <..> … Cell Cell Cell Cell 𝑦 <1> 𝑦 <2> 𝑦 <3> 𝑦 <𝑈 𝑦 > Teddy Roosevelt was a great President. Time step Teddy bears are on sale! 30
Recurrent Neural Networks • RNNs cell 𝑧 <𝑢> ො tanh/ReLU 𝑏 <𝑢> = 1 𝑋 𝑏𝑏 𝑏 <𝑢−1> + 𝑋 𝑏𝑦 𝑦 <𝑢> + 𝑐 𝑏 𝑏 <𝑢−1> 𝑏 <𝑢> RNN Cell 𝑧 <𝑢> = 2 (𝑋 𝑏𝑧 𝑏 <𝑢> + 𝑐 𝑧 ) ො 𝑦 <𝑢> sigmoid 31
Recurrent Neural Networks • Backward propagation through time 𝑧 <1> 𝑧 <2> 𝑧 <3> 𝑧 <𝑈 𝑧 > ො ො ො ො 𝑏 <0> RNN RNN RNN RNN 𝑏 <1> 𝑏 <2> 𝑏 <3> 𝑏 <..> … Cell Cell Cell Cell 𝑦 <1> 𝑦 <2> 𝑦 <3> 𝑦 <𝑈 𝑦 > 𝑈 𝑦 𝑀 <𝑢> (ො 𝑧 <𝑢> , 𝑧 <𝑢> ) 𝑀 ො 𝑧, 𝑧 = Loss function: 32 𝑢=1
Different types of RNNs 𝑼 𝒚 = 𝑼 𝒛 𝑼 𝒚 ≠ 𝑼 𝒛 33
Vanishing gradient with RNNs The cat , which already ate the food, was full. The cats , which already ate the food, were full. 𝑧 <𝑈 𝑧 > 𝑧 <1> 𝑧 <2> 𝑧 <3> ො ො ො ො RNN RNN RNN RNN 𝑏 <0> 𝑏 <1> 𝑏 <2> 𝑏 <3> 𝑏 <..> … Cell Cell Cell Cell 𝑦 <1> 𝑦 <2> 𝑦 <3> 𝑦 <𝑈 𝑦 > 34
Solution to vanishing gradient • GRU – Gated Recurrent Unit • TCN – Time Convolutional Networks • LSTM – Long-Short Term Memory Unit 35
Summary for RNNs • What is sequence data? • One-hot representation for words in a vocabulary • Why not standard networks? • RNNs • Forward RNNs cell • • Backward • Different types of RNNs • Solution to vanishing gradient 36
Deep Learning Frameworks
Deep Learning Frameworks • Popular frameworks (Python) (C++, Python, Matlab) (Python, backends support (Python, C, Java, GO) other languages) • Less frequently frameworks (Python, C++) (Python) (Python) (Python, C++, C#) 38 (Python, R, Julia, Scala, Go, Javascript and more) (Matlab)
Recommend
More recommend