Convolutional Neural Networks for Computer Vision Caner Hazırba ş Centrum für Informations- und Sprachverarbeitung 24. November ’15
Computer Vision Group 5 Postdocs, 24 PhD students Caner Hazırba ş | vision.in.tum.de Convolutional Neural Networks for Computer Vision 2
Research in Computer Vision Caner Hazırba ş | vision.in.tum.de Convolutional Neural Networks for Computer Vision 3
Convolutional Neural Networks for Computer Vision
What is deep learning ? Representation learning method • Learning good features automatically from raw data Learning representations of data with multiple levels of abstraction • Google’s cat detection neural network Caner Hazırba ş | vision.in.tum.de Convolutional Neural Networks for Computer Vision 5
Going deeper in the network Input 1st and 2nd Layers 3rd Layer 4th Layer ‘Pixels’ ‘Edges’ ‘Object Parts’ ‘Objects’ faces faces cars airplanes motorbikes Caner Hazırba ş | vision.in.tum.de Convolutional Neural Networks for Computer Vision 6 third layer
Deep Learning Methods Unsupervised Methods • Restricted Boltzmann Machines • Deep Belief Networks • Auto encoders: unsupervised feature extraction/learning encode decode Caner Hazırba ş | vision.in.tum.de Convolutional Neural Networks for Computer Vision 7
Deep Learning Methods Supervised Methods Deep Neural Networks • Recurrent Neural Networks • Convolutional Neural Networks • Language Vision Generating RNN Deep CNN A group of people shopping at an outdoor market. There are many vegetables at the fruit stand. Caner Hazırba ş | vision.in.tum.de Convolutional Neural Networks for Computer Vision 8
How to train a deep network ? Stochastic Gradient Descent — supervised learning • show input vector of few examples • compute the output and the errors • compute average gradient • update the weights accordingly Caner Hazırba ş | vision.in.tum.de Convolutional Neural Networks for Computer Vision 9
How to train a deep network ? Alternatives: • AdaGrad, AdaDelta, NAG (Nesterov’s Accelerated Gradient)… • ADAM (now in Caffe - http://caffe.berkeleyvision.org/tutorial/solver.htm l ) The Adam is a gradient-based optimization method (like SGD). This includes an “adaptive moment estimation” (m t ,v t ) and can be regarded as a generalization of AdaGrad. The update formulas are: ( m t ) i = β 1 ( m t − 1 ) i + (1 � β 1 )( r L ( W t )) i , ( v t ) i = β 2 ( v t − 1 ) i + (1 � β 2 )( r L ( W t )) 2 i p 1 − ( β 2 ) t ( m t ) i i ( W t +1 ) i = ( W t ) i − α . 1 − ( β 1 ) t p ( v t ) i + ε i D. Kingma, J. Ba. Adam: A Method for Stochastic Optimization. International Conference for Learning Representations, 2015 Caner Hazırba ş | vision.in.tum.de Convolutional Neural Networks for Computer Vision 10
Convolutional Neural Networks CNNs are designed to process the data in the form of multiple arrays • (e.g. 2D images, 3D video/volumetric images) Typical architecture is composed of series of stages: convolutional layers • and pooling layers Each unit is connected to local patches in the feature maps of the • previous layer 10% E A q y B 4 50 20 50 20 4 x 14 8 x 27 8 x 27 15 x 54 15 x 54 pool2 conv1 pool1 conv 378 x 1 500 x 1 Caner Hazırba ş | vision.in.tum.de Convolutional Neural Networks for Computer Vision 11
Key Idea behind Convolutional Networks Convolutional networks take advantage of the properties of natural signals: • local connections • shared weights • pooling • the use of many layers Person Caner Hazırba ş | vision.in.tum.de Convolutional Neural Networks for Computer Vision 12
FlowNet: Learning Optical Flow with Convolutional Networks Philipp Fischer, Alexey Dosovitskiy, Eddy Ilg, Thomas Brox Philip Häusser, Caner Hazırba ş , Vladimir Golkov, Daniel Cremers, Patrick van der Smagt
Flying Chairs Caner Hazırba ş | vision.in.tum.de Convolutional Neural Networks for Computer Vision 14
Flying Chairs Flying Chairs Sintel 9 8 x 10 x 10 2.5 3 2 Number of pixels 2.5 Number of pixels 2 1.5 1.5 1 1 0.5 0.5 0 50 100 0 50 100 Displacement (px) Displacement (px) Flying Chairs Sintel Number of pixels (log scale) Number of pixels (log scale) 8 10 6 10 0 50 100 0 50 100 Displacement (px) Displacement (px) Caner Hazırba ş | vision.in.tum.de Convolutional Neural Networks for Computer Vision 15
Data Augmentation Generated Augmented • translation , rotation , scaling , additive Gaussian noise • changes in brightness , contrast , gamma and colour 16 Caner Hazırba ş | vision.in.tum.de Convolutional Neural Networks for Computer Vision
FlowNetSimple FlowNetSimple conv1 conv2 conv3 conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 7 x 7 refine- prediction 5 x 5 ment 3 x 3 5 x 5 1024 96 x 128 9 512 512 192 x 256 512 512 256 256 384 x 512 136 x 320 128 64 6 Caner Hazırba ş | vision.in.tum.de Convolutional Neural Networks for Computer Vision 17
FlowNetSimple - Flying Chairs FlowNetSimple conv1 conv2 conv3 conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 7 x 7 refine- prediction 5 x 5 ment 3 x 3 5 x 5 1024 96 x 128 9 512 512 192 x 256 512 512 256 256 384 x 512 136 x 320 128 64 6 Caner Hazırba ş | vision.in.tum.de Convolutional Neural Networks for Computer Vision 18
FlowNetSimple - Sintel FlowNetSimple conv1 conv2 conv3 conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 7 x 7 refine- prediction 5 x 5 ment 3 x 3 5 x 5 1024 96 x 128 9 512 512 192 x 256 512 512 256 256 384 x 512 136 x 320 128 64 6 Caner Hazırba ş | vision.in.tum.de Convolutional Neural Networks for Computer Vision 19
FlowNetCorr FlowNetCorr conv1 conv2 conv3 conv_redir 1 x 1 7 x 7 sqrt 1 x 1 5 x 5 conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 384 x 512 256 4 x 512 4 x 512 128 64 2 refine- prediction kernel 3 x 3 3 corr ment 1024 512 512 512 512 32 136 x 320 256 441 473 Caner Hazırba ş | vision.in.tum.de Convolutional Neural Networks for Computer Vision 20
Correlation Layer conv_redir 1 x 1 sqrt 1 x 1 256 kernel 3 x 3 corr 441 Caner Hazırba ş | vision.in.tum.de Convolutional Neural Networks for Computer Vision 21
FlowNetCorr - Flying Chairs FlowNetCorr conv1 conv2 conv3 conv_redir 1 x 1 7 x 7 sqrt 1 x 1 5 x 5 conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 384 x 512 256 4 x 512 4 x 512 128 64 2 refine- prediction kernel 3 x 3 3 corr ment 1024 512 512 512 512 32 136 x 320 256 441 473 Caner Hazırba ş | vision.in.tum.de Convolutional Neural Networks for Computer Vision 22
Simple vs. Corr - Flying Chairs FlowNetCorr conv1 conv2 conv3 conv_redir 1 x 1 7 x 7 sqrt 1 x 1 5 x 5 conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 384 x 512 256 4 x 512 4 x 512 128 64 2 refine- prediction kernel 3 x 3 3 corr ment 1024 512 512 512 512 32 136 x 320 256 441 473 FlowNetS FlowNetCorr Caner Hazırba ş | vision.in.tum.de Convolutional Neural Networks for Computer Vision 23
FlowNetCorr - Sintel FlowNetCorr conv1 conv2 conv3 conv_redir 1 x 1 7 x 7 sqrt 1 x 1 5 x 5 conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 384 x 512 256 4 x 512 4 x 512 128 64 2 refine- prediction kernel 3 x 3 3 corr ment 1024 512 512 512 512 32 136 x 320 256 441 473 Caner Hazırba ş | vision.in.tum.de Convolutional Neural Networks for Computer Vision 24
Simple vs. Corr - Sintel FlowNetCorr conv1 conv2 conv3 conv_redir 1 x 1 7 x 7 sqrt 1 x 1 5 x 5 conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 384 x 512 256 4 x 512 4 x 512 128 64 2 refine- prediction kernel 3 x 3 3 corr ment 1024 512 512 512 512 32 136 x 320 256 441 473 FlowNetS FlowNetCorr Caner Hazırba ş | vision.in.tum.de Convolutional Neural Networks for Computer Vision 25
FlowNetSimple + Variational Smoothing Caner Hazırba ş | vision.in.tum.de Convolutional Neural Networks for Computer Vision 26
FlowNet: Learning Optical Flow with Convolutional Networks Caner Hazırba ş | vision.in.tum.de Convolutional Neural Networks for Computer Vision 27
References Building High-level Features Using Large Scale Unsupervised Learning • Quoc V. Le , Rajat Monga , Matthieu Devin , Kai Chen , Greg S. Corrado , Jeff Dean , Andrew Y. Ng ICML’12 Convolutional Deep Belief Networks for Scalable Unsupervised Learning of • Hierarchical Representations Honglak Lee Roger Grosse Rajesh Ranganath Andrew Y. Ng ICML’09 ImageNet Classification with Deep Convolutional Neural Networks • Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton NIPS’12 Gradient-based learning applied to document recognition. • Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner Proceedings of the IEEE’98 FlowNet: Learning Optical Flow with Convolutional Networks • Philipp Fischer, Alexey Dosovitskiy, Eddy Ilg, Philip Häusser, Caner Hazırba ş , Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, Thomas Brox Caner Hazırba ş | vision.in.tum.de Convolutional Neural Networks for Computer Vision 28
Recommend
More recommend