Deep Learning in Computer Vision Caner Hazırba ş Deep Learning in Action 24. June ’15
Computer Vision Group 6 Postdocs, 16 PhD students Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 2
Research in Computer Vision Robot Vision Shape Analysis Image-based 3D Reconstruction Image RGB-D Vision Visual SLAM Segmentation Optical Flow Convex Relaxation Methods Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 3
Deep Learning in Computer Vision
How to teach a machine ? edges classifier Person (or any other hand-crafted features) Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 5
How to teach a machine ? n o i t a t n e edges classifier s e r p Person e r d o o g a t o N (or any other hand-crafted features) Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 6
What is deep learning ? Representation learning method • Learning good features automatically from raw data Learning representations of data with multiple levels of abstraction • Google’s cat detection neural network Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 7
Construction of higher levels of abstraction w 1 w 2 w 3 b “non-linear” transformation 1 Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 8
Going deeper in the network Input 1st and 2nd Layers 3rd Layer 4th Layer ‘Pixels’ ‘Edges’ ‘Object Parts’ ‘Objects’ faces faces cars airplanes motorbikes Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 9 third layer
Deep Learning Methods Unsupervised Methods • Restricted Boltzmann Machines • Deep Belief Networks • Auto encoders: unsupervised feature extraction/learning encode decode Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 10
Deep Learning Methods Supervised Methods Deep Neural Networks • Recurrent Neural Networks • Convolutional Neural Networks • Language Vision Generating RNN Deep CNN A group of people shopping at an outdoor market. There are many vegetables at the fruit stand. Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 11
How to train a deep network ? Stochastic Gradient Descent — supervised learning • show input vector of few examples • compute the output and the errors • compute average gradient • update the weights accordingly Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 12
Convolutional Neural Networks CNNs are designed to process the data in the form of multiple arrays • (e.g. 2D images, 3D video/volumetric images) Typical architecture is composed of series of stages: convolutional layers • and pooling layers Each unit is connected to local patches in the feature maps of the • previous layer 10% E A q y B 4 50 20 50 20 4 x 14 8 x 27 8 x 27 15 x 54 15 x 54 pool2 conv1 pool1 conv 378 x 1 500 x 1 Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 13
Key Idea behind Convolutional Networks Convolutional networks take advantage of the properties of natural signals: • local connections Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 14
Key Idea behind Convolutional Networks Convolutional networks take advantage of the properties of natural signals: • local connections • shared weights Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 15
Key Idea behind Convolutional Networks Convolutional networks take advantage of the properties of natural signals: • local connections • shared weights • pooling Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 16
Key Idea behind Convolutional Networks Convolutional networks take advantage of the properties of natural signals: • local connections • shared weights • pooling • the use of many layers Person Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 17
Pros & Cons Best performing method in many Need of huge amount of training • • Computer Vision tasks data No need of hand-crafted features Hard to train (local minima problem, • • tuning hyper-parameters) Most applicable method for large- • scale problems, e.g. classification Difficult to analyse ( to be solved ) • of 1000 classes Easy parallelization on GPUs • Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 18
Deep Learning Applications in Computer Vision
Handwritten Digit Recognition Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 20
ImageNet Classification with Deep Convolutional Neural Networks (AlexNet) Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 21
FlowNet: Learning Optical Flow with Convolutional Networks in collaboration with University of Freiburg lmb.informatik.uni-freiburg.de Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 22
FlowNet: Learning Optical Flow with Convolutional Networks Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 23
FlowNet: Learning Optical Flow with Convolutional Networks FlowNetSimple conv1 conv2 conv3 conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 7 x 7 refine- prediction 5 x 5 ment 3 x 3 5 x 5 1024 96 x 128 9 512 512 192 x 256 512 512 256 256 384 x 512 136 x 320 128 64 6 FlowNetCorr conv1 conv2 conv3 conv_redir 1 x 1 7 x 7 sqrt 1 x 1 5 x 5 conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 384 x 512 256 4 x 512 4 x 512 128 64 2 refine- kernel prediction 3 x 3 3 corr ment 1024 512 512 512 512 32 136 x 320 256 441 473 Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 24
FlowNet: Learning Optical Flow with Convolutional Networks conv_redir 1 x 1 sqrt 1 x 1 256 kernel 3 x 3 corr 441 Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 25
FlowNet: Learning Optical Flow with Convolutional Networks Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 26
From Image to Caption Language Vision Generating RNN Deep CNN A group of people shopping at an outdoor market. There are many vegetables at the fruit stand. A woman is throwing a frisbee in a park. A dog is standing on a hardwood fm oor. A stop sign is on a road with a mountain in the background A little girl sitting on a bed with a teddy bear. A group of people sitting on a boat in the water. A gira fg e standing in a forest with trees in the background. Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 27
Deep Learning in Computer Vision Caner Hazırba ş | hazirbas@cs.tum.edu Language Vision Generating RNN Deep CNN A group of people shopping at an outdoor End of Questions ? market. Presentation There are many vegetables at the fruit stand.
References Building High-level Features Using Large Scale Unsupervised Learning • Quoc V. Le , Rajat Monga , Matthieu Devin , Kai Chen , Greg S. Corrado , Jeff Dean , Andrew Y. Ng ICML’12 Convolutional Deep Belief Networks for Scalable Unsupervised Learning of • Hierarchical Representations Honglak Lee Roger Grosse Rajesh Ranganath Andrew Y. Ng ICML’09 ImageNet Classification with Deep Convolutional Neural Networks • Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton NIPS’12 Gradient-based learning applied to document recognition. • Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner Proceedings of the IEEE’98 FlowNet: Learning Optical Flow with Convolutional Networks • Philipp Fischer, Alexey Dosovitskiy, Eddy Ilg, Philip Häusser, Caner Hazırba ş , Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, Thomas Brox Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 29
References Google’s cat detection neural network http://www.resnap.com/image- • selection-technology/deep-learning-image-classification/ Example auto-encoder : http://nghiaho.com/?p=1765 • SGD : http://blog.datumbox.com/tuning-the-learning-rate-in-gradient- • descent/ Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 30
Recommend
More recommend