deep learning in computer vision
play

Deep Learning in Computer Vision Caner Hazrba Deep Learning in - PowerPoint PPT Presentation

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15 Computer Vision Group 6 Postdocs, 16 PhD students Caner Hazrba | vision.in.tum.de Deep Learning in Computer Vision 2 Research in Computer Vision


  1. Deep Learning in Computer Vision Caner Hazırba ş Deep Learning in Action 
 24. June ’15

  2. Computer Vision Group 6 Postdocs, 16 PhD students Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 2

  3. Research in Computer Vision Robot Vision Shape Analysis Image-based 3D 
 Reconstruction Image 
 RGB-D Vision Visual SLAM Segmentation Optical Flow Convex 
 Relaxation 
 Methods Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 3

  4. Deep Learning 
 in Computer Vision

  5. How to teach a machine ? edges classifier Person (or any other hand-crafted features) Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 5

  6. How to teach a machine ? n o i t a t n e edges classifier s e r p Person e r d o o g a t o N (or any other hand-crafted features) Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 6

  7. What is deep learning ? Representation learning method 
 • Learning good features automatically from raw data Learning representations of data with multiple levels of abstraction • Google’s cat detection neural network Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 7

  8. Construction of higher 
 levels of abstraction w 1 w 2 w 3 b “non-linear” 
 transformation 1 Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 8

  9. Going deeper in the network Input 
 1st and 2nd Layers 3rd Layer 4th Layer ‘Pixels’ ‘Edges’ ‘Object Parts’ ‘Objects’ faces faces cars airplanes motorbikes Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 9 third layer

  10. Deep Learning Methods Unsupervised Methods • Restricted Boltzmann Machines • Deep Belief Networks • Auto encoders: unsupervised feature extraction/learning encode decode Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 10

  11. Deep Learning Methods Supervised Methods Deep Neural Networks • Recurrent Neural Networks • Convolutional Neural Networks • Language Vision Generating RNN Deep CNN A group of people shopping at an outdoor market. There are many vegetables at the fruit stand. Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 11

  12. How to train a deep network ? Stochastic Gradient Descent — supervised learning • show input vector of few examples • compute the output and the errors • compute average gradient • update the weights accordingly Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 12

  13. Convolutional Neural Networks CNNs are designed to process the data in the form of multiple arrays • (e.g. 2D images, 3D video/volumetric images) Typical architecture is composed of series of stages: convolutional layers • and pooling layers Each unit is connected to local patches in the feature maps of the • previous layer 10% E A q y B 4 50 20 50 20 4 x 14 8 x 27 8 x 27 15 x 54 15 x 54 pool2 conv1 pool1 conv 378 x 1 500 x 1 Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 13

  14. Key Idea behind 
 Convolutional Networks Convolutional networks take advantage of the properties of natural signals: • local connections Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 14

  15. Key Idea behind 
 Convolutional Networks Convolutional networks take advantage of the properties of natural signals: • local connections • shared weights Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 15

  16. Key Idea behind 
 Convolutional Networks Convolutional networks take advantage of the properties of natural signals: • local connections • shared weights • pooling Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 16

  17. Key Idea behind 
 Convolutional Networks Convolutional networks take advantage of the properties of natural signals: • local connections • shared weights • pooling • the use of many layers Person Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 17

  18. Pros & Cons Best performing method in many Need of huge amount of training • • Computer Vision tasks data No need of hand-crafted features Hard to train (local minima problem, • • tuning hyper-parameters) Most applicable method for large- • scale problems, e.g. classification Difficult to analyse ( to be solved ) • of 1000 classes Easy parallelization on GPUs • Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 18

  19. Deep Learning Applications 
 in Computer Vision

  20. Handwritten Digit Recognition Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 20

  21. ImageNet Classification with Deep Convolutional Neural Networks (AlexNet) Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 21

  22. FlowNet: Learning Optical Flow with Convolutional Networks in collaboration with University of Freiburg 
 lmb.informatik.uni-freiburg.de Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 22

  23. FlowNet: Learning Optical Flow with Convolutional Networks Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 23

  24. FlowNet: Learning Optical Flow with Convolutional Networks FlowNetSimple conv1 conv2 conv3 conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 7 x 7 refine- prediction 5 x 5 ment 3 x 3 5 x 5 1024 96 x 128 9 512 512 192 x 256 512 512 256 256 384 x 512 136 x 320 128 64 6 FlowNetCorr conv1 conv2 conv3 conv_redir 1 x 1 7 x 7 sqrt 1 x 1 5 x 5 conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 384 x 512 256 4 x 512 4 x 512 128 64 2 refine- kernel prediction 3 x 3 3 corr ment 1024 512 512 512 512 32 136 x 320 256 441 473 Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 24

  25. FlowNet: Learning Optical Flow with Convolutional Networks conv_redir 1 x 1 sqrt 1 x 1 256 kernel 3 x 3 corr 441 Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 25

  26. FlowNet: Learning Optical Flow with Convolutional Networks Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 26

  27. From Image to Caption Language Vision Generating RNN Deep CNN A group of people shopping at an outdoor market. There are many vegetables at the fruit stand. A woman is throwing a frisbee in a park. A dog is standing on a hardwood fm oor. A stop sign is on a road with a mountain in the background A little girl sitting on a bed with a teddy bear. A group of people sitting on a boat in the water. A gira fg e standing in a forest with trees in the background. Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 27

  28. Deep Learning in Computer Vision Caner Hazırba ş | hazirbas@cs.tum.edu Language Vision Generating RNN Deep CNN A group of people shopping at an outdoor End of 
 Questions ? market. Presentation There are many vegetables at the fruit stand.

  29. References Building High-level Features Using Large Scale Unsupervised Learning 
 • Quoc V. Le , Rajat Monga , Matthieu Devin , Kai Chen , Greg S. Corrado , Jeff Dean , Andrew Y. Ng ICML’12 Convolutional Deep Belief Networks for Scalable Unsupervised Learning of • Hierarchical Representations 
 Honglak Lee Roger Grosse Rajesh Ranganath Andrew Y. Ng ICML’09 ImageNet Classification with Deep Convolutional Neural Networks 
 • Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton NIPS’12 Gradient-based learning applied to document recognition. 
 • Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner Proceedings of the IEEE’98 FlowNet: Learning Optical Flow with Convolutional Networks 
 • Philipp Fischer, Alexey Dosovitskiy, Eddy Ilg, Philip Häusser, Caner Hazırba ş , Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, Thomas Brox Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 29

  30. References Google’s cat detection neural network http://www.resnap.com/image- • selection-technology/deep-learning-image-classification/ Example auto-encoder : http://nghiaho.com/?p=1765 • SGD : http://blog.datumbox.com/tuning-the-learning-rate-in-gradient- • descent/ Caner Hazırba ş | vision.in.tum.de Deep Learning in Computer Vision 30

Recommend


More recommend