delving deep into computer vision
play

Delving Deep into Computer Vision Caner Hazirbas Machine Learning - PowerPoint PPT Presentation

Delving Deep into Computer Vision Caner Hazirbas Machine Learning Meetup #1 Delving Deep into Computer Vision FlowNet FuseNet PoseLSTM DDFF Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 2 Delving Deep into


  1. Delving Deep into Computer Vision Caner Hazirbas Machine Learning Meetup #1

  2. Delving Deep into Computer Vision FlowNet FuseNet PoseLSTM DDFF Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 2

  3. Delving Deep into Computer Vision FlowNet FlowNetSimple conv1 conv2 conv3 conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 7 x refine- prediction 7 5 x ment 3 5 x 5 1024 3 x 96 x 128 9 5 512 512 192 x 256 512 512 256 384 x 512 256 136 x 320 128 64 6 Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 3

  4. Learning Optical Flow with FlowNet Convolutional Networks ICCV’15 FlowNetSimple conv1 conv2 conv3 conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 7 x refine- prediction 7 5 x ment 3 5 x 5 1024 3 x 9 96 x 128 5 512 512 192 x 256 512 512 256 256 384 x 512 136 x 320 128 64 6 FlowNetCorr conv1 conv2 conv3 conv_redir 1 x 1 7 x 7 sqrt 1 x 1 5 x 5 conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 384 x 512 256 4 x 512 4 x 512 128 64 2 refine- prediction kernel 3 x 3 3 corr ment 1024 512 512 512 512 32 136 x 320 256 441 473 Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 4

  5. Flying Chairs FlowNet Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 5

  6. FlowNetSimple FlowNet FlowNetSimple conv1 conv2 conv3 conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 7 x refine- prediction 7 5 x ment 3 5 x 5 1024 3 x 96 x 128 9 5 512 512 192 x 256 512 512 256 256 384 x 512 136 x 320 128 64 6 Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 6

  7. <latexit sha1_base64="bqyMlj+iueCfrlLrqfMh5shUHg=">ACaXicbVFLa9wEJbdV7p9bdpLaS9Dl0JC3MUyhZCILSXQi8pdJPAyhZO94VlmUjySGL2T/ZW/9AL/0DPVbr+LJBwTfY0YaPuWNktbF8a8gvHP3v0Hew9Hjx4/efpsvP/8zNatETgTtarNRc4tKqlx5qRTeNEY5FWu8Dwv2z980s0Vtb6h1s3mFZ8qWUhBXdeysZKHFxlNLrKkM4BmbKutqYFLD/F0ZlSlzskI7kA0wxfVSIRQZ3c7BEdSHkWeJZ0nPgJm+hUsioAx+Abw6RiSEo5oNp7E07gvuA3oACZkqNs/JstatFWqJ1Q3No5jRuXdtw4KRuRqy12HBR8iV2K1SX6Ha0uYea+/3Trk9qA2+9soCiNv5oB726cwuvrF1Xue+suFvZm95W/J83b13xMe2kblqHWlw/VLQKXA3b2GEhDQqn1h5wYaTfH8SKGy6c/5yRD4bejOE2OEumNJ7S7+8nJ5+HiPbIa/KGHBKPpAT8pWckhkR5Cf5GwRBGPwJ98OX4avr1jAYZl6QnQon/wC2bLM3</latexit> <latexit sha1_base64="bqyMlj+iueCfrlLrqfMh5shUHg=">ACaXicbVFLa9wEJbdV7p9bdpLaS9Dl0JC3MUyhZCILSXQi8pdJPAyhZO94VlmUjySGL2T/ZW/9AL/0DPVbr+LJBwTfY0YaPuWNktbF8a8gvHP3v0Hew9Hjx4/efpsvP/8zNatETgTtarNRc4tKqlx5qRTeNEY5FWu8Dwv2z980s0Vtb6h1s3mFZ8qWUhBXdeysZKHFxlNLrKkM4BmbKutqYFLD/F0ZlSlzskI7kA0wxfVSIRQZ3c7BEdSHkWeJZ0nPgJm+hUsioAx+Abw6RiSEo5oNp7E07gvuA3oACZkqNs/JstatFWqJ1Q3No5jRuXdtw4KRuRqy12HBR8iV2K1SX6Ha0uYea+/3Trk9qA2+9soCiNv5oB726cwuvrF1Xue+suFvZm95W/J83b13xMe2kblqHWlw/VLQKXA3b2GEhDQqn1h5wYaTfH8SKGy6c/5yRD4bejOE2OEumNJ7S7+8nJ5+HiPbIa/KGHBKPpAT8pWckhkR5Cf5GwRBGPwJ98OX4avr1jAYZl6QnQon/wC2bLM3</latexit> <latexit sha1_base64="bqyMlj+iueCfrlLrqfMh5shUHg=">ACaXicbVFLa9wEJbdV7p9bdpLaS9Dl0JC3MUyhZCILSXQi8pdJPAyhZO94VlmUjySGL2T/ZW/9AL/0DPVbr+LJBwTfY0YaPuWNktbF8a8gvHP3v0Hew9Hjx4/efpsvP/8zNatETgTtarNRc4tKqlx5qRTeNEY5FWu8Dwv2z980s0Vtb6h1s3mFZ8qWUhBXdeysZKHFxlNLrKkM4BmbKutqYFLD/F0ZlSlzskI7kA0wxfVSIRQZ3c7BEdSHkWeJZ0nPgJm+hUsioAx+Abw6RiSEo5oNp7E07gvuA3oACZkqNs/JstatFWqJ1Q3No5jRuXdtw4KRuRqy12HBR8iV2K1SX6Ha0uYea+/3Trk9qA2+9soCiNv5oB726cwuvrF1Xue+suFvZm95W/J83b13xMe2kblqHWlw/VLQKXA3b2GEhDQqn1h5wYaTfH8SKGy6c/5yRD4bejOE2OEumNJ7S7+8nJ5+HiPbIa/KGHBKPpAT8pWckhkR5Cf5GwRBGPwJ98OX4avr1jAYZl6QnQon/wC2bLM3</latexit> <latexit sha1_base64="bqyMlj+iueCfrlLrqfMh5shUHg=">ACaXicbVFLa9wEJbdV7p9bdpLaS9Dl0JC3MUyhZCILSXQi8pdJPAyhZO94VlmUjySGL2T/ZW/9AL/0DPVbr+LJBwTfY0YaPuWNktbF8a8gvHP3v0Hew9Hjx4/efpsvP/8zNatETgTtarNRc4tKqlx5qRTeNEY5FWu8Dwv2z980s0Vtb6h1s3mFZ8qWUhBXdeysZKHFxlNLrKkM4BmbKutqYFLD/F0ZlSlzskI7kA0wxfVSIRQZ3c7BEdSHkWeJZ0nPgJm+hUsioAx+Abw6RiSEo5oNp7E07gvuA3oACZkqNs/JstatFWqJ1Q3No5jRuXdtw4KRuRqy12HBR8iV2K1SX6Ha0uYea+/3Trk9qA2+9soCiNv5oB726cwuvrF1Xue+suFvZm95W/J83b13xMe2kblqHWlw/VLQKXA3b2GEhDQqn1h5wYaTfH8SKGy6c/5yRD4bejOE2OEumNJ7S7+8nJ5+HiPbIa/KGHBKPpAT8pWckhkR5Cf5GwRBGPwJ98OX4avr1jAYZl6QnQon/wC2bLM3</latexit> FlowNetCorr FlowNet FlowNetCorr conv1 conv2 conv3 conv_redir 1 x 1 7 x 7 sqrt 1 x 1 5 x 5 conv4 conv3_1 conv4_1 conv5 conv5_1 conv6 384 x 512 256 4 x 512 4 x 512 128 64 2 refine- prediction kernel 3 x 3 3 corr ment 1024 512 512 512 512 32 256 136 x 320 441 473 X c ( x 1 , x 2 ) = h f 1 ( x 1 + o ) , f 2 ( x 2 + o ) i , o ∈ [ − k,k ] × [ − k,k ] K := 2 k + 1 Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 7

  8. Simple vs. Corr 
 FlowNet Flying Chairs FlowNetCorr conv1 conv2 conv3 conv_redir 1 x 1 7 x 7 sqrt 1 x 1 5 x 5 conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 384 x 512 256 4 x 512 4 x 512 128 64 2 refine- kernel prediction 3 x 3 3 corr ment 1024 512 512 512 512 32 256 136 x 320 441 473 FlowNetS FlowNetCorr Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 8

  9. Simple vs. Corr 
 FlowNet Sintel FlowNetCorr conv1 conv2 conv3 conv_redir 1 x 1 7 x 7 sqrt 1 x 1 5 x 5 conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 384 x 512 256 4 x 512 4 x 512 128 64 2 refine- kernel prediction 3 x 3 3 corr ment 1024 512 512 512 512 32 256 136 x 320 441 473 FlowNetS FlowNetCorr Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 9

  10. Learning Optical Flow with FlowNet Convolutional Networks Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 10

  11. Delving Deep into Computer Vision FlowNet FuseNet Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 11

  12. Incorporating Depth into Semantic Segmentation via Fusion-based CNN FuseNet Architecture ACCV’16 Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 12

  13. A conventional way: HHA FuseNet Multi-Scale Convolutional Architecture for Semantic Segmentation, Raj et al., Tech. Report, CMU-RI-TR-15-21,2015 Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 13

  14. A deep way… FuseNet Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 14

  15. Why a second encoder for FuseNet Depth input? Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 15

  16. Are we any better than HHA? FuseNet Proposed network improves all segmentation • metrics Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 16

  17. What about the others? FuseNet Proposed network improves all segmentation metrics • Metrics 
 • Global : total number of correctly classified pixels 
 Mean : average class accuracy 
 IoU : average of intersection over union. Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 17

  18. Delving Deep into Computer Vision FlowNet FuseNet PoseLSTM LSTMs Pretrained FC GoogLeNet p ∈ R 3 q ∈ R 4 CNNs y ∈ R 2048 FC Y ∈ R 32 × 64 z ∈ R 128 Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 18

  19. Image-based localization using LSTMs PoseLSTM for structured feature correlation ICCV’17 LSTMs Pretrained FC GoogLeNet p ∈ R 3 q ∈ R 4 CNNs y ∈ R 2048 FC Y ∈ R 32 × 64 z ∈ R 128 Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 19

  20. PoseNet PoseLSTM Pretrained FC GoogLeNet p ∈ R 3 q ∈ R 4 CNNs y ∈ R 2048 FC R 128 Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 20

  21. Structured Feature Correlation PoseLSTM LSTMs Pretrained FC GoogLeNet p ∈ R 3 q ∈ R 4 CNNs y ∈ R 2048 FC Y ∈ R 32 × 64 z ∈ R 128 Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 21

  22. Winner in Outdoor: SIFT PoseLSTM Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 22

  23. Where SIFT dies… PoseLSTM TUM-LSI Dataset The map cannot be reconstructed due to a lack of sufficient matches: repeated structures, textureless areas Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 23

  24. Delving Deep into Computer Vision FlowNet FuseNet PoseLSTM DDFF Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 24

  25. Deep Depth From Focus DDFF Image of a point intersects the camera sensor when the point is in focus • Therefore, sharpness determines the focused regions on the images • https://inst.eecs.berkeley.edu/~cs39j/sp02/session12.html Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 25

  26. Conventional DFF methods DDFF Image of a point intersects the camera sensor when the point is in focus • Therefore, sharpness determines the focused regions on the images • Distance of a point from the camera can be formulated wrt. focus • Measure of Optimizer sharpness [Pertuz et al.] [Moeller et al.] Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 26

  27. Deep Depth From Focus DDFF Focus gradually changes on each image in the stack • End-to-end trained convolutional auto-encoder • Depth (disparity) from focal stack • Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 27

  28. How to get data? DDFF Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 28

Recommend


More recommend