Delving Deep into Computer Vision Caner Hazirbas Machine Learning Meetup #1
Delving Deep into Computer Vision FlowNet FuseNet PoseLSTM DDFF Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 2
Delving Deep into Computer Vision FlowNet FlowNetSimple conv1 conv2 conv3 conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 7 x refine- prediction 7 5 x ment 3 5 x 5 1024 3 x 96 x 128 9 5 512 512 192 x 256 512 512 256 384 x 512 256 136 x 320 128 64 6 Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 3
Learning Optical Flow with FlowNet Convolutional Networks ICCV’15 FlowNetSimple conv1 conv2 conv3 conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 7 x refine- prediction 7 5 x ment 3 5 x 5 1024 3 x 9 96 x 128 5 512 512 192 x 256 512 512 256 256 384 x 512 136 x 320 128 64 6 FlowNetCorr conv1 conv2 conv3 conv_redir 1 x 1 7 x 7 sqrt 1 x 1 5 x 5 conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 384 x 512 256 4 x 512 4 x 512 128 64 2 refine- prediction kernel 3 x 3 3 corr ment 1024 512 512 512 512 32 136 x 320 256 441 473 Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 4
Flying Chairs FlowNet Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 5
FlowNetSimple FlowNet FlowNetSimple conv1 conv2 conv3 conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 7 x refine- prediction 7 5 x ment 3 5 x 5 1024 3 x 96 x 128 9 5 512 512 192 x 256 512 512 256 256 384 x 512 136 x 320 128 64 6 Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 6
<latexit sha1_base64="bqyMlj+iueCfrlLrqfMh5shUHg=">ACaXicbVFLa9wEJbdV7p9bdpLaS9Dl0JC3MUyhZCILSXQi8pdJPAyhZO94VlmUjySGL2T/ZW/9AL/0DPVbr+LJBwTfY0YaPuWNktbF8a8gvHP3v0Hew9Hjx4/efpsvP/8zNatETgTtarNRc4tKqlx5qRTeNEY5FWu8Dwv2z980s0Vtb6h1s3mFZ8qWUhBXdeysZKHFxlNLrKkM4BmbKutqYFLD/F0ZlSlzskI7kA0wxfVSIRQZ3c7BEdSHkWeJZ0nPgJm+hUsioAx+Abw6RiSEo5oNp7E07gvuA3oACZkqNs/JstatFWqJ1Q3No5jRuXdtw4KRuRqy12HBR8iV2K1SX6Ha0uYea+/3Trk9qA2+9soCiNv5oB726cwuvrF1Xue+suFvZm95W/J83b13xMe2kblqHWlw/VLQKXA3b2GEhDQqn1h5wYaTfH8SKGy6c/5yRD4bejOE2OEumNJ7S7+8nJ5+HiPbIa/KGHBKPpAT8pWckhkR5Cf5GwRBGPwJ98OX4avr1jAYZl6QnQon/wC2bLM3</latexit> <latexit sha1_base64="bqyMlj+iueCfrlLrqfMh5shUHg=">ACaXicbVFLa9wEJbdV7p9bdpLaS9Dl0JC3MUyhZCILSXQi8pdJPAyhZO94VlmUjySGL2T/ZW/9AL/0DPVbr+LJBwTfY0YaPuWNktbF8a8gvHP3v0Hew9Hjx4/efpsvP/8zNatETgTtarNRc4tKqlx5qRTeNEY5FWu8Dwv2z980s0Vtb6h1s3mFZ8qWUhBXdeysZKHFxlNLrKkM4BmbKutqYFLD/F0ZlSlzskI7kA0wxfVSIRQZ3c7BEdSHkWeJZ0nPgJm+hUsioAx+Abw6RiSEo5oNp7E07gvuA3oACZkqNs/JstatFWqJ1Q3No5jRuXdtw4KRuRqy12HBR8iV2K1SX6Ha0uYea+/3Trk9qA2+9soCiNv5oB726cwuvrF1Xue+suFvZm95W/J83b13xMe2kblqHWlw/VLQKXA3b2GEhDQqn1h5wYaTfH8SKGy6c/5yRD4bejOE2OEumNJ7S7+8nJ5+HiPbIa/KGHBKPpAT8pWckhkR5Cf5GwRBGPwJ98OX4avr1jAYZl6QnQon/wC2bLM3</latexit> <latexit sha1_base64="bqyMlj+iueCfrlLrqfMh5shUHg=">ACaXicbVFLa9wEJbdV7p9bdpLaS9Dl0JC3MUyhZCILSXQi8pdJPAyhZO94VlmUjySGL2T/ZW/9AL/0DPVbr+LJBwTfY0YaPuWNktbF8a8gvHP3v0Hew9Hjx4/efpsvP/8zNatETgTtarNRc4tKqlx5qRTeNEY5FWu8Dwv2z980s0Vtb6h1s3mFZ8qWUhBXdeysZKHFxlNLrKkM4BmbKutqYFLD/F0ZlSlzskI7kA0wxfVSIRQZ3c7BEdSHkWeJZ0nPgJm+hUsioAx+Abw6RiSEo5oNp7E07gvuA3oACZkqNs/JstatFWqJ1Q3No5jRuXdtw4KRuRqy12HBR8iV2K1SX6Ha0uYea+/3Trk9qA2+9soCiNv5oB726cwuvrF1Xue+suFvZm95W/J83b13xMe2kblqHWlw/VLQKXA3b2GEhDQqn1h5wYaTfH8SKGy6c/5yRD4bejOE2OEumNJ7S7+8nJ5+HiPbIa/KGHBKPpAT8pWckhkR5Cf5GwRBGPwJ98OX4avr1jAYZl6QnQon/wC2bLM3</latexit> <latexit sha1_base64="bqyMlj+iueCfrlLrqfMh5shUHg=">ACaXicbVFLa9wEJbdV7p9bdpLaS9Dl0JC3MUyhZCILSXQi8pdJPAyhZO94VlmUjySGL2T/ZW/9AL/0DPVbr+LJBwTfY0YaPuWNktbF8a8gvHP3v0Hew9Hjx4/efpsvP/8zNatETgTtarNRc4tKqlx5qRTeNEY5FWu8Dwv2z980s0Vtb6h1s3mFZ8qWUhBXdeysZKHFxlNLrKkM4BmbKutqYFLD/F0ZlSlzskI7kA0wxfVSIRQZ3c7BEdSHkWeJZ0nPgJm+hUsioAx+Abw6RiSEo5oNp7E07gvuA3oACZkqNs/JstatFWqJ1Q3No5jRuXdtw4KRuRqy12HBR8iV2K1SX6Ha0uYea+/3Trk9qA2+9soCiNv5oB726cwuvrF1Xue+suFvZm95W/J83b13xMe2kblqHWlw/VLQKXA3b2GEhDQqn1h5wYaTfH8SKGy6c/5yRD4bejOE2OEumNJ7S7+8nJ5+HiPbIa/KGHBKPpAT8pWckhkR5Cf5GwRBGPwJ98OX4avr1jAYZl6QnQon/wC2bLM3</latexit> FlowNetCorr FlowNet FlowNetCorr conv1 conv2 conv3 conv_redir 1 x 1 7 x 7 sqrt 1 x 1 5 x 5 conv4 conv3_1 conv4_1 conv5 conv5_1 conv6 384 x 512 256 4 x 512 4 x 512 128 64 2 refine- prediction kernel 3 x 3 3 corr ment 1024 512 512 512 512 32 256 136 x 320 441 473 X c ( x 1 , x 2 ) = h f 1 ( x 1 + o ) , f 2 ( x 2 + o ) i , o ∈ [ − k,k ] × [ − k,k ] K := 2 k + 1 Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 7
Simple vs. Corr FlowNet Flying Chairs FlowNetCorr conv1 conv2 conv3 conv_redir 1 x 1 7 x 7 sqrt 1 x 1 5 x 5 conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 384 x 512 256 4 x 512 4 x 512 128 64 2 refine- kernel prediction 3 x 3 3 corr ment 1024 512 512 512 512 32 256 136 x 320 441 473 FlowNetS FlowNetCorr Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 8
Simple vs. Corr FlowNet Sintel FlowNetCorr conv1 conv2 conv3 conv_redir 1 x 1 7 x 7 sqrt 1 x 1 5 x 5 conv3_1 conv4 conv4_1 conv5 conv5_1 conv6 384 x 512 256 4 x 512 4 x 512 128 64 2 refine- kernel prediction 3 x 3 3 corr ment 1024 512 512 512 512 32 256 136 x 320 441 473 FlowNetS FlowNetCorr Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 9
Learning Optical Flow with FlowNet Convolutional Networks Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 10
Delving Deep into Computer Vision FlowNet FuseNet Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 11
Incorporating Depth into Semantic Segmentation via Fusion-based CNN FuseNet Architecture ACCV’16 Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 12
A conventional way: HHA FuseNet Multi-Scale Convolutional Architecture for Semantic Segmentation, Raj et al., Tech. Report, CMU-RI-TR-15-21,2015 Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 13
A deep way… FuseNet Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 14
Why a second encoder for FuseNet Depth input? Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 15
Are we any better than HHA? FuseNet Proposed network improves all segmentation • metrics Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 16
What about the others? FuseNet Proposed network improves all segmentation metrics • Metrics • Global : total number of correctly classified pixels Mean : average class accuracy IoU : average of intersection over union. Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 17
Delving Deep into Computer Vision FlowNet FuseNet PoseLSTM LSTMs Pretrained FC GoogLeNet p ∈ R 3 q ∈ R 4 CNNs y ∈ R 2048 FC Y ∈ R 32 × 64 z ∈ R 128 Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 18
Image-based localization using LSTMs PoseLSTM for structured feature correlation ICCV’17 LSTMs Pretrained FC GoogLeNet p ∈ R 3 q ∈ R 4 CNNs y ∈ R 2048 FC Y ∈ R 32 × 64 z ∈ R 128 Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 19
PoseNet PoseLSTM Pretrained FC GoogLeNet p ∈ R 3 q ∈ R 4 CNNs y ∈ R 2048 FC R 128 Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 20
Structured Feature Correlation PoseLSTM LSTMs Pretrained FC GoogLeNet p ∈ R 3 q ∈ R 4 CNNs y ∈ R 2048 FC Y ∈ R 32 × 64 z ∈ R 128 Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 21
Winner in Outdoor: SIFT PoseLSTM Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 22
Where SIFT dies… PoseLSTM TUM-LSI Dataset The map cannot be reconstructed due to a lack of sufficient matches: repeated structures, textureless areas Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 23
Delving Deep into Computer Vision FlowNet FuseNet PoseLSTM DDFF Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 24
Deep Depth From Focus DDFF Image of a point intersects the camera sensor when the point is in focus • Therefore, sharpness determines the focused regions on the images • https://inst.eecs.berkeley.edu/~cs39j/sp02/session12.html Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 25
Conventional DFF methods DDFF Image of a point intersects the camera sensor when the point is in focus • Therefore, sharpness determines the focused regions on the images • Distance of a point from the camera can be formulated wrt. focus • Measure of Optimizer sharpness [Pertuz et al.] [Moeller et al.] Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 26
Deep Depth From Focus DDFF Focus gradually changes on each image in the stack • End-to-end trained convolutional auto-encoder • Depth (disparity) from focal stack • Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 27
How to get data? DDFF Caner Hazirbas | hazirbas@cs.tum.edu Delving Deep into Computer Vision 28
Recommend
More recommend