disparity estimation and structure from motion
play

disparity estimation, and structure from motion Thomas Brox - PowerPoint PPT Presentation

Learning 3D representations, disparity estimation, and structure from motion Thomas Brox University of Freiburg, Germany Research funded by the ERC Starting Grant VideoLearn, the German Research Foundation, and the Deutsche Telekom Stiftung


  1. Learning 3D representations, disparity estimation, and structure from motion Thomas Brox University of Freiburg, Germany Research funded by the ERC Starting Grant VideoLearn, the German Research Foundation, and the Deutsche Telekom Stiftung Thomas Brox

  2. Outline 3D shape and texture from a single image FlowNet: end-to-end optical flow DispNet: end-to-end disparities DeMoN: end-to-end structure from motion Thomas Brox 2

  3. Single-view to multi-view Maxim Tatarchenko Alexey Dosovitskiy ECCV 2016 Up-convolutional part New image Choose desired from arbitrary output view Additional depth map view Analysis part Canonical 3D representation? Thomas Brox 3

  4. Single-view to multi-view Synthetic images Real images Thomas Brox 4

  5. Multi-view looks like 3D Thomas Brox 5

  6. Reconstructing explicit 3D models Thomas Brox 6

  7. Multiview morphing Thomas Brox 7

  8. Other interesting work Yang et al. NIPS 2015 Recurrent network, incrementally rotates the object Ours for comparison Input GT Choy Kar Input GT Choy Kar Kar et al. CVPR 2015 Choy et al. 2016 Thomas Brox 8

  9. Outline 3D shape and texture from a single image FlowNet: end-to-end optical flow DispNet: end-to-end disparities DeMoN: end-to-end structure from motion Thomas Brox 9

  10. FlowNet: estimating optical flow with a ConvNet • Can networks learn to find correspondences? • New learning task! (very different from classification, etc.) Dosovitskiy et al. ICCV 2015 Thomas Brox 10

  11. Can networks learn to find correspondences?  Help the network with an explicit correlation layer Dosovitskiy et al. ICCV 2015 Thomas Brox 11

  12. Enough data to train such a network? • Getting ground truth optical flow for realistic videos is hard • Existing datasets are small: Frames with ground truth Middlebury 8 KITTI 194 Sintel 1041 Needed >10000 Thomas Brox 12

  13. Realism is overrated: the “flying chairs” dataset Image pair Optical flow Thomas Brox 13

  14. Synthetic 3D datasets Mayer et al. CVPR 2016 Driving, Monkaa, FlyingThings3D datasets publicly available Thomas Brox 14

  15. Generalization: it works! Ground truth Input images FlowNetCorr FlowNetSimple Although the network has only seen flying chairs for training, it predicts good optical flow on other data Thomas Brox 15

  16. Optical flow estimation in 18ms Thomas Brox 16

  17. FlowNet 2.0 Eddy Ilg et al. arXiv 2016 Major changes: • Improved data and training schedules • Stacking of networks with motion compensation • Special small displacements and fusion network Thomas Brox 17

  18. FlowNet vs. FlowNet 2.0 Thomas Brox 18

  19. Numbers…. Sintel KITTI runtime DeepFlow (Weinzaepfel et al. 2013) 7.21 5.8 51940 ms FlowFields (Bailer et al. 2015) 5.81 3.5 22810 ms PCA Flow (Wulff & Black 2015) 8.65 6.2 140 ms FlowNet (Dosovitskiy et al. 2015) 7.52 - 18 ms FlowNet 2.0 5.74 1.8 123 ms Thomas Brox 19

  20. DispNet: disparity estimation Mayer et al. CVPR 2016 Thomas Brox 20

  21. DispNet: disparity estimation Thomas Brox 21

  22. Outline 3D shape and texture from a single image FlowNet: end-to-end optical flow DispNet: end-to-end disparities DeMoN: end-to-end structure from motion Thomas Brox 22

  23. DeMoN: Structure from motion with a network Benjamin Ummenhofer Huizhong Zhou arxiv 2016 Egomotion estimation and depth estimation are mutually dependent Thomas Brox 23

  24. Straightforward idea Image pair Network ignores the second image  motion parallax not learned Thomas Brox 24

  25. DeMoN architecture Estimates depth and Estimates optical flow egomotion Thomas Brox 25

  26. Iterative refinement Input images Ground truth Estimated optical flow Optical Flow Ground truth Estimated depth Depth Thomas Brox 26

  27. Outperforms two-frame SfM baselines Thomas Brox 27

  28. Two images generalize better than one image Thomas Brox 28

  29. Two images generalize better than one image Thomas Brox 29

  30. Structure from motion at 7fps Thomas Brox 30

  31. Estimated camera trajectory Example from RGB-D SLAM dataset (Sturm et al.) Red: DeMoN. Black: Ground truth. Thomas Brox 31

  32. Deep learning for 3D Vision is promising 3D shape and texture from a single image FlowNet: end-to-end optical flow DispNet: end-to-end disparities DeMoN: end-to-end structure from motion Thomas Brox 32

Recommend


More recommend