competitive collaboration
play

Competitive Collaboration Joint Unsupervised Learning of Depth, - PowerPoint PPT Presentation

Competitive Collaboration Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation Anurag Ranjan Perceiving Systems Max Planck Institute for Intelligent Systems 1 Varun Jampani Lukas Balles Deqing Sun


  1. Competitive Collaboration Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation Anurag Ranjan Perceiving Systems Max Planck Institute for Intelligent Systems 1

  2. Varun Jampani Lukas Balles Deqing Sun Kihwan Kim Jonas Wulff Michael Black 2

  3. TΓΌbingen, Germany 3

  4. Outline Motion and Deep Learning Competitive Unsupervised Learning of Geometry Optical Flow with Structure Collaboratio Everything n Supervise Unsupervise d d 4

  5. Motion and Optical Flow 5

  6. Optical Flow 2D velocity for all pixels between two frames of a video sequence. 𝐽 𝑦, 𝑧, 𝑒 βˆ’ 1 = 𝐽(𝑦 + 𝑣, 𝑧 + 𝑀, 𝑒) 6

  7. Why do we need Optical Flow SLAM Action Recognition Super-resolution Optical Flow Video Compression Slomo VFX Unsupervised Segmentation Motion Magnification 7 Unsupervised Segmentation: Mahendran et al., VFX: Black et al., Motion Magnification: Liu et al., Action Recognition: Simoyan et al.

  8. Optical Flow 2D velocity for all pixels between two frames of a video sequence. 𝐽 𝑦, 𝑧, 𝑒 βˆ’ 1 = 𝐽(𝑦 + 𝑣, 𝑧 + 𝑀, 𝑒) 8

  9. Estimating Optical Flow 𝐽 𝑦, 𝑧, 𝑒 βˆ’ 1 = 𝐽(𝑦 + 𝑣, 𝑧 + 𝑀, 𝑒) min 𝑣,𝑀 βˆ₯ 𝐽 𝑦, 𝑧, 𝑒 βˆ’ 1 βˆ’ 𝐽 𝑦 + 𝑣, 𝑧 + 𝑀, 𝑒 βˆ₯ min 𝑣,𝑀 𝜍(𝐽 𝑒 βˆ’ 1 βˆ’ π‘₯arp 𝐽 𝑒 , 𝑣, 𝑀 ) Photometric Loss 9

  10. min 𝑣,𝑀 𝜍(𝐽 𝑒 βˆ’ 1 βˆ’ π‘₯arp 𝐽 𝑒 , 𝑣, 𝑀 ) Photometric Loss 10

  11. No prior on structure 11

  12. Can we learn from data? 12

  13. Optical Flow Estimation ∈ ℝ π‘œΓ—n Dosovitskiy et al. 2015 13

  14. FlowNet Dosovitskiy et al. 2015 14

  15. Problem FlowNet is too big. 33 M parameters. Needs to learn both large and small motions. Does not perform well. 15

  16. Approach Image statistics are scale invariant. Use an image pyramid. Train a small network for each pyramid level. Compute residual flow at each level. Network captures small displacements. Pyramid captures large displacements. Burt and Adelson. The Laplacian pyramid as a compact image code. IEEE COM, 1983 16

  17. SPyNet Spatial Pyramid Network for Optical Flow Estimation Ranjan et al. Optical Flow estimation using a Spatial Pyramid Network. CVPR 2017. 17

  18. 𝐽 1 , 𝐽 2 32x7x7 64x7x7 32x7x7 16x7x7 2x7x7 𝑀 𝑙 18

  19. 𝐻 𝑙 19

  20. 𝑣 𝑣 + + 0 π‘Š 0 π‘Š 1 𝐻 1 π‘₯ 𝐻 0 𝑀 0 𝑀 1 𝑒 𝑒 1 𝐽 0 1 1 𝐽 1 𝐽 2 𝑒 𝑒 2 𝐽 0 2 𝐽 2 2 𝐽 1 20

  21. 𝑣 𝑣 + + + 0 π‘Š 0 π‘Š π‘Š 1 2 𝐻 2 𝐻 1 π‘₯ π‘₯ 𝐻 0 𝑀 0 𝑀 1 𝑀 2 𝑒 𝑒 1 𝐽 0 1 1 𝐽 1 𝐽 2 𝑒 𝑒 2 𝐽 0 2 𝐽 2 2 𝐽 1 21

  22. Spatial Temporal Spatial Temporal SPyNet FlowNet 22

  23. Frames Ground Truth FlowNetS FlowNetC SPyNet 23

  24. Average EPE on Sintel (Clean + Final) 8,500 8,400 Voxel2Voxel* 8,300 8,200 8,100 FlowNetC 8,000 7,900 FlowNetS 7,800 7,700 SPyNet 7,600 7,500 1 10 100 Number of Model Parameters (in Millions) *error metric not consistent with the benchmarks 24

  25. Average EPE on Sintel (Clean + Final) 9,000 Voxel2Voxel* [2016] 8,500 SPyNet [2017] FlowNetS [2015] 8,000 FlowNetC [2015] 7,500 7,000 6,500 6,000 PWC-Net [2018] 5,500 FlowNet2 [2017] 5,000 4,500 4,000 1 10 100 1000 Number of Model Parameters (in Millions) *error metric not consistent with the benchmarks 25

  26. Sintel Clean Sintel Clean d0-10 d10-60 d60-140 s0-10 s10-40 s40+ SpyNet+ft 43.442 5.501 3.122 1.719 0.832 3.343 FlownetS+ft 5.992 3.561 2.193 1.424 3.815 40.098 FlownetC+ft 5.575 3.182 1.993 1.622 3.974 33.369 Sintel Final Sintel Final d0-10 d10-60 d60-140 s0-10 s10-40 s40+ SpyNet+ft 3.290 49.707 6.694 4.368 1.395 5.534 FlownetS+ft 7.252 4.610 1.873 5.826 43.236 2.993 FlownetC+ft 7.190 4.619 3.298 2.305 6.169 40.779 Distance from Motion Boundaries Average Displacement 26

  27. Problem SPyNet [1] [1] Ranjan et al. Optical Flow estimation using a Spatial Pyramid Network. CVPR 2017. 28

  28. Why humans? β€’ Useful for recognition problems. Scenes contain human actions. β€’ Two-stream architectures use fast classical optical flow methods. β€’ Deep Networks have massive GPU memory requirements. Left Image: Delaitre et al. Recognizing human actions in still images, BMVC 2010 29 Right Image: Simonyan et al. Two-stream convolutional networks for action recognition in videos. NIPS 2014 .

  29. Problem Flying Chairs MPI Sintel KITTI [3] [1] [2] No dataset for human optical flow for training neural networks. [1] Dosovitskiy et al. Flownet: Learning optical flow with convolutional networks. ICCV 2015. [2] Butler et al. A naturalistic open source movie for optical flow evaluation. ECCV 2012. 30 [3] Geiger et al. Vision meets robotics: The KITTI dataset. International Journal of Robotics Research 32.11 (2013): 1231-1237.

  30. Idea Create a new dataset for human optical flow. Use it to train an existing fast and compact optical flow method. 31

  31. Human Flow Dataset Human Motion Realistic + + Environment Capture data Human Body [3] [1] Model [2] + Cloth texture, Lighting, Noise, Motion Blur, Camera Blur Blender Simulate and Extract Motion Vectors [1] Ionescu et al. Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE PAMI2014. 32 [2] Loper et al. MoSh: Motion and Shape Capture from Sparse Markers. SIGGRAPH Asia 2014. [3] Yu et al. "Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop." arXiv preprint arXiv:1506.03365(2015).

  32. Human Flow Dataset 33

  33. SPyNet 𝑣 𝑣 + + + 0 π‘Š 0 π‘Š π‘Š 1 2 𝐻 2 𝐻 1 π‘₯ π‘₯ 𝐻 0 𝑀 0 𝑀 1 𝑀 2 𝑒 𝑒 1 𝐽 0 1 1 𝐽 1 𝐽 2 𝑒 𝑒 2 𝐽 0 2 𝐽 2 2 𝐽 1 Ranjan et al. Optical Flow estimation using a Spatial Pyramid Network. CVPR 2017. 35

  34. Evaluation of Optical Flow Networks Average EPE Human Flow Dataset 0.6 0.5 SPyNet PWC-Net 0.4 SPyNet+HF 0.3 PWC-Net+HF 0.2 0.1 0 0.010 0.100 1.000 10.000 Inference Time (s) 36

  35. Evaluation of Optical Flow Networks Average EPE Human Flow Dataset 1 FlowNetS 0.9 0.8 0.7 PCA Flow 0.6 SPyNet 0.5 Epic Flow LDOF PWC-Net 0.4 SPyNet+HF FlowNet2 0.3 PWC-Net+HF Flow Fields 0.2 0.1 0 0.010 0.100 1.000 10.000 Inference Time (s) 37

  36. Visuals – Video Ground Truth Human Flow SpyNet 38

  37. Visuals – Video Ground Truth Human Flow SpyNet 39

  38. Visuals – Video Ground Truth Human Flow SpyNet 40

  39. Visuals – Video Human Flow SpyNet 41

  40. Visuals – Video Human Flow SpyNet 42

  41. Human Flow may not work on other parts of the scene. 43

  42. Introduction to Scene Geometry 44

  43. Motion of a Static Scene For static scenes: Depth + Camera Motion = Optical 45 Flow

  44. Multi-view Geometry Pinhole Camera Matrix 𝑦 2 = 𝐿 𝑆 𝑦 1 = πΏπ‘Œ, 𝑒 π‘Œ, 𝐽 2 𝐽 1 𝑒 𝑔 𝑦 1 π‘Œ = 𝑒 βˆ₯ 𝐽 1 𝑦 1 βˆ’ 𝐽 2 𝑦 2 βˆ₯= 0 min 𝑆,𝑒,𝑒 𝜍(𝐽 1 βˆ’ π‘₯arp 𝐽 2 , 𝑆, 𝑒, 𝑒 ) Photometric Loss 46

  45. Static Scene and Moving Objects 47

  46. How to decompose a scene? 48

  47. Competitive Collaboration 49

  48. 𝑆 𝒠 𝑠 50

  49. 𝑆 𝐺 𝒠 𝑠 𝒠 𝑔 Competitor Competitor 𝒠 51

  50. Competition 𝑆 𝐺 𝒠 𝑠 𝒠 𝑔 Competitor Competitor 𝑁 Moderator 52

  51. Collaboration 𝑆 𝐺 βˆ— βˆ— 𝒠 𝑠 𝒠 𝑔 Competitor Competitor 𝑁 Moderator 53

  52. Mixed Domain Learning 𝐡 𝐢 𝑁 54

  53. Competition Loss 𝐹 𝑑𝑝𝑛 = 𝑛 βˆ™ 𝐼 𝐡 , 5 + 1 βˆ’ 𝑛 βˆ™ 𝐼(𝐢 , 5) 55

  54. Collaboration Loss 𝐹 π‘‘π‘π‘š = 𝐹 𝑑𝑝𝑛 + α‰Š βˆ’ log(𝑁 𝑧 + πœ—) 𝑗𝑔 𝐹 𝐡 < 𝐹 𝐢 βˆ’ log(1 βˆ’ 𝑁 𝑧 + πœ—) 𝑗𝑔𝐹 𝐡 β‰₯ 𝐹 𝐢 𝐹 𝐡 = 𝐼(𝐡 ( ), 5) 56

  55. 𝐡 𝐢 𝑁 57

  56. Accuracy Model Training MNIST SVHN MNIST+SVHN Error Error Error Alice Basic 1.34 11.88 8.96 Alice CC 1.41 11.55 8.74 Bob CC 1.24 11.75 8.84 Alice+Bob+Mod CC 1.24 11.55 8.70 Alice 3x Basic 1.33 10.86 8.22 58

  57. Moderator Behavior Alice Bob MNIST 0 % 100 % SVHN 100 % 0 % 59

  58. Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation 60

  59. Monocular Depth Prediction 𝐸 𝑆 𝐷 CameraMotion Estimation Zhou et al. CVPR 2017 61

  60. Meister et al. AAAI β€˜18, Janai et al. ECCV β€˜18 Monocular Depth Prediction Optical Flow Estimation 𝐸 𝐺 𝑆 𝐷 CameraMotion Estimation Zhou et al. CVPR 2017 62

  61. Monocular Depth Prediction Optical Flow Estimation 𝐸 𝐺 𝒠 𝑠 𝑆 𝒠 𝑔 𝐷 𝑁 CameraMotion Estimation Motion Segmentation 63

  62. Photometric Photometric Loss Loss 𝐹 𝑆 = 𝜍(𝐽, π‘₯arp(𝐽 + , 𝑑, 𝑒 )) β‹… 𝑛 𝐹 𝐺 = 𝜍(𝐽, π‘₯arp(𝐽 + , 𝑣 + )) β‹… (1 βˆ’ 𝑛) Monocular Depth Prediction Optical Flow Estimation 𝐸 𝐺 Loss 𝑆 𝐹 Loss 𝐷 𝑁 CameraMotion Estimation Motion Segmentation 𝐹 𝐷 = 𝐼(𝑱 βˆ₯𝑣 𝑆 βˆ’ 𝑣 𝐺 βˆ₯<πœ‡ 𝑑 , 𝑛) 64

Recommend


More recommend