LEARNING RIGIDITY IN DYNAMIC SCENES FOR SCENE FLOW ESTIMATION Kihwan Kim, Senior Research Scientist Zhaoyang Lv, Kihwan Kim, Alejandro Troccoli, Deqing Sun, James M. Rehg, Jan Kautz
CORRESPENDECES IN COMPUTER VISION 2 Image courtesy Roy Shilkrot
OPTICAL FLOW Fan et al. 2014 Brox and Malik 2011 Castro M. 2017 3
OPTICAL FLOW AND 3D SCENE FLOW Fan et al. 2014 Brox and Malik 2011 Castro M. 2017 4 Letouzey et al 2011
APPLICATION OF 3D MOTION AR and telepresence 3D reconstruction of dynamic scene [DynamicFusion, R. Newcombe, CVPR 2016] [Holoportation, Microsoft 2016] 5
APPLICATION OF 3D MOTION 3D Scene Understanding for autonomous driving Robotics Interaction [KITTI Dataset, A. Geiger, PAMI 2014] [SE3-Net,A. Byravan, ICRA, 2017] 6
2D OPTICAL FLOW VS 3D SCENE FLOW Why 3D motion estimation is challenging? 7
STATIC SCENE - MOVING CAMERA ๐ ๐ x 0 ๐ ๐ ๐ ๐ ๐โฒ ๐ โฒ ๐ ๐ ๐ฝ 0 ๐ฝ 1 8
STATIC SCENE - MOVING CAMERA ๐ ๐ ๐ ๐ ๐๐ 0โ1 Optical flow from camera motion x 0 ๐ ๐ ๐ก๐0 ๐๐ 0โ1 ๐ ๐ ๐ ๐ ๐๐ 0โ1 ๐โฒ ๐ โฒ ๐ ๐ ๐ฝ 0 ๐ฝ 1 9
STATIC SCENE - MOVING CAMERA ๐ ๐ ๐ ๐ ๐๐ 0โ1 Optical flow from camera motion Structure (3D) x 0 from (camera) Motion ๐ ๐ ๐ก๐0 ๐๐ 0โ1 ๐ ๐ ๐ ๐ ๐๐ 0โ1 ๐โฒ ๐ โฒ ๐ ๐ ๐ฝ 0 ๐ฝ 1 10
DYNAMIC SCENE - FIXED CAMERA ๐ ๐ x 0 ๐ ๐ ๐ฝ 0 11
DYNAMIC SCENE - FIXED CAMERA ๐ ๐ ๐ ๐ ๐ x 0โ1 x 1 Scene flow x 0 ๐ x 0โ1 ๐ก๐1 ๐๐ 0โ1 Projected scene flow in ๐ฝ 1 ๐ ๐ ๐ก๐0 ๐๐ 0โ1 ๐โฒ ๐ ๐ฝ 0 12
DYNAMIC SCENE - FIXED CAMERA ๐ ๐ ๐ ๐ ๐ x 0โ1 x 1 Scene flow x 0 ๐ x 0โ1 ๐ ๐ ๐โฒ ๐ ๐ฝ 0 13
14 COMMON VIDEOS NOWADAYS 14 Giphy.com #gopro, #drone, Sondra.T.
DYNAMIC SCENE โ MOVING CAMERA ๐ ๐ x 0 ๐ ๐ ๐ฝ 0 15
DYNAMIC SCENE โ MOVING CAMERA ๐ ๐ ๐ ๐ x 1 x 0 ๐ x 0โ1 ๐ ๐ ๐โฒ ๐ ๐ ๐ โฒ ๐ ๐ ๐ฝ 0 ๐ฝ 1 16
DYNAMIC SCENE โ MOVING CAMERA ๐ ๐ ๐ ๐ ๐ก๐1 ๐๐ 0โ1 Projected scene flow in ๐ฝ 1 x 1 ๐๐ ๐๐ 0โ1 x 0 ๐ x 0โ1 Optical flow ๐ ๐ ๐๐ 0โ1 Optical flow from camera motion ๐ ๐ ๐ก๐0 ๐๐ 0โ1 ๐ ๐ ๐๐ ๐๐ 0โ1 ๐โฒ ๐ ๐ ๐ ๐๐ 0โ1 ๐ ๐ ๐ก๐๐ ๐๐ 0โ1 โฒ ๐ ๐ ๐ฝ 0 ๐ฝ 1 17
DYNAMIC SCENE โ MOVING CAMERA Camera Ego motion Projected scene flow (3D motion field) Optical flow Input sequence Optical flow Camera Pose (transform) RIGIDITY (projected) scene flow or Camera ego-motion flow 3D motion field 18
HOW OTHER WORKS SOLVE THIS? Non-rigid or rigid local motions as outliers Menze and Geiger. CVPR 2015 Yang et al. ICRA 2011 19
HOW OTHER FLOW ALGORITHMS SOLVE THIS? Quiroga et al. ECCV 2014 Vogel et al. ICCV 2013 Wulff et al. CVPR 2017 Jaimez et al. 3DV 2015 Jaimez et al. ICRA 2017 20
OUR PROPOSAL Learn which parts of the scene is (likely) rigid/non-rigid 21
PIPELINE Refined ๐ฌ ๐ ๐ฑ ๐ [๐บ|๐] Rigidity Refinement [๐บ|๐] Transform Network Rigidity (RTN) Warping Mask Ego-motion In 3D ๐ฑ ๐ ๐ฌ ๐ flow Optical flow Estimated Projected Scene Flow Flow network PWC-net Subtraction 22
RIGIDITY TRANSFORM NETWORK (RTN) ๐ฌ ๐ ๐ฑ ๐ Rigidity Deconv Attention 1-5 Mask conv1-6 Pose ๐ ๐ฎ Regressor ๐ฌ ๐ ๐ฑ ๐ 23
RIGIDITY TRANSFORM NETWORK (RTN) ๐ฌ ๐ ๐ฑ ๐ Binary cross entropy loss Deconv Rigidity 1-5 Attention Mask c c c c c c o o o o conv-T o o Global ๐ ๐ฎ n n n n n n Average v v v v v v conv-R Pooling 2 4 5 1 3 6 ๐ฌ ๐ ๐ฑ ๐ Huber Rotation Translation loss 24
2D OPTICAL FLOW PWCNET CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume Sun et al. CVPR 2018 25
POSE REFINEMENT AND FLOW ๐บ ๐ โ = arg min เท ๐ถ ๐ฃ, ๐ค = 1 ๐ ๐ฃ, ๐ค = 0 ๐ ๐บ๐ 0 ๐ฃ + ๐๐ฃ, ๐ค + ๐๐ค + ๐ โ ๐ 1 ๐ฃ, ๐ค ๐ฃ,๐ค โฮฉ Flow correspondences Occlusion mask Rigidity mask We solve this objective function using off-the-shelf Gauss Newton solver GTSAM. 26
SUPERVISION NEEDED Scene-net Monkaa RGB-D SLAM benchmark SINTEL FlyingThings 3D RGB-D Lay-out Total Scenes Pose Optical flow Segmentation Photo Depth dataset Number Images (GT) (GT) (GT) realistic realistic Scene-net 47 5.1M static Yes Yes (from pose) Yes No Yes RGB-D SLAM 18 230K static Yes No No Yes Yes SINTEL 23 1018 dynamic Yes Yes Yes No Yes FlyingThings - 25K dynamic Yes Yes Yes No No Monkaa - 10K dynamic Yes Yes Yes No Yes 27
SEMI-SYNTHETIC DYNAMIC SCENE DATASET 28
REFRESH DATASET 29
30
31
SINTEL EVALUATION Trained from our data, testing on SINTEL data 32
SINTEL EVALUATION (POSE) 33
REAL WORLD DATA EVALUATION 34
35
CONCLUSION Proposed a learning-based approach to estimate the rigid regions in dynamic scenes observed by a moving camera ๏ฑ Robust per- pixel โRigidityโ of dynamic scenes Camera pose refined jointly together with 2D optical flow and rigid/occlusion masks ๏ฑ Novel semi-synthetic dynamic scene dataset, REFRESH ๏ฑ ๏ฑ Ours outperforms the state-of-the-art in SINTEL Future works End-to-end framework that learns rigidity as well as correspondences More rich contents in dynamic scene data for encouraging more generalization 36
Recommend
More recommend