Inferring 3D Cues from a Single Image Wei- -Cheng Su Cheng Su - PowerPoint PPT Presentation

Inferring 3D Cues from a Single Image Wei- -Cheng Su Cheng Su Wei

Motivation 2 ¤ Human can estimate the 3D information from a single image easily. But how about computers? ¤ Possible cues: defocus, texture, shading, perspective, object size…

Outline 3 ¤ Inferring Spatial Layout from A Single Image via Depth-Ordered Grouping, by Stella X. Yu, Hao Zhang, and Jitendra Malik, Workshop on Perceptual Organization in Computer Vision, 2008 ¤ Depth Estimation using Monocular and Stereo Cues, by A. Saxena, J. Schulte, and A. Ng. IJCAI 2007 ¤ Comparison

Inferring Spatial Layout from A Single Image via Depth-Ordered Grouping 4 [Yu, Zhang, and Malik, Workshop on Perceptual Organization in Computer Vision 2008]

Goal 5 ¤ Infer 3D spatial layout from a single 2D image ¤ Based on grouping ¤ Focus on indoor scenes

6 Lines Edges Depth-ordered planes Quadrilaterals Line groups [Yu, Zhang, and Malik, Workshop on Perceptual Organization in Computer Vision 2008]

Edges 7 ¤ The most time consuming operation ¤ Canny edge detection ¤ 5 seconds for a 400x400 image with a 2GHz CPU

Lines 8 ¤ Link edge pixels into line segments ¤ Short lines are ignored [Yu, Zhang, and Malik, Workshop on Perceptual Organization in Computer Vision 2008]

Line Groups 9 [Yu, Zhang, and Malik, Workshop on Perceptual Organization in Computer Vision 2008]

Line Groups 10 ¤ Estimate vanish points (one for each of the three line clusters) [Yu, Zhang, and Malik, Workshop on Perceptual Organization in Computer Vision 2008]

Line Groups 11 ¤ A_ & A || : measure how likely two lines belong to the same group – attraction ¤ R ⊥ : measure how likely two lines belong to different groups – repulsion ¤ Pairwise attraction and repulsion in a graph cuts framework [Yu, Zhang, and Malik, Workshop on Perceptual Organization in Computer Vision 2008]

Quadrilaterals 12 ¤ Quadrilaterals are determined by adjacent lines and their vanishing points. [Yu, Zhang, and Malik, Workshop on Perceptual Organization in Computer Vision 2008]

Depth Ordered Planes 13 ¤ Coplanarity: based on the degree of overlap, A ⃞ ¤ Rectify before measuring [Yu, Zhang, and Malik, Workshop on Perceptual Organization in Computer Vision 2008]

Depth Ordered Planes 14 ¤ Relative Depth [Yu, Zhang, and Malik, Workshop on Perceptual Organization in Computer Vision 2008]

Depth Ordered Planes 15 ¤ The relative depth between two quadrilaterals is determined by the relative depth of their endpoints, R d [Yu, Zhang, and Malik, Workshop on Perceptual Organization in Computer Vision 2008]

Depth Ordered Planes 16 ¤ Pairwise attraction and directional repulsion in a graph cuts framework ⁄ Attraction: A ⃞ ⁄ Replusion: R d [Yu, Zhang, and Malik, Workshop on Perceptual Organization in Computer Vision 2008]

17 Lines Edges Depth-ordered planes Quadrilaterals Line groups [Yu, Zhang, and Malik, Workshop on Perceptual Organization in Computer Vision 2008]

Results 18 [Yu, Zhang, and Malik, Workshop on Perceptual Organization in Computer Vision 2008]

Depth Estimation using Monocular and Stereo Cues 20 ¤ Shortcomings of stereo vision ⁄ Fail for texture-less regions. ⁄ Inaccurate when the distance is large ¤ Monocular cues ⁄ Texture variations and gradients ⁄ Defocus ⁄ Haze ¤ Stereo and monocular cues are complementary ⁄ Stereo: image difference ⁄ Monocular: image content, prior knowledge about the environment and global structure are required.

Goal 21 ¤ 3-D scanner to collect training data ⁄ Stereo pairs ⁄ Ground truth depthmaps ¤ Estimate posterior distribution of the depths given the monocular image features and the stereo disparities ⁄ P(depths| monocular features, stereo disparities)

Visual Cues for Depth Estimation 22 ¤ Monocular Cues ¤ Stereo Cues

Monocular Features 23 ¤ 17 filters are used. 9 Laws’ masks, 6 oriented edge filters, 2 color filters ⁄ Texture variation ⁄ Texture gradients ⁄ Color [Saxena, Schulte, and Ng, IJCAI 2007] ¤ An image is divided into rectangular patches, a single depth value is estimated for each patch

Monocular Features 24 ¤ Absolute features ⁄ Sum-squared energy of each filter outputs over each patch ⁄ To capture global information, 4 neighboring patches at 3 spatial scales are concatenated. ⁄ Feature vector: (1+4)*3*17 = 255 dimensions ¤ Relative features ⁄ 10-bin histogram formed by the filter outputs of pixels in one patch. 10*17 = 170 dimensions

Monocular Features 25 [Saxena, Schulte, and Ng, IJCAI 2007]

Stereo Cues 26 ¤ Use the sum-of-absolute-differences correlation as the metric score to find correspondences ¤ Find disparity ¤ Calculate the depth

Probabilistic Model 27 ¤ Markov Random Field model ¤ P(d|X), X: monocular features of the patch, stereo disparity, and depths of other parts of the image the depth and stereo the depth and Smoothness disparity the features of constraint patch i

Learning 28 ¤ θ r : maximizing p(d|X; θ r ) of the training data. Assume all σ ’s are constant. ¤ Model σ 2 2rs as a linear function of the patches i and j’s relative depth features y ijs. ⁄ σ 2 T |y ijs | 2rs =u rs ¤ Model σ 2 1r as a linear function of x i ⁄ σ 2 T x i 1r = v r

Laplacian Model 29 ¤ The histogram of (d i – d j ) is close to a Laplacian distribution empirically ¤ Laplacian is more robust to outliers ¤ Gaussian is not able to give depthmaps with sharp edges

Experiments 30 ¤ Laser scanner on a panning motor 67x54 ⁄ ¤ Stereo cameras 1024x768 ⁄ ¤ 257 stereo pairs+depthmaps are obtained 75% used for training, 25% used ⁄ for testing ¤ Scenes Natural environments [Saxena, Schulte, and Ng, IJCAI 2007] ⁄ Man-made environments ⁄ Indoor environments ⁄

Experiments 31 ¤ Baseline ¤ Stereo ¤ Stereo(smooth, Lap) ¤ Mono(Gaussian) ¤ Mono(Lap) ¤ Stereo+Mono(Lap)

Results 32 [Saxena, Schulte, and Ng, IJCAI 2007]

Results 33 Image Ground truth stereo mono Stereo+mono [Saxena, Schulte, and Ng, IJCAI 2007]

Results 34 Image Ground truth stereo mono Stereo+mono [Saxena, Schulte, and Ng, IJCAI 2007]

Test Images from Internet 35 [http://ai.stanford.edu/~asaxena/learningdepth/others.html]

Results 38 [Saxena, Schulte, and Ng, IJCAI 2007]

Comparison 40 Depth order grouping [Zhang] ¤ Geometrical ⁄ Learning is not required ⁄ Can be used only for indoor scenes ⁄ Estimate the relative depth between planes ⁄ Objects should be rectangular or quadrilaterals ⁄ Depth estimation [Saxena] ¤ Statistical ⁄ Learning is required. ⁄ May not generalize well on images very different from training samples ⁄ Can be used for both indoor and unstructured outdoor environments. ⁄ Estimate the absolute depth ⁄

Thank you

Inferring 3D Cues from a Single Image Wei- -Cheng Su Cheng Su - PowerPoint PPT Presentation

Inferring 3D Cues from a Single Image Wei- -Cheng Su Cheng Su Wei Motivation 2 Human can estimate the 3D information from a single image easily. But how about computers? Possible cues: defocus, texture, shading, perspective,

Reinforcement Learning of Reinforcement Learning of Affordance Cues Affordance Cues Final

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Inferring Internet Inferring Internet Denial- -of of- -Service Activity Service Activity

On Inferring and Characterizing On Inferring and Characterizing Internet Routing Policies

Image as a single label king crab Image Source: ImageNet Image as an object set Man

Acoustic Cues Used by Learners of English Danica Reid Phonological Processing Lab Simon Fraser

Auditory Perception and Audition Addition of audio cues to VR environments Primary goal is to

Motion Cues for Illustration of Skeletal Motion Capture Data Simon Bouvier-Zappa Victor

Depth Perception Deep Blue See April 5, 2020 PSYCH 4041 / 6014 Overview Cue Theory

Perceptive Context Awareness of the User -- Visual Conversation Cues: Interfaces (kiosks, agents,

Syntactic cues alone in Adjective learning Michael Clauss and Jeremy Hartman 13 November 2015

Cue combinations, Bayesian models Thurs. March 1, 2018 1 Visual Cues: image properties that

Image Processing Todays Class Image Representations: Matrices Image Representations: RGB,

Topic 7: Topic 7: Image Morphing Image Morphing 1. 1. Intro to basic image morphing Intro to

Image Features Sanja Fidler CSC420: Intro to Image Understanding 1 / 64 Image Features Image

RGBD Tutorial 14210240041 Gu Pan Image RGB YUV Lab Depth Image RGB image Depth image Each pixel in

3D Multi-Object Tracking for Autonomous Driving Xinshuo Weng, Kris Kitani June 15, 2020 1 3D

Monocular Visual-Inertial SLAM for ISMAR SLAM Challenge Jie PAN Shaozu CAO, Jie PAN, Jieqi SHI,

* * 2 :

Deep learning for dense per-pixel prediction Chunhua Shen The University of Adelaide, Australia

Analysis of Ultra High Energetic Cosmic Rays measured in monocular mode with the fmuorescence

A PODS-based Extended Kalman Filter: Quantifying Sensing Uncertainties in Automatic Bird Species

Deep Structured Learning Chunhua Shen School of Computer Science, The University of Adelaide

robots navigation LUKAS HFLIGER SUPERVISED BY MARIAN GEORGE 2 LUKAS HFLIGER 3 4 LUKAS