A Whirlwind Tour of where we are in Computational Binocular Stereo Vision a beginners tutorial for the uninitiated Toby Breckon School of Engineering and Computing Sciences Durham University Slides: www.durham.ac.uk/toby.breckon/teaching/tutorials/vihm_wks_2015_breckon.pdf Slide material acknowledgements (some portions): R. Szeliski (Microsoft/Washington), B. Fisher (Edinburgh), O. Hamilton (Cranfield/Durham), J. Xiao, N. Snavely, J. Hays, S. Prince ViiHM Mini-Workshop 2015 Stereo Vision : 1
Setting the Scene ... Breckon: ViiHM 2015 Stereo Vision : 2
the core problem: stereo vision Breckon: ViiHM 2015 Stereo Vision : 3
the core problem: stereo vision ● Binocular Stereo Vision (i.e. only 2 cameras) 3D scene information implicitly encoded in image differences – ⇒ Representation: RGB intensity images noisy – Breckon: ViiHM 2015 Stereo Vision : 4
Left Breckon: ViiHM 2015 Stereo Vision : 5
Right Breckon: ViiHM 2015 Stereo Vision : 6
Stereo Vision – the key principle image features (e.g. point / line / pixel) will project differently in the left and right images depending on its distance from the camera (or eyes in human vision). P R P L P L P R This difference in image position is known as disparity , d =|P L - P R | Breckon: ViiHM 2015 Stereo Vision : 7
Stereo Vision - principle - Matching every feature between the left and right images results in a 2D ‘disparity map’ or ‘depth map’ (computed as disparity, d, at every feature position) - Real-world 3D information (distances to scene objects) can be recovered from this depth map Breckon: ViiHM 2015 Stereo Vision : 8
Concept : depth recovery Depth of scene object indicated by greyscale value http://vision.middlebury.edu/stereo/ Breckon: ViiHM 2015 Stereo Vision : 9
But why is this computationally challenging ? Breckon: ViiHM 2015 Stereo Vision : 10
Left Breckon: ViiHM 2015 Stereo Vision : 11
Right Breckon: ViiHM 2015 Stereo Vision : 12
In reality - images are noisy due to {encoding, sampling, illumination, camera alignment, camera variations, temperature} thus features appear differently in each image .. thus simple image matching (most) often fails Breckon: ViiHM 2015 Stereo Vision : 13
this is what makes stereo vision challenging Breckon: ViiHM 2015 Stereo Vision : 14
Today , almost all computational stereo research addresses the matching problem [to some degree, at some level] Breckon: ViiHM 2015 Stereo Vision : 15
Disparity Vs. Depth ● Computer Vision people often refer to disparity estimation P L – disparity is a 2D measure of feature P R d displacement between the images (measured in pixels) ● Biological Vision people often refer to depth perception – depth is an axis of positional Scene measurement of distance Depth Ordering within the scene (measured in metres / mm / cm) Relative scene depth, Z Breckon: ViiHM 2015 Stereo Vision : 16
… essentially the same thing Depth of a scene object, Z , observed to have disparity difference, d , between two stereo images separated by a baseline distance, B , with camera lenses with a focal length, f. .... if you have one you can calculate the other Breckon: ViiHM 2015 Stereo Vision : 17
Stereo : Standard Formulation Camera 1 Camera 2 (left eye) (right eye) B L ⇒ R left / right views at known (calibrated) distance apart (baseline, B) ● Breckon: ViiHM 2015 Stereo Vision : 18
Stereo Vision – disparity to depth Point P (in the world) is projected into the left image plane (as P L ) and the right image plane (as P R ) Z P L P R Left Right Image Plane Image Plane f f P = (X,Y,Z) (in the world) P L =(x L ,y L ) (in left image) B L ⇒ P R =(x R ,y R ) (in right image) R Breckon: ViiHM 2015 Stereo Vision : 19
Stereo Vision – disparity to depth The re-projection of P L from the left image plane into the right image plane allows us to recover disparity as a pixel distance within the image. disparity, d =|P L -P R | P Z d P L P R P L Left Right Image Plane Image Plane f f P = (X,Y,Z) (in the world) P L =(x L ,y L ) (in left image) B L ⇒ P R =(x R ,y R ) (in right image) R Breckon: ViiHM 2015 Stereo Vision : 20
Stereo Vision – disparity to depth What is stereo vision? Z X Y Images captured under Perspective Transform ● (X,Y,Z) in scene (depth Z) – imaged at position (x,y) on the image plane – determined by the focal length of the camera f – (lens to image plane distance) image inverted during capture (fixed inside camera) – y Z x f Thus in stereo to recover 3D position of P = (X, Y, Z): ● depth of a feature, Z , with disparity, d, over a stereo baseline, B: – Breckon: ViiHM 2015 Stereo Vision : 21
Computational Stereo – An Outline [How do we solve the matching problem ?] Breckon: ViiHM 2015 Stereo Vision : 22
Stereo Vision - Overview 2 stereo cameras Stereo camera setup two cameras, viewing calibration ● target [Lukins '05] relative positions known (calibration) Image Capture Feature Extraction What can we see in each ● image? Can we match ● Feature Matching features between images? Triangulation Depth recovery from matched features Breckon: ViiHM 2015 Stereo Vision : 23
Sparse Image Features ● State of the Art : feature points – high dimensional local feature descriptions (e.g. 128D+) – considerable research effort Initial work - [ Harris, 1998] then intensive - [Period : 2004 → 2010+ ] – robust matching performance beyond the stereo case ● considerably beyond (!) ● strongly invariant (via RANSAC) – Feature points in a nutshell: ● pixels described by local gradient histograms ● normalized for maximal invariance ● discard pixel regions that are not locally unique [ SIFT – Lowe, 2004 / SURF – Bay et al., 2006] Breckon: ViiHM 2015 Stereo Vision : 24
Sparse Image Features Harris Feature Points – example - [Fisher / Breckon et al., 2014] Breckon: ViiHM 2015 Stereo Vision : 25
Sparse Image Features ● Under-pins …. 3D reconstruction from tourist photos: http://www.cs.cornell.edu/projects/p2f/ Real-time image mosaicking [Breckon et al., 2010] Deformable object matching - http://www.cvc.uab.es/~jcrubio/ Object instance detection – [SURF, SIFT et al.] … + object recognition and a whole lot more. Breckon: ViiHM 2015 Stereo Vision : 26
Readily gives us feature-based stereo (i.e. sparse depth) e.g. Match local unique “corner” features points (obtain disparity/depth at these points) Interpolate complete 3D depth solution / object positions etc. Breckon: ViiHM 2015 Stereo Vision : 27
Example: sparse stereo for HCI [Features = red/green blobs] [source: anon] Breckon: ViiHM 2015 Stereo Vision : 28
Example: sparse stereo for stereo odometry [Features = feature points] https://www.youtube.com/watch?v=lTQGTbrNssQ Breckon: ViiHM 2015 Stereo Vision : 29
Reality … nobody really uses sparse stereo any more [apart from bespoke applications like those just illustrated] Breckon: ViiHM 2015 Stereo Vision : 30
.. the world went dense. Breckon: ViiHM 2015 Stereo Vision : 31
Dense Stereo Vision ● Concept: compute depth for each and every scene pixel Breckon: ViiHM 2015 Stereo Vision : 32
Key challenge: any pixel in left could now potentially match to any pixel in the right this is a lot of matches to evaluate! → a large search space of matches is computationally expensive (and prone to mis-matching errors) Breckon: ViiHM 2015 Stereo Vision : 33
Stereo Correspondence Problem Q: For a given feature in the left, what is the correct correspondence? ? Different pairing result in different 3D results ● inconsistent correspondence = inconsistent 3D (!) – Key problem in all stereo vision approaches – Breckon: ViiHM 2015 Stereo Vision : 34
In computational stereo vision this is addressed via three aspects: camera calibration leading to epipolar geometry Match aggregation – matching regions not pixels Match optimization – compute many possible matches, then select the best subset that are maximal inter-consistent Breckon: ViiHM 2015 Stereo Vision : 35
Epipolar Geometry – reduces matching space ● Feature p l in the left image lies on a ray r in space – r projects to an epipolar line e in the right image – along which the matching feature p r must lie If the images are “rectified”, then epipolar line is the image row ● i.e. camera images both perfectly axis aligned – Breckon: ViiHM 2015 Stereo Vision : 36
Epipolar Geometry – reduces matching space ● Constrains L → R Correspondence – reduces 2D search to 1D – images linked by fundamental matrix, F. – For matched points p l F p r =0. – F generally derived from prior calibration routine (with pre- known target). – Points are homogeneous – F is 3x3 Right Image Plane Left Image Plane Match for point p l on ray r (left) must lie on epipolar line e (right). ● Breckon: ViiHM 2015 Stereo Vision : 37
Example: rectified Images original rectified “rectified” images = then epipolar line is the image row • rectification is performed via calibration thus stereo is reduced to a 1D “scan-line matching” problem Breckon: ViiHM 2015 Stereo Vision : 38
Recommend
More recommend