from 2d to 3d monocular vision
play

From 2D to 3D: Monocular Vision With application to robotics/AR - PowerPoint PPT Presentation

From 2D to 3D: Monocular Vision With application to robotics/AR Motivation How many sensors do we really need? Motivation What is the limit of what can be inferred from a single embodied (moving) camera frame? Aim AR with a hand-held


  1. From 2D to 3D: Monocular Vision With application to robotics/AR

  2. Motivation How many sensors do we really need?

  3. Motivation ● What is the limit of what can be inferred from a single embodied (moving) camera frame?

  4. Aim ● AR with a hand-held camera ● Visual Tracking provides registration ● Track without prior model of world ● Challenges ● Speed ● Accuracy ● Robustness ● Interaction with real world

  5. Existing attempts: SLAM ● Simultaneous Localization and Mapping ● Well-established in robotics (using a rich array of sensors) ● Demonstrated with a single hand-held camera by Davison 2003

  6. Model-based tracking vs SLAM

  7. Model-based tracking vs SLAM ● Model-based tracking is ● More robust ● More accurate ● Why? ● SLAM fundamentally harder?

  8. Pinhole camera model  X ,Y ,Z  ↦  fX / Z , fY / Z   1  0 ]  1  ↦  Z  = [ X X fX f 0 x = Y Y PX fY f 0 Z Z 1

  9. Pinhole camera model ( p x p , ) principal point: y   X +         f X Zp f p 1 0   x x Y       + = f Y Zp f p 1 0         x y Z         Z 1 1 0         1     f p x [ ]   P = K I | 0 = K f p calibration matrix   y   1  

  10. Camera rotation and translation In non-homogeneous coordinates: ( ) ~ ~ ~ cam = X R X - C ~ ~ ~       − − R R C X R R C   = = X cam X       0 1 1 0 1       [ ] X ~ ~ [ ] [ ] , = = − P = = − x K I | 0 X K R | R C K R | t t R C cam Note: C is the null space of the camera projection matrix (PC=0)

  11. Triangulation • Given projections of a 3D point in two or more images (with known camera matrices), find the coordinates of the point X? x 2 x 1 O 2 O 1

  12. Structure from Motion (SfM) • Given: m images of n fixed 3D points x ij = P i X j , i = 1 , … , m, j = 1 , … , n • Problem: estimate m projection matrices P i and n 3D points X j from the mn correspondences x ij X j x 1 j x 3 j x 2 j P 1 P 3 P 2

  13. SfM ambiguity • If we scale the entire scene by some factor k and, at the same time, scale the camera matrices by the factor of 1/ k , the projections of the scene points in the image remain exactly the same:   1 = =   x PX P ( k X )  k  It is impossible to recover the absolute scale of the scene!

  14. Structure from Motion (SfM) • Given: m images of n fixed 3D points x ij = P i X j , i = 1 , … , m, j = 1 , … , n Problem: estimate m projection matrices P i and n 3D points X j from the mn correspondences x ij • With no calibration info, cameras and points can only be recovered up to a 4x4 projective transformation Q : X → QX, P → PQ -1 • We can solve for structure and motion when 2 mn >= 11 m +3 n • For two cameras, at least 7 points are needed

  15. Bundle Adjustment • Non-linear method for refining structure and motion (Levenberg-Marquardt) • Minimizing re-projection error X j 2 ( ) m n ∑∑ = E ( P , X ) D x , P X ij i j = = i 1 j 1 P 1 X j x 3 j x 1 j P 3 X j P 2 X j x 2 j P 1 P 3 P 2

  16. Self-calibration • Self-calibration (auto-calibration) is the process of determining intrinsic camera parameters directly from uncalibrated images • For example, when the images are acquired by a single moving camera, we can use the constraint that the intrinsic parameter matrix remains fixed for all the images ● Compute initial projective reconstruction and find 3D projective transformation matrix Q such that all camera matrices are in the form P i = K [ R i | t i ] • Can use constraints on the form of the calibration matrix: zero skew

  17. Why is this cool? http://www.youtube.com/watch?v=sQegEro5Bfo

  18. Why is this still cool? http://www.youtube.com/watch?v=p16frKJLVi0

  19. The SLAM Problem • Simultaneous Localization And Mapping • A robot is exploring an unknown, static environment • Given: • The robot's controls • Observations of nearby features • Estimate: • Map of features • Path of the robot

  20. Structure of the landmark-based SLAM Problem

  21. SLAM a hard problem?? SLAM : robot path and map are both unknown Robot path error correlates errors in the map

  22. SLAM a hard problem?? Robot pose uncertainty • In the real world, the mapping between observations and landmarks is unknown • Picking wrong data associations can have catastrophic consequences • Pose error correlates data associations

  23. SLAM ● Full SLAM: Estimates entire path and map! p ( x , m | z , u ) 1 : t 1 : t 1 : t ● Online SLAM: ∫∫ ∫ =  p ( x , m | z , u ) p ( x , m | z , u ) dx dx ... dx − t 1 : t 1 : t 1 : t 1 : t 1 : t 1 2 t 1 Integrations typically done one at a time Estimates most recent pose and map!

  24. Graphical Model of Full SLAM p ( x , m | z , u ) 1 : t 1 : t 1 : t

  25. Graphical Model of Online SLAM ∫∫ ∫ = p ( x , m | z , u )  p ( x , m | z , u ) dx dx ... dx − t 1 : t 1 : t 1 : t 1 : t 1 : t 1 2 t 1

  26. Scan Matching ● Maximize the likelihood of the i-th pose and map relative to the (i-1)-th pose and map { } − = ⋅ [ t 1 ] ˆ ˆ ˆ x arg max p ( z | x , m ) p ( x | u , x ) − − t t t t t 1 t 1 x t robot motion current measurement map constructed so far ● Calculate the map according to “mapping” with known poses based on the poses and observations

  27. SLAM approach

  28. PTAM approach

  29. Tracking & Mapping threads

  30. Mapping thread

  31. Stereo Initialization ● 5 point-pose algorithm (Stewenius et al '06) ● Requires a pair of frames and feature correspondences ● Provides initial (sparse) 3D point cloud

  32. Wait for new keyframe ● Keyframes are only added if: ● There is a baseline to the other keyframes ● Tracking quality is good ● When a keyframe is added: ● The mapping thread stops whatever it is doing ● All points in the map are measured in the keyframe ● New map points are found and added to the map

  33. Add new map points ● Want as many map points as possible ● Check all maximal FAST corners in the keyframe: ● Check Shi-Tomasi score ● Check if already in map ● Epipolar search in a neighboring keyframe ● Triangulate matches and add to map ● Repeat in four image pyramid levels

  34. Optimize map ● Use batch SFM method: Bundle Adjustment* ● Adjusts map point positions and keyframe poses ● Minimizes re-projection error of all points in all keyframes (or use only last N keyframes) ● Cubic complexity with keyframes, linear with map points ● Compatible with M-estimators (we use Tukey)

  35. Map maintenance ● When camera is not exploring, mapping thread has idle time – use this to improve the map ● Data association in bundle adjustment is reversible ● Re-attempt outlier measurements ● Try to measure new map features in all old keyframes

  36. Tracking thread

  37. Pre-process frame ● Make mono and RGB version of image ● Make 4 pyramid levels ● Detect FAST corners

  38. Project Points ● Use motion model to update camera pose ● Project all map points into image to see which are visible, and at what pyramid level ● Choose subset to measure ● ~50 biggest features for coarse stage ● 1000 randomly selected for fine stage

  39. Measure Points ● Generate 8x8 matching template (warped from source keyframe) ● Search a fixed radius around projected position ● Use zero-mean SSD ● Only search at FAST corner points ● Up to 10 inverse composition iterations for subpixel position (for some patches) ● Typically find 60-70% of patches

  40. Update camera pose ● 6-DOF problem ● 10 iterations ● Tukey M-Estimator to minimize a robust objective function of re-projection error where ej is the re-projection error vector

  41. Bundle-adjustment ● Global bundle-adjustment ● Local bundle-adjustment ● X - The newest 5 keyframes in the keyframe chain ● Z - All of the map points visible in any of these keyframes ● Y - Keyframe for which a measurement of any point in Z has been made That is, local bundle ● Optimizes the pose of the most recent keyframe and its closest neighbors, and all of the map points seen by these, using all of the measurements ever made of these points.

  42. Video http://www.youtube.com/watch?v=Y9HMn6bd-v8 http://www.youtube.com/watch?v=pBI5HwitBX4

  43. Capabilities

  44. Capabilities Bundle adjusted point cloud with PTAM Multi-scale Compactly Supported Basis Functions

  45. Video http://www.youtube.com/watch?v=CZiSK7OMANw

  46. RGB-D Sensor ● Principle: structured light ● IR projector + IR camera ● RGB camera ● Dense depth images

  47. Kinect-based mapping

  48. System Overview ● Frame-to-frame alignment ● Global optimization (SBA for loop closure)

  49. Feature matching

Recommend


More recommend