Structure From Motion EECS 442 โ David Fouhey Fall 2019, University of Michigan http://web.eecs.umich.edu/~fouhey/teaching/EECS442_F19/
Structure from Motion
Structure from motion Have: 2D points p ij seen in m images Assume: points generated from n fixed 3D points X j and cameras M i or ๐ ๐๐ โก ๐ต ๐ ๐ ๐ X j Want: Cameras ๐ต ๐ , points ๐ ๐ p 1 j p 3 j p 2 j (Remember) M 1 M 3 ๐ต ๐ โก ๐ณ ๐ [๐บ ๐ , ๐ ๐ ] M 2 ๐๐ ๐๐ = ๐ต ๐ ๐ ๐ , ๐ โ 0 Known Unknown Diagram credit: S. Lazebnik
Is SFM always uniquely solvable? โข Necker cube Source: N. Snavely
Structure from motion ambiguities Letโs first find one easy ambiguity ๐ ๐๐ โก ๐ต ๐ ๐ ๐ 3x1 3x4 4x1
Zoolander , 2001
Structure from motion ambiguities Letโs first find one easy ambiguity ๐ ๐๐ โก ๐ต ๐ ๐ ๐ Can pick any arbitrary scaling factor k and adjust the cameras and points ๐ ๐๐ โก ๐ต ๐ ๐ โ๐ ๐๐ ๐ (Can usually be fixed in practice: just need a number, obtainable from heights of known objects or an IMU)
Structure from motion ambiguity Does this diagram change X j meaning if I use this coordinate system? x y p 1 j z 0 p 3 j p 2 j M 1 Versus this coordinate M 3 M 2 system?z Coordinate system irrelevant! x So global R,t also ambiguous 0 y
Structure from motion ambiguities Not just limited to scale. Given: ๐ ๐๐ โก ๐ต ๐ ๐ ๐ Can insert any global transform H ๐ ๐๐ โก ๐ต ๐ ๐ ๐ = ๐ต ๐ ๐ฐ โ๐ ๐ฐ๐ ๐ H is a 3D homography / perspective transform / projective transform
Similarity/Affine/Perspective Given: Perspective Affine Similarity Lines +Parallelism +Angles ๐ ๐ ๐ ๐ ๐ ๐ ๐ก๐บ ๐ ๐ ๐ ๐ ๐ ๐ ๐ 0 1 ๐ โ ๐ 0 0 1 3D: same idea, different dimensions House image: A. Efros
Projective ambiguity With no constraints on cameras matrices and scene, can only reconstruct up to a perspective ambiguity H ๐ ๐๐ โก ๐ต ๐ ๐ ๐ = ๐ต ๐ ๐ฐ โ๐ ๐ฐ๐ ๐ Slide credit: S. Lazebnik
Projective ambiguity Slide credit: S. Lazebnik
Affine ambiguity If we have constraints in the form of what lines are parallel, can reduce ambiguity to affine ambiguity . ๐ฉ ๐ ๐ฐ = Affine 0 1 ๐ ๐๐ โก ๐ต ๐ ๐ ๐ = ๐ต ๐ ๐ฐ โ๐ ๐ฐ๐ ๐ Slide credit: S. Lazebnik
Affine ambiguity Slide credit: S. Lazebnik
Similarity ambiguity If we have orthogonality constraints, get up to similarity transform. Really the best we can do. We get this if we have calibrated cameras. ๐ก๐บ ๐ ๐ฐ = 0 1 ๐ ๐๐ โก ๐ต ๐ ๐ ๐ = ๐ต ๐ ๐ฐ โ๐ ๐ฐ๐ ๐ Slide credit: S. Lazebnik
Similarity ambiguity Slide credit: S. Lazebnik
Affine structure from motion Weโll do the math with affine / weak perspective cameras (math is much easier) Perspective Weak Perspective
Recall: orthographic projection Orthographic camera: things infinitely far away but you have an amazing camera Image World Projection along the z direction ๐ฆ ๐ฃ 1 0 0 0 โ ๐ฆ ๐ง ๐ค = 0 1 0 0 ๐ง ๐จ 1 0 0 0 1 1
Field of view and focal length standard wide-angle telephoto Slide Credit: F. Durand
Affine Camera 1 0 0 0 ๐ต = ๐ฉ 2๐ธ ๐ 2๐ธ ๐ฉ 3๐ธ ๐ 3๐ธ 0 1 0 0 0 1 0 1 0 0 0 1 3x3 Matrix 3x4 Ortho. 4x4 Matrix Affine 2D Proj Affine 3D Tedious mathโฆ ๐ 11 ๐ 12 ๐ 13 ๐ 1 ๐ต = ๐ 21 ๐ 22 ๐ 23 ๐ 2 0 0 0 1
Affine Camera So what? Who cares? Examine the projection ๐ ๐ฃ ๐ 11 ๐ 12 ๐ 13 ๐ 1 ๐ ๐ค โก ๐ 21 ๐ 22 ๐ 23 ๐ 2 ๐ 1 0 0 0 1 1 Projection becomes linear mapping + translation and doesnโt involve homogeneous coordinates! ๐ ๐ค โก ๐ 11 ๐ 12 ๐ 13 ๐ฃ + ๐ 1 ๐ ๐ 21 ๐ 22 ๐ 23 ๐ 2 ๐ b is projection of origin. Can anyone see why?
Affine structure from motion General structure ๐ ๐๐ โก ๐ต ๐ ๐ ๐ from motion: 3x1 3x4 4x1 ๐ ๐๐ = ๐ฉ ๐ ๐ ๐ + ๐ ๐ Assume M is affine camera: 2x1 2x1 2x3 3x1 mn 2D points, m cameras, n 3D points up to arbitrary 3D affine (12 DOF) Need: 2mn โฅ 8m + 3n โ 12 (m = 2): n โฅ 4 (for all m!)
One simplifying trick Subtract off the average 2D point ๐ ๐๐ = ๐ฉ ๐ ๐ ๐ + ๐ ๐ ๐ ๐ ๐ ๐๐ = ๐ ๐๐ โ 1 = ๐ฉ ๐ ๐ ๐ + ๐ ๐ โ 1 เท ๐ เท ๐ ๐๐ ๐ เท ๐ฉ ๐ ๐ ๐ + ๐ ๐ ๐=1 ๐=1 Gather terms involving A i ,push out b i 0 ๐ ๐ ๐ ๐๐ = ๐ฉ ๐ ๐ ๐ โ 1 + ๐ ๐ โ 1 เท ๐ เท ๐ ๐ ๐ เท ๐ ๐ ๐=1 ๐=1 Set origin to mean of 3D points Can do this entirely in terms of A ! ๐ ๐๐ = ๐ฉ ๐ ๐ ๐ เท
Affine structure from motion First, make data measurement matrix consisting of all the points stacked together ๐ฃ 11 เท ๐ฃ 1๐ เท โฏ เท เท ๐ ๐๐ โฏ ๐ ๐๐ ๐ค 11 เท ๐ค 1๐ เท m โฎ โฑ โฎ โฎ โฑ โฎ cameras ๐ ๐๐ เท โฏ ๐ ๐๐ เท ๐ฃ ๐1 เท ๐ฃ ๐๐ เท โฏ ๐ค ๐1 เท ๐ค ๐๐ เท n points How big is this matrix? C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method. IJCV , 9(2):137-154, November 1992.
Affine structure from motion Then, write all the equations in one in terms of product of cameras and points. ๐ฉ ๐ ๐ ๐๐ เท โฏ ๐ ๐๐ เท โฎ โฎ โฑ โฎ = ๐ ๐ โฏ ๐ ๐ ๐ฌ = ๐ฉ ๐ เท เท ๐ ๐๐ โฏ ๐ ๐๐ 2m x n 2mx3 3xn D M S Whatโs the rank of D ? 3! C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: A factorization method. IJCV , 9(2):137-154, November 1992.
Making Matrices Rank Deficient Repeat of epipolar geometry class, but important enough to see twice. Given matrix M: rotation matrices ๐ ๐ร๐ , ๐ ๐ร๐ ๐ โ ๐ฮฃ๐ ๐ diagonal scaling matrix ฮฃ ๐ร๐ Keep only k ๐ 1 โฏ 0 biggest ฯ ; set โฎ โฑ โฎ ฮฃ = 0 โฏ ๐ ๐ others to 0 Minimizes ๐ โ เทก ๐ ๐บ (sum of ๐ โ ๐เท เทก ฮฃ๐ ๐ squares) subject to rank( เทก ๐ ) โค k See Eckart โ Young โMirsky theorem if youโre interested
Affine structure from motion Weโd like to take the measurements and convert them into M , S = x D M S 2m n 3 Remake of M. Hebert diagram
Affine structure from motion Do SVD (typically you donโt make full U,ฮฃ ,V) n n n n D U ฮฃ V T x x n = 2m Truncate to top 3 singular values ฮฃ 3 V 3 T D x x = U 3 Remake of M. Hebert diagram
Affine structure from motion Nearly there apart from this annoying ฮฃ 3 . x x D = U 3 ฮฃ 3 V 3 T ฮค 1/2 ๐ 1 2 ฮฃ 3 ๐ One solution (split ฮฃ 3 in two): ๐ธ = ๐ 3 ฮฃ 3 3 ๐ ๐ But remember x D = M S that we can put HH -1 in the middle Remake of M. Hebert diagram
Eliminating the affine ambiguity Rows a i of A i give axes of camera. Can multiply each projection A i with C to make A i C that satisfies: ๐ผ ๐ ๐ = 0 ๐ ๐ p ๐ ๐ = 1 ๐ ๐ = 1 a 2 X a 1 Gives 3 equations per camera, can set A i C to new camera, and C -1 S to new points. In general, a recipe for eliminating ambiguities Remake of M. Hebert diagram
Reconstruction results C. Tomasi and T. Kanade, Shape and motion from image streams under orthography: A factorization method, IJCV 1992
Dealing with missing data So far, assume we can see all points in all views In reality, measurement matrix typically looks like this: cameras points Possible solution: find dense blocks, solve in block, fuse. In general, finding these dense blocks is NP-complete Figure Credit: S. Lazebnik
But cameras arenโt affine! Want: m cameras M i , n 3D points X j Given: mn 2D points p ij ๐ ๐๐ โก ๐ต ๐ ๐ ๐ = ๐ต ๐ ๐ฐ โ๐ ๐ฐ๐ ๐
When is this Possible? Want: m cameras M i , n 3D points X j Given: mn 2D points p ij ๐ ๐๐ โก ๐ต ๐ ๐ ๐ = ๐ต ๐ ๐ฐ โ๐ ๐ฐ๐ ๐ 3D point (3) 2D 4x4 homography 3x4 camera point (2) (15) why? matrix (11) why? Need 2mn โฅ 11m + 3n โ 15 (m = 2): n โฅ 7 (m = 3): n โฅ 6 (doesnโt get better after) (m=1): n โค 4
Two Camera Case For two cameras, we need 7 points. Hmm. What else (in theory) requires 7 points? Compute fundamental X matrix F and epipole b s.t. F T b = 0. Then: p p' ๐ต 1 = [๐ฑ, ๐] b ๐ต 1 ๐ต 2 = [โ ๐ ๐ฆ ๐ฎ, ๐] ๐ต 2 Remember: this is up to a projective ambiguity!
Incremental SFM Key idea: incrementally add cameras, points ? M 1 ? M 2 Cameras ? ? Points ? ? ? ? Remake of S. Lazebnik material Note: numbers of points arenโt to scale.
Incremental SFM Key idea: incrementally add cameras, points ? 1. Initialize motion M i M 1 = [R i ,t i ] with ? M 2 Cameras fundamental matrix ? ? Points ? ? ? ? Remake of S. Lazebnik material Note: numbers of points arenโt to scale.
Recommend
More recommend