c18 computer vision
play

C18 Computer Vision Lecture 5 Imaging geometry, camera calibration - PowerPoint PPT Presentation

C18 Computer Vision Lecture 5 Imaging geometry, camera calibration Victor Adrian Prisacariu http://www.robots.ox.ac.uk/~victor InfiniDense DEMO Course Content Projective geometry, camera calibration. Salient feature detection.


  1. C18 Computer Vision Lecture 5 Imaging geometry, camera calibration Victor Adrian Prisacariu http://www.robots.ox.ac.uk/~victor

  2. InfiniDense DEMO

  3. Course Content • Projective geometry, camera calibration. • Salient feature detection. • Recovering 3D from two images I: epipolar geometry. • Recovering 3D from two images II: stereo correspondences, triangulation, neural nets. Slides at http://www.robots.ox.ac.uk/~victor -> Teaching Lots borrowed from David Murray + AV C18.

  4. Useful Texts • Multi ltiple Vie iew Geometry try in in Computer Visi ision • Richard Hartley, Andrew Zisserman • Computer Visi ision: A Modern Approach • David Forsyth, Jean Ponce Prentice Hall; ISBN:0130851981 • 3-Dim imensional Computer Visi ision: A Geometr tric Vie iewpoint • Olivier Faugeras

  5. Computer Vision: This time… 5. 5. Im Imaging geometry, camera calibration. 1. Introduction. 2. The perspective camera as a geometric device. 3. Perspective using homogeneous coordinates. 4. Calibration the elements of the perspective model. 6. Salient feature detection and description. 7. Recovering 3D from two images I: epipolar geometry. 8. Recovering 3D from two images II: stereo correspondences, triangulation, neural nets.

  6. 5.1 Introduction Aim in geometric computati tion vis isio ion is to take a number of 2D images, and obtain an understanding of the 3D environment; what is in it; and how it evolves over time. What do we have here …? … seems very easy …

  7. It isn’t …

  8. Organizing the tricks … Although human and (3D) computer vision might be bags of tricks, it is useful to place the tricks with ithin la larger proce cessing paradigms. For example: a) Data-driven, bottom-up processing. b) Model-driven, top-down, generative processing. c) Dynamic Vision (mixes bottom-up with top-down feedback). d) Active Vision (task oriented). e) Data-driven discriminative approach (machine learning). These are neither all-embracing nor exclusive.

  9. (a) Data-driven, bottom-up processing • Image processing produces map of salient 2D features. • Features input into a range of shape from X processes whose output was the 2.5 .5D sketch. • Only in the last stage we get a fully 3D obje ject- ce centered description.

  10. (b) Model-driven, and (c) Dynamic vision • Model-driven, top-down, generati tive proce cessing: – a model of the scene is assumed known. – Supply a pose for the object relative to the camera, and use projection to predict where salient features should be found in the image space. – Search for the features, and refine the pose by minimizing the observed deviation. • Dynamic vis vision: mixes bottom- up/top-down by introducing Top-down Dynamic feedback.

  11. (d) Active Vision • Introduces task-oriented sensing-perception- actio ion lo loops: – Visual data needs only be “good enough” to drive the particular action. • No need to build and maintain an overarching representation of the surroundings. • Computational resources focused where they are needed.

  12. (e) Data-driven approach • The aim is to le learn a description of the transformation between input and output using exemplars. • Geometry is not forgotten, but implicit learned representation are favored.

  13. 5.2 The perspective camera as a geometric device

  14. This is (a picture of) my cat 0 520 x = 295 x 308 Cat nose 520

  15. My cat lives in a 3D world 𝑌 1 𝐲 = 𝑦 1 𝑌 2 𝐘 = 𝑦 2 𝑌 3 The point 𝐘 in world space projects to the point 𝐲 in image space

  16. Going from X in 3D to x in 2D ? 𝑌 1 𝐲 = 𝑦 1 𝑌 2 𝐘 = 𝑦 2 𝑌 3 film/sensor cat Output would be blurry  if film just exposed to the cat

  17. Going from X in 3D to x in 2D ? 𝑌 1 𝐲 = 𝑦 1 𝑌 2 𝐘 = 𝑦 2 𝑌 3 film/sensor barrier cat Blur reduced, looks good ☺

  18. Pinhole Camera ? 𝑌 1 𝐲 = 𝑦 1 𝑌 2 𝐘 = 𝑦 2 𝑌 3 Image Plane pinhole cat All rays pass through the ce center of of pr projection (a single point). Image forms on the image plane.

  19. Pinhole Camera image plane 𝑌 1 𝑌 2 𝐘 = 𝑌 3 f p o Optical axis 𝐲 = 𝑦 1 𝑦 2 𝑌 1 is imaged into 𝐲 = 𝑦 1 𝑌 2 The 3D point 𝐘 = 𝑦 2 as: 𝑌 3 f – focal length 𝑌 1 𝑔 o – camera origin 𝑦 1 𝑌 3 𝑦 2 = p – principal point 𝑌 2 𝑔 𝑌 3

  20. Homogeneous coordinates • The projection 𝐲 = 𝑔𝐘/𝑌 3 is non-linear  . • Can be made linear using homogeneous coordinates – involves representing the image and scene in higher dimensional space. • Limiting cases – e.g. vanishing points – are handled better. • Homogeneous coordinates allow for transformations to be concatenated more easily.

  21. 3D Euclidean transforms: inh inhomogeneous coordinates • My cat moves through 3D space. • The movement of the tip of the nose can be described using an Eucli lidean tr transform: ′ 𝐘 3×1 = 𝑺 3×3 𝐘 3×1 + 𝐮 3×1 rotation translation

  22. 3D Euclidean transforms: inh inhomogeneous coordinates ′ • Euclidean transform: 𝐘 3×1 = 𝑺 3×3 𝐘 3×1 + 𝐮 3×1 • Concatenation of successive transform is a mess! • 𝐘 1 = 𝑺 1 𝐘 + 𝐮 1 • 𝐘 2 = 𝑺 2 𝐘 1 + 𝐮 2 • 𝐘 2 = 𝑺 2 𝑺 1 𝐘 + 𝐮 1 + 𝐮 2 = 𝑺 2 𝑺 1 𝐘 + 𝑺 2 𝐮 𝟐 + 𝐮 2 .

  23. 3D Euclidean transforms: homogeneous coordinates 𝑌 𝑌 𝑍 • We replace the 3D points with a four vector . 𝑍 𝑎 𝑎 1 • The Euclidean transform becomes: 𝑺 𝐮 = 𝑭 𝐘 𝐘 𝐘′ 1 = 𝟏 𝑈 1 1 1 • Transformations can now be concatenated by matrix multiplication: 𝐘 1 = 𝑭 10 𝐘 0 𝐘 2 = 𝑭 21 𝐘 1 → 𝐘 2 = 𝑭 21 𝑭 10 𝐘 𝟏 1 1 1 1 1 1

  24. Homogeneous coordinates – definition in 𝑆 3 𝑌, 𝑍, 𝑎 𝑈 is represented in homogeneous coordinates by any • 𝐘 = 4-vector 𝑌 1 𝑌 2 𝑌 3 𝑌 4 • such that 𝑌 = 𝑌 1 /𝑌 4 , 𝑍 = 𝑌 2 /𝑌 4 , and 𝑎 = 𝑌 3 /𝑌 4 . • So the following homogeneous vectors represent the same point, for any 𝜇 ≠ 0 : 𝑌 1 𝑌 1 𝑌 2 𝑌 2 and 𝜇 𝑌 3 𝑌 3 𝑌 4 𝑌 4 E.g. 2,3,5, 1 𝑈 is the same as −3, −4.5, −7.5, −1.5 𝑈 and both • same inhomogeneous point 2,3,5 𝑈 represent the sam

  25. Homogeneous coordinates – definition in 𝑆 2 𝑦, 𝑧 𝑈 is represented in homogeneous • 𝐲 = coordinates by any 3-vector 𝑦 1 𝑦 2 𝑦 3 • such that 𝑦 = 𝑦 1 /𝑦 3 , 𝑧 = 𝑦 2 /𝑦 3 . • E.g. 1,2,3 𝑈 is the same as 3,6,9 𝑈 and both represent the same inhomogeneous point 0.33,0.66 𝑈

  26. Homogeneous notation – rues for use 1. Convert the inhomogeneous point to an homogeneous vector: 𝑌 𝑌 𝑍 → 𝑍 𝑎 𝑎 1 2. Apply a 4 × 4 transform. 3. Dehomogenize the resulting vector: 𝑌 1 𝑌 1 /𝑌 4 𝑌 2 𝑌 2 /𝑌 4 → 𝑌 3 𝑌 3 /𝑌 4 𝑌 4

  27. Projective transformations • A projective transformation is a linear transformation on homogeneous 4-vectors represented by a non-singular 4x4 matr trix ix. 𝑌′ 1 𝑞 11 𝑞 12 𝑞 13 𝑞 14 𝑌 1 𝑞 21 𝑞 22 𝑞 23 𝑞 24 𝑌′ 2 𝑌 2 = 𝑞 31 𝑞 32 𝑞 33 𝑞 34 𝑌 3 𝑌′ 3 𝑞 41 𝑞 42 𝑞 43 𝑞 44 𝑌 4 𝑌′ 4 • The effect on the homogenous points is that the original and transformed points are linked through a projection center. • The 4x4 matrix is defined up to scale, and so has 15 degrees of freedom.

  28. More 3D-3D and 2D-2D Transforms Projective (15 dof): Projective (aka Homography, 8 dof): 𝑌′ 1 𝑌 1 𝑦 ′1 𝑦 1 𝑌′ 2 𝑌 2 𝑦 ′2 𝑦 2 = 𝐼 3×3 = 𝑸 4×4 𝑌 3 𝑦 3 𝑌′ 3 𝑦 ′ 3 𝑌 4 𝑌′ 4 Affine (6 dof): Affine (12 dof): 1 = 𝑩 𝟑×𝟑 𝐮 2 𝐲 𝐲′ 1 = 𝑩 3×3 𝐮 3 𝐘 𝐘′ 𝟏 𝑈 1 1 𝟏 𝑈 1 1 Similarity (5 dof): Similarity (7 dof): 1 = 𝑇𝑺 2×2 𝐮 2 𝐲 𝐲′ = 𝑇𝑺 3×3 𝐮 3 𝐘 𝐘′ 𝟏 𝑈 1 1 𝟏 𝑈 1 1 1 Euclidean (4 dof): Euclidean (6 dof): 1 = 𝑺 2×𝟑 𝐮 𝟑 𝐲 𝐲′ = 𝑺 3×3 𝐮 3 𝐘 𝐘′ 1 𝟏 𝑈 1 𝟏 𝑈 1 1 1

  29. 2D-2D Transform Examples 𝑏 11 𝑏 12 𝑢 𝑦 ℎ 11 ℎ 12 ℎ 12 cos 𝜄 − sin 𝜄 𝑢 𝑦 𝑡cos 𝜄 − 𝑡sin 𝜄 𝑢 𝑦 𝑏 21 𝑏 22 𝑢 𝑧 ℎ 21 ℎ 22 ℎ 23 sin 𝜄 cos 𝜄 𝑢 𝑧 𝑡sin 𝜄 𝑡cos 𝜄 𝑢 𝑧 ℎ 31 ℎ 32 ℎ 33 0 0 1 0 0 1 0 0 1 Euclidean Similarity Affine Projective 3 DoF 4 DoF 6 DoF 8 DoF

  30. Perspective 3D-2D Transforms • Similar to a 3D-3D projective transform, but constr train the transformed poi point to to a a plane 𝒜 = 𝒈 . pla 𝑦 1 𝑦 2 𝑨 = 𝑔 → 𝐘 image = 𝑔 1 • Because z = 𝑔 is fixed, we can write: 𝑞 11 𝑞 12 𝑞 13 𝑞 14 𝑦 1 𝑌 1 𝑞 21 𝑞 22 𝑞 23 𝑞 24 𝑦 2 𝑌 2 𝜇 = 𝑔𝑞 31 𝑔𝑞 32 𝑔𝑞 33 𝑔𝑞 34 𝑔 𝑌 3 𝑞 31 𝑞 32 𝑞 33 𝑞 34 1 1 The 3 rd row is redundant, so: • 𝑌 1 𝑌 1 𝑞 11 𝑞 12 𝑞 13 𝑞 14 𝑦 1 𝑌 2 𝑌 2 𝑞 21 𝑞 22 𝑞 23 𝑞 24 𝑦 2 𝜇 = = 𝑄 3×4 𝑌 3 𝑌 3 𝑞 31 𝑞 32 𝑞 33 𝑞 34 1 1 1 𝑄 3×4 is the pr projection matrix ix and this is a per perspective transform

Recommend


More recommend