6 869 model based vision
play

6.869 Model-based Vision Topics: Advances in Computer Vision - PDF document

6.869 Model-based Vision Topics: Advances in Computer Vision Hypothesize and test Interpretation Trees Prof. Bill Freeman Alignment Interpretation trees Model-based vision Hypothesis generation methods Hypothesize


  1. 6.869 Model-based Vision Topics: Advances in Computer Vision – Hypothesize and test • Interpretation Trees Prof. Bill Freeman • Alignment – Interpretation trees Model-based vision – Hypothesis generation methods • Hypothesize and test • Interpretation Trees • Pose clustering • Alignment • Invariances • Pose Clustering • Geometric hashing • Geometric Hashing – Verification methods Readings: F&P Ch 18.1-18.5 1 2 Object recognition as a function of Paths to computer vision research time in computer vision research Recognizing object Recognizing instances Picking identical classes, material of textured objects parts from a pile properties Computer science Electrical engineering, physics Tools: Tools: Computer vision Binary numbers, Real numbers, Counting, Probabilities, Threshold tests, Soft decisions, Graph cuts. Belief propagation. ~1985 ~1995 ~2005 dollarfifty.tripod.com/ pho/004lg.jpg http://images.google.com/imgres?imgurl=http://www.displayit- 3 4 http://www.fanuc.co.jp/en/product/robot/rob info.com/food/images/desserts/2131.JPG&imgrefurl=http://www.displayit- otshow2003/image/m-16ib20_3dv_e.gif info.com/food/dessert6.html&h=504&w=501&sz=181&tbnid=FXJATGzVyA4J:&tbnh=128&tbnw=127&st art=13&prev=/images%3Fq%3Dice%2Bcream%2Bsundae%26hl%3Den%26lr%3D%26sa%3DG Approach Hypothesize and Test Recognition • Given • Hypothesize object identity and correspondence – Recover pose – CAD Models (with features) – Render object in camera – Detected features in an image – Compare to image • Hypothesize and test recognition… • Issues – Guess – where do the hypotheses come from? – Render – How do we compare to image (verification)? – Compare 5 6 1

  2. Features? How to generate hypotheses? • Points • Brute force but also, – Construct a correspondence for all object features to • Lines every correctly sized subset of image points • Conics – Expensive search, which is also redundant. • Other fitted curves – L objects with N features • Regions (particularly the center of a region, etc.) – M features in image – O(LM N ) ! • More descriptive local features (eg work by Schmid and Lowe). “…of intermediate complexity, which means that they are distinctive enough to determine likely matches in a large database of features, but are sufficiently local to be insensitive to clutter and occlusion”. (Lowe, CVPR01) 7 8 Brute force method Ways around that combinatorial L models image explosion A B C M pts N pts • Add geometric constraints to prune search, leading to interpretation tree search Try all M image feature points for a model point, • Try subsets of features (frame groups)… Then try all M-1 remaining image feature points for another model point, then all M-2 for the next, etc. M * (M-1) * (M-2) …* (M-N+1) for each of L models= O(LM N ) 9 10 Frame groups Adding constraints • A group of features that can yield a camera hypothesis. • If you know the intrinsic parameters of your camera, then these are the set of features needed to specify the object’s • Correspondences between image features and pose relative to the camera. model features are not independent. • With a perspective camera model, known intrinsic camera parameters, some frame groups are: • A small number of good correspondences yields a reliable pose estimation --- the others must be consistent with this. • Generate hypotheses using small numbers of correspondences (e.g. triples of points for a 3 points Trihedral vertex, and a Dihedral vertex, calibrated perspective camera, etc., etc.) point (for scale) and a point 11 12 2

  3. Pose consistency / Alignment Rendering an object into the image Perspective camera • Given known camera type in some unknown configuration (pose) – Hypothesize configuration from set of initial features – Backproject – Test 13 14 Rendering an object into the image A frame group for an affine camera model Affine camera Affine camera = Π = Π Rendering ith 3d pt to 2d Rendering ith 3d pt to 2d p AP p AP image position image position i i i i General affine ⎛ ⎞ a a a a ⎜ ⎟ Orthographic 00 01 02 03 General affine transformation transformation ⎛ ⎞ ⎛ ⎞ 1 0 0 0 ⎜ ⎟ a a a a a a a a camera ⎜ ⎟ = 10 11 12 13 Orthographic camera A ⎜ ⎟ Π = ⎜ ⎟ 00 01 02 03 ⎜ ⎟ 0 1 0 0 a a a a ⎜ ⎟ ⎜ ⎟ ⎛ ⎞ 20 21 22 23 1 0 0 0 ⎜ ⎟ ⎜ ⎟ a a a a ⎝ 0 0 0 1 ⎠ ⎜ ⎟ ⎝ 0 0 0 1 ⎠ = 10 11 12 13 A ⎜ ⎟ Π = ⎜ ⎟ 0 1 0 0 a a a a Relating observed 2-d positions to 3-d model positions ⎜ ⎟ ⎜ ⎟ + + + 20 21 22 23 ⎛ ⎞ ⎛ ⎞ ⎜ ⎟ p a P a P a P a P ⎝ ⎠ 0 0 0 1 ⎜ ⎟ ⎜ ⎟ ⎝ ⎠ = 0 0 0 1 i 0 00 i 0 01 i 1 02 i 2 03 i 3 ⎜ ⎟ ⎜ ⎟ + + + ⎝ p ⎠ ⎝ a P a P a P a P ⎠ i 1 10 i 0 11 i 1 12 i 2 13 i 3 Need at least 4 points in general position to determine the affine camera parameters. (Note: only the 1 st 2 rows of A contribute to the projection, so we only need to 15 16 estimate them.) Alignment algorithm 17 18 3

  4. More than 1 object in image Model-based Vision Topics: • Require same intrinsic camera parameters – Hypothesize and test for each object. • Interpretation Trees • Alignment – Interpretation trees – Hypothesis generation methods • Pose clustering • Invariances • Geometric hashing – Verification methods 19 20 Interpretation Trees Interpretation Trees • Tree of possible model-image feature assignments • Depth-first search • Prune when unary (binary, …) constraint violated – length – area – orientation (a,1) (b,2) … “Wild cards” handle spurious image features … [ A.M. Wallace. 1988. ] 21 22 http://faculty.washington.edu/cfolson/papers/pdf/icpr04.pdf Model-based Vision Topics: • How does the hypothesize and test method – Hypothesize and test fail? • Interpretation Trees – False matches • Alignment – Too many hypotheses to consider – Interpretation trees – Hypothesis generation methods • Pose clustering • To add robustness and efficiency, use other • Invariances heuristics to select candidate object poses • Geometric hashing – Verification methods 23 24 4

  5. Pose Clustering Pose clustering • Each model leads to many correct sets of correspondences, each of which has the same pose • Vote on object pose, in an accumulator array (per object) • This is a computer science approach to doing a more probabilistic thing: treating each set of feature observations as statistically independent and multiplying together their probabilities of occurrence to obtain a likelihood function. 25 26 Two models used in an early pose clustering system Pose clustering Problems – Clutter may lead to more votes than the target! – Difficult to pick the right bin size Confidence-weighted clustering – See where model frame group is reliable (visible!) – Downweight / discount votes from frame groups at poses where that frame group is unreliable… – Again, we can make this more precise in a probabilistic framework later. 27 28 Test image, with edge points marked pick feature pair dark regions show reliable-pose-estimate views of those 29 30 features over the viewing sphere 5

  6. Image with edges of found models overlaid 31 32 A more recent pose/view clustering example Detected airplanes, rerendered at their detected poses. (Note mis-estimated • “Local feature view clustering for 3D object recognition”, pose of plane on runway.) by David Lowe (see his web page for copy). • Schmid, Lowe incorporate “super-features”, point features with robust local image descriptors 33 34 Detecting 0.1% inliers among 99.9% outliers? Lowe’s Model verification step • Example: David Lowe’s SIFT-based Recognition system • Examine all clusters with at least 3 features • Goal: recognize clusters of just 3 consistent features • Perform least-squares affine fit to model. among 3000 feature match hypotheses • Discard outliers and perform top-down check for • Approach additional features. – Vote for each potential match according to model ID and pose • Evaluate probability that match is correct – Insert into multiple bins to allow for error in similarity – Use Bayesian model, with probability that features approximation would arise by chance if object was not present – Using a hash table instead of an array avoids need to – Takes account of object size in image, textured regions, form empty bins or predict array size model feature count in database, accuracy of fit (Lowe, CVPR 01) 35 36 [Lowe] [Lowe] 6

  7. Solution for affine parameters Models for planar surfaces with SIFT keys: • Affine transform of [x,y] to [u,v]: • Rewrite to solve for transform parameters: 37 38 [Lowe] [Lowe] 3D Object Recognition Planar recognition • Extract outlines with background • Planar surfaces can be subtraction reliably recognized at a rotation of 60° away from the camera • Affine fit approximates perspective projection • Only 3 points are needed for recognition 39 40 [Lowe] [Lowe] 3D Object Recognition Recognition under occlusion • Only 3 keys are needed for recognition, so extra keys provide robustness • Affine model is no longer as accurate 41 42 [Lowe] [Lowe] 7

Recommend


More recommend