6.869 Model-based Vision Topics: Advances in Computer Vision - PDF document

6.869 Model-based Vision Topics: Advances in Computer Vision – Hypothesize and test • Interpretation Trees Prof. Bill Freeman • Alignment – Interpretation trees Model-based vision – Hypothesis generation methods • Hypothesize and test • Interpretation Trees • Pose clustering • Alignment • Invariances • Pose Clustering • Geometric hashing • Geometric Hashing – Verification methods Readings: F&P Ch 18.1-18.5 1 2 Object recognition as a function of Paths to computer vision research time in computer vision research Recognizing object Recognizing instances Picking identical classes, material of textured objects parts from a pile properties Computer science Electrical engineering, physics Tools: Tools: Computer vision Binary numbers, Real numbers, Counting, Probabilities, Threshold tests, Soft decisions, Graph cuts. Belief propagation. ~1985 ~1995 ~2005 dollarfifty.tripod.com/ pho/004lg.jpg http://images.google.com/imgres?imgurl=http://www.displayit- 3 4 http://www.fanuc.co.jp/en/product/robot/rob info.com/food/images/desserts/2131.JPG&imgrefurl=http://www.displayit- otshow2003/image/m-16ib20_3dv_e.gif info.com/food/dessert6.html&h=504&w=501&sz=181&tbnid=FXJATGzVyA4J:&tbnh=128&tbnw=127&st art=13&prev=/images%3Fq%3Dice%2Bcream%2Bsundae%26hl%3Den%26lr%3D%26sa%3DG Approach Hypothesize and Test Recognition • Given • Hypothesize object identity and correspondence – Recover pose – CAD Models (with features) – Render object in camera – Detected features in an image – Compare to image • Hypothesize and test recognition… • Issues – Guess – where do the hypotheses come from? – Render – How do we compare to image (verification)? – Compare 5 6 1

Features? How to generate hypotheses? • Points • Brute force but also, – Construct a correspondence for all object features to • Lines every correctly sized subset of image points • Conics – Expensive search, which is also redundant. • Other fitted curves – L objects with N features • Regions (particularly the center of a region, etc.) – M features in image – O(LM N ) ! • More descriptive local features (eg work by Schmid and Lowe). “…of intermediate complexity, which means that they are distinctive enough to determine likely matches in a large database of features, but are sufficiently local to be insensitive to clutter and occlusion”. (Lowe, CVPR01) 7 8 Brute force method Ways around that combinatorial L models image explosion A B C M pts N pts • Add geometric constraints to prune search, leading to interpretation tree search Try all M image feature points for a model point, • Try subsets of features (frame groups)… Then try all M-1 remaining image feature points for another model point, then all M-2 for the next, etc. M * (M-1) * (M-2) …* (M-N+1) for each of L models= O(LM N ) 9 10 Frame groups Adding constraints • A group of features that can yield a camera hypothesis. • If you know the intrinsic parameters of your camera, then these are the set of features needed to specify the object’s • Correspondences between image features and pose relative to the camera. model features are not independent. • With a perspective camera model, known intrinsic camera parameters, some frame groups are: • A small number of good correspondences yields a reliable pose estimation --- the others must be consistent with this. • Generate hypotheses using small numbers of correspondences (e.g. triples of points for a 3 points Trihedral vertex, and a Dihedral vertex, calibrated perspective camera, etc., etc.) point (for scale) and a point 11 12 2

Pose consistency / Alignment Rendering an object into the image Perspective camera • Given known camera type in some unknown configuration (pose) – Hypothesize configuration from set of initial features – Backproject – Test 13 14 Rendering an object into the image A frame group for an affine camera model Affine camera Affine camera = Π = Π Rendering ith 3d pt to 2d Rendering ith 3d pt to 2d p AP p AP image position image position i i i i General affine ⎛ ⎞ a a a a ⎜ ⎟ Orthographic 00 01 02 03 General affine transformation transformation ⎛ ⎞ ⎛ ⎞ 1 0 0 0 ⎜ ⎟ a a a a a a a a camera ⎜ ⎟ = 10 11 12 13 Orthographic camera A ⎜ ⎟ Π = ⎜ ⎟ 00 01 02 03 ⎜ ⎟ 0 1 0 0 a a a a ⎜ ⎟ ⎜ ⎟ ⎛ ⎞ 20 21 22 23 1 0 0 0 ⎜ ⎟ ⎜ ⎟ a a a a ⎝ 0 0 0 1 ⎠ ⎜ ⎟ ⎝ 0 0 0 1 ⎠ = 10 11 12 13 A ⎜ ⎟ Π = ⎜ ⎟ 0 1 0 0 a a a a Relating observed 2-d positions to 3-d model positions ⎜ ⎟ ⎜ ⎟ + + + 20 21 22 23 ⎛ ⎞ ⎛ ⎞ ⎜ ⎟ p a P a P a P a P ⎝ ⎠ 0 0 0 1 ⎜ ⎟ ⎜ ⎟ ⎝ ⎠ = 0 0 0 1 i 0 00 i 0 01 i 1 02 i 2 03 i 3 ⎜ ⎟ ⎜ ⎟ + + + ⎝ p ⎠ ⎝ a P a P a P a P ⎠ i 1 10 i 0 11 i 1 12 i 2 13 i 3 Need at least 4 points in general position to determine the affine camera parameters. (Note: only the 1 st 2 rows of A contribute to the projection, so we only need to 15 16 estimate them.) Alignment algorithm 17 18 3

More than 1 object in image Model-based Vision Topics: • Require same intrinsic camera parameters – Hypothesize and test for each object. • Interpretation Trees • Alignment – Interpretation trees – Hypothesis generation methods • Pose clustering • Invariances • Geometric hashing – Verification methods 19 20 Interpretation Trees Interpretation Trees • Tree of possible model-image feature assignments • Depth-first search • Prune when unary (binary, …) constraint violated – length – area – orientation (a,1) (b,2) … “Wild cards” handle spurious image features … [ A.M. Wallace. 1988. ] 21 22 http://faculty.washington.edu/cfolson/papers/pdf/icpr04.pdf Model-based Vision Topics: • How does the hypothesize and test method – Hypothesize and test fail? • Interpretation Trees – False matches • Alignment – Too many hypotheses to consider – Interpretation trees – Hypothesis generation methods • Pose clustering • To add robustness and efficiency, use other • Invariances heuristics to select candidate object poses • Geometric hashing – Verification methods 23 24 4

Pose Clustering Pose clustering • Each model leads to many correct sets of correspondences, each of which has the same pose • Vote on object pose, in an accumulator array (per object) • This is a computer science approach to doing a more probabilistic thing: treating each set of feature observations as statistically independent and multiplying together their probabilities of occurrence to obtain a likelihood function. 25 26 Two models used in an early pose clustering system Pose clustering Problems – Clutter may lead to more votes than the target! – Difficult to pick the right bin size Confidence-weighted clustering – See where model frame group is reliable (visible!) – Downweight / discount votes from frame groups at poses where that frame group is unreliable… – Again, we can make this more precise in a probabilistic framework later. 27 28 Test image, with edge points marked pick feature pair dark regions show reliable-pose-estimate views of those 29 30 features over the viewing sphere 5

Image with edges of found models overlaid 31 32 A more recent pose/view clustering example Detected airplanes, rerendered at their detected poses. (Note mis-estimated • “Local feature view clustering for 3D object recognition”, pose of plane on runway.) by David Lowe (see his web page for copy). • Schmid, Lowe incorporate “super-features”, point features with robust local image descriptors 33 34 Detecting 0.1% inliers among 99.9% outliers? Lowe’s Model verification step • Example: David Lowe’s SIFT-based Recognition system • Examine all clusters with at least 3 features • Goal: recognize clusters of just 3 consistent features • Perform least-squares affine fit to model. among 3000 feature match hypotheses • Discard outliers and perform top-down check for • Approach additional features. – Vote for each potential match according to model ID and pose • Evaluate probability that match is correct – Insert into multiple bins to allow for error in similarity – Use Bayesian model, with probability that features approximation would arise by chance if object was not present – Using a hash table instead of an array avoids need to – Takes account of object size in image, textured regions, form empty bins or predict array size model feature count in database, accuracy of fit (Lowe, CVPR 01) 35 36 [Lowe] [Lowe] 6

Solution for affine parameters Models for planar surfaces with SIFT keys: • Affine transform of [x,y] to [u,v]: • Rewrite to solve for transform parameters: 37 38 [Lowe] [Lowe] 3D Object Recognition Planar recognition • Extract outlines with background • Planar surfaces can be subtraction reliably recognized at a rotation of 60° away from the camera • Affine fit approximates perspective projection • Only 3 points are needed for recognition 39 40 [Lowe] [Lowe] 3D Object Recognition Recognition under occlusion • Only 3 keys are needed for recognition, so extra keys provide robustness • Affine model is no longer as accurate 41 42 [Lowe] [Lowe] 7

6.869 Model-based Vision Topics: Advances in Computer Vision - PDF document

6.869 Model-based Vision Topics: Advances in Computer Vision Hypothesize and test Interpretation Trees Prof. Bill Freeman Alignment Interpretation trees Model-based vision Hypothesis generation methods Hypothesize

$3,217,869 $3,307,057 20 $2,106,397 includes Building Capital funds ($277,072) 3% 19% 14%

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

Branding Presentation VISION Mevushal VISION Muscat of Alexandria & Viognier VISION

Face detection Bill Freeman, MIT 6.869 April 5, 2005 Today (April 5, 2005) Face detection

Face detection and recognition Bill Freeman, MIT 6.869 April 7, 2005 Today (April 7, 2005)

6.869 Advances in Computer Vision Prof. Bill Freeman March 3, 2005 Image and shape descriptors

6.869 Advances in Computer Vision Matching with Invariant Features Prof. Bill Freeman March 3,

6.869 Computer Vision and Applications Prof. Bill Freeman Tracking Density propagation

6.869 Advances in Computer Vision Prof. Bill Freeman March 1, 2005 1 2 Local Features Today

Vision Services Vision Services & & Vision Therapy Vision Therapy February 2, 2007

Vision Our National Church partners .. Vision Our National Network partners Vision Getting

Page 1 Research at MERL on fast, But we can fake it with low-cost vision systems clever system

HIM Without Walls Realizing Our Vision! Realizing Our Vision Realize Our Vision Realizing Our

15-869 References 1/28 I was very excited to see the first paper listed below appear in SIGGRAPH

Today Interpretation tree Edges Bayes 1 Bill Freeman, MIT 6.869, March 10, 2005

Segmentation and low-level grouping. Bill Freeman, MIT 6.869 April 14, 2005 Readings: Mean shift

Of Visions & Revelations 2 Corinthians 12:113 I. Of Visions, 2 Corinthians 12:16 II. Of

S. Jindariani M.A.Cummings, C.Group, R.Patterson G.Pawloski, R.Zwaska

Tech Nation Visa Scheme Wednesday 28 th March 2018 Ward Hadaway Guest WiFi Email:

The A to Z of US Market Entry: Company Set Up, Tax, Investment, Employment & Immigration

Lattices: . . . to Cryptography Chris Peikert Georgia Institute of Technology Visions of

Dark sector searches at BaBar and Belle and outlook for Belle II Christopher Hearty University of

A/K/A Apocalypse A/K/A Apocalypse ( apokalupsis ) A/K/A

FAITH AND WORK: A PRACTICAL THEOLOGY OF WORK God Will, Gudiance and Decision Making Part VI 2

6.869 Model-based Vision Topics: Advances in Computer Vision - PDF document

6.869 Model-based Vision Topics: Advances in Computer Vision Hypothesize and test Interpretation Trees Prof. Bill Freeman Alignment Interpretation trees Model-based vision Hypothesis generation methods Hypothesize

$3,217,869 $3,307,057 20 $2,106,397 includes Building Capital funds ($277,072) 3% 19% 14%

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

Branding Presentation VISION Mevushal VISION Muscat of Alexandria &amp; Viognier VISION

Face detection Bill Freeman, MIT 6.869 April 5, 2005 Today (April 5, 2005) Face detection

Face detection and recognition Bill Freeman, MIT 6.869 April 7, 2005 Today (April 7, 2005)

6.869 Advances in Computer Vision Prof. Bill Freeman March 3, 2005 Image and shape descriptors

6.869 Advances in Computer Vision Matching with Invariant Features Prof. Bill Freeman March 3,

6.869 Computer Vision and Applications Prof. Bill Freeman Tracking Density propagation

6.869 Advances in Computer Vision Prof. Bill Freeman March 1, 2005 1 2 Local Features Today

Vision Services Vision Services &amp; &amp; Vision Therapy Vision Therapy February 2, 2007

Vision Our National Church partners .. Vision Our National Network partners Vision Getting

Page 1 Research at MERL on fast, But we can fake it with low-cost vision systems clever system

HIM Without Walls Realizing Our Vision! Realizing Our Vision Realize Our Vision Realizing Our

15-869 References 1/28 I was very excited to see the first paper listed below appear in SIGGRAPH

Today Interpretation tree Edges Bayes 1 Bill Freeman, MIT 6.869, March 10, 2005

Segmentation and low-level grouping. Bill Freeman, MIT 6.869 April 14, 2005 Readings: Mean shift

Of Visions &amp; Revelations 2 Corinthians 12:113 I. Of Visions, 2 Corinthians 12:16 II. Of

S. Jindariani M.A.Cummings, C.Group, R.Patterson G.Pawloski, R.Zwaska

Tech Nation Visa Scheme Wednesday 28 th March 2018 Ward Hadaway Guest WiFi Email:

The A to Z of US Market Entry: Company Set Up, Tax, Investment, Employment &amp; Immigration

Lattices: . . . to Cryptography Chris Peikert Georgia Institute of Technology Visions of

Dark sector searches at BaBar and Belle and outlook for Belle II Christopher Hearty University of

A/K/A Apocalypse A/K/A Apocalypse ( apokalupsis ) A/K/A

FAITH AND WORK: A PRACTICAL THEOLOGY OF WORK God Will, Gudiance and Decision Making Part VI 2

Branding Presentation VISION Mevushal VISION Muscat of Alexandria & Viognier VISION

Vision Services Vision Services & & Vision Therapy Vision Therapy February 2, 2007

Of Visions & Revelations 2 Corinthians 12:113 I. Of Visions, 2 Corinthians 12:16 II. Of

The A to Z of US Market Entry: Company Set Up, Tax, Investment, Employment & Immigration