A Hierarchical Matching of Deformable Shapes Pedro Felzenszwalb Department of Computer Science University of Chicago Joint work with Joshua Schwartz
Shape-based recognition • Humans can recognize many objects based on shape alone. • Fundamental cue for many object categories. • Classical approach for recognizing rigid objects. • Invariant to photometric variation.
Comparing and matching shapes • Related problems - Measuring the similarity between shapes. - Finding a set of correspondences between shapes. - Finding a shape similar to a model in an image. ≈
Elastic matching • Measure amount of bending and stretching necessary to turn one curve into another [Basri et al 95], [Sebastian et al 03]. - Similar to computing edit distance between strings. - Efficient dynamic programming algorithms. - Can capture some but not all important shape aspects. Can turn these into each other without much bending anywhere. Similar objects with completely different local boundary properties.
Our approach: compositional method • Compose matchings between subcurves to get longer matchings. - Different kind of dynamic programming. • Cost of combination depends on: p1 - Cost of matchings being combined. q1 - Arrangement of endpoints. B1 A1 q2 p2 B2 A2 For long matchings the endpoints q3 are far away and we capture global p3 geometric properties.
Shape-tree • Shape-tree of curve from a to b: - Select midpoint c, store relative location c | a,b. - Left child is a shape-tree of sub-curve from a to c. - Right child is a shape-tree of sub-curve from c to b. h f i c d e c | a,b g b a e | a,c d | c,b f | a,e g | e,c h | c,d i | d,b
Shape-tree • Invariant to similarity transformation. • Sub-tree is shape-tree of sub-curve. • Given placement for a,b we can reconstruct the curve. • Bottom nodes captures local curvature. • Top nodes capture curvature of sub-sampled curve. h f i c d e c | a,b g b a e | a,c d | c,b f | a,e g | e,c h | c,d i | d,b
Relative locations B B A C (-1/2,0) (1/2,0) C A • Bookstein coordinates for representing B | A,C. • There exists a unique similarity transformation T taking: - A to (-1/2,0) - C to (1/2,0) • Look at T(B).
Deformation model • Independently perturb relative locations stored in a shape-tree. - Reconstructed curve is perceptually similar to original. - Local and global properties are preserved.
Distance between curves • Given curves A and B. • Can’t compare shape-trees for A and B built separately. • Search over shape-trees for A and B looking for similar pair. - Can be done in O(n 3 m 3 ) time using DP (n = |A|, m = |B|). • Our current approach: - Fix shape-tree for A and look for map from points in A to points in B that doesn’t deform the shape-tree much. - Efficient O(nm 3 ) DP algorithm.
Matching open curves • Curves: A = (a 1 , ..., a n ) and B = (b 1 , ..., b m ). • Assume a 1 → b 1 and a n → b m . • Shape-tree defines midpoint a i dividing A into A 1 and A 2 . • Search over corresponding point b j dividing B. � � ψ ( A 1 , B 1 ) + ψ ( A 2 , B 2 ) + ψ ( A, B ) = min λ ∗ dif(( a i | a 1 , a n ) , ( b j | b 1 , b m )) b j is similarity between A and B. ψ measures difference in relative locations. dif is a scaling factor. λ
Dynamic programming • Let v be node in shape-tree of A. - Corresponds to subcurve A’. • Table T(v): - T(v)[i][j] is cost of matching A’ to (b i , ..., b j ). - T(v) can be computed using T(u) and T(w) where u and w are children of v in shape-tree. • O(n) tables, O(m 2 ) entries per table, O(m) to compute entry. - O(nm 3 ) algorithm. • Generalization can cut off sub-trees to allow for missing parts. • Can also handle closed curves...
Recognition results - swedish leaves Nearest neighbor classification 15 species Shape-tree 96.28 75 examples per species (25 training, 50 test) Inner distance 94.13 Shape context 88.12
Recognition results - MPEG7 Example categories: Bullseye score Shape-tree 87.70 Hier. procrustes 86.35 Inner distance 85.40 Curve edit 78.14 Shape context 76.51 70 categories 20 shapes per category
Cluttered images (1) input (2) edges model (4) detection (3) contours
Matching to cluttered images • M: model (closed curve). • C: curves in image. • P: endpoints of curves in C. • Match([a,b], [p,q]): matching of subcurve of M from a to b to subset of C with a → p and b → q. p a b q M C
Matching to cluttered images • Use DP to match each curve in C to every subcurve of M. - Generate a set of initial matchings Match([a,b], [p,q]). - Running time is linear on total length of image contours. • Define gap matching Match([a,b], [p,q]) from every subcurve of M to every pair of anchor points in the image. - Cost depends on length of [a,b]. • Stitch partial matchings together to form longer matchings. - Using compositional rule. - Second phase of DP.
Compositional rule If ||q-r|| < T can compose Match([a,b], [p,q]) and Match([b,c], [r,s]) a p b q r s c M C m = (q+r)/2 Match([a,b], [p,q]) = w 1 Match([b,c], [r,s]) = w 2 Match([a,c], [p,s]) = w 1 + w 2 + dif((b|a,c), (m|p,s))
Example compositions • Composing - Match([c,d], [r,s]), Match([d,e], [t,u]). - Get Match([c,e], [r,u]). - Match([a,b], [p,q]) with gap matching Match([b,c], [q,r]). - Get matching Match([a,c], [p,r]). - ...
Example results model best match in each image
More results model best match in each image
Object detection
Part-based models • Sub-trees usually represent fairly generic curves. • We can share sub-trees among different models. - Leads to a notion of parts. - Useful for bottom-up matching. • We can generalize shape-tree models using grammars. - Allows for models to share parts. - Parts can share sub-parts. - Objects can have variable structure.
Hierarchical curve models (HCM) • Underlying PCFG defying the “syntactic” structure of objects. - Single terminal l corresponding to line segment. - Productions: • X → l • X → YZ • X(a,b) is curve of type X from a to b. • Geometry of curve is defined by its structure and conditional distributions over midpoint choice. - For each rule r = X → YZ we have P r (c | a,b).
• To generate a curve of type X from a to b, - Pick production r = X → YZ. - Pick midpoint c from P r (c | a,b). - Generate curve of type Y from a to c and Z from c to b. • Get a new stochastic grammar: - Nonterminals X(a,b) and terminals l(a,b). - Sentences are polygonal chains: l(a 1 ,a 2 ) l(a 2 ,a 3 ) ... l(a n-1 ,a n ). - P(X(a,b) → Y(a,c)Z(c,b)) = P(X → YZ) P r (c | a,b). - P(X(a,b) → l(a,b)) = P(X → l).
Examples • Shape-tree deformation model is HCM with fixed structure. - Underlying grammar has one non-terminal and production for each node in shape-tree. - Always generates the same structure. - P r (c | a,b) is parametric model defined by midpoint location. • L(a,b) generates an “almost straight curve” from a to b: - L(a,b) ➝ L(a,c) L(c,b) where c ~ (a+b)/2 - L(a,b) ➝ l(a,b) - Recursive model with a single non-terminal. • These are two extremes...
Current and future work... • Relationship to wavelets. • Learning HCMs from example shapes. • Using HCMs for parsing images. • ... random shapes
Recommend
More recommend