Object Detection and Segmentation from Joint Embedding of Parts and Pixels Michael Maire 1 , Stella X. Yu 2 , Pietro Perona 1 1 California Institute of Technology - Pasadena, CA 91125 2 Boston College - Chestnut Hill, MA 02467
Segmentation Detection
Segmentation Detection � �� � Perceptual Grouping Framework
Ingredients Plug in state-of-the-art components:
Ingredients Plug in state-of-the-art components: low-level cues: color, texture, edges [Arbel´ aez, Maire, Fowlkes, Malik, PAMI 2011]
Ingredients Plug in state-of-the-art components: low-level cues: top-down parts: color, texture, edges poselets for person detection [Arbel´ aez, Maire, Fowlkes, Malik, PAMI 2011] [Bourdev, Maji, Brox, Malik, ECCV 2010]
Ingredients Plug in state-of-the-art components: PASCAL VOC 2010 Person Category: Improved Detection and Segmentation low-level cues: top-down parts: color, texture, edges poselets for person detection [Arbel´ aez, Maire, Fowlkes, Malik, PAMI 2011] [Bourdev, Maji, Brox, Malik, ECCV 2010]
Grouping Relationships
Grouping Relationships
Pixel Affinity: Color, Texture Similarity
b Pixel Affinity: Color, Texture Similarity
b Pixel Affinity: Color, Texture Similarity
Part Affinity: Geometric Compatibility
Part Affinity: Geometric Compatibility
b pixels b
parts b pixels b
parts surround b pixels b
parts surround b pixels b
parts surround b pixels b
parts figure/ground surround b C prior b pixels b
parts figure/ground surround bC prior b pixels b ⇒ Angular Embedding ⇒ ⇒ ⇒ segmentation objects figure/ground
Angular Embedding
Angular Embedding q p
Angular Embedding q p
Angular Embedding Given: q ◮ Relative ordering Θ( · , · ) ◮ Confidence on relationships C ( · , · ) p
Angular Embedding Given: q ◮ Relative ordering Θ( · , · ) ◮ Confidence on relationships C ( · , · ) Compute: ◮ Global ordering θ ( · ) - p ◮ Embed into unit circle: - q p p → z ( p ) = e i θ ( p ) θ
Angular Embedding Given: q ◮ Relative ordering Θ( · , · ) ◮ Confidence on relationships C ( · , · ) Compute: ◮ Global ordering θ ( · ) - p ◮ Embed into unit circle: - q p p → z ( p ) = e i θ ( p ) θ Subject to: ◮ Linear constraints on embedding solution in columns of U
z ( p ) i z ( r ) z ( q ) − 1 0 1 � q C ( p , q ) minimize: ε = � z ( p ) | 2 p , q C ( p , q ) · | z ( p ) − ˜ � p [Yu, PAMI 2011]
z ( r ) e i Θ( p , r ) z ( p ) i z ( q ) e i Θ( p , q ) Θ( p , q ) z ( r ) Θ( p , r ) ) C ( p , r ) q , p ( z ( q ) C − 1 0 1 � q C ( p , q ) minimize: ε = � z ( p ) | 2 p , q C ( p , q ) · | z ( p ) − ˜ � p [Yu, PAMI 2011]
z ( r ) e i Θ( p , r ) z ( p ) i z ( q ) e i Θ( p , q ) z ( p ) ˜ Θ( p , q ) z ( r ) Θ( p , r ) ) C ( p , r ) q , p ( z ( q ) C − 1 0 1 � q C ( p , q ) minimize: ε = � z ( p ) | 2 p , q C ( p , q ) · | z ( p ) − ˜ � p [Yu, PAMI 2011]
C q ( C f , Θ f ) ( C s , Θ s ) b C U b b C p
pixels parts prior surround ���� � �� � � �� � ���� 0 0 0 C p 0 α · C q β · C s γ · C f C = β · C T 0 0 0 s γ · C T 0 0 0 f 0 0 0 0 0 0 Θ s Θ f Θ = Σ − 1 − Θ T 0 0 0 s − Θ T 0 0 0 f
Angular Embedding Relax to generalized eigenproblem QPQz = λ z : P = D − 1 W Q = I − D − 1 U ( U T D − 1 U ) − 1 U T with D and W defined as: D = Diag ( C 1 n ) W = C • e i Θ Eigenvectors { z 0 , z 1 , ..., z m − 1 } embed pixels and parts into C m
Angular Embedding ∠ z 0 encodes global ordering z 1 , z 2 , ..., z m − 1 encode grouping
Angular Embedding ∠ z 0 encodes global ordering z 1 , z 2 , ..., z m − 1 encode grouping if Θ = 0 ⇒ Normalized Cuts (grouping without ordering)
Decoding Eigenvectors: Object Detection ℑ ( z 2 ) ℜ ( z 2 ) ℑ ( z 0 ) ℑ ( z 1 ) ℜ ( z 0 ) ℜ ( z 1 ) Ordering Grouping
b b b Decoding Eigenvectors: Object Detection ℑ ( z 2 ) ℜ ( z 2 ) ℑ ( z 0 ) ℑ ( z 1 ) ℜ ( z 0 ) ℜ ( z 1 ) Ordering Grouping
b b b b b b Decoding Eigenvectors: Object Detection ℑ ( z 2 ) ℜ ( z 2 ) ℑ ( z 0 ) ℑ ( z 1 ) ℜ ( z 0 ) ℜ ( z 1 ) Ordering Grouping
b b b b b b b Decoding Eigenvectors: Object Detection ℑ ( z 2 ) b b ℜ ( z 2 ) ℑ ( z 0 ) ℑ ( z 1 ) ℜ ( z 0 ) ℜ ( z 1 ) Ordering Grouping
b b b b b b b Decoding Eigenvectors: Object Detection ℑ ( z 2 ) b b ℜ ( z 2 ) ℑ ( z 0 ) ℑ ( z 1 ) ℜ ( z 0 ) ℜ ( z 1 ) Ordering Grouping
b b b b b b b Decoding Eigenvectors: Object Detection ℑ ( z 2 ) b b ℜ ( z 2 ) ℑ ( z 0 ) ℑ ( z 1 ) ℜ ( z 0 ) ℜ ( z 1 ) Ordering Grouping
b b b b b b b Decoding Eigenvectors: Object Detection ℑ ( z 2 ) b b ℜ ( z 2 ) ℑ ( z 0 ) ℑ ( z 1 ) ℜ ( z 0 ) ℜ ( z 1 ) Ordering Grouping
b b b b b b b Decoding Eigenvectors: Object Detection ℑ ( z 2 ) b b ℜ ( z 2 ) ℑ ( z 0 ) ℑ ( z 1 ) ℜ ( z 0 ) ℜ ( z 1 ) Ordering Grouping
Decoding Eigenvectors: Figure/Ground ℜ ( z ) ℑ ( z ) z 0 z 1 z 2 z 3 z 4
Decoding Eigenvectors: Figure/Ground ℜ ( z ) ℑ ( z ) z 0 z 1 z 2 z 3 z 4 ⇐ ℑℜ ( z ) ∠ z 0 ∇ z 1 ∇ z 2 ∇ z 3 ∇ z 4
Decoding Eigenvectors: Segmentation ℑℜ ( z ) ∇ z 1 ∇ z 2 ∇ z 3 ∇ z 4 ∠ z 0 � �� � Figure/Ground Hierarchical Segmentation [Arbel´ aez, Maire, Fowlkes, Malik, PAMI 2011]
Decoding Eigenvectors: Object Segmentation Assign pixels p k to objects Q i via parts q j : � � p k → argmin q j ∈ Q i { Dist ( p k , q j ) } min Q i
Decoding Eigenvectors: Object Segmentation Assign pixels p k to objects Q i via parts q j : � � p k → argmin q j ∈ Q i { Dist ( p k , q j ) } min Q i
Decoding Eigenvectors
Results: PASCAL 2010 Person Category Detections Poselet Mask F/G Mask Segmentation
Results: PASCAL 2010 Person Category Detections Poselet Mask F/G Mask Segmentation
Results: PASCAL 2010 Person Category ◮ Segmentation task score: 41 . 1 (35 . 5 for poselet baseline)
Results: PASCAL 2010 Person Category ◮ Segmentation task score: 41 . 1 (35 . 5 for poselet baseline) ◮ 11% relative improvement due to better detection
Summary ◮ Simultaneous segmentation and detection: ◮ Part detectors → figure pop-out, object grouping ◮ Color, texture → pixel grouping
Summary ◮ Simultaneous segmentation and detection: ◮ Part detectors → figure pop-out, object grouping ◮ Color, texture → pixel grouping ◮ Graph: ◮ Parts and pixels as nodes ◮ Links encode multiple relationship types
Summary ◮ Simultaneous segmentation and detection: ◮ Part detectors → figure pop-out, object grouping ◮ Color, texture → pixel grouping ◮ Graph: ◮ Parts and pixels as nodes ◮ Links encode multiple relationship types ◮ Embedding: graph nodes → C m
Summary ◮ Simultaneous segmentation and detection: ◮ Part detectors → figure pop-out, object grouping ◮ Color, texture → pixel grouping ◮ Graph: ◮ Parts and pixels as nodes ◮ Links encode multiple relationship types ◮ Embedding: graph nodes → C m ◮ Decode: ◮ Figure/ground ◮ Image segmentation ◮ Detected objects ◮ Segmentation of each object instance
Summary ◮ Simultaneous segmentation and detection: ◮ Part detectors → figure pop-out, object grouping ◮ Color, texture → pixel grouping ◮ Graph: ◮ Parts and pixels as nodes ◮ Links encode multiple relationship types ◮ Embedding: graph nodes → C m ◮ Decode: ◮ Figure/ground ◮ Image segmentation ◮ Detected objects ◮ Segmentation of each object instance ◮ Better person detection and segmentation on PASCAL
Thank You
Recommend
More recommend