Deformable part models Ross Girshick UC Berkeley CS231B Stanford - PowerPoint PPT Presentation

Deformable part models Ross Girshick UC Berkeley CS231B Stanford University Guest Lecture April 16, 2013

Image understanding Snack time in the lab photo by “thomas pix” http://www.flickr.com/photos/thomaspix/2591427106

What objects are where? I see twinkies! . . . robot: “I see a table with twinkies, pretzels, fruit, and some mysterious chocolate things...”

DPM lecture overview Part 1: modeling Part 2: learning AP 12% 27% 36% 45% 49% 2005 2008 2009 2010 2011

Formalizing the object detection task Many possible ways

Formalizing the object detection task Many possible ways, this one is popular: person cat, motorbike dog, chair, cow, person, motorbike, car, ... Input Desired output

Formalizing the object detection task Many possible ways, this one is popular: person cat, motorbike dog, chair, cow, person, motorbike, car, ... Input Desired output Performance summary: Average Precision (AP) 0 is worst 1 is perfect

Benchmark datasets PASCAL VOC 2005 – 2012 - 54k objects in 22k images - 20 object classes - annual competition

Reduction to binary classification pos = { ... ... } neg = { ... background patches ... } HOG SVM “Sliding window” detector Dalal & Triggs (CVPR’05)

Sliding window detection p �� ( � , � ) = w · φ ( � , � ) (f) Image pyramid HOG feature pyramid • Compute HOG of the whole image at multiple resolutions • Score every subwindow of the feature pyramid • Apply non-maxima suppression

Detection p number of locations p ~ 250,000 per image

Detection p number of locations p ~ 250,000 per image test set has ~ 5000 images >> 1.3x10 9 windows to classify

Detection p number of locations p ~ 250,000 per image test set has ~ 5000 images >> 1.3x10 9 windows to classify typically only ~ 1,000 true positive locations

Detection p number of locations p ~ 250,000 per image test set has ~ 5000 images >> 1.3x10 9 windows to classify typically only ~ 1,000 true positive locations Extremely unbalanced binary classification

Dalal & Triggs detector on INRIA Recall − Precision −− different descriptors on INRIA static person database 1 0.9 0.8 0.7 0.6 Precision 0.5 0.4 Ker. R − HOG 0.3 Lin. R − HOG Lin. R2 − Hog 0.2 Wavelet 0.1 PCA − SIFT Lin. E − ShapeC 0 0 0.2 0.4 0.6 0.8 1 Recall • AP = 75% • (79% in my implementation) • Very good • Declare victory and go home?

Dalal & Triggs on PASCAL VOC 2007 AP = 12% (using my implementation)

How can we do better? Revisit an old idea: part-based models (“pictorial structures”) - Fischler & Elschlager ‘73, Felzenszwalb & Huttenlocher ’00 Combine with modern features and machine learning

Part-based models • Parts — local appearance templates • “Springs” — spatial connections between parts (geom. prior) Image: [Felzenszwalb and Huttenlocher 05]

Part-based models • Local appearance is easier to model than the global appearance - Training data shared across deformations - “part” can be local or global depending on resolution • Generalizes to previously unseen configurations

General formulation � = ( � , � ) � = ( � � , . . . , � � ) � ⊆ � × � ( � � , . . . , � � ) ∈ � � v 1 v 2 p part locations in the image (or feature pyramid)

Part configuration score function spring costs � score( � � , . . . , � � ) = � � � ( � � ) − � � �� ( � � , � � ) � = � ( � , � ) ∈ � Part match scores v 1 v 2 p Highest scoring configurations

Part configuration score function spring costs � score( � � , . . . , � � ) = � � � ( � � ) − � � �� ( � � , � � ) � = � ( � , � ) ∈ � Part match scores • Objective: maximize score over p 1 ,...,p n • h n configurations! (h = |P|, about 250,000) • Dynamic programming - If G = (V,E) is a tree, O(nh 2 ) general algorithm ‣ O(nh) with some restrictions on d ij

Star-structured deformable part models root part “star” model test image detection

Recall the Dalal & Triggs detector p �� ( � , � ) = w · φ ( � , � ) (f) Image pyramid HOG feature pyramid • HOG feature pyramid • Linear filter / sliding-window detector • SVM training to learn parameters w

D&T + parts p 0 root z [FMR CVPR’08] Image pyramid HOG feature pyramid [FGMR PAMI’10] • Add parts to the Dalal & Triggs detector - HOG features - Linear filters / sliding-window detector - Discriminative training

� � Sliding window DPM score function p 0 root z Image pyramid HOG feature pyramid � = ( � � , . . . , � � ) � � score( � , � � ) = max � � ( � , � � ) − � � ( � � , � � ) � � ,..., � � � = � � = � Filter scores Spring costs

Detection in a slide test image feature map feature map at 2x resolution model ... x x x 1-st part filter n -th part filter � � ... root filter � � responses of part filters [ � � ( � � ) − � � ( � � , � � )] ... max response of root filter � � transformed responses + color encoding of filter response values detection scores for each root location low value high value

What are the parts?

Aspect soup General philosophy: enrich models to better represent the data

Mixture models Data driven: aspect, occlusion modes, subclasses FMR CVPR ’08: AP = 0.27 (person) FGMR PAMI ’10: AP = 0.36 (person)

Pushmi–pullyu? Good generalization properties on Doctor Dolittle’s farm ( + ) / 2 = This was supposed to detect horses

Latent orientation Unsupervised left/right orientation discovery horse AP 0.42 0.47 0.57 FGMR PAMI ’10: AP = 0.36 (person) voc-release5: AP = 0.45 (person) Publicly available code for the whole system: current voc-release5

Summary of results (f) [DT’05] [FMR’08] AP 0.12 AP 0.27 [FGMR’10] [GFM voc-release5] AP 0.36 AP 0.45 [GFM’11] AP 0.49

Part 2: DPM parameter learning fixed model structure ? ? ? ? ? ? ? ? ? ? ? ? component 1 component 2

Part 2: DPM parameter learning fixed model structure training images y ? ? ? ? ? +1 ? ? ? ? ? ? ? component 1 component 2

Part 2: DPM parameter learning fixed model structure training images y ? ? ? ? ? +1 ? ? ? ? ? ? ? component 1 component 2 -1

Part 2: DPM parameter learning fixed model structure training images y ? ? ? ? ? +1 ? ? ? ? ? ? ? component 1 component 2 Parameters to learn: -1 – biases (per component) – deformation costs (per part) – filter weights

� � � Linear parameterization � = ( � � , . . . , � � ) � � score( � , � � ) = max � � ( � , � � ) − � � ( � � , � � ) � � ,..., � � � = � � = � Filter scores Spring costs � � ( � , � � ) = w � · φ ( � , � � ) Filter scores � � ( � � , � � ) = d � · ( �� , �� , �� , �� ) Spring costs �� ( � , � � ) = max w · � ( � , ( � � , � ))

Positive examples ( y = +1) x specifies an image and bounding box person We want � w ( � ) = max � ∈ � ( � ) w · � ( � , � ) to score >= +1 � ( � ) includes all z with more than 70% overlap with ground truth

Negative examples ( y = -1) x specifies an image and a HOG pyramid location p 0 p 0 We want � w ( � ) = max � ∈ � ( � ) w · � ( � , � ) to score <= -1 � ( � ) restricts the root to p 0 and allows any placement of the other filters

Typical dataset 300 – 8,000 positive examples 500 million to 1 billion negative examples (not including latent configurations!) Large-scale* *unless someone from google is here

� How we learn parameters: latent SVM � � w � � + � � � ( w ) = � max { � , � � � � � w ( � � ) }

� How we learn parameters: latent SVM � � w � � + � � � ( w ) = � max { � , � � � � � w ( � � ) } � � w � � + � � � ( w ) = � max { � , � � max � ∈ � ( � ) w · � ( � � , � ) } � ∈ � � + � max { � , � + max � ∈ � ( � ) w · � ( � � , � ) } � ∈ �

� How we learn parameters: latent SVM � � w � � + � � � ( w ) = � max { � , � � � � � w ( � � ) } � � w � � + � � � ( w ) = � max { � , � � max � ∈ � ( � ) w · � ( � � , � ) } � ∈ � � + � max { � , � + max � ∈ � ( � ) w · � ( � � , � ) } � ∈ � + score z 4 z 1 z 2 z 3 w convex

Deformable part models Ross Girshick UC Berkeley CS231B Stanford - PowerPoint PPT Presentation

Deformable part models Ross Girshick UC Berkeley CS231B Stanford University Guest Lecture April 16, 2013 Image understanding Snack time in the lab photo by thomas pix http://www.flickr.com/photos/thomaspix/2591427106 What objects are

Geometric Registration for Deformable Shapes 2.2 Deformable Registration Variational Model

Engineering Mechanics Of Deformable Solids A Presentation With Exercises Engineering Mechanics

Manipulation of 1D and 2D Deformable Objects Without Modeling Deformation Dmitry Berenson

A Discriminatively Trained, Multiscale, Deformable Part Model by Pedro Felzenszwalb, David

Tracking Deformable Objects with Point Clouds John Schulman, Alex

Tracking deformable objects with WiSARD networks: a preliminary work INNOROBO 2014 European

A Deformable Balloon for Tomography Motion Artifact Study Damien Rohmer November 21, 2006

Path Planning and Execution For Deformable Objects Using a Voxel-Based Representation Calder

Engineering Mechanics of Deformable Solids: A Presentation with Exercises by Sanjay Govindjee

Sof oft-Bod ody y Dy Dyna namics cs Deformable Objects A difficult problem Lots of

Geometric Registration for Deformable Shapes 1.3 4D Kinematic Surfaces Rigid Transformation (

Last time Fitting an arbitrary shape with active deformable contours Segmentation

Geometric Registration for Deformable Shapes 4.1 Dynamic Registration Scan Registration

Simulation Deformable Objects High-quality deformable simulation is important Well-known

Position Based Dynamics A fast yet physically plausible method for deformable body simulation

Deformable Model with Adaptive Mesh and Automated Topology Changes Jacques-Olivier Lachaud

All tunnels of all tunnel number 1 knots Darryl McCullough University of Oklahoma Geometric

JSJ decompositions of toroidal 3-manifolds obtained by Dehn surgeries on pretzel knots K Ichihara

Optimality Criteria for Matching with One-Sided Preferences Richard Matthew McCutchen Advisor:

Science and Technology in Everyday Life Dr. Janardan Kundu Physical Chemistry Division CSIR NCL

Single Ring Multibunch Operation and Beam Separation Richard Talman Cornell University 55th

Week 13 - Monday What did we talk about last time? Bit fields Unions Programs must be

ISLANDS ON ALGEBRAIC SURFACES Curtis T McMullen Harvard University M( ) = product of

Tracking Structures Etienne Forest KEK Tsukuba, Japan Talk at Jefferson Laboratory 6/13/2003