deformable parts model
play

Deformable Parts Model Rafi Witten, David Knight 4 April 2011 - PowerPoint PPT Presentation

Deformable Parts Model Rafi Witten, David Knight 4 April 2011 Overview 1. Labeled dataset (PASCAL) 2. Making image pyramids 3. Feature extraction (HoG) 4. Parts model 5. Generating random negative examples 6. Generating hard negatives


  1. Deformable Parts Model Rafi Witten, David Knight 4 April 2011

  2. Overview 1. Labeled dataset (PASCAL) 2. Making image pyramids 3. Feature extraction (HoG) 4. Parts model 5. Generating random negative examples 6. Generating hard negatives examples 7. Train Latent SVM iteratively

  3. 1. Labeled dataset (PASCAL) 2. Making image pyramids 3. Feature extraction (HoG) 4. Parts model 5. Generating random negative examples 6. Generating hard negatives examples 7. Train Latent SVM iteratively

  4. PASCAL Challenge • ~10,000 images with ~25,000 target objects – Objects from 20 categories (person, car, bicycle, cow, table) – Objects annotated with labeled bounding boxes – GOAL: produce a bounding box that overlaps 50%+ with ground truth bounding box

  5. 1. Labeled dataset (PASCAL) 2. Making image pyramids 3. Feature extraction (HoG) 4. Parts model 5. Generating random negative examples 6. Generating hard negatives examples 7. Train Latent SVM iteratively

  6. Image Pyramid • Collection of the same image at different sizes • Lower levels capture higher spatial frequencies • Higher levels capture lower spatial frequencies

  7. Implementation image[0] = originalImage for i = 1 to (numPyramidLevels-1) image[i] = gaussFilter(image[i-1], sigma) image[i] = downSample(image[i], factor)

  8. Parameters • Number of levels • Gaussian filter σ • Per-level down- sample factor

  9. 1. Labeled dataset (PASCAL) 2. Making image pyramids 3. Feature extraction (HoG) 4. Parts model 5. Generating random negative examples 6. Generating hard negatives examples 7. Train Latent SVM iteratively

  10. Histogram of Gradients • Intuition: normalized counts of gradient directions in a local region are... – Invariant to uniform lighting changes – Invariant to small shape deformations • Grayscale images are straight-forward • Color images – At each pixel, take gradient of RGB channels, pick gradient with greatest magnitude

  11. Histogram of Gradients • HoG features calculated for each level of an image pyramid

  12. Histogram of Gradients • Cells – 8x8 pixels Block – Non-overlapping • Blocks Cell – 2x2 cells – Overlapping

  13. Histogram of Gradients • Gradient histograms generated per cell – 9 bins

  14. Implementation for each level in pyramid: take gradient levelGrad = gradient(level) for each cell in level: cell.hist = new Histogram(9 buckets) binning for each pixel in cell: cell.hist.vote(levelGrad.angle, levelGrad.magnitude) normalize energySum = epsilon (L1-norm) For each blockCell in block: energySum = energySum + concatenate sum(blockCell.hist.values) featureVec = new Vector()

  15. Parameters • Cell size Block • Block size • Histogram normalization Cell method

  16. 1. Labeled dataset (PASCAL) 2. Making image pyramids 3. Feature extraction (HoG) 4. Parts model 5. Generating random negative examples 6. Generating hard negatives examples 7. Train Latent SVM iteratively

  17. Parts Model • Root window region (cyan) – Object of interest – Should coincide with PASCAL bounding box

  18. Parts Model • Parts window regions (yellow) – Positioned relative to root window – Located at a lower pyramid level (higher detail)

  19. Filters • Fixed size templates • Series of weights applied to a local region of a HoG pyramid level • Applied with dot product between... – Vector of filter weights – Vector of HoG features for a region

  20. Filter Scoring Filter HoG Quadratic Spatial features Penalty Part Distance from Root Center

  21. Parameters • Number of parts • Part filter sizes – Width – Height • Spatial penalty s i

  22. Framing as a Learning Problem • PASCAL data “Weakly-labeled” • Training images only have object ground truth • Part filters and locations must be blindly learned

  23. Root Window Initialization • Dimensions – Aspect ratio: most common amongst training data – Size: largest size smaller than 80% of the image • Filter – Classical SVM – Resized (aspect ratio) positive examples

  24. Part Initialization • Where n is the number of parts, select an area a such that na = 80% of the root filter area – Pick rectangular region in a with largest positive energy, zero region out – Repeat above step (n-1) times • Copy (and resize) root filter content for region, becomes initial part filter • a i = (0, 0), b i = (-1, -1)

  25. Framing as a Learning Problem • Filters and penalty terms become classifier weights ( β ) • Root and part window locations become latent variables ( z ) • Entire HoG pyramid becomes classifier feature vectors ( x ) for latent variable search • HoG features in window regions become classifier feature vectors ( Φ ) for final SVM training

  26. 1. Labeled dataset (PASCAL) 2. Making image pyramids 3. Feature extraction (HoG) 4. Parts model 5. Generating random negative examples 6. Generating hard negatives examples 7. Train Latent SVM iteratively

  27. • We simply put bounding boxes in random images. • Many don’t look very much like humans. • Intuitively this is making training slower – we want fewer, ‘harder’ negatives.

  28. 1. Labeled dataset (PASCAL) 2. Making image pyramids 3. Feature extraction (HoG) 4. Parts model 5. Generating random negative examples 6. Generating hard negatives examples 7. Train Latent SVM iteratively

  29. • Using the feature extraction methods explained earlier, fit an SVM on negatives and original labeled dataset. • Keep hard negatives.

  30. Why Find Hard Examples? • This isn’t surprising, given intuition about SVM, but fitting an SVM will give you exactly the same answer if you exclude the easy examples. • More formally, there is typically a relatively small subset of the examples that, when trained on, give same separating hyperplane as the full dataset. • Since next step will be the bottleneck, it helps to spend a lot of time finding that subset.

  31. Finding Hard Examples • This step is very important, so the system they use is quite costly. • The algorithm follows. Here x i is the features extracted from an image and bounding box. (1)Start with positive examples. (x 1 , x 2 , …, x k ). (2)Fill cache with random negative examples up to size n. {(x 1 , 1), … (x k , 1), (x k+1 , -1), …, (x n , -1)}. (3)Fit binary SVM on data from (2). Only using root filter. (4)Only keep hard examples ( y i f β (x i ) <1). (5)Goto (2).

  32. Theoretical Guarantees • Assume that there is indeed a subset of all of the examples of size n that gives the same hyperplane as the full dataset. • If at each step (2) we have space to add more examples this can be shown to converge to one such subset. • In practice, they only run the loop for 10 iterations, and use the resulting dataset as the dataset for the latent SVM that we get to in part (7).

  33. • We’re kind of cheating in the last step. We’re assuming we know the bounding boxes. • Location of the bounding box is a latent variable. • Without bounding box we can’t do the feature extraction. Additionally, aren't using part model! • (Obviously, PASCAL challenge doesn’t give us bounding box) • Solution: Latent SVM.

  34. 1. Labeled dataset (PASCAL) 2. Making image pyramids 3. Feature extraction (HoG) 4. Parts model 5. Generating random negative examples 6. Generating hard negatives examples 7. Train Latent SVM iteratively

  35. Latent SVM

  36. How To Optimize A Latent SVM For z fixed, f ω is a linear function (of ω), lets call it g ω (x,z). • Hence, for all the z i fixed, what we have is: • Which is just a SVM!

  37. Latent SVM Algorithm • Given labeled data, {(x 1 , 1), … (x k , 1), (x k+1 , -1), …, (x n , -1)} and some initial ω. (1)Find z i that that attain f ω (x i ). (2)Calculate ω to solve SVM: (3)Goto (1).

  38. Some Notes • Computing step (1) requires looking at all bounding boxes. • The algorithm is a descent algorithm. • Won’t necessarily converge to global optimum, but fixed points are strong local optimum. • According to other sources, algorithm does convergence in mathematical sense. • If you want to get better latent SVM performance, try self-paced learning!

  39. What They Actually Do (1) Choose hard examples, with ω fixed. Denote these {(x1 , 1), … (xk , 1), (xk+1 , -1), …, (xn , -1)}. (2) Choose zk to attain max for each xk. (3) Solve SVM to calculate ω. (4) Goto 1. • Note that hard examples means pictures with negatives that fire or positives with nothing that fires for current ω.

  40. Direction For Improvement? • The latent SVM algorithm doesn’t depend on this, but f ω is a convex function (since it’s the pointwise max of linear functions). • Hence, looking at the optimization problem defined by the latent SVM: • Terms in sum are convex for y i <0, not for y i >0.

  41. Directions For Improvement? • This suggests an alternative algorithm, taking advantage of so called “semi-convexity”. • Given labeled data, {(x 1 , 1), … (x k , 1), (x k+1 , -1), …, (x n , -1)} and some initial ω. (1)With ω fixed, find optimal z i for positive examples, {z 1 , …, z k }. (2)Solve resulting convex problem with positive z i fixed. (3)Goto (1).

Recommend


More recommend