deep epitomic nets and scale position search for image
play

Deep Epitomic Nets and Scale/Position Search for Image Classification - PowerPoint PPT Presentation

1 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Deep Epitomic Nets and Scale/Position Search for Image Classification TTIC_ECP team George Papandreou Iasonas Kokkinos Toyota Technological Institute Ecole Centrale Paris/INRIA


  1. 1 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Deep Epitomic Nets and Scale/Position Search for Image Classification TTIC_ECP team George Papandreou Iasonas Kokkinos Toyota Technological Institute Ecole Centrale Paris/INRIA at Chicago

  2. 2 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search TTIC_ECP entry in a nutshell Goal: Invariance in Deep CNNs Part 1: Deep epitomic nets: local translation (deformation) Part 2: Global scaling and translation Fusion (0) Baseline: (2) epitomic DCNN (1) epitomic DCNN (1)+(2) max-pooled net + search 10.56% 13.0% 11.9% 10.22% ~1% gain ~1.5% gain Top-5 error. All DCNNs have 6 convolutional and 2 fully-connected layers.

  3. 3 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Deep Convolutional Neural Networks (DCNNs) convolutional fully connected Cascade of convolution + max-pooling blocks (deformation-invariant template matching) Our work: different blocks (P1) & different architecture (P2) LeCun et al.: Gradient-Based Learning Applied to Document Recognition, Proc. IEEE 1998 Krizhevsky et al.: ImageNet Classification with Deep CNNs, NIPS 2012

  4. 4 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Part 1: Deep epitomic nets

  5. 5 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Epitomes: translation-invariant patch models Patch Templates Separate modeling: more data & less power per parameter Epitomes: a lot more for just a bit more EM-based training Jojic, Frey, Kannan: Epitomic analysis of appearance and shape, ICCV 2003 Benoit, Mairal, Bach, Ponce: Sparse image representation with epitomes, CVPR 2011 Grosse, Raina, Kwong, Ng: Shift-invariant sparse coding, UAI 2007

  6. 6 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Mini-epitomes for image classification Dictionary of mini-epitomes Dictionary of patches (K-means) Gains in (flat) BoW classification Papandreou, Chen, Yuille: Modeling Image Patches with a Dictionary of Mini-Epitomes, CVPR14

  7. 7 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search From flat to deep: Epitomic convolution Max-Pooling Epitomic Convolution k =1 , 2 , . . . k =1 , 2 , . . . Max over image positions Max over epitome positions G. Papandreou: Deep Epitomic Convolutional Neural Networks, arXiv, June 2014.

  8. 8 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Deep Epitomic Convolutional Nets Epitomic convolution Convolution + max-pooling Supervised dictionary learning by back-propagation G. Papandreou: Deep Epitomic Convolutional Neural Networks, arXiv, June 2014.

  9. 9 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Deep Epitomic Convolutional Nets Parameter sharing: faster and more reliable model learning Consistent improvements (0) Baseline: (1) epitomic DCNN max-pooled net 13.0% 11.9% ~1% gain

  10. 10 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Part 2: Global scaling and translation

  11. 11 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Scale Invariance challenge Category-dependent (ear detector) Dogs Scale-dependent (area)

  12. 12 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Scale Invariance challenge Category-dependent (ear detector) Dogs Skyscrapers Scale-dependent

  13. 13 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Scale Invariance challenge Category-dependent (ear detector) Training set Dogs Skyscrapers Scale-dependent

  14. 14 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Scale Invariance challenge Category-dependent (ear detector) Rule: Large skyscrapers have ears, large dogs don’t Dogs Skyscrapers Scale-dependent

  15. 15 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Scale Invariant classification Category-dependent MIL: End-to-end training! Scale-dependent x → { x s 1 , . . . , x s K } feature ‘bag’ of features F ( x ) → { F ( x s 1 ) , . . . , F ( x s K ) } K F 0 ( x ) = 1 X F ( x s k ) F 0 ( x ) = max F ( x s k ) This work: K k k =1 A. Howard. Some improvements on deep convolutional neural network based image classification, 2013. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition, 2014. T. Dietterich et al. Solving the multiple-instance problem with axis-parallel rectangles. Artificial Intelligence, 1997.

  16. 16 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Step 1: Efficient multi-scale convolutional features 220x220x3 5x5x512 pyramid stitch GPU I(x,y) Patchwork(x,y) C(x,y) I(x,y,s) C(x,y,s) unstitch multi-scale convolutional features Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat : ICLR 2014 Dubout, C., Fleuret, F.: Exact acceleration of linear object detectors. ECCV 2012 Iandola, F., Moskewicz, M., Karayev, S., Girshick, R., Darrell, T., Keutzer, K.: Densenet. arXiv 2014

  17. 17 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Step 2: From fully connected to fully convolutional convolutional fully connected

  18. 18 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Step 2: From fully connected to fully convolutional convolutional

  19. 19 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Step 2: From fully connected to fully convolutional 220x220x3 1x1x4096 pyramid stich GPU I(x,y) Patchwork(x,y) F(x,y) I(x,y,s) convolutional fully connected

  20. 20 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Step 3: Global max-pooling pyramid stich GPU I(x,y) Patchwork(x,y) F c ( x, y ) I(x,y,s) G c = max x,y F c ( x, y ) + w c ( x, y ) learned class-specific bias Consistent, explicit position and scale search during training and testing For free: argmax yields 48% localization error Fusion (0) Baseline: (2) epitomic DCNN (1) epitomic DCNN (1)+(2) max-pooled net + search 13.0% 11.9% 10.56% 10.22% ~1% gain ~1.5% gain

  21. 21 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Deep Epitomic Nets and Scale/Position Search for Image Classification Goal: Invariance in Deep CNNs Fusion (0) Baseline: (2) search (1) Epitomic DCNN (1)+(2) max-pooled net 13.0% 11.9% 10.56% 10.22% ? ~1% gain ~1.5% gain DCNN: 6 Convolutional + 2 Fully Connected layers The Deeper the Better: stay tuned!

  22. 22 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Epitomic implementation details n Architecture of our deep epitomic net (11.94%) n Training took 3 weeks on a singe Titan (60 epochs) n Standard choices for learning rate, momentum, etc.

  23. 23 TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Pyramidal search implementation details n Image warp to square image. Position in mosaic is fixed n Scales: 400, 300, 220, 160, 120, 90 pixels à Mosaic: 720 pixels

Recommend


More recommend