understanding image representations
play

Understanding image representations by measuring their equivariance - PowerPoint PPT Presentation

Visual Geometry Group, Department of Engineering Science Understanding image representations by measuring their equivariance and equivalence Karel Lenc, Andrea Vedaldi Representations for image understanding 2 image feature semantic


  1. Visual Geometry Group, Department of Engineering Science Understanding image representations by measuring their equivariance and equivalence Karel Lenc, Andrea Vedaldi

  2. Representations for image understanding 2 image feature semantic representation classifier space space space ๐œ” ๐œš ๐’š bike ๐œš(๐’š) ๐œš ๐œ” ๐’› ๐œš(๐’›) bike ๐œš ๐œ” ๐œš(๐’œ) ๐’œ dog Ultimate goal of a representation: simplify a task such as image classification Many representations Local image descriptors โ–ถ SIFT [Lowe 04], HOG [Dalal et al. 05], SURF [Bay et al. 06], LBP [Ojala et al. 02], โ€ฆ Feature encoders โ–ถ BoVW [Sivic et al. 02], Fisher Vector [Perronnin et al. 07], VLAD [Jegou et al. 10], sparse coding, โ€ฆ Deep convolutional neural networks โ–ถ [Fukushima 1974-1982, LeCun et al. 89, Krizhevsky et al. 12, โ€ฆ]

  3. Design of representations 3 Many designs are empirical, the main theoretical design principle is invariance image feature representation space space ๐œš ๐’š ๐‘• ๐œš ๐‘•๐’š Invariant ๐œš ๐’š = ๐œš(๐‘•๐’š)

  4. Design of representations 4 However, many representations such as HOG are not invariant , even to simple transformations image feature representation space space HOG ๐’š โ‰  ๐‘• HOG ๐‘•๐’š Not invariant ๐œš ๐’š โ‰  ๐œš(๐‘•๐’š)

  5. Design of representations 5 But they often transform in a simple and predictable manner image feature representation space space HOG ๐’š ๐‘• HOG ๐‘•๐’š Equivariant โˆ€๐’š: ๐œš ๐’š = ๐‘ ๐‘• ๐œš(๐‘•๐’š)

  6. Design of representations 6 But what happens with more complex transformations like affine ones? image feature representation space space HOG ๐’š ? ๐‘• HOG ๐‘•๐’š

  7. Design of representations 7 What happens with more complex representations like CNNs? Invariance of CNN rep. studied in. [Goodfellow et al. 09] or [Zeiler, Fergus 13] image feature representation space space CNN ๐’š ? ๐‘• CNN ๐‘•๐’š Contribution : transformations in CNNs

  8. Representation properties 8 ๐œš Equivariance ? How does a representation reflect ๐œš image transformations?

  9. When are two representations the same? 9 Learning representations means that there is an endless number of them Variants obtained by learning on different datasets, or different local optima representations CNN-A ๐’š CNN-B Equivalence ๐œš ๐ถ ๐’š = ๐น ๐œš ๐ต (๐’š)

  10. Representation properties 10 ๐œš Equivariance ? How does a representation reflect ๐œš image transformations? ๐œš ๐ถ Equivalence ? Do different representations have different meanings? ๐œš ๐ต

  11. 11 Finding equivariance empirically Regularized linear regression ๐‘• โ‰ƒ ๐œš ๐‘ ๐‘• ๐‘ ๐‘• ๐œš ๐’š ๐œš ๐‘•๐’š ๐œš ๐œš(๐ฒ) ๐ต ๐‘• ๐œš ๐’š + ๐‘ ๐‘• (learned empirically)

  12. 12 Finding equivariance empirically Convolutional structure ๐‘• โ‰ƒ ๐œš ๐‘ ๐‘• ๐œš ๐‘•๐’š ๐œš ๐ต ๐‘• ๐œš ๐’š + ๐‘ ๐‘• (learned empirically) convolution by 1 โจ‰ 1 filter bank permutation โˆ— ๐ต ๐‘• ๐œš ๐‘•๐’š

  13. 13 Finding equivariance empirically HOG features Rotation 45ยบ โ‰ƒ ๐œš ๐‘ ๐‘• ๐œš โ‰ƒ ๐œš ๐‘ ๐‘• ๐œš โ‰ƒ ๐œš ๐‘ ๐‘• ๐œš

  14. 14 Finding equivariance empirically HOG features โ€“ inverse with MIT HOGgles [Vondrick et al. 13] Rotation 45ยบ โ‰ƒ ๐œš ๐‘ ๐‘• ๐œš โ‰ƒ ๐œš ๐‘ ๐‘• ๐œš โ‰ƒ ๐œš ๐‘ ๐‘• ๐œš

  15. 15 Finding equivariance empirically HOG features โ€“ inverse with MIT HOGgles [Vondrick 13] 1.25x Upscale โ‰ƒ ๐œš ๐‘ ๐‘• ๐œš โ‰ƒ ๐œš ๐‘ ๐‘• ๐œš โ‰ƒ ๐œš ๐‘ ๐‘• ๐œš

  16. 16 Equivariance of representations Findings Transformations scaling, rotation, flipping, translation โ–ถ Equivariant representations HOG โ–ถ

  17. Finding equivariance empirically 17 CNN case Label ๐‘ง ๐’š 1 2 3 4 5 FC dog ๐œš ๐œ” convolutional layers fully-connected layers We run the same analysis on a typical CNN architecture AlexNet [Krizevsky et al. 12] โ–ถ 5 convolutional layers + fully-connected layers โ–ถ Trained on ImageNet ILSVRC โ–ถ

  18. 18 Learning mappings empirically CNN case Label ๐‘ง ๐’š 1 2 3 4 5 FC dog Label ๐‘ง ๐’š ๐œš(๐’š) 1 2 3 4 5 2 3 4 5 FC Classif. Loss โ„“ ๐œš ๐œ”

  19. 19 Learning mappings empirically CNN case Label ๐‘ง ๐’š 1 2 3 4 5 FC dog Label ๐‘ง ๐‘ ๐‘• โˆ’1 ๐œš(๐‘•๐’š) ๐’š 4 5 FC Classif. Loss โ„“ ๐‘• ๐ท๐‘๐‘œ๐‘ค3 learned ๐‘ ๐‘• โˆ’1 empirically 1 2 3 ๐‘•๐’š ๐œš(๐‘•๐’š)

  20. Results โ€“ Vertical Flip 20 ๐‘• Original Classif., no TF Original Classif. + TF Before Training After Training 60 Original Classif., no TF 50 ๐’š 1 2 3 4 5 FC 40 Original Classif. + TF Top-5 Error [%] ๐‘•๐’š 1 2 3 4 5 FC 30 Before Training 20 ๐‘•๐’š 1 2 3 4 5 FC 10 After Training ๐‘•๐’š โˆ— 1 2 3 4 5 FC 0 ๐ท๐‘๐‘œ๐‘ค4 ๐‘ ๐‘• โˆ’1 ๐ท๐‘๐‘œ๐‘ค2 ๐ท๐‘๐‘œ๐‘ค1 ๐ท๐‘๐‘œ๐‘ค3 ๐ท๐‘๐‘œ๐‘ค5 ๐‘ ๐‘• โˆ’1 ๐‘ ๐‘• โˆ’1 ๐‘ ๐‘• โˆ’1 ๐‘ ๐‘• โˆ’1 ๐ท๐‘๐‘œ๐‘ค3 ๐‘ ๐‘• โˆ’1 1 2345 12 345 123 45 1234 5 12345

  21. 21 Equivariance of representations Findings Transformations scaling, rotation, flipping, translation โ–ถ Equivariant representations HOG โ–ถ Early convolutional layers in CNNs โ–ถ Equivariant to a lesser degree Deeper convolutional layers in CNNs โ–ถ

  22. Representation properties 22 ๐œš Equivariance ? How does a representation reflect ๐œš image transformations? ๐œš ๐ถ Equivalence ? Do different representations have different meanings? ๐œš ๐ต

  23. 23 Equivalence CNN transplantation crash course AlexNet [Krizhevsky et al. 12], same training data, different parametrization CNN-A CNN-B 1 2 3 4 5 FC 1 2 3 4 5 FC ๐œš ๐œšโ€ฒ Are ๐œš and ๐œšโ€ฒ equivalent features?

  24. 24 Equivalence CNN transplantation crash course Same training data, different parametrization CNN-A CNN-B 1 2 3 4 5 FC 1 2 3 4 5 FC stitching layer (linear convolution) Classif. โˆ— ๐น 5 FC 1 2 3 4 Loss โ„“ Label ๐‘ง Train with SGD

  25. 25 Franken-network Stitch CNN-A ๏‚ฎ CNN-B Training data is the same, but parametrization is entirely different 100 Baseline 90 1 2 3 4 5 FC 80 70 CNN-B Top-5 Error [%] 60 Before Training 50 1 2 3 4 5 FC 40 CNN-A CNN-B 30 20 After Training 10 ๐น 1 2 3 4 5 FC 0 ๐ท๐‘๐‘œ๐‘ค1 ๐ท๐‘๐‘œ๐‘ค2 ๐ท๐‘๐‘œ๐‘ค3 ๐ท๐‘๐‘œ๐‘ค4 ๐ท๐‘๐‘œ๐‘ค5 ๐น ๐œšโ†’๐œšโ€ฒ ๐น ๐œšโ†’๐œšโ€ฒ ๐น ๐œšโ†’๐œšโ€ฒ ๐น ๐œšโ†’๐œšโ€ฒ ๐น ๐œšโ†’๐œšโ€ฒ CNN-A CNN-B 1 2345 12 345 123 45 1234 5 12345

  26. 26 Equivalence of similar architecture Compare training on the same or different data ILSVRC12 dataset Places dataset CNN-PLACES CNN-IMNET 1 2 3 4 5 FC 1 2 3 4 5 FC

  27. Franken-network 27 Stitch CNN-PLACES ๏‚ฎ CNN-IMNET Now even the training sets differ 100 Baseline 90 80 1 2 3 4 5 FC 70 CNN-IMNET Top-5 Error [%] 60 Before Training 50 1 2 3 4 5 FC 40 CNN-IMNET CNN-PLCS 30 After Training 20 ๐น 1 2 3 4 5 FC 10 CNN-PLCS CNN-IMNET 0 ๐ท๐‘๐‘œ๐‘ค1 ๐ท๐‘๐‘œ๐‘ค2 ๐ท๐‘๐‘œ๐‘ค3 ๐ท๐‘๐‘œ๐‘ค4 ๐ท๐‘๐‘œ๐‘ค5 ๐น ๐œšโ†’๐œšโ€ฒ ๐น ๐œšโ†’๐œšโ€ฒ ๐น ๐œšโ†’๐œšโ€ฒ ๐น ๐œšโ†’๐œšโ€ฒ ๐น ๐œšโ†’๐œšโ€ฒ 1 2345 12 345 123 45 1234 5 12345

  28. Example application 28 Structured-output pose detection Equivariant maps โ€“ Transform features instead of images ๐‘• โˆ— = argmax ๐‘• โˆˆ ๐ป ๐’™, ๐œš ๐‘• โˆ’1 ๐’š = argmax ๐‘• โˆˆ ๐ป ๐’™, ๐‘ ๐‘• โˆ’1 ๐œš ๐’š Allows significant speedup at test time

  29. 29 Conclusions Representing geometry Beyond invariance: equivariance โ–ถ Transforming the image results in a simple and predictable transformation โ–ถ of HOG and early CNN layers Application to accelerated structured output regression โ–ถ Representation equivalence CNN trained from different random seeds are very different, โ–ถ but only on the surface Early CNN layers are interchangeable even between tasks โ–ถ General idea study mathematical properties of representations empirically โ–ถ

Recommend


More recommend