metric learning for large scale image classification
play

Metric Learning for Large-Scale Image Classification: Generalizing - PowerPoint PPT Presentation

Metric Learning for Large-Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Florent Perronnin 1 work published at ECCV 2012 with: Thomas Mensink 1 , 2 Jakob Verbeek 2 Gabriela Csurka 1 1 Xerox Research Centre Europe, 2


  1. Metric Learning for Large-Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Florent Perronnin 1 work published at ECCV 2012 with: Thomas Mensink 1 , 2 Jakob Verbeek 2 Gabriela Csurka 1 1 Xerox Research Centre Europe, 2 INRIA NIPS BigVision Workshop December 7, 2012 1

  2. Motivation Real-life image datasets are always evolving: • new images are added every second • new labels, tags, faces and products appear over time • for example: Facebook, Flickr, Twitter, Amazon. . . Need to annotate these items for indexing and retrieval Therefore, we are interested in methods for large-scale visual classification where we can add new images and new classes at near-zero cost on the fly 2

  3. Outline 1. Introduction 2. Distance Based Classifiers 3. Metric learning for NCM Classifier 4. Experimental Evaluation 5. Conclusion 3

  4. Introduction Recent focus on large-scale image classification • ImageNet data set [1] • Currently over 14 million images, and 20 thousand classes Standard large-scale classification pipeline: • High dim. features: Super Vector [3] & Fisher Vector [4] • Linear 1-vs-Rest SVM classifiers [2,3,4] • Stochastic Gradient Descent (SGD) training [3,4] → In this work, we take features for granted and focus on the learning problem . 1. Deng et al ., ImageNet: A large-scale hierarchical image database, CVPR’09 2. Deng et al ., What does classifying 10,000 image categories tell us?, ECCV’10 3. Lin et al ., Large-scale image classification: Fast feature extraction, CVPR’11 4. S´ anchez and Perronnin, High-dimensional signature compression for large-scale image classification, CVPR’11 4

  5. Challenges of open-ended datasets 1-vs-Rest + SGD might look ideal for our problem: • 1-vs-Rest: classes are trained independently • SGD: online algorithm can accomodate new data Still several issues need to be addressed: • Given a new sample, feed it to all classifiers? → costly and suboptimal [1] • How to balance the negatives and positives? • How to regularize (and choose the step-size)? → We turn to distance-based classifiers. 1. Perronnin et al ., Towards good practice in large-scale learning for image classification, CVPR’12 5

  6. Outline 1. Introduction 2. Distance Based Classifiers 3. Metric learning for NCM Classifier 4. Experimental Evaluation 5. Conclusion 6

  7. Distance Based Classifiers Classify based on the distance between images, or between image and class-representatives: • k-Nearest Neighbors • Nearest Class Mean Classification Trivial addition of new images or new classes Critically depends on the distance function 7

  8. k-Nearest Neighbor Classifier Assign an image i to the most common class among the k closest images from the training set ✓ Very flexible non-linear model ✓ Easy to integrate new images ✓ Easy to integrate new classes ✗ Expensive at test time! 8

  9. k-Nearest Neighbor Classifier Assign an image i to the most common class among the k closest images from the training set ✓ Very flexible non-linear model ✓ Easy to integrate new images ✓ Easy to integrate new classes ✗ Expensive at test time! 8

  10. k-Nearest Neighbor Classifier Assign an image i to the most common class among the k closest images from the training set ✓ Very flexible non-linear model ✓ Easy to integrate new images ✓ Easy to integrate new classes ✗ Expensive at test time! Metric Learning: Large Margin Nearest Neighbors [1] 1. Weinberger et al ., Distance Metric Learning for LMNN Classification, NIPS’06 8

  11. Nearest Class Mean Classifier Assign an image i to the class with the closest class mean µ c = 1 � x i N c i : y i = c c ∗ = argmin d ( x , µ c ) c ✓ Very fast at test time: linear model ✓ Easy to integrate new images ✓ Easy to integrate new classes ✗ Class only represented with mean, not flexible enough? 9

  12. Nearest Class Mean Classifier Assign an image i to the class with the closest class mean µ c = 1 � x i N c i : y i = c c ∗ = argmin d ( x , µ c ) c ✓ Very fast at test time: linear model ✓ Easy to integrate new images ✓ Easy to integrate new classes ✗ Class only represented with mean, not flexible enough? 9

  13. Nearest Class Mean Classifier Assign an image i to the class with the closest class mean µ c = 1 � x i N c i : y i = c c ∗ = argmin d ( x , µ c ) c ✓ Very fast at test time: linear model ✓ Easy to integrate new images ✓ Easy to integrate new classes ✗ Class only represented with mean, not flexible enough? We introduce metric learning 9

  14. Outline 1. Introduction 2. Distance Based Classifiers 3. Metric learning for NCM Classifier 4. Experimental Evaluation 5. Conclusion 10

  15. Mahalanobis Distance Learning = ( x − x ′ ) ⊤ M ( x − x ′ ) d ( x , x ′ ) d W ( x , x ′ ) = || W x − W x ′ || 2 2 1. M = I Euclidean distance • Likely to be suboptimal 2. M : D × D Full Mahalanobis distance • Huge number of parameters for large D � D 2 � • Expensive to compute distances in O 3. M = W ⊤ W Low-Rank Projection W : m × D • Controllable number of parameters: m × D • Allows for compression of images to only m dimensions � m 2 � • Cheap computation of distances in O 11

  16. Mahalanobis Distance Learning = ( x − x ′ ) ⊤ M ( x − x ′ ) d ( x , x ′ ) d W ( x , x ′ ) = || W x − W x ′ || 2 2 1. M = I Euclidean distance • Likely to be suboptimal 2. M : D × D Full Mahalanobis distance • Huge number of parameters for large D � D 2 � • Expensive to compute distances in O 3. M = W ⊤ W Low-Rank Projection W : m × D • Controllable number of parameters: m × D • Allows for compression of images to only m dimensions � m 2 � • Cheap computation of distances in O 11

  17. Mahalanobis Distance Learning = ( x − x ′ ) ⊤ M ( x − x ′ ) d ( x , x ′ ) d W ( x , x ′ ) = || W x − W x ′ || 2 2 1. M = I Euclidean distance • Likely to be suboptimal 2. M : D × D Full Mahalanobis distance • Huge number of parameters for large D � D 2 � • Expensive to compute distances in O 3. M = W ⊤ W Low-Rank Projection W : m × D • Controllable number of parameters: m × D • Allows for compression of images to only m dimensions � m 2 � • Cheap computation of distances in O 11

  18. NCM Metric Learning (NCMML) Probabilistic formulation using the soft-min function: exp − d W ( x , µ c ) p ( c | x ) = � C c ′ = 1 exp − d W ( x , µ c ′ ) Corresponds to class posterior in generative model: → p ( x | c ) = N ( x ; µ c , Σ) , with shared covariance matrix Crucial point: parameters W and { µ c , c = 1 , . . . , C } can be learned independently on different data subsets . 12

  19. NCM Metric Learning (NCMML) Discriminative maximum likelihood training: • We maximize with respect to W : N � L ( W ) = ln p ( y i | x i ) i = 1 • Implicit regularization through the rank of W Stochastic Gradient Descent (SGD): at time t • Pick a random sample ( x t , y t ) • Update: W ( t ) = W ( t − 1 ) + η t ∇ W = W ( t − 1 ) ln p ( y t | x t ) → mini-batch more efficient 13

  20. Illustration of Learned Distances 14

  21. Illustration of Learned Distances 14

  22. Relationship to FDA Three non-linearly separable classes 15

  23. Relationship to FDA Fisher Discriminant Analysis: maximizes variance between all class means 15

  24. Relationship to FDA NCMML: maximizes variance between nearby class means 15

  25. Relation to other linear classifiers f c ( x ) = b c + w c ⊤ x Linear SVM • Learn { b c , w c } per class WSABIE [1] W ∈ R d × D • w c = v c W • Learn { v c } per class and shared W Nearest Class Mean • b c = || W µ c || 2 � µ c ⊤ W ⊤ W � 2 , w c = − 2 • Learn shared W 1. Weston et al ., Scaling up to large vocabulary image annotation, IJCAI’11 16

  26. Outline 1. Introduction 2. Distance Based Classifiers 3. Metric learning for NCM Classifier 4. Experimental Evaluation 5. Conclusion 17

  27. Experimental Evaluation Data sets: • ILSVRC’10: classes = 1,000, images = 1.2M training + 50K validation + 150K test • INET10K: classes ≈ 10K, images = 4.5M training + 50K validation + 4.5M test Features: • 4K and 64K dimensional Fisher Vectors [1] • PQ Compression on 64K features [2] 1. Perronnin et al ., Improving the Fisher kernel for image classification, ECCV’10 2. J´ egou et al ., Product quantization for nearest neighbor search, PAMI’11 18

  28. Evaluation: ILSVRC’10 (Top 5 acc.) k-NN & NCM improve with metric learning NCM outperforms more flexible k-NN 4K Fisher Vectors Projection dimensionality 256 512 1024 ℓ 2 k-NN, LMNN [1] - dynamic 61.0 60.9 59.6 44.1 NCM, learned metric 62.6 63.0 63.0 32.0 1. Weinberger et al ., Distance Metric Learning for LMNN Classification, NIPS’06 19

  29. Evaluation: ILSVRC’10 (Top 5 acc.) k-NN & NCM improve with metric learning NCM outperforms more flexible k-NN NCM competitive with SVM and WSABIE 4K Fisher Vectors Projection dimensionality 256 512 1024 ℓ 2 k-NN, LMNN [1] - dynamic 61.0 60.9 59.6 44.1 NCM, learned metric 62.6 63.0 63.0 32.0 WSABIE [2] 61.6 61.3 61.5 Baseline: 1-vs-Rest SVM 61.8 1. Weinberger et al ., Distance Metric Learning for LMNN Classification, NIPS’06 2. Weston et al ., Scaling up to large vocabulary image annotation, IJCAI’11 19

Recommend


More recommend