Addressing Inter-Class Similarity in Fine-Grained Visual Classification Abhimanyu Dubey Collaborators: Nikhil Naik, Ryan Farrell, Otkrist Gupta, Pei Guo, Ramesh Raskar
Fine-Grained Visual Classification - Image classification with target categories that are visually very similar - Classification within subcategories of the same larger visual category - Examples: - Identifying the make and model of a vehicle - Identifying species categorizations among flora/fauna Addressing Inter-Class Similarity in Fine-Grained Visual Classification | Abhimanyu Dubey
Fine-Grained Visual Classification Addressing Inter-Class Similarity in Fine-Grained Visual Classification | Abhimanyu Dubey
How is this different from large-scale classification? Addressing Inter-Class Similarity in Fine-Grained Visual Classification | Abhimanyu Dubey
How is this different from large-scale classification? - Foreground vs Background: - Diverse (Large-Scale) Problems: Background context can be relevant for foreground classification - eg: we probably won’t come across an image of an airplane in someone’s living room - Fine-Grained Problems: Background usually varies independently of the foreground classification - eg: many bird species can be photographed in the same setting Addressing Inter-Class Similarity in Fine-Grained Visual Classification | Abhimanyu Dubey
How is this different from large-scale classification? - Inter-class and intra-class diversity: - In large-scale classification, the average visual diversity between classes is typically much larger than the variation that exists within samples of the same class - In fine-grained classification: - samples within a class can vary significantly based on background, pose and lighting - samples across classes, on average, exhibit smaller diversity due to the minute differences between the foreground objects Addressing Inter-Class Similarity in Fine-Grained Visual Classification | Abhimanyu Dubey
How is this different from large-scale classification? samples from the same class samples from different classes (labrador retriever) (norfolk terrier vs cairn terrier) Addressing Inter-Class Similarity in Fine-Grained Visual Classification | Abhimanyu Dubey
How is this different from large-scale classification? - Data collection is harder: - Domains require expert knowledge - Smaller datasets on average, too little for directly training CNNs - Data is imbalanced: - Large-scale classification typically has a uniform distribution of labels in the training set - FGVC may have some classes harder to photograph, giving a fatter tail in the data distribution Addressing Inter-Class Similarity in Fine-Grained Visual Classification | Abhimanyu Dubey
Approaches to Fine-Grained Visual Classification - Object parts are common across classes: - We can utilize object part annotations to remove unwanted context - Removes background sensitivity, part-based pooling introduces pose invariance [Cui et al, CVPR09] Addressing Inter-Class Similarity in Fine-Grained Visual Classification | Abhimanyu Dubey
Explicit Part Localization [Cui et al, CVPR09] Addressing Inter-Class Similarity in Fine-Grained Visual Classification | Abhimanyu Dubey
Part Alignment via co-segmentation [Krause et al, CVPR15] Addressing Inter-Class Similarity in Fine-Grained Visual Classification | Abhimanyu Dubey
Bilinear Pooling [Lin et al, ICCV15, Cui et al, ICCV17, Gao et al CVPR16] Addressing Inter-Class Similarity in Fine-Grained Visual Classification | Abhimanyu Dubey
Our Intuition: - Only experts among humans can identify a fine-grained class with certainty - Typically, we would expect confusion between classes during training, instead of memorizing each sample with complete confidence - For example: p(y|x) dog1 dog2 dog3 dogN p(y|x) dog1 dog2 dog3 dogN Addressing Inter-Class Similarity in Fine-Grained Visual Classification | Abhimanyu Dubey
Our Hypothesis: - Foreground objects in fine-grained samples do not have enough diversity between classes to enjoy generalization with strongly-discriminative training (minimizing cross-entropy from the training set) - Therefore, to reduce training error, they probably memorize samples based on non-generalizable artefacts (background, distractor objects, occlusions) Addressing Inter-Class Similarity in Fine-Grained Visual Classification | Abhimanyu Dubey
A solution: make training less discriminative [ECCV18] - Cross-entropy will enforce samples from different classes to have predictions very different from each other by the end of training - The most obvious fix: Can we bring predictions from different classes closer together? d( , ) p(y|x) p(y|x) dog1 dog2 dog3 dogN dog1 dog2 dog3 dogN Addressing Inter-Class Similarity in Fine-Grained Visual Classification | Abhimanyu Dubey
Measuring divergence between predictions - KL-divergence: standard divergence between probability distributions - Problem: asymmetric - Solution: consider Jeffrey’s divergence - KL(p || q) + KL(q + p) - New Problem: Will get arbitrarily big as predictions concentrate on one class Addressing Inter-Class Similarity in Fine-Grained Visual Classification | Abhimanyu Dubey
Measuring divergence between predictions Addressing Inter-Class Similarity in Fine-Grained Visual Classification | Abhimanyu Dubey
Alternative: Euclidean Distance - Symmetric - Easy to compute - Well-behaved: Addressing Inter-Class Similarity in Fine-Grained Visual Classification | Abhimanyu Dubey
Training Pipeline: Pairwise Confusion Addressing Inter-Class Similarity in Fine-Grained Visual Classification | Abhimanyu Dubey
Results: Fine-Grained Classification - We take baseline models and train them with our modified objective - “Basic” Models (ResNets, Inception, DenseNets): Average improvement of 4.5% in top-1 accuracy across 6 datasets - “Fine-Grained” Models (Bilinear Pooling, Spatial Transformer Nets): Average improvement of 1.9% in top-1 performance across 6 datasets ( 4.5x larger relative improvement) - Training time and LR-schedule is the same - Only minor variations in performance based on the choice of hyperparameter Addressing Inter-Class Similarity in Fine-Grained Visual Classification | Abhimanyu Dubey
Results: Large-Scale v/s Fine-Grained - We want to compare the importance of “low visual diversity” in the performance of weakly-discriminative training - We subsampled all the Dog classes from ImageNet (116 classes, ~117K points) and compared performance on this subset with performance on a similarly sized random subsample from ImageNet - We obtained an average improvement of around 2.7% in top-1 on the Dog subset, and only 0.18% on the random subset Addressing Inter-Class Similarity in Fine-Grained Visual Classification | Abhimanyu Dubey
Results: Robustness to Distractors - We compared the overlap in the heatmaps returned by Grad-CAM on our models vs the true object annotations: Addressing Inter-Class Similarity in Fine-Grained Visual Classification | Abhimanyu Dubey
A lot was left to be desired: - Not really “principled”: Many different formulations can be derived from the intuition of “weakly” discriminative training - How does this objective effect generalization performance? - Can we quantify the notion of visual diversity? Addressing Inter-Class Similarity in Fine-Grained Visual Classification | Abhimanyu Dubey
Entropy and Weakly Discriminative Training [NeurIPS18] - We would desire the learnt probability distribution during training to have the weakest discriminatory power while also predicting the correct class - More formally, for the distribution p(y|x), we would like it to have the maximum entropy possible while ensuring that MODE(p(y|x)) = training label - However, directly enforcing a mode alignment constraint is non-differentiable, therefore we can relax this constraint and attempt to minimize cross-entropy - Additionally, we would wish to maximise the entropy: Cross-entropy Entropy Addressing Inter-Class Similarity in Fine-Grained Visual Classification | Abhimanyu Dubey
Maximum-Entropy and Generalization: Analysis Preliminaries: - Since we are performing a fine-tuning task, we assume the pre-trained feature map ɸ(x) to be a multivariate mixture of m (unknown and possibly very large) Gaussians for any data distribution p x : Addressing Inter-Class Similarity in Fine-Grained Visual Classification | Abhimanyu Dubey
Maximum-Entropy and Generalization: Analysis Preliminaries: - Under this assumption, the variance of the feature space, given by the overall covariance matrix Σ * characterize the variation of the features under the data distribution. Addressing Inter-Class Similarity in Fine-Grained Visual Classification | Abhimanyu Dubey
Maximum-Entropy and Generalization: Diversity Addressing Inter-Class Similarity in Fine-Grained Visual Classification | Abhimanyu Dubey
Maximum-Entropy and Generalization: Diversity To see how well this measure of diversity characterizes fine-grained problems, we look at the spread of features projected onto the top-2 eigenvectors from ImageNet training set(red) and CUB-2011 training set (blue) for GoogLeNet pool5 features: Addressing Inter-Class Similarity in Fine-Grained Visual Classification | Abhimanyu Dubey
Recommend
More recommend