Laplacian Regularized Few Shot Learning (LaplacianShot) Imtiaz Masud Ziko, Jose Dolz, Eric Granger and Ismail Ben Ayed ETS Montreal 1
Overview Few-Shot Proposed Experiments Learning LaplacianShot - Experimental Setup - What and Why ? - The context - SOTA results on 5 different few-shot - Brief discussion - Proposed formulation benchmarks. on existing approaches. - Optimization - Proposed Algorithm 2
Few-Shot Learning (An example) 3
Few-Shot Learning (An example) - Given C = 5 classes From these - Each class c having 1 examples. To classify Learn a this Model (5-way 1-shot) 4
Few-Shot Learning (An example) 2 4 - Given C = 5 classes From these - Each class c having 1 examples. To classify Learn a this Model (5-way 1-shot) 5
Few-Shot Learning 2 4 Humans recognize perfectly with few examples 6
Few-Shot Learning Modern ML methods generalize poorly ❏ Need a better way. ❏ 7
Few-shot learning A very large body of recent works, mostly based on: Meta-learning framework 8
Meta-Learning Framework 9
Meta-Learning Framework Training set with enough labeled data (base classes different from the test classes) 10
Meta-Learning Framework Training set with enough labeled data to learn initial mode l 11
Meta-Learning Framework Create episodes and do episodic training to learn meta-learner Vinyal et al, (Neurips ‘16) , Snell et al, (Neurips ‘17) , Sung et al, (CVPR ‘ 18) , Finn et al, (ICML‘ 17) , Ravi et al, (ICLR‘ 17) , Lee et al, (CVPR‘ 19) , Hu et al, (ICLR ‘20) , Ye et al, (CVPR ‘20) , . . . 12
Taking a few steps backward . . Recently [ Chen et al., ICLR’19, Wang et al., ’19, Dhillon et al., ICLR’20 ] : Simple baselines outperform the overly convoluted meta-learning based approaches. 13
Baseline Framework No need to meta-train 14
Baseline Framework Simple conventional cross-entropy training The approaches mostly differ during inference 15
Inductive vs Transductive inference Supports Examples Vinayls et al., NEURIPS’ 16 (Attention mechanism) Query/Test point Snell et al., NEURIPS’ 17 (Nearest Prototype) 16
Inductive vs Transductive inference Supports Examples Liu et. al., ICLR’19 (Label propagation) Query/Test points Dhillon, ICLR’20 (Transductive fine-tuning) Transductive : Predict for all test points, instead of one at a time 17
Proposed LaplacianShot - Latent Assignment matrix for N Laplacian-regularized query samples: objective: - Label assignment for each query: - And Simplex Constraints: 18
Proposed LaplacianShot Nearest Prototype classification Laplacian-regularized objective: When Similar to ProtoNet ( Snell ’17 ) or SimpleShot ( Wang ’19 ) Laplacian Regularization Well known in Graph Laplacian: Spectral clustering ( Shi I‘00, Von ‘07 ) , SLK ( Ziko ’18 ) SSL ( Weston ‘12, Belkin ‘06 ) 19
LaplacianShot Takeaways ✓ SOTA results without bell and whistles. ✓ Simple constrained graph clustering works very well. ✓ No network fine-tuning , neither meta-learning ✓ Model Agnostic ✓ Fast transductive inference: almost inductive time 20
LapLacianShot More Details 21
Proposed LaplacianShot Nearest Prototype classification Laplacian-regularized objective: When Labeling according to nearest support prototypes - Feature embedding: - Prototype can be : - The support example in 1-shot or - Simple mean from support examples or - Weighted mean from both support and initially predicted query samples 22
Proposed LaplacianShot Pairwise similarity Laplacian-regularized objective: Laplacian Regularization Well known in Graph Laplacian: Encourages nearby points to have similar assignments 23
Proposed Optimization Laplacian-regularized objective: Tricky to optimize due to: 24
Proposed Optimization Laplacian-regularized objective: Tricky to optimize due to: ✖ Simplex/Integer Constraints. 25
Proposed Optimization Laplacian-regularized objective: Tricky to optimize due to: ✖ Laplacian over discrete variables. 26
Proposed Optimization Laplacian-regularized objective: Relax integer constraints: ✖ Require solving for the N×C variables all together Convex quadratic problem ➢ ✖ Extra projection steps for the simplex constraints 27
Proposed Optimization Laplacian-regularized objective: We do: ✓ Independent and closed-form updates for each assignment variable ✓ Concave relaxation ✓ Efficient bound optimization 28
Concave Laplacian 29
Concave Laplacian When Equal = = 30
Concave Laplacian When Not = Equal 31
Concave Laplacian When Not = Equal Degree 32
Concave Laplacian Remove constant terms = 33
Concave Laplacian Concave for PSD matrix 34
Concave-Convex relaxation Putting it altogether Convex barrier function: ● Avoids extra dual variables for ● Closed- form update for the simplex constraint duel 35
Bound optimization First-order approximation of concave term Fixed unary 36
Bound optimization Iteratively optimize: We get Iterative tight upper bound: Where: 37
Bound optimization Independent upper bound: 38
Bound optimization Minimize Independent upper bound: KKT conditions brings closed form updates: 39
LaplacianShot Algorithm 40
Experiments Generic Classification Datasets: mini ImageNet splits: 64 base, 16 1. Mini- ImageNet validation and 20 test classes tiered ImageNet splits: 351 base, 97 2. Tierd-ImageNet validation and 160 test classes 3. CUB 200-2001 Fine-Grained Classification 4. Inat Splits: 100 base, 50 validation and 50 test classes 41
Experiments Evaluation protocol: Datasets: - 5 -way 1 -shot/ 5 -shot . 1. Mini- ImageNet - 15 query samples per class 2. Tierd-ImageNet (N=75). 3. CUB 200-2001 - Average accuracy over 10,000 few-shot tasks with 95% 4. Inat confidence interval. 42
Experiments - More realistic and challenging Datasets: - Recently introduced (Wertheimer& Hariharan, 2019) 1. Mini- ImageNet - Slight class distinction 2. Tierd-ImageNet - Imbalanced class distribution with 3. CUB 200-2001 variable number of supports/query per class 4. Inat 43
Experiments Evaluation protocol: Datasets: - 227 -way multi-shot . 1. Mini- ImageNet - Top-1 accuracy averaged over 2. Tierd-ImageNet the test images Per Class . 3. CUB 200-2001 - Top-1 accuracy averaged over all the test images ( Mean ) 4. Inat 44
Experiments We do Cross-entropy training with base classes LaplacianShot during inference 45
Results (Mini-ImageNet) 46
Results (Mini-ImageNet) 47
Results (Tiered-ImageNet) 48
Results (CUB) Cross Domain 49
Results (iNat) 50
Ablation: Choosing 51
Ablation: Convergence 52
Ablation: Average Inference time Transductive 53
LaplacianShot Takeaways ✓ SOTA results without bell and whistles. ✓ Simple constrained graph clustering works very well. ✓ No network fine-tuning , neither meta-learning ✓ Model Agnostic: during inference with any training model and gain up to 4/5%!!! ✓ Fast transductive inference: almost inductive time 54
Thank you Code On: https://github.com/imtiazziko/LaplacianShot 55
Recommend
More recommend