Unraveling Meta-Learning: Understanding Feature Representations for Few-Shot Tasks Micah Goldblum, Steven Reich, Liam Fowl, Renkun Ni, Valeriia Cherepanova, Tom Goldstein University of Maryland, College Park, Maryland, USA goldblum@umd.edu August 14, 2020
A Brief Synopsis What is the difgerence between meta-learned and classically trained networks? fine-tuning perform clustering in feature space. problems by encouraging feature-space clustering. performance by enforcing a consensus penalty. Unraveling Meta-Learning Goldblum et al. August 14, 2020 2/17 • Meta-learners which fix the feature extractor during • Improve the performance of classical training for few-shot • Relate Reptile to consensus optimization and improve its
Meta-Learning for Few-Shot Classification 8 August 14, 2020 Goldblum et al. Unraveling Meta-Learning Algorithm 1: The meta-learning framework 11 end while n 10 Update base model parameters (outer loop): 9 end for 7 6 5 3 while not done do 4 3/17 1 Require: Base model, F θ , fine-tuning algorithm, A , learning rate, γ , and distribution over tasks, p ( T ) . 2 Initialize θ , the weights of F ; Sample batch of tasks, {T i } n i =1 , where T i ∼ p ( T ) and T i = ( T s i , T q i ) . for i = 1 , . . . , n do Fine-tune model on T i (inner loop). New network parameters are written θ i = A ( θ, T s i ) . Compute gradient g i = ∇ θ L ( F θ i , T q i ) θ ← θ − γ ∑ i g i
Meta-Learning for Few-Shot Classification 2017]. last linear layer) [Bertinetto et al. 2018]. linear layer) [Lee et al. 2019]. fine-tune last layer) [Snell et al. 2017]. Unraveling Meta-Learning Goldblum et al. August 14, 2020 4/17 • Meta-learning methods mainly difger in fine-tuning procedure. • MAML: SGD to fine-tune all network parameters [Finn et al. • R2-D2: Ridge regression on the one-hot labels (only fine-tune • MetaOptNet: Difgerentiable solver for SVM (only fine-tune last • ProtoNet: Nearest neighbors with class prototypes (only
Meta-Learned Feature Extractors Are Better for 48.29 51.80 55.89 47 .89 53.72 R2-D2-Classical 48.39 28.77 46.39 44.31 Table 1: Comparison of meta-learning and classical transfer learning models on 5-way 1-shot mini-ImageNet. Column headers denote the fine-tuning algorithm used for evaluation. Unraveling Meta-Learning Goldblum et al. August 14, 2020 R2-D2-Meta 41.89 Few-Shot Classification 55.09 same architecture trained with SGD. fine-tuning algorithm. Model SVM RR ProtoNet MAML MetaOptNet-Meta 62.64 60.50 51.99 55.77 MetaOptNet-Classical 56.18 5/17 • Meta-learned models perform better than models of the • Meta-learned models are not simply well-tuned for their own
Clustering in Feature Space Hypothesis: meta-learning algorithms which fix the feature extractor during the inner loop cluster each class around a point. Unraveling Meta-Learning Goldblum et al. August 14, 2020 6/17 • Visualize feature clustering. • Measure feature clustering. • Suffjcient condition for good few-shot classification. • Clustering regularizers improve few-shot performance.
Recommend
More recommend