leep a new measure to evaluate transferability of learned
play

LEEP: A New Measure to Evaluate Transferability of Learned - PowerPoint PPT Presentation

LEEP: A New Measure to Evaluate Transferability of Learned Representations Cuong V. Nguyen Tal Hassner Amazon Web Services Facebook AI Matthias Seeger Cedric Archambeau Amazon Web Services Amazon Web Services Work done prior to


  1. LEEP: A New Measure to Evaluate Transferability of Learned Representations Cuong V. Nguyen Tal Hassner ∗ Amazon Web Services Facebook AI Matthias Seeger Cedric Archambeau Amazon Web Services Amazon Web Services ∗ Work done prior to joining Facebook AI Correspondence to: nguycuo@amazon.com 1/14

  2. Problem Transferability estimation Estimating how easy it is to transfer knowledge from one classification task to another ◮ Given a pre-trained source model and a target data set ◮ Develop a measure (a score) for how effectively transfer learning can transfer from the source model to the target data ◮ Transferability measure should be easy and cheap to compute → ideally without training Correspondence to: nguycuo@amazon.com 2/14

  3. Why do we need transferability estimation? ◮ Help understand the relationships/structures between tasks ◮ Select groups of highly transferable tasks for joint training ◮ Select good source models for transfer learning ◮ Potentially reduce training data size and training time Correspondence to: nguycuo@amazon.com 3/14

  4. Our contributions ◮ We develop a novel transferability measure, Log Expected Empirical Prediction (LEEP), for deep networks ◮ Properties of LEEP: ◮ Very simple ◮ Clear interpretation : average log-likelihood of the expected empirical predictor ◮ Easy to compute : no training needed, only requires one forward pass through target data set ◮ Can be applied to most modern deep networks Correspondence to: nguycuo@amazon.com 4/14

  5. Log Expected Empirical Prediction (LEEP) (1) ◮ Assume source model θ and target data set D = { ( x 1 , y 1 ) , . . . , ( x n , y n ) } ◮ We compute LEEP score between θ and D in 3 steps. 1. Apply θ to each input x i to get dummy label distribution θ ( x i ). ◮ θ ( x i ) is a distribution on source label set Z ◮ Labels in Z may not semantically relate to true label y i of x i e.g., Z is ImageNet labels but ( x i , y i ) is from CIFAR 2. Compute empirical conditional distribution of target label y given dummy source label z Empirical joint dist : ˆ P ( y , z ) = � i : y i = y θ ( x i ) z / n Empirical marginal dist : ˆ y ˆ P ( z ) = � P ( y , z ) Empirical conditional dist : ˆ P ( y | z ) = ˆ P ( y , z ) / ˆ P ( z ) Correspondence to: nguycuo@amazon.com 5/14

  6. Log Expected Empirical Prediction (LEEP) (2) Expected Empirical Predictor (EEP) A classifier that predicts the label y of an input x as follows: ◮ First, randomly drawing a dummy label z from θ ( x ) ◮ Then, randomly drawing y from ˆ P ( y | z ) z ˆ Equivalently, y ∼ � P ( y | z ) θ ( x ) z 3. LEEP is the average log-likelihood of EEP given data D : �� � T ( θ, D ) = 1 � ˆ log P ( y i | z ) θ ( x i ) z n z i Correspondence to: nguycuo@amazon.com 6/14

  7. Experiment: overview ◮ Aim: show that LEEP can predict actual transfer accuracy ◮ Procedure: ◮ Consider many random transfer learning tasks ◮ Compute LEEP scores for these tasks ◮ Compute actual test accuracy of transfer learning methods on these tasks ◮ Evaluate correlations between LEEP scores and the test accuracies ◮ Transfer methods: ◮ Retrain head : only retrain last fully connected layer using target set ◮ Fine-tune : replace the head classifier and fine-tune all model parameters with SGD Correspondence to: nguycuo@amazon.com 7/14

  8. Experiment: LEEP vs. Transfer Accuracy ◮ Compare LEEP score with test accuracy of transferred models on 200 random target tasks ◮ Result: LEEP scores highly correlated with actual test accuracies (correlation coefficients > 0.94) 1.0 1.0 0.8 0.8 Test accuracy Test accuracy 0.6 0.6 0.4 0.4 0.2 0.2 fine-tune fine-tune retrain head retrain head 0.0 0.0 4 3 2 1 4 3 2 1 LEEP score LEEP score ImageNet → CIFAR100 CIFAR10 → CIFAR100 (ResNet18) (ResNet20) Correspondence to: nguycuo@amazon.com 8/14

  9. Experiment: LEEP with Small Data ◮ Restrict target data sets to 5 random classes and 50 examples per class ◮ Partitioning LEEP scores’ range into 5 transferability levels and averaging test accuracies of tasks within each level ◮ Result: higher transferability level according to LEEP → easier to transfer ◮ Similar results when target data sets are imbalanced. Average test accuracy 0.8 0.6 0.4 fine-tune retrain head 0.2 1 2 3 4 5 Transferability level Correspondence to: nguycuo@amazon.com 9/14

  10. Experiment: LEEP vs. Meta-Transfer Accuracy ◮ Compare LEEP score with test accuracy of Conditional Neural Adaptive Processes (CNAPs) (Requeima et al., 2019) ◮ CNAPs was trained using the Meta-dataset (Triantafillou et al., 2020) ◮ Target tasks are drawn from CIFAR100 ◮ Result: higher transferability level according to LEEP → easier to meta-transfer 0.9 Average test accuracy 0.8 0.7 0.6 1 2 3 4 5 Transferability level Correspondence to: nguycuo@amazon.com 10/14

  11. Experiment: LEEP vs. Convergence of Fine-tuned Models ◮ Compare convergence speed to a reference model ◮ Reference model: trained from scratch using only the target data set ◮ Result: higher transferability level according to LEEP → better convergence Accuracy difference Accuracy difference 0.2 0.0 0.1 0.0 level 1 level 4 level 1 level 4 0.2 level 2 level 5 level 2 level 5 0.2 level 3 level 3 0.3 1 5 10 15 1 5 10 15 # epoch # epoch ImageNet → CIFAR100 CIFAR10 → CIFAR100 (ResNet18) (ResNet20) Correspondence to: nguycuo@amazon.com 11/14

  12. Experiment: LEEP for Source Model Selection ◮ Select from 9 candidate models and transfer to CIFAR100 ◮ Compare with: ◮ Negative Conditional Entropy (NCE) (Tran et al., 2019) ◮ H score (Bao et al., 2019) ◮ ImageNet top-1 accuracy (Kornblith et al., 2019) ◮ Result: LEEP can predict test accuracies better ResNet18 ResNet50 MobileNet0.75 MobileNet0.25 SENet154 ResNet34 MobileNet1.0 MobileNet0.5 DarkNet53 0.3 Test accuracy 0.2 0.1 0.0 4.5 4.0 3.5 4.0 3.7 3.4 4 11 18 0.5 0.6 0.7 0.8 LEEP score NCE score H score ImageNet accuracy Correspondence to: nguycuo@amazon.com 12/14

  13. Discussion ◮ Model selection results are very sensitive to the architecture and the size of the source networks. → May need to calibrate/normalize the scores for better performance ◮ Potentially useful for feature selection as well. ◮ For very small data sets, re-training the head directly using 2 nd -order optimization methods could also be efficient Correspondence to: nguycuo@amazon.com 13/14

  14. Thank you. Correspondence to: nguycuo@amazon.com 14/14

Recommend


More recommend