advanced meta learning task construction
play

Advanced Meta-Learning: Task Construction CS 330 1 Logistics - PowerPoint PPT Presentation

Advanced Meta-Learning: Task Construction CS 330 1 Logistics Homework 2 out, due Friday, October 16th Project group form due Weds, October 7th (encouraged to do it early) Proposal proposal due & presentations on October 14th 2 Question of


  1. Advanced Meta-Learning: Task Construction CS 330 1

  2. Logistics Homework 2 out, due Friday, October 16th Project group form due Weds, October 7th (encouraged to do it early) Proposal proposal due & presentations on October 14th 2

  3. Question of the Day How should tasks be de fi ned for good meta-learning performance?

  4. Plan for Today Brief Recap of Meta-Learning & Task Construction Memorization in Meta-Learning - When it arises - A potential solutions Meta-Learning without Tasks Provided - Unsupervised Meta-Learning - Meta-Learning from Unsegmented Task Stream (time permitting) 🚩 Disclaimer 🚩 : These topics are at the bleeding edge of research. Goals for by the end of lecture : - Understand when & how memorization in meta-learning may occur - Understand techniques for constructing tasks automatically 4

  5. Recap: Black-Box Meta-Learning φ i f θ 4 y ts x ts 0 1 2 3 4 D tr i Key idea: parametrize learner as a neural network - challenging op0miza0on problem + expressive

  6. Recap: Op9miza9on-Based Meta-Learning φ i r θ L 4 y ts x ts 0 1 2 3 4 D tr i Key idea: embed op5miza5on inside the inner learning process + structure of op0miza0on - typically requires second-order op0miza0on embedded into meta-learner

  7. Recap: Non-Parametric Meta-Learning 0 1 2 x 3 4 4 y ts x ts 0 1 2 3 4 D tr i Key idea: non-parametric learner (e.g. nearest neighbor to examples, prototypes) with parametric embedding space / distance metric + easy to op0mize, - largely restricted to classifica0on computa0onally fast

  8. Recap: Task Construc9on Techniques For N-way image classifica9on Use labeled images from prior classes For adap9ng to regional differences For few-shot imita9on learning Rußwurm et al. Meta-Learning for Few-Shot Land Yu et al. One-Shot Imita5on Learning from Cover Classifica5on. CVPR 2020 EarthVision Workshop Observing Humans. RSS 2018 Use labeled images from prior regions Use demonstra9ons for prior tasks

  9. Plan for Today Brief Recap of Meta-Learning & Task Construction Memorization in Meta-Learning - When it arises - A potential solutions Meta-Learning without Tasks Provided - Unsupervised Meta-Learning - Meta-Learning from Unsegmented Task Stream (time permitting) 9

  10. How we construct tasks for meta-learning. 𝒠 tr x ts 0 1 2 3 4 2 4 0 1 2 3 4 3 1 T 3 0 1 2 3 4 4 3 Randomly assign class labels to image classes for each task —> Tasks are mutually exclusive . Algorithms must use training data to infer label ordering.

  11. What if label order is consistent? 𝒠 tr x ts 0 1 2 3 4 2 4 0 1 2 3 4 3 1 T 3 0 2 3 4 1 1 2 Tasks are non-mutually exclusive : a single function can solve all tasks. The network can simply learn to classify inputs, irrespective of 𝒠 tr

  12. The network can simply learn to classify inputs, irrespective of 𝒠 tr 4 1 2 3 4 0 4 r θ L 0 1 2 3 4

  13. What if label order is consistent? 𝒠 tr x ts 0 1 2 3 4 2 4 0 1 2 3 4 3 1 T 3 0 2 3 4 1 1 2 For new image classes: can’t make predictions w/o 𝒠 tr T test training data test set

  14. Is this a problem? - No : for image classi fi cation, we can just shu ffl e labels* - No , if we see the same image classes as training (& don’t need to adapt at meta-test time) - But, yes , if we want to be able to adapt with data for new tasks.

  15. Another example “hammer” “close drawer” “stack” meta-training … T 50 “close box” T test If you tell the robot the task goal, the robot can ignore the trials. T Yu, D Quillen, Z He, R Julian, K Hausman, C Finn, S Levine. Meta-World . CoRL ‘19

  16. Another example Model can memorize the canonical orientations of the training objects. Yin, Tucker, Yuan, Levine, Finn. Meta-Learning without Memorization . ICLR ‘19

  17. Can we do something about it?

  18. If tasks mutually exclusive : single function cannot solve all tasks (i.e. due to label shu ffl ing, hiding information) If tasks are non - mutually exclusive : single function can solve all tasks y ts = f θ ( D tr multiple solutions to the i , x ts ) meta-learning problem 𝒠 tr One solution: θ memorize canonical pose info in & ignore i 𝒠 tr Another solution: θ carry no info about canonical pose in , acquire from i An entire spectrum of solutions based on how information fl ows. Suggests a potential approach: control information fl ow. Yin, Tucker, Yuan, Levine, Finn. Meta-Learning without Memorization . ICLR ‘19

  19. If tasks are non - mutually exclusive : single function can solve all tasks y ts = f θ ( D tr multiple solutions to the i , x ts ) meta-learning problem 𝒠 tr One solution: θ memorize canonical pose info in & ignore i 𝒠 tr Another solution: θ carry no info about canonical pose in , acquire from i An entire spectrum of solutions based on how information fl ows. one option: max I ( ̂ y ts , 𝒠 tr | x ts ) Meta-regularization minimize meta-training loss + information in θ ℒ ( θ , 𝒠 meta − train ) + β D KL ( q ( θ ; θ μ , θ σ ) ∥ p ( θ )) θ Places precedence on using information from over storing info in . 𝒠 tr Can combine with your favorite meta-learning algorithm. Yin, Tucker, Yuan, Levine, Finn. Meta-Learning without Memorization . ICLR ‘19

  20. Omniglot without label shu ffl ing: “non-mutually-exclusive” Omniglot On pose prediction task: (and it’s not just as simple as standard regularization) TAML: Jamal & Qi. Task-Agnostic Meta-Learning for Few-Shot Learning . CVPR ‘19 Yin, Tucker, Yuan, Levine, Finn. Meta-Learning without Memorization . ICLR ‘19

  21. Does meta-regularization lead to better generalization? P ( θ ) θ Let be an arbitrary distribution over that doesn’t depend on the meta-training data. P ( θ ) = 𝒪 ( θ ; 0 , I ) (e.g. ) 1 − δ For MAML, with probability at least , ∀ θ μ , θ σ meta-regularization error on the generalization meta-training set error β With a Taylor expansion of the RHS + a particular value of —> recover the MR MAML objective . Proof: draws heavily on Amit & Meier ‘18 Yin, Tucker, Yuan, Levine, Finn. Meta-Learning without Memorization . ICLR ‘19

  22. Summary of Memorization Problem meta-learning standard supervised learning meta overfitting standard overfitting f i ( x i , y i ) memorize training functions memorize training datapoints corresponding to tasks in your meta-training dataset in your training dataset meta regularization standard regularization controls information fl ow regularize hypothesis class regularizes description length (though not always for DNNs) of meta-parameters

  23. Plan for Today Brief Recap of Meta-Learning & Task Construction Memorization in Meta-Learning - When it arises - A potential solutions Meta-Learning without Tasks - Unsupervised Meta-Learning - Meta-Learning from Unsegmented Task Stream (time permitting) 23

  24. Where do tasks come from? Requires labeled data from other regions Rußwurm et al. Meta-Learning for Few- Shot Land Cover Classifica5on. 2020 What if we only have unlabeled data? few-shot meta-learning from: unlabeled images unlabeled text

  25. A general recipe for unsupervised meta-learning Given unlabeled dataset(s) Propose tasks Run meta-learning Goal of unsupervised meta-learning methods: Automatically construct tasks from unlabeled data Question: What do you want 1. diverse (more likely to cover test tasks) the task set to look like? 2. structured (so that few-shot meta-learning is possible) (answer in chat or raise hand) Task construction from unlabeled image data Next: Task construction from unlabeled text data

  26. Can we meta-learn with only unlabeled images? — — Task construction — — Propose cluster Unsupervised learning Run meta-learning discrimination tasks (to get an embedding space) x class 1 x x xx x x x x class 2 x x x x class 1 class 2 … Result: representation suitable for learning downstream tasks Hsu, Levine, Finn. Unsupervised Learning via Meta-Learning . ICLR ‘19

  27. Can we meta-learn with only unlabeled images? Propose cluster Unsupervised learning Run meta-learning discrimination tasks (to get an embedding space) MAML — Finn et al. ’17 Clustering to Automatically A few options: ProtoNets — Snell et al. ’17 Construct Tasks for Unsupervised BiGAN — Donahue et al. ’17 Meta-Learning (CACTUs) DeepCluster — Caron et al. ’18 miniImageNet 5-way 5-shot accuracy method Same story for : MAML with labels 62.13% - 4 di ff erent embedding methods - 4 datasets (Omniglot, CelebA, BiGAN kNN 31.10% miniImageNet, MNIST) BiGAN logistic 33.91% - 2 meta-learning methods (*) BiGAN MLP + dropout 29.06% - Test tasks with larger datasets BiGAN cluster matching 29.49% BiGAN CACTUs MAML 51.28% *ProtoNets underperforms in some cases. DeepCluster CACTUs MAML 53.97% CACTUs MAML Hsu, Levine, Finn. Unsupervised Learning via Meta-Learning . ICLR ‘19

  28. Can we use domain knowledge when constructing tasks? e.g. image’s label often won’t change when you: - drop out some pixels - translate the image - re fl ect the image Task construction: For each i. Randomly sample images & assign labels N 1,…, N task : 𝒰 i —> Store in 𝒠 tr i 1 2 3 𝒠 tr ii. For each datapoint in , augment image using domain i knowledge —> Store in 𝒠 ts i 1 2 3 Khodadadeh, Bölöni, Shah. Unsupervised Meta-Learning for Few-Shot Image Classification . NeurIPS ‘19

Recommend


More recommend