the meta learning problem black box meta learning
play

The Meta-Learning Problem & Black-Box Meta-Learning CS 330 - PowerPoint PPT Presentation

The Meta-Learning Problem & Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today, due Wednesday, September 30 Project guidelines will be posted by tomorrow. Plan for Today Transfer Learning - Problem formulation - Fine-tuning


  1. The Meta-Learning Problem & Black-Box Meta-Learning CS 330

  2. Logistics Homework 1 posted today, due Wednesday, September 30 Project guidelines will be posted by tomorrow.

  3. Plan for Today Transfer Learning - Problem formulation - Fine-tuning Meta-Learning - Problem formulation } - General recipe of meta-learning algorithms Topic of Homework 1! - Black-box adaptation approaches - Case study of GPT-3 (time-permitting) Goals for by the end of lecture : - Di ff erences between multi-task learning, transfer learning, and meta-learning problems - Basics of transfer learning via fi ne-tuning - Training set-up for few-shot meta-learning algorithms - How to implement black-box meta-learning techniques

  4. Multi-Task Learning vs. Transfer Learning Multi-Task Learning Transfer Learning 𝒰 1 , ⋯ , 𝒰 T 𝒰 b 𝒰 a Solve multiple tasks at once. Solve target task after solving source task by transferring knowledge learned from 𝒰 a T ∑ ℒ i ( θ , 𝒠 i ) min 𝒠 a Key assumption: Cannot access data during transfer. θ i =1 Transfer learning is a valid solution to multi-task learning. (but not vice versa) Question : What are some problems/applications where transfer learning might make sense? (answer in chat or raise hand) when you don’t care about solving 𝒠 a when is very large 𝒰 a 𝒰 b & simultaneously 𝒠 a (don’t want to retain & retrain on ) 4

  5. Transfer learning via fine-tuning Parameters pre-trained on 𝒠 a φ θ � α r θ L ( θ , D tr ) training data for new task 𝒰 b (typically for many gradient steps) Some common prac6ces What makes ImageNet good for transfer learning? Huh, Agrawal, Efros. ‘16 - Fine-tune with a smaller learning rate Where do you get the pre-trained parameters? - Smaller learning rate for earlier layers - Freeze earlier layers, gradually unfreeze - ImageNet classifica8on - Reini8alize last layer - Models trained on large language corpora (BERT, LMs) - Search over hyperparameters via cross-val - Other unsupervised learning techniques - Architecture choices maMer (e.g. ResNets) - Whatever large, diverse dataset you might have Pre-trained models oOen available online. 5

  6. Universal Language Model Fine-Tuning for Text Classifica6on . Howard, Ruder. ‘18 Fine-tuning doesn’t work well with small target task datasets This is where meta-learning can help. 6

  7. Plan for Today Transfer Learning - Problem formulation - Fine-tuning Meta-Learning - Problem formulation - General recipe of meta-learning algorithms - Black-box adaptation approaches - Case study of GPT-3 (time-permitting)

  8. The Meta-Learning Problem Statement (that we will consider in this class)

  9. Two ways to view meta-learning algorithms Mechanistic view Probabilistic view ➢ Deep network that can read in an entire ➢ Extract prior knowledge from a set of tasks dataset and make predictions for new that allows efficient learning of new tasks ➢ Learning a new task uses this prior and (small) datapoints ➢ Training this network uses a meta-dataset, training set to infer most likely posterior which itself consists of many datasets, each parameters for a different task Today : Focus primarily on the mechanistic view. (Bayes will come back later)

  10. How does meta-learning work? An example. Given 1 example of 5 classes: Classify new examples test set training data

  11. How does meta-learning work? An example. training meta-training classes … … Given 1 example of 5 classes: Classify new examples meta-testing T test test set training data any ML regression , language genera6on , skill learning , Can replace image classifica8on with: problem

  12. The Meta-Learning Problem 𝒰 1 , …, 𝒰 n Given data from , quickly solve new task 𝒰 test Key assumption : meta-training tasks and meta-test task drawn i.i.d. from same task distribution 𝒰 1 , …, 𝒰 n ∼ p ( 𝒰 ) 𝒰 j ∼ p ( 𝒰 ) , Like before, tasks must share structure. What do the tasks correspond to? - recognizing handwritten digits from di ff erent languages (see homework 1!) - spam fi lter for di ff erent users - classifying species in di ff erent regions of the world - a robot performing di ff erent tasks How many tasks do you need? The more the better. (analogous to more data in ML)

  13. Some terminology D tr task training set “support set” D test task test dataset i i “query set” k-shot learning : learning with k examples per class N-way classification : choosing between N classes (or k examples total for regression) Question : What are k and N for the above example? (answer in chat)

  14. Problem Settings Recap Multi-Task Learning Transfer Learning 𝒰 1 , ⋯ , 𝒰 T 𝒰 b 𝒰 a Solve multiple tasks at once. Solve target task after solving source task T by transferring knowledge learned from 𝒰 a ∑ min ℒ i ( θ , 𝒠 i ) θ i =1 The Meta-Learning Problem 𝒰 1 , …, 𝒰 n Given data from , quickly solve new task 𝒰 test In transfer learning and meta-learning: generally impractical to access prior tasks In all settings: tasks must share structure.

  15. Plan for Today Transfer Learning - Problem formulation - Fine-tuning Meta-Learning - Problem formulation - General recipe of meta-learning algorithms - Black-box adaptation approaches - Case study of GPT-3 (time-permitting)

  16. General recipe How to evaluate a meta-learning algorithm the Omniglot dataset Lake et al. Science 2015 1623 characters from 50 different alphabets many classes , few examples the “transpose” of MNIST … sta8s8cs more reflec8ve of the real world 20 instances of each character Proposes both few-shot discrimina6ve & few-shot genera6ve problems Ini8al few-shot learning approaches w/ Bayesian models, non-parametrics Fei-Fei et al. ‘03 Lake et al. ‘11 Salakhutdinov et al. ‘12 Lake et al. ‘13 Other datasets used for few-shot image recogni6on : 8eredImageNet, CIFAR, CUB, CelebA, others Other benchmarks: molecular property predic8on (Ngyugen et al. ’20), object pose predic8on (Yin et al. ICLR ’20)

  17. <latexit sha1_base64="1Zuvkn3lJe+MOxuFwnUwXx+8fNU=">ACIXicbVBNS8NAEN34bf2KevQSLEJPJRHBHkU9eKxgq9DWMtlO7OJmE3YnYgn9K178K148KNKb+Gfcfgi1+mDg8d4M/PCVApDv/pzM0vLC4tr6wW1tY3Nrfc7Z26STLNscYTmeibEAxKobBGgiTepBohDiVeh/dnQ/6AbURibqiXoqtGO6UiAQHslLbrRxE7SZ1kaDUjIG6HGR+3rcSPlJOGoTqF6aM2x+j3aLftkfwftLgkpsgmqbXfQ7CQ8i1ERl2BMI/BTauWgSXCJdktmMAV+D3fYsFRBjKaVjz7sewdW6XhRom0p8kbq9EQOsTG9OLSdw2PNrDcU/MaGUWVi5UmhEqPl4UZdKjxBvG5XWERk6yZwlwLeytHu+CBk421INIZh9+S+pH5YDy+PienkzhW2B7bZyUWsGN2wi5YldUYZ0/shb2xd+fZeXU+nMG4dc6ZzOyX3C+vgHtwKVA</latexit> <latexit sha1_base64="1Zuvkn3lJe+MOxuFwnUwXx+8fNU=">ACIXicbVBNS8NAEN34bf2KevQSLEJPJRHBHkU9eKxgq9DWMtlO7OJmE3YnYgn9K178K148KNKb+Gfcfgi1+mDg8d4M/PCVApDv/pzM0vLC4tr6wW1tY3Nrfc7Z26STLNscYTmeibEAxKobBGgiTepBohDiVeh/dnQ/6AbURibqiXoqtGO6UiAQHslLbrRxE7SZ1kaDUjIG6HGR+3rcSPlJOGoTqF6aM2x+j3aLftkfwftLgkpsgmqbXfQ7CQ8i1ERl2BMI/BTauWgSXCJdktmMAV+D3fYsFRBjKaVjz7sewdW6XhRom0p8kbq9EQOsTG9OLSdw2PNrDcU/MaGUWVi5UmhEqPl4UZdKjxBvG5XWERk6yZwlwLeytHu+CBk421INIZh9+S+pH5YDy+PienkzhW2B7bZyUWsGN2wi5YldUYZ0/shb2xd+fZeXU+nMG4dc6ZzOyX3C+vgHtwKVA</latexit> <latexit sha1_base64="1Zuvkn3lJe+MOxuFwnUwXx+8fNU=">ACIXicbVBNS8NAEN34bf2KevQSLEJPJRHBHkU9eKxgq9DWMtlO7OJmE3YnYgn9K178K148KNKb+Gfcfgi1+mDg8d4M/PCVApDv/pzM0vLC4tr6wW1tY3Nrfc7Z26STLNscYTmeibEAxKobBGgiTepBohDiVeh/dnQ/6AbURibqiXoqtGO6UiAQHslLbrRxE7SZ1kaDUjIG6HGR+3rcSPlJOGoTqF6aM2x+j3aLftkfwftLgkpsgmqbXfQ7CQ8i1ERl2BMI/BTauWgSXCJdktmMAV+D3fYsFRBjKaVjz7sewdW6XhRom0p8kbq9EQOsTG9OLSdw2PNrDcU/MaGUWVi5UmhEqPl4UZdKjxBvG5XWERk6yZwlwLeytHu+CBk421INIZh9+S+pH5YDy+PienkzhW2B7bZyUWsGN2wi5YldUYZ0/shb2xd+fZeXU+nMG4dc6ZzOyX3C+vgHtwKVA</latexit> <latexit sha1_base64="1Zuvkn3lJe+MOxuFwnUwXx+8fNU=">ACIXicbVBNS8NAEN34bf2KevQSLEJPJRHBHkU9eKxgq9DWMtlO7OJmE3YnYgn9K178K148KNKb+Gfcfgi1+mDg8d4M/PCVApDv/pzM0vLC4tr6wW1tY3Nrfc7Z26STLNscYTmeibEAxKobBGgiTepBohDiVeh/dnQ/6AbURibqiXoqtGO6UiAQHslLbrRxE7SZ1kaDUjIG6HGR+3rcSPlJOGoTqF6aM2x+j3aLftkfwftLgkpsgmqbXfQ7CQ8i1ERl2BMI/BTauWgSXCJdktmMAV+D3fYsFRBjKaVjz7sewdW6XhRom0p8kbq9EQOsTG9OLSdw2PNrDcU/MaGUWVi5UmhEqPl4UZdKjxBvG5XWERk6yZwlwLeytHu+CBk421INIZh9+S+pH5YDy+PienkzhW2B7bZyUWsGN2wi5YldUYZ0/shb2xd+fZeXU+nMG4dc6ZzOyX3C+vgHtwKVA</latexit> Another View on the Meta-Learning Problem Supervised Learning: Inputs: Outputs: Data: Meta Supervised Learning: D tr Inputs: Outputs: Data: { Why is this view useful? Reduces the meta-learning problem to the design & optimization of h. Finn. Learning to Learn with Gradients . PhD Thesis 2018

  18. General recipe How to design a meta-learning algorithm 1. Choose a form of 2. Choose how to op8mize w.r.t. max-likelihood objec8ve using meta-training data θ meta-parameters

  19. Plan for Today Transfer Learning - Problem formulation - Fine-tuning Meta-Learning - Problem formulation - General recipe of meta-learning algorithms - Black-box adaptation approaches - Case study of GPT-3 (time-permitting)

Recommend


More recommend