optimization based meta learning
play

Optimization-Based Meta-Learning CS 330 1 Course Reminders HW1 - PowerPoint PPT Presentation

Optimization-Based Meta-Learning CS 330 1 Course Reminders HW1 due next Weds (9/30). Project guidelines posted start forming groups & formulating ideas. Guest lecture by Matt Johnson on Monday! 2 Plan for Today Recap - Meta-learning


  1. Optimization-Based Meta-Learning CS 330 1

  2. Course Reminders HW1 due next Weds (9/30). Project guidelines posted — start forming groups & formulating ideas. Guest lecture by Matt Johnson on Monday! 2

  3. Plan for Today Recap - Meta-learning problem & black-box meta-learning Optimization Meta-Learning } Part of Homework 2! - Overall approach - Compare: optimization-based vs. black-box - Challenges & solutions - Case study of land cover classi fi cation (time-permitting) Goals for by the end of lecture : - Basics of optimization-based meta-learning techniques (& how to implement) - Trade-o ff s between black-box and optimization-based meta-learning

  4. Problem Settings Recap Multi-Task Learning Transfer Learning 𝒰 1 , ⋯ , 𝒰 T 𝒰 b 𝒰 a Solve multiple tasks at once. Solve target task after solving source task T by transferring knowledge learned from 𝒰 a ∑ min ℒ i ( θ , 𝒠 i ) θ i =1 The Meta-Learning Problem 𝒰 1 , …, 𝒰 n Given data from , quickly solve new task 𝒰 test In transfer learning and meta-learning: generally impractical to access prior tasks In all settings: tasks must share structure.

  5. Example Meta-Learning Problem 5 -way, 1 -shot image classifica;on (MiniImagenet) Given 1 example of 5 classes: Classify new examples held-out classes meta-training training classes … … any ML regression , language genera;on , skill learning , Can replace image classificaCon with: problem 5

  6. <latexit sha1_base64="rb12wq8rXozM7PyL2Kah1Gx2n2M=">ACL3icbVDLSgMxFM3UV62vqks3wSJU0DIjgm6EoiIuK9gHtHXIpJk2NPMguSMtw/yRG3+lGxF3PoXZtqKWj0QODn3Hu69xwkFV2Caz0Zmbn5hcSm7nFtZXVvfyG9u1VQScqNBCBbDhEMcF9VgUOgjVCyYjnCFZ3+hdpvX7PpOKBfwvDkLU90vW5ykBLdn5q+FdC9gAYlAJPsOuPfk5gtD+oRMkmLI9CjRMSXyVerTGx+gAfzn07XzBL5hj4L7GmpICmqNj5UasT0MhjPlBlGpaZgjtmEjgVLAk14oUC/UOpMuamvrEY6odj+9N8J5WOtgNpH4+4LH60xET6mh5+jOdHk1W0vF/2rNCNzTdsz9MALm08kgNxIYApyGhztcMgpiqAmhkutdMe0RSjoiHM6BGv25L+kdlSyNL85LpTPp3Fk0Q7aRUVkoRNURteogqIogc0Qi/o1Xg0now343SmjGmnm30C8bHJ2L4qxY=</latexit> <latexit sha1_base64="rb12wq8rXozM7PyL2Kah1Gx2n2M=">ACL3icbVDLSgMxFM3UV62vqks3wSJU0DIjgm6EoiIuK9gHtHXIpJk2NPMguSMtw/yRG3+lGxF3PoXZtqKWj0QODn3Hu69xwkFV2Caz0Zmbn5hcSm7nFtZXVvfyG9u1VQScqNBCBbDhEMcF9VgUOgjVCyYjnCFZ3+hdpvX7PpOKBfwvDkLU90vW5ykBLdn5q+FdC9gAYlAJPsOuPfk5gtD+oRMkmLI9CjRMSXyVerTGx+gAfzn07XzBL5hj4L7GmpICmqNj5UasT0MhjPlBlGpaZgjtmEjgVLAk14oUC/UOpMuamvrEY6odj+9N8J5WOtgNpH4+4LH60xET6mh5+jOdHk1W0vF/2rNCNzTdsz9MALm08kgNxIYApyGhztcMgpiqAmhkutdMe0RSjoiHM6BGv25L+kdlSyNL85LpTPp3Fk0Q7aRUVkoRNURteogqIogc0Qi/o1Xg0now343SmjGmnm30C8bHJ2L4qxY=</latexit> <latexit sha1_base64="rb12wq8rXozM7PyL2Kah1Gx2n2M=">ACL3icbVDLSgMxFM3UV62vqks3wSJU0DIjgm6EoiIuK9gHtHXIpJk2NPMguSMtw/yRG3+lGxF3PoXZtqKWj0QODn3Hu69xwkFV2Caz0Zmbn5hcSm7nFtZXVvfyG9u1VQScqNBCBbDhEMcF9VgUOgjVCyYjnCFZ3+hdpvX7PpOKBfwvDkLU90vW5ykBLdn5q+FdC9gAYlAJPsOuPfk5gtD+oRMkmLI9CjRMSXyVerTGx+gAfzn07XzBL5hj4L7GmpICmqNj5UasT0MhjPlBlGpaZgjtmEjgVLAk14oUC/UOpMuamvrEY6odj+9N8J5WOtgNpH4+4LH60xET6mh5+jOdHk1W0vF/2rNCNzTdsz9MALm08kgNxIYApyGhztcMgpiqAmhkutdMe0RSjoiHM6BGv25L+kdlSyNL85LpTPp3Fk0Q7aRUVkoRNURteogqIogc0Qi/o1Xg0now343SmjGmnm30C8bHJ2L4qxY=</latexit> <latexit sha1_base64="rb12wq8rXozM7PyL2Kah1Gx2n2M=">ACL3icbVDLSgMxFM3UV62vqks3wSJU0DIjgm6EoiIuK9gHtHXIpJk2NPMguSMtw/yRG3+lGxF3PoXZtqKWj0QODn3Hu69xwkFV2Caz0Zmbn5hcSm7nFtZXVvfyG9u1VQScqNBCBbDhEMcF9VgUOgjVCyYjnCFZ3+hdpvX7PpOKBfwvDkLU90vW5ykBLdn5q+FdC9gAYlAJPsOuPfk5gtD+oRMkmLI9CjRMSXyVerTGx+gAfzn07XzBL5hj4L7GmpICmqNj5UasT0MhjPlBlGpaZgjtmEjgVLAk14oUC/UOpMuamvrEY6odj+9N8J5WOtgNpH4+4LH60xET6mh5+jOdHk1W0vF/2rNCNzTdsz9MALm08kgNxIYApyGhztcMgpiqAmhkutdMe0RSjoiHM6BGv25L+kdlSyNL85LpTPp3Fk0Q7aRUVkoRNURteogqIogc0Qi/o1Xg0now343SmjGmnm30C8bHJ2L4qxY=</latexit> Black-Box Adapta;on φ i f θ 4 general form : y ts y ts = f black-box ( D tr i , x ts ) x ts 0 1 2 3 4 D tr i - challenging op6miza6on problem + expressive φ i = f θ ( D tr i ) How else can we represent ? What if we treat it as an op6miza6on procedure?

  7. Plan for Today Recap - Meta-learning problem & black-box meta-learning Optimization Meta-Learning } Part of Homework 2! - Overall approach - Compare: optimization-based vs. black-box - Challenges & solutions - Case study of land cover classi fi cation (time-permitting)

  8. Black-Box Adapta;on Black-Box Adapta;on Op;miza;on-Based Adapta;on φ i f θ 4 y ts x ts 0 1 2 3 4 D tr i

  9. Black-Box Adapta;on Op;miza;on-Based Adapta;on φ i r θ L 4 y ts x ts 0 1 2 3 4 D tr i Key idea: embed opCmizaCon inside the inner learning process Why might this make sense?

  10. Recall: Fine-tuning pre-trained parameters φ θ � α r θ L ( θ , D tr ) Fine-tuning training data for new task (typically for many gradient steps) Universal Language Model Fine-Tuning for Text Classifica;on . Howard, Ruder. ‘18 Fine-tuning less effecCve with very small datasets . 10

  11. Op;miza;on-Based Adapta;on pre-trained parameters φ θ � α r θ L ( θ , D tr ) Fine-tuning training data [test-Cme] for new task X X L ( θ � α r θ L ( θ , D tr L ( θ � α r θ L ( θ , D tr L ( θ � α r θ L ( θ , D tr i ) , D ts i ) , D ts min min min i ) , i ) i ) Meta-learning θ θ θ task i task i i Key idea : Over many tasks, learn parameter vector θ that transfers via fine-tuning 11 Finn, Abbeel, Levine. Model-Agnostic Meta-Learning. ICML 2017

  12. Op;miza;on-Based Adapta;on X L ( θ � α r θ L ( θ , D tr i ) , D ts min i ) θ task i parameter vector being meta-learned op;mal parameter φ ∗ vector for task i i M odel- A gnos;c M eta- L earning 12 Finn, Abbeel, Levine. Model-Agnostic Meta-Learning. ICML 2017

  13. Op;miza;on-Based Adapta;on Key idea : Acquire through opCmizaCon. General Algorithm : Black-box approach OpCmizaCon-based approach 1. Sample task T i (or mini batch of tasks) 2. Sample disjoint datasets D tr i , D test from D i i 3. Compute φ i ← f θ ( D tr Optimize φ i θ � α r θ L ( θ , D tr i ) i ) 4. Update θ using r θ L ( φ i , D test ) i —> brings up second-order derivaCves Do we need to compute the full Hessian? -> whiteboard Do we get higher-order deriva;ves with more inner gradient steps? 13

  14. Plan for Today Recap - Meta-learning problem & black-box meta-learning Optimization Meta-Learning } Part of Homework 2! - Overall approach - Compare: optimization-based vs. black-box - Challenges & solutions - Case study of land cover classi fi cation (time-permitting)

  15. <latexit sha1_base64="rb12wq8rXozM7PyL2Kah1Gx2n2M=">ACL3icbVDLSgMxFM3UV62vqks3wSJU0DIjgm6EoiIuK9gHtHXIpJk2NPMguSMtw/yRG3+lGxF3PoXZtqKWj0QODn3Hu69xwkFV2Caz0Zmbn5hcSm7nFtZXVvfyG9u1VQScqNBCBbDhEMcF9VgUOgjVCyYjnCFZ3+hdpvX7PpOKBfwvDkLU90vW5ykBLdn5q+FdC9gAYlAJPsOuPfk5gtD+oRMkmLI9CjRMSXyVerTGx+gAfzn07XzBL5hj4L7GmpICmqNj5UasT0MhjPlBlGpaZgjtmEjgVLAk14oUC/UOpMuamvrEY6odj+9N8J5WOtgNpH4+4LH60xET6mh5+jOdHk1W0vF/2rNCNzTdsz9MALm08kgNxIYApyGhztcMgpiqAmhkutdMe0RSjoiHM6BGv25L+kdlSyNL85LpTPp3Fk0Q7aRUVkoRNURteogqIogc0Qi/o1Xg0now343SmjGmnm30C8bHJ2L4qxY=</latexit> <latexit sha1_base64="rb12wq8rXozM7PyL2Kah1Gx2n2M=">ACL3icbVDLSgMxFM3UV62vqks3wSJU0DIjgm6EoiIuK9gHtHXIpJk2NPMguSMtw/yRG3+lGxF3PoXZtqKWj0QODn3Hu69xwkFV2Caz0Zmbn5hcSm7nFtZXVvfyG9u1VQScqNBCBbDhEMcF9VgUOgjVCyYjnCFZ3+hdpvX7PpOKBfwvDkLU90vW5ykBLdn5q+FdC9gAYlAJPsOuPfk5gtD+oRMkmLI9CjRMSXyVerTGx+gAfzn07XzBL5hj4L7GmpICmqNj5UasT0MhjPlBlGpaZgjtmEjgVLAk14oUC/UOpMuamvrEY6odj+9N8J5WOtgNpH4+4LH60xET6mh5+jOdHk1W0vF/2rNCNzTdsz9MALm08kgNxIYApyGhztcMgpiqAmhkutdMe0RSjoiHM6BGv25L+kdlSyNL85LpTPp3Fk0Q7aRUVkoRNURteogqIogc0Qi/o1Xg0now343SmjGmnm30C8bHJ2L4qxY=</latexit> <latexit sha1_base64="rb12wq8rXozM7PyL2Kah1Gx2n2M=">ACL3icbVDLSgMxFM3UV62vqks3wSJU0DIjgm6EoiIuK9gHtHXIpJk2NPMguSMtw/yRG3+lGxF3PoXZtqKWj0QODn3Hu69xwkFV2Caz0Zmbn5hcSm7nFtZXVvfyG9u1VQScqNBCBbDhEMcF9VgUOgjVCyYjnCFZ3+hdpvX7PpOKBfwvDkLU90vW5ykBLdn5q+FdC9gAYlAJPsOuPfk5gtD+oRMkmLI9CjRMSXyVerTGx+gAfzn07XzBL5hj4L7GmpICmqNj5UasT0MhjPlBlGpaZgjtmEjgVLAk14oUC/UOpMuamvrEY6odj+9N8J5WOtgNpH4+4LH60xET6mh5+jOdHk1W0vF/2rNCNzTdsz9MALm08kgNxIYApyGhztcMgpiqAmhkutdMe0RSjoiHM6BGv25L+kdlSyNL85LpTPp3Fk0Q7aRUVkoRNURteogqIogc0Qi/o1Xg0now343SmjGmnm30C8bHJ2L4qxY=</latexit> <latexit sha1_base64="rb12wq8rXozM7PyL2Kah1Gx2n2M=">ACL3icbVDLSgMxFM3UV62vqks3wSJU0DIjgm6EoiIuK9gHtHXIpJk2NPMguSMtw/yRG3+lGxF3PoXZtqKWj0QODn3Hu69xwkFV2Caz0Zmbn5hcSm7nFtZXVvfyG9u1VQScqNBCBbDhEMcF9VgUOgjVCyYjnCFZ3+hdpvX7PpOKBfwvDkLU90vW5ykBLdn5q+FdC9gAYlAJPsOuPfk5gtD+oRMkmLI9CjRMSXyVerTGx+gAfzn07XzBL5hj4L7GmpICmqNj5UasT0MhjPlBlGpaZgjtmEjgVLAk14oUC/UOpMuamvrEY6odj+9N8J5WOtgNpH4+4LH60xET6mh5+jOdHk1W0vF/2rNCNzTdsz9MALm08kgNxIYApyGhztcMgpiqAmhkutdMe0RSjoiHM6BGv25L+kdlSyNL85LpTPp3Fk0Q7aRUVkoRNURteogqIogc0Qi/o1Xg0now343SmjGmnm30C8bHJ2L4qxY=</latexit> Op;miza;on vs. Black-Box Adapta;on Black-box adapta;on Model-agnos;c meta-learning general form : y ts = f black-box ( D tr i , x ts ) y ts MAML can be viewed as computa6on graph , x ts with embedded gradient operator Note : Can mix & match components of computa;on graph Learn ini;aliza;on but replace gradient update with learned network f ( θ , D tr i , r θ L ) Ravi & Larochelle ICLR ’17 (actually precedes MAML) This computa6on graph view of meta-learning will come back again! 15

  16. Op;miza;on vs. Black-Box Adapta;on How well can learning procedures generalize to similar, but extrapolated tasks? MAML SNAIL, Omniglot image classifica6on MetaNetworks performance task variability Does this structure come at a cost? Finn & Levine ICLR ’18 16

Recommend


More recommend