parameter efficient transfer learning for nlp
play

Parameter-Efficient Transfer Learning for NLP N. Houlsby, A. - PowerPoint PPT Presentation

Parameter-Efficient Transfer Learning for NLP N. Houlsby, A. Giurgiu*, S. Jastrzbski*, B. Morrone, Q. de Laroussilhe, A. Gesmundo, M. Attariyan, S. Gelly Imagine doing Transfer Learning for NLP Ingredients: A large pretrained model


  1. Parameter-Efficient Transfer Learning for NLP N. Houlsby, A. Giurgiu*, S. Jastrzębski*, B. Morrone, Q. de Laroussilhe, A. Gesmundo, M. Attariyan, S. Gelly

  2. Imagine doing Transfer Learning for NLP Ingredients: ● A large pretrained model (BERT) ● Fine-tuning 2/5

  3. Imagine doing Transfer Learning for NLP Ingredients: ● A large pretrained model (BERT) ● Fine-tuning BERT Task 1 BERT Task 2 Problem for large N ... BERT Task N-1 BERT Task N 2/5

  4. Imagine doing Transfer Learning for NLP Ingredients: ● A large pretrained model (BERT) ● Fine-tuning + Adapter 1 Task 1 + Adapter 2 Task 2 BERT + Adapter N-1 Task N-1 + Adapter N Task N 2/5

  5. BERT + Adapters ● Solution : Train tiny adapter modules at each layer Solution 5 5 3/5

  6. BERT + Adapters ● Solution : Train tiny adapter modules at each layer Solution 6 6 3/5

  7. BERT + Adapters ● Solution : Train tiny adapter modules at each layer Solution 7 7 3/5

  8. BERT + Adapters ● Solution : Train tiny adapter modules at each layer Bottleneck Solution 8 8 3/5

  9. Results on GLUE Benchmark 4/5

  10. Results on GLUE Benchmark 4/5

  11. Results on GLUE Benchmark 4/5

  12. Results on GLUE Benchmark 4/5

  13. Results on GLUE Benchmark Fewer parameters, similar performance Fewer parameters, degraded performance 4/5

  14. Results on GLUE Benchmark 0.4% accuracy drop for 96.4% reduction in the # of parameters/task 4/5

  15. Conclusions 1. If we move towards a single model future, we need to improve parameter-efficiency of transfer learning 2. We propose a module reducing drastically # params/task for NLP , e.g. by 30x at only 0.4% accuracy drop Related work (@ ICML): “ BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning”, A. Stickland & I. Murray Please come to our poster today at 6:30 PM (#102) 5/5

Recommend


More recommend