me meta lear learnin ing a bri brief introduct ction
play

Me Meta Lear Learnin ing A Bri Brief Introduct ction Xiachong - PowerPoint PPT Presentation

Me Meta Lear Learnin ing A Bri Brief Introduct ction Xiachong Feng TG Ph.D. Student 2018-12-01 Ou Outline Introduction to Meta Learning Types of Meta-Learning Models Papers: Optimization as a model for few-shot


  1. Me Meta Lear Learnin ing A Bri Brief Introduct ction Xiachong Feng TG Ph.D. Student 2018-12-01

  2. Ou Outline • Introduction to Meta Learning • Types of Meta-Learning Models • Papers: • � Optimization as a model for few-shot learning � ICLR2017 • � Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks � ICML2017 • � Meta-Learning for Low-Resource Neural Machine Translation � EMNLP2018 • Conclusion

  3. Me Meta-lear learnin ing Reinforcement learning Machine Learning ����L��������� ������� ����L��������� Meta Learning ��R������ Deep Learning ����������� ��������� +�������� L��������� ��D���� ��������� Meta Learning/Learning to learn https://zhuanlan.zhihu.com/p/28639662

  4. Me Meta-lear learnin ing • Learning to learn ������ • ������������� • ��������������������� ��������������������� ����� �� �������������� ����������� ���� ������ ����� ���������� ������ �������I������������� ��A���������� �������� �������� • Meta learning �� AI ������ ���� Learning to Learn ������������������� https://zhuanlan.zhihu.com/p/27629294

  5. Ex Exampl ple Learner �� model ���������� ���������� �� • �� • �� • �� • ���� • ���� • �� • �� • �� • �� • � SGD/Adam Meta-learner ����� Learner � Learning rate Dacay …… Meta learning Machine or Deep learning

  6. Ty Types of Meta-Le Learn rning Mod Models • Humans learn following different methodologies tailored to specific circumstances. • In the same way, not all meta-learning models follow the same techniques. • Types of Meta-Learning Models 1. Few Shots Meta-Learning 2. Optimizer Meta-Learning 3. Metric Meta-Learning 4. Recurrent Model Meta-Learning 5. Initializations Meta-Learning What’s New in Deep Learning Research: Understanding Meta-Learning

  7. Fe Few Shots Meta ta-Le Learn rning • Create models that can learn from minimalistic datasets mimicking --> (learn from tiny data) • Papers • Optimization As A Model For Few Shot Learning � ICLR2017 � • One-Shot Generalization in Deep Generative Models � ICML2016 � • Meta-Learning with Memory-Augmented Neural Networks � ICML2016 �

  8. Op Optimizer Meta-Le Learn rning • Task: Learning how to optimize a neural network to better accomplish a task. • There is one network (the meta-learner) which learns to update another network (the learner) so that the learner effectively learns the task. • Papers: • Learning to learn by gradient descent by gradient descent (NIPS 2016) • Learning to Optimize Neural Nets

  9. Me Metri ric Me Meta-Le Learn rning • To determine a metric space in which learning is particularly efficient. This approach can be seen as a subset of few shots meta-learning in which we used a learned metric space to evaluate the quality of learning with a few examples • Papers: • Prototypical Networks for Few-shot Learning(NIPS2017) • Matching Networks for One Shot Learning(NIPS2016) • Siamese Neural Networks for One-shot Image Recognition • Learning to Learn: Meta-Critic Networks for Sample Efficient Learning

  10. Re Recurrent Model Meta-Le Learn rning • The meta-learner algorithm will train a RNN model will process a dataset sequentially and then process new inputs from the task • Papers: • Meta-Learning with Memory-Augmented Neural Networks • Learning to reinforcement learn • !" # : Fast Reinforcement Learning via Slow Reinforcement Learning

  11. Initializ Initializatio tions ns Meta-Le Learn rning • Optimized for an initial representation that can be effectively fine-tuned from a small number of examples • Papers: • Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks � ICML 2017 � • Meta-Learning for Low-Resource Neural Machine Translation � EMNLP2018 �

  12. Pa Papers Few Shots Meta-Learning � Recurrent Model Meta- Learning � Optimizer Meta-Learning � Initializations Meta-Learning � Supervised Meta Learning Optimization As a Model For Few Shot Learning (ICLR2017) Modern Meta Learning Meta Learning in NLP Model-Agnostic Meta-Learning for Meta-Learning for Low-Resource Fast Adaptation of Deep Networks Neural Machine Translation (ICML2017) (EMNLP2018)

  13. Op Optimization on As a Mod odel For or Few Sh Shot ot Le Lear arnin ing g Twitter, Sachin Ravi, Hugo Larochelle ICLR2017 Few Shots Meta-Learning • Recurrent Model Meta-Learning • Optimizer Meta-Learning • Supervised Meta Learning • Initializations Meta-Learning •

  14. Fe Few Shots Learning • Given a tiny labelled training set ! � which has " examples, ! = $ % , ' % , … $ ) , ' ) , • In classification problem: • * − ,ℎ./ Learning • " classes • * labelled examples( * is always less than 20)

  15. LSTM TM-Ce Cell state update forgetting the things we decided to forget earlier new cell state old cell state new candidate values �� ��� � �� https://www.jianshu.com/p/9dc9f41f0b29

  16. Su Supervised l learn rning ���� NN ���������� Optimizer �� • SGD �� • Adam ���� • …… �� • �� • !(#) → & image label

  17. Me Meta l learn rning • Meta-learning suggests framing the learning problem at two levels. (Thrun, 1998; Schmidhuber et al., 1997) • The first is quick acquisition of knowledge within each separate task presented. (Fast adaption) • This process is guided by the second, which involves slower extraction of information learned across all the tasks.(Learning)

  18. Mot Motivation on • Deep Learning has shown great success in a variety of tasks with large amounts of labeled data. • Gradient-based optimization (momentum, Adagrad, Adadelta and ADAM) in high capacity classifiers requires many iterative steps over many examples to perform well. • Start from a random initialization of its parameters. • Perform poorly on few-shot learning tasks. Is there an optimizer can finish the optimization task using just few examples?

  19. Me Method od LSTM cell-state update � Gradient based update � Propose an LSTM based meta-learner model to learn the exact optimization algorithm used to train another learner neural network classifier in the few-shot regime.

  20. LSTM-based meta-learner Me Method od optimizer that is trained to optimize a learner neural network classifier. Current parameter ! "#$ Gradient ∇ & '() ℒ Meta-learner Learner Learn optimization algorithm Neural network classifier New parameter ! " Gradient-based optimization: Meta-learner optimization: ! " = metalearner(! "#$ , ∇ & '() ℒ) knowing how to quickly optim the parameters

  21. Mod Model Given by learner Given by learner

  22. Ta Task Description episode Used to train learner Used to train meta-learner

  23. Tr Training • Example: 5 classes, 1 shot learning • & "'()* , & ",-" ← Random dataset from & /,"(#"'()* Loss ℒ Learner Neural network classifier ( ! "#$ ) Gradient ∇ 1 234 ℒ Loss ℒ Meta-learner Output of Current param ! "#$ Learn optimization meta learner algorithm( Θ 6#$ ) Gradient ∇ 1 234 ℒ 7 " Output of Learner Learner Update learner meta learner Neural network Update 7 " classifier ( ! " ) Learner Meta-Learner Neural network Loss ℒ ",-" Update classifier ( ! " ) Θ 6 = Θ 6#$ − :∇ ; <34 ℒ ",-"

  24. Initializ Initializatio tions ns Meta-Le Learn rning • Initial value of the cell state ! " • Initial weights of the classifier # " • ! " = # " • Learning this initial value lets the meta-learner determine the optimal initial weights of the learner

  25. Te Testing • Example: 5 classes, 1 shot learning • ' #()*+ , ' #-.# ← Random dataset from ' 0-#)$1231 Loss ℒ Learner (Init with ! " , Current ! #$% ) Gradient ∇ 5 678 ℒ Loss ℒ Meta-learner Output of Current param ! #$% learn optimization meta learner algorithm( Θ ) Gradient ∇ 5 678 ℒ : # Output of Learner Learner Update learner meta learner Neural network Update : # classifier( ! # ) Learner Testing Neural network Metric classifier

  26. Tr Training Learner Update Meta-Learner Update

  27. Tr Trick • Parameter Sharing • meta-learner to produce updates for deep neural networks, which consist of tens of thousands of parameters, to prevent an explosion of meta-learner parameters we need to employ some sort of parameter sharing. • Batch Normalization • Speed up learning of deep neural networks by reducing internal covariate shift within the learner’s hidden layers.

Recommend


More recommend