deep multi task learning with evolving weights
play

Deep multi-task learning with evolving weights ESANN 2016 Soufiane - PowerPoint PPT Presentation

Deep multi-task learning with evolving weights ESANN 2016 Soufiane Belharbi Romain Hrault Clment Chatelain Sbastien Adam soufiane.belharbi@insa-rouen.fr LITIS lab., DocApp team - INSA de Rouen, France images/logos 27 April, 2016


  1. Deep multi-task learning with evolving weights ESANN 2016 Soufiane Belharbi Romain Hérault Clément Chatelain Sébastien Adam soufiane.belharbi@insa-rouen.fr LITIS lab., DocApp team - INSA de Rouen, France images/logos 27 April, 2016 LITIS lab., DocApp team - INSA de Rouen, France Deep multi-task learning with evolving weights

  2. Context Training deep neural networks Deep neural network are interesting models (Complex/hierarchical features, complex mapping) ⇒ Improve performance Training deep neural networks is difficult ⇒ Vanishing gradient ⇒ More parameters ⇒ Need more data Some solutions: ⇒ Pre-training technique [ Y.Bengio et al. 06, G.E.Hinton et al. 06 ] ⇒ Use unlabeled data images/logos LITIS lab., DocApp team - INSA de Rouen, France Deep multi-task learning with evolving weights 1/20

  3. Context Training deep neural networks Deep neural network are interesting models (Complex/hierarchical features, complex mapping) ⇒ Improve performance Training deep neural networks is difficult ⇒ Vanishing gradient ⇒ More parameters ⇒ Need more data Some solutions: ⇒ Pre-training technique [ Y.Bengio et al. 06, G.E.Hinton et al. 06 ] ⇒ Use unlabeled data images/logos LITIS lab., DocApp team - INSA de Rouen, France Deep multi-task learning with evolving weights 1/20

  4. Context Semi-supervised learning General case: Data = { labeled data , unlabeled data } � �� � � �� � expensive (money, time), few cheap, abundant E.g: medical images ⇒ semi-supervised learning: Exploit unlabeled data to improve the generalization images/logos LITIS lab., DocApp team - INSA de Rouen, France Deep multi-task learning with evolving weights 2/20

  5. Context Semi-supervised learning General case: Data = { labeled data , unlabeled data } � �� � � �� � expensive (money, time), few cheap, abundant E.g: medical images ⇒ semi-supervised learning: Exploit unlabeled data to improve the generalization images/logos LITIS lab., DocApp team - INSA de Rouen, France Deep multi-task learning with evolving weights 2/20

  6. Context Pre-training and semi-supervised learning The pre-training technique can exploit the unlabeled data A sequential transfer learning performed in 2 steps: Unsupervised task ( x labeled and unlabeled data) 1 Supervised task ( ( x , y ) labeled data) 2 images/logos LITIS lab., DocApp team - INSA de Rouen, France Deep multi-task learning with evolving weights 3/20

  7. Pre-training technique and semi-supervised learning Layer-wise pre-training: auto-encoders x 1 x 2 ˆ y 1 x 3 ˆ y 2 x 4 x 5 A DNN to train images/logos LITIS lab., DocApp team - INSA de Rouen, France Deep multi-task learning with evolving weights 4/20

  8. Pre-training technique and semi-supervised learning Layer-wise pre-training: auto-encoders 1) Step 1: Unsupervised layer-wise training Train layer by layer sequentially using only x (labeled or unlabeled) x 1 ˆ x 1 x 2 ˆ x 2 x 3 ˆ x 3 x 4 ˆ x 4 x 5 ˆ x 5 images/logos LITIS lab., DocApp team - INSA de Rouen, France Deep multi-task learning with evolving weights 5/20

  9. Pre-training technique and semi-supervised learning Layer-wise pre-training: auto-encoders 1) Step 1: Unsupervised layer-wise training Train layer by layer sequentially using only x (labeled or unlabeled) x 1 h 1 , 1 x 2 h 1 , 2 x 3 h 1 , 3 x 4 h 1 , 4 x 5 h 1 , 5 images/logos LITIS lab., DocApp team - INSA de Rouen, France Deep multi-task learning with evolving weights 5/20

  10. Pre-training technique and semi-supervised learning Layer-wise pre-training: auto-encoders 1) Step 1: Unsupervised layer-wise training Train layer by layer sequentially using only x (labeled or unlabeled) ˆ x 1 h 1 , 1 h 1 , 1 ˆ x 2 h 1 , 2 h 1 , 2 ˆ x 3 h 1 , 3 h 1 , 3 ˆ x 4 h 1 , 4 h 1 , 4 ˆ x 5 h 1 , 5 h 1 , 5 images/logos LITIS lab., DocApp team - INSA de Rouen, France Deep multi-task learning with evolving weights 5/20

  11. Pre-training technique and semi-supervised learning Layer-wise pre-training: auto-encoders 1) Step 1: Unsupervised layer-wise training Train layer by layer sequentially using only x (labeled or unlabeled) x 1 x 2 h 2 , 1 x 3 h 2 , 2 x 4 h 2 , 3 x 5 images/logos LITIS lab., DocApp team - INSA de Rouen, France Deep multi-task learning with evolving weights 5/20

  12. Pre-training technique and semi-supervised learning Layer-wise pre-training: auto-encoders 1) Step 1: Unsupervised layer-wise training Train layer by layer sequentially using only x (labeled or unlabeled) x 1 ˆ x 2 h 2 , 1 h 2 , 1 ˆ x 3 h 2 , 2 h 2 , 2 ˆ x 4 h 2 , 3 h 2 , 3 x 5 images/logos LITIS lab., DocApp team - INSA de Rouen, France Deep multi-task learning with evolving weights 5/20

  13. Pre-training technique and semi-supervised learning Layer-wise pre-training: auto-encoders 1) Step 1: Unsupervised layer-wise training Train layer by layer sequentially using only x (labeled or unlabeled) x 1 x 2 h 3 , 1 x 3 h 3 , 2 x 4 h 3 , 3 x 5 images/logos LITIS lab., DocApp team - INSA de Rouen, France Deep multi-task learning with evolving weights 5/20

  14. Pre-training technique and semi-supervised learning Layer-wise pre-training: auto-encoders 1) Step 1: Unsupervised layer-wise training Train layer by layer sequentially using only x (labeled or unlabeled) x 1 x 2 x 3 x 4 x 5 At each layer : ⇒ When to stop training? ⇒ What hyper-parameters to use? images/logos ⇒ How to make sure that the training improves the supervised task? LITIS lab., DocApp team - INSA de Rouen, France Deep multi-task learning with evolving weights 5/20

  15. Pre-training technique and semi-supervised learning Layer-wise pre-training: auto-encoders 2) Step 2: Supervised training Train the whole network using ( x , y ) x 1 x 2 ˆ y 1 x 3 ˆ y 2 x 4 x 5 Back-propagation images/logos LITIS lab., DocApp team - INSA de Rouen, France Deep multi-task learning with evolving weights 6/20

  16. Pre-training technique and semi-supervised learning Pre-training technique: Pros and cons Pros Improve generalization Can exploit unlabeled data Provide better initialization than random Train deep networks ⇒ Circumvent the vanishing gradient problem Cons Add more hyper-parameters No good stopping criterion during pre-training phase Good criterion for the unsupervised task But May not be good for the supervised task images/logos LITIS lab., DocApp team - INSA de Rouen, France Deep multi-task learning with evolving weights 7/20

  17. Pre-training technique and semi-supervised learning Pre-training technique: Pros and cons Pros Improve generalization Can exploit unlabeled data Provide better initialization than random Train deep networks ⇒ Circumvent the vanishing gradient problem Cons Add more hyper-parameters No good stopping criterion during pre-training phase Good criterion for the unsupervised task But May not be good for the supervised task images/logos LITIS lab., DocApp team - INSA de Rouen, France Deep multi-task learning with evolving weights 7/20

  18. Pre-training technique and semi-supervised learning Proposed solution Why is it difficult in practice? ⇒ Sequential transfer learning Possible solution: ⇒ Parallel transfer learning Why in parallel? Interaction between tasks Reduce the number of hyper-parameters to tune Provide one stopping criterion images/logos LITIS lab., DocApp team - INSA de Rouen, France Deep multi-task learning with evolving weights 8/20

  19. Pre-training technique and semi-supervised learning Proposed solution Why is it difficult in practice? ⇒ Sequential transfer learning Possible solution: ⇒ Parallel transfer learning Why in parallel? Interaction between tasks Reduce the number of hyper-parameters to tune Provide one stopping criterion images/logos LITIS lab., DocApp team - INSA de Rouen, France Deep multi-task learning with evolving weights 8/20

  20. Pre-training technique and semi-supervised learning Proposed solution Why is it difficult in practice? ⇒ Sequential transfer learning Possible solution: ⇒ Parallel transfer learning Why in parallel? Interaction between tasks Reduce the number of hyper-parameters to tune Provide one stopping criterion images/logos LITIS lab., DocApp team - INSA de Rouen, France Deep multi-task learning with evolving weights 8/20

  21. Proposed approach Parallel transfer learning: Tasks combination Train cost = supervised task + unsupervised task � �� � reconstruction l labeled samples, u unlabeled samples, w sh : shared parameters. Reconstruction (auto-encoder) task : l + u � J r ( D ; w ′ = { w sh , w r } ) = C r ( R ( x i ; w ′ ) , x i ) . i = 1 Supervised task : l � J s ( D ; w = { w sh , w s } ) = C s ( M ( x i ; w ) , y i ) . i = 1 Weighted tasks combination J ( D ; { w sh , w s , w r } ) = λ s · J s ( D ; { w sh , w s } ) + λ r · J r ( D ; { w sh , w r } ) . λ s , λ r ∈ [ 0 , 1 ] : importance weight, λ s + λ r = 1. images/logos LITIS lab., DocApp team - INSA de Rouen, France Deep multi-task learning with evolving weights 9/20

Recommend


More recommend