tackling the limits of deep learning for nlp
play

Tackling the Limits of Deep Learning for NLP Richard Socher - PowerPoint PPT Presentation

Tackling the Limits of Deep Learning for NLP Richard Socher Salesforce Research Caiming Xiong, Stephen Merity, James Bradbury, Victor Zhong, Kazuma Hashimoto and Stanford: Hakan Inan, Khashayar Khosravi The Limits of Single Task Learning


  1. Tackling the Limits of Deep Learning for NLP Richard Socher Salesforce Research Caiming Xiong, Stephen Merity, James Bradbury, Victor Zhong, Kazuma Hashimoto and Stanford: Hakan Inan, Khashayar Khosravi

  2. The Limits of Single Task Learning • Great performance improvements • Projects start from random • Single unsupervised task can’t fix it • How to express different tasks in the same framework, e.g. – sequence tagging – sentence-level classification – seq2seq?

  3. Framework for Tackling NLP A joint model for comprehensive QA

  4. QA Examples A: NNP VBZ DT NN IN NNP . I: Mary walked to the bathroom. I: I think this model is incredible I: Sandra went to the garden. Q: In French? I: Daniel went back to the garden. A: Je pense que ce mod` ele est incroyable. I: Sandra took the milk there. I: Q: Where is the milk? A: garden I: Everybody is happy. Q: What’s the sentiment? A: positive What color are Q: What color are the bananas? A: Green. Move from {x i ,y i } to {x i ,q i ,y i }

  5. First of Six Major Obstacles • For NLP no single model architecture with consistent state of the art results across tasks Task State of the art model Question answering Strongly Supervised MemNN (babI) (Weston et al 2015) Sentiment Analysis Tree-LSTMs (Tai et al. 2015) (SST) Part of speech tagging Bi-directional LSTM-CRF (PTB-WSJ) (Huang et al. 2015)

  6. Tackling Obstacle 1: Dynamic Memory Network Episodic Memory Answer module 2 2 2 2 2 2 2 2 e 1 e 2 e 3 e 4 e 5 e 6 e 7 e 8 Module 2 m 0.0 0.3 0.0 0.0 0.0 0.9 0.0 0.0 y 1 1 1 1 1 1 1 1 e 1 e 2 e 3 e 4 e 5 e 6 e 7 e 8 > a S w O l l E 1 a 0.3 0.0 0.0 0.0 0.0 0.0 1.0 0.0 m h < Input Module Question Module s 1 s 2 s 3 s 4 s 5 s 6 s 7 s 8 q w w 1 T . . . . . . . . ? e y e y l m n n l l r a r a a l e e a e o e w w b h d b h h o t c l l r o t l t l o r t a a a o k d i l o g k h l h f l e a f i e e m b e e b e e h h h h t h h e o t t e t t t h t o o s h o o o t n f t i t t t t o w e e t k t d t n r o t h n o c e e g d t d e a e h l w e b l w e t W y t v o v u r t y a o g n n a p r M m h e r n a n t o w h M n y h J o h a r o a J o r J M d J n a S

  7. The Modules: Episodic Memory Semantic Memory Answer module Episodic Memory 2 2 2 2 2 2 2 2 e 1 e 2 e 3 e 4 e 5 e 6 e 7 e 8 Module Module 2 m 0.0 0.3 0.0 0.0 0.0 0.9 0.0 0.0 (Glove vectors) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 hallway e 1 e 1 e 2 e 2 e 3 e 3 e 4 e 4 e 5 e 5 e 6 e 6 e 7 e 7 e 8 e 8 <EOS> 1 1 0.3 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 m m Input Module Question Module s 1 s 2 s 3 s 4 s 5 s 6 s 7 s 8 q w w 1 T Where is the fooball? Mary got the milk there. John moved to the bedroom. Sandra went back to the kitchen. Mary travelled to the hallway. John got the football there. John went to the hallway. John put down the football. Mary went to the garden. # = 𝑕 " # ℎ "+, # 𝐻𝑆𝑉 𝑡 " , ℎ "+, # # ℎ " + 1 − 𝑕 " Last hidden state: m t

  8. The Modules: Episodic Memory • Gates are activated if sentence relevant to the question or memory # = [𝑡 " ∘ 𝑟 ; 𝑡 " ∘ 𝑛 #+, ; |𝑡 " − 𝑟| ; |𝑡 " − 𝑛 #+, |] 𝑨 " • When the end of the input is reached, the relevant facts are summarized in another GRU

  9. Related work Sequence to Sequence (Sutskever et al. 2014) • Neural Turing Machines (Graves et al. 2014) • Teaching Machines to Read and Comprehend (Hermann et al. 2015) • Learning to Transduce with Unbounded Memory (Grefenstette 2015) • Structured Memory for Neural Turing Machines (Wei Zhang 2015) • Memory Networks (Weston et al. 2015) • End to end memory networks (Sukhbaatar et al. 2015) • à Main difference: Sequence models for all functions in DMN, allowing for greater generality of tasks that be ”answered”

  10. Comparison to MemNets Similarities: • MemNets and DMNs have input, scoring, attention and response mechanisms Differences: • For input representations MemNets use bag of word, nonlinear or linear embeddings that explicitly encode position • MemNets iteratively run functions for attention and response • DMNs show that neural sequence models can be used for input representation, attention and response mechanisms à naturally captures position and temporality • Enables broader range of applications

  11. Analysis of Number of Episodes • How many attention + memory passes are needed in the episodic memory? • Results on Babi dataset and Stanford Sentiment Max sentiment task 3 task 7 task 8 passes (fine grain) three-facts count lists/sets 0 pass 0 48.8 33.6 50.0 1 pass 0 48.8 54.0 51.5 2 pass 16.7 49.1 55.6 52.1 3 pass 64.7 83.4 83.4 50.1 5 pass 95.2 96.9 96.5 N/A

  12. Analysis of Attention for Sentiment • Sharper attention when 2 passes are allowed. • Examples that are wrong with just one pass

  13. Analysis of Attention for Sentiment • Examples where full sentence context from first pass changes attention to words more relevant for final prediction

  14. Analysis of Attention for Sentiment • Examples where full sentence context from first pass changes attention to words more relevant for final prediction

  15. Analysis of Attention for Sentiment

  16. Modularization Allows for Different Inputs Answer Answer Episodic Memory Episodic Memory Kitchen Palm Input Module Input Module Question Question Where is John moved to the What kind garden. the of tree is John got the apple there. apple? in the John moved to the kitchen. backgrou Sandra picked up the nd? milk there. John dropped the apple. John moved to the office. (a) Text Question-Answering (b) Visual Question-Answering Dynamic Memory Networks for Visual and Textual Question Answering, Caiming Xiong, Stephen Merity, Richard Socher

  17. Input Module for Images Input Module Input fusion GRU GRU GRU layer GRU GRU GRU W W W embedding Feature Visual feature extraction 512 CNN 14 14

  18. Accuracy: Visual Question Answering test-dev test-std VQA test-dev and Method All Y/N Other Num All test-standard: VQA • Antol et al. (2015) Image 28.1 64.0 3.8 0.4 - • ACK Wu et al. (2015); Question 48.1 75.7 27.1 36.7 - • iBOWIMG - Zhou et al. Q+I 52.6 75.6 37.4 33.7 - (2015); LSTM Q+I 53.7 78.9 36.4 35.2 54.1 • DPPnet - Noh et al. ACK 55.7 79.2 40.1 36.1 56.0 (2015); D-NMN - Andreas iBOWIMG 55.7 76.5 42.6 35.0 55.9 et al. (2016); DPPnet 57.2 80.7 41.7 37.2 57.4 • SAN - Yang et al. (2015) D-NMN 57.9 80.5 43.1 37.4 58.0 SAN 58.7 79.3 46.1 36.6 58.9 DMN+ 60.3 80.5 48.3 36.8 60.4

  19. Attention Visualization Answer: metal What is this sculpture What color are Answer: green made out of ? the bananas ? What is the pattern on the Answer: stripes Did the player hit Answer: yes cat ' s fur on its tail ? the ball ? Figure 4. Examples of qualitative results of attention for VQA. Each image (left) is shown

  20. Attention Visualization What is the main color on Answer: blue What type of trees are in Answer: pine the bus ? the background ? How many pink flags Answer: 2 Is this in the wild ? Answer: no are there ?

  21. Attention Visualization picture taken ? Which man is dressed more Answer: right Who is on both photos ? Answer: girl flamboyantly ? What is the boy holding ? Answer: surfboard What time of day was this Answer: night picture taken ? shown with the attention that the episodic memory

  22. • DEMO

  23. Obstacle 2: Joint Many-task Learning • Fully joint multitask learning* is hard: – Usually restricted to lower layers – Usually helps only if tasks are related – Often hurts performance if tasks are not related * meaning: same decoder/classifier and not only transfer learning with source target task pairs

  24. Tackling Joint Training • A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks Kazuma Hashimoto, Entailment Caiming Xiong, Entailment Entailment encoder encoder semantic Yoshimasa Tsuruoka & Relatedness level Richard Socher Relatedness Relatedness encoder encoder syntactic DEP DEP level • Final Model à word level CHUNK CHUNK POS POS word representation word representation Sentence 1 Sentence 2

Recommend


More recommend