a decomposable attention model for natural language
play

A Decomposable Attention Model for Natural Language Inference - PowerPoint PPT Presentation

A Decomposable Attention Model for Natural Language Inference Ankur Parikh, Oscar Tackstrom, Dipanjan Das, Jakob Uszkoreit Presented by: Xikun Zhang University of Illinois, Urbana-Champaign Natural Language Inference A key part of our


  1. A Decomposable Attention Model for Natural Language Inference Ankur Parikh, Oscar Tackstrom, Dipanjan Das, Jakob Uszkoreit Presented by: Xikun Zhang University of Illinois, Urbana-Champaign

  2. Natural Language Inference A key part of our understanding of natural language is the ability to u understand sentence semantics. Semantic Entailment or, more popularly, the task of Natural u Language Inference (NLI) is a core Natural Language Understanding task (NLU). While it poses as a classification task, it is uniquely well-positioned to serve as a benchmark task for research on NLU. It attempts to judge whether one sentence can be inferred from another. More specifically, it tries to identify the relationship between the u meanings of a pair of sentences, called the premise and the hypothesis. The relationship could be one of the following: Entailment: the hypothesis is a sentence with a similar meaning as the o premise Contradiction: the hypothesis is a sentence with a contradictory meaning o Neutral: the hypothesis is a sentence with mostly the same lexical items o as the premise but a different meaning.

  3. Natural Language Inference (Cont’d) u Determine entailment/contradiction/neutral relationships between a premise and a hypothesis. Bob is in his room, but because of the thunder Premise and lightning outside, he cannot sleep. entailment Hypothesis 1 Bob is awake. Hypothesis 2 It is sunny outside. contradiction neutral Hypothesis 3 Bob has a big house. 3

  4. Recent Work (Sentence Encoding) words 4

  5. Recent Work (Sentence Encoding) word vector representation s 5

  6. Recent Work (Sentence Encoding) representation layer 6

  7. Recent Work (Sentence Encoding) similarity layer 7

  8. Recent Work (Sentence Encoding) output 8

  9. Recent Work (Sentence Encoding) Lot of papers using this family of neural architectures: Hu et al. (2014) Bowman et al. (2015) He et al. (2015) 9

  10. Recent Work (Seq2Seq) encoder recurrent neural network How are you <EOS> model for machine translation (Sutskever et al. 2014, Cho et al. 2014) 1 0

  11. Recent Work (Seq2Seq) decoder recurrent neural network M I am fine <EOS> How are you <EOS> model for machine translation (Sutskever et al. 2014, Cho et al. 2014) 11

  12. Recent Work decoder recurrent neural network M I am fine <EOS> How are you <EOS> sequence to sequence model with attention 12 (Bahdanau et al. 2014)

  13. decoder recurrent neural network M I am fine <EOS> How are you <EOS> machine translation (Bahdanau et al. 2014) reading comprehension (Hermann et al. 2015) sentence similarity/entailment 13 (Rocktaschel et al. 2015, Wang and Jiang 2015, Cheng et al. 2016)

  14. Motivation for this Work u Alignment plays key role in many NLP tasks: u Machine translation [Koehn, 2009] u Sentence Similarity [Haghighi et al., 2005; Koehn, 2009; Das and Smith, 2009, Chang et al., 2010; Fader et al., 2013] u Natural Language Inference [Marsi and Krahmer, 2005; McCartney et al., 2006; Hickl and Bensley, 2007; McCartney et al., 2008] u Semantic Parsing [Andreas et al., 2013] u Attention is the neural counterpart to alignment [Bahdanau et al. 2014] 14

  15. Motivation for this Work How well can we do with just alignment/attention, without building complex sentence representations? Bob is in his room, but because of the thunder and lightning Premise outside, he cannot sleep. Hypothesis 1 Bob is awake. Bob is in his room, but because of the thunder and lightning Premise outside, he cannot sleep. Hypothesis 2 It is sunny outside. 15

  16. Decomposable Attention 3. Aggregate 1. Attend 2. Compare park outside someone = outside G ( , ) playing music in alice someone = G ( , ) the H ( ) = + + + … … park flute+ solo music alice G ( , ) = plays a flute solo flute music F ( , ) 16

  17. Step 1: Attend Unnormalized attention weights: In practice, sub-phrase in sub-phrase in sentence 2 aligned to sentence 1 aligned to 17

  18. Attend 2: Compare Separately compare aligned subphrases: is a feed forward network 18

  19. Step 3: Aggregate u Combine results and classify. In practice, H is a feed forward neural network + linear layer + sigmoid 19

  20. Decomposable Attention 3. Aggregate 1. Attend 2. Compare park outside someone = outside G ( , ) playing music in alice someone = G ( , ) the H ( ) = + + + … … park flute+ solo music alice G ( , ) = plays a flute solo flute music F ( , ) 20

  21. Beyond Unordered Words u Intra-Attention - Construct a “context” using an extra attention layer u Uses weak word order information via distance bias The distance-sensitive bias terms ! "#$ ∈ ℝ provides the model with a minimal amount of sequence information, while remaining parallelizable. These terms are bucketed such that all 21 distances greater than 10 words share the same bias.

  22. Empirical Results Dataset: Stanford Natural Language Inference Corpus (SNLI, Bowman et al. 2015) http://nlp.stanford.edu/projects/snli/ 549,367 sentence pairs for training 9,842 pairs for development 9,824 pairs for testing 22

  23. Empirical Results Accuracy 78 Lexicalized Classifiers Bowman et al. (2015) 3M 81 LSTM RNN Encoders Bowman et al. (2016) 15M 81 Pretrained GRU Encoders Vendrov et al. (2015) 3.5M 82 Tree-Based CNN Encoders Mou et al. (2015) 3.7M 83 SPINN-PI Encoders Bowman et al. (2016) 252K 84 LSTM with Attention Rocktaschel et al. (2016) 1.9M 86 mLSTM Wang and Jiang (2016) 3.4M 86 LSTMN w/ Attention Fusion Cheng et al. (2016) 382K 86 This Work 23 87 This Work with Self Attention 582K

  24. Empirical Results 92 92 91 Neutral Entailment Contradiction 88 87 87 86 86 Accuracy 84 84 82 81 24

  25. Error Analysis - Wins Sentence 1 Sentence 2 DA (vanilla) DA (intra att.) SPINN-PI mLSTM Gold Two kids are standing Two kids enjoy in the ocean hugging their day at the N N E E N each other. beach. A dancer in costumer the man is performs on stage N N E E N captivated while a man watches. The fountain is They are sitting on the splashing the N N C C N edge of a fountain persons seated 25

  26. Error Analysis - Losses Sentence 1 Sentence 2 DA (vanilla) DA (intra att.) SPINN-PI mLSTM Gold Two dogs play with Dogs are watching N C C C C tennis ball in field. a tennis match. Two kids begin to Two penguins make a snowman on a making a N C C C C sunny winter day. snowman. The horses pull the carriage, holding Horses ride in a people carriage pulled by E E C C C and a dog, through a dog. the rain. 26

  27. Headroom Sentence 1 Sentence 2 DA (vanilla) DA (intra att.) SPINN-PI mLSTM Gold A woman closes her The woman has eyes as she plays her E E E E C her eyes open cello. Two women having Three women are drinks and smoking E E E E C at a bar. cigarettes at the bar. A band playing with A band watches E E E E C fans watching. the fans play 27

  28. Conclusion u We presented a simple attention-based approach to text similarity that is trivially parallelizable. u Our results suggest that for at least the SNLI task pairwise comparisons are relatively more important than global sentence-level representations 28

  29. Thank You 29

Recommend


More recommend