semi supervised question retrieval with gated convolutions
play

Semi-supervised Question Retrieval with Gated Convolutions Tao Lei - PowerPoint PPT Presentation

Semi-supervised Question Retrieval with Gated Convolutions Tao Lei joint work with Hrishikesh Joshi, Regina Barzilay, Tommi Jaakkola, Kateryna Tymoshenko, Alessandro Moschitti and Llus Mrquez NAACL 2016 QCRI/MIT-CSAIL Annual Meeting


  1. Semi-supervised Question Retrieval with Gated Convolutions Tao Lei joint work with Hrishikesh Joshi, Regina Barzilay, Tommi Jaakkola, 
 Kateryna Tymoshenko, Alessandro Moschitti and Lluís Màrquez NAACL 2016 QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 ‹#› ‹#›

  2. Our Task Find similar ques.ons given the user’s input ques.on title body question from Stack Exchange AskUbuntu QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 2 ‹#› ‹#›

  3. Our Task Find similar ques.ons given the user’s input ques.on user-marked similar question question from Stack Exchange AskUbuntu Our goal: automate this process as a solu.on for QA QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 3 ‹#› ‹#›

  4. Challenges • Mul.-sentence text contains irrelevant details Title: How can I boot Ubuntu from a USB ? Body: I bought a Compaq pc with Windows 8 a few months ago and now I want to install Ubuntu but still keep Windows 8. I tried Webi but when my pc restarts it read ERROR 0x000007b. I know that Windows 8 has a thing about not letting you have Ubuntu ... Title: When I want to install Ubuntu on my laptop I’ll have to erase all my data. “Alonge side windows” doesnt appear Body: I want to install Ubuntu from a Usb drive. It says I have to erase all my data but I want to install it along side Windows 8. The “Install alongside windows” option doesn’t appear … • Forum user annota.on is limited and noisy (more on this later) QCRI/MIT-CSAIL Annual Meeting – March 2015 4 QCRI/MIT-CSAIL Annual Meeting – March 2014 ‹#› ‹#›

  5. Solution (1) a model to better represent the question text (2) semi-supervised training to leverage raw text data QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 5 ‹#› ‹#›

  6. Model Model Architecture*: Choice of encoder: LSTM, GRU, CNN … or: cosine similarity pooling pooling ⇣ ⌘ c (3) λ t � c (2) c (2) = + (1 � λ t ) � t − 1 + W 3 x t t − 1 t ⇣ ⌘ c (2) λ t � c (2) c (1) = + (1 � λ t ) � t − 1 + W 2 x t encoder encoder t t − 1 c (1) λ t � c (1) = + (1 � λ t ) � ( W 1 x t ) t t − 1 tanh( c (3) h t = + b ) t question 1 question 2 question 1 question 2 Why this encoder (or equations)? How to understand it? *Other architectures possible: (Feng et. al. 2015), (Tan et. al. 2015) etc. QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 6 ‹#› ‹#›

  7. Sentence: “the movie is not that good” Neural Bag-of-words 
 Bag of words, TF-IDF (average embedding) not movie movie … + + + = good that good not is e t d o i v o n … o o m g QCRI/MIT-CSAIL Annual Meeting – March 2015 7 QCRI/MIT-CSAIL Annual Meeting – March 2014 ‹#› ‹#›

  8. Sentence: “the movie is not that good” Ngram Kernel CNNs (N=2) not that the movie that good is not movie is … Neural methods as a dimension-reduction of traditional methods QCRI/MIT-CSAIL Annual Meeting – March 2015 8 QCRI/MIT-CSAIL Annual Meeting – March 2014 ‹#› ‹#›

  9. Sentence: “the movie is not that good” String Kernel   0 not _ good the movie λ 0 the movie   is _ _ good  λ 2  is _ that   movie _ not   . .   . is not     not _ good not _ good λ 1 …   0 penalize skips bigger feature space λ ∈ (0 , 1) Neural model inspired by this kernel method ? QCRI/MIT-CSAIL Annual Meeting – March 2015 9 QCRI/MIT-CSAIL Annual Meeting – March 2014 ‹#› ‹#›

  10. “string” convolution is good the movie not that QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 10 ‹#› ‹#›

  11. “string” convolution is good the movie not that QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 11 ‹#› ‹#›

  12. “string” convolution is good the movie not that QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 12 ‹#› ‹#›

  13. Formulas in the case of 3gram ⇣ ⌘ c (3) λ · c (3) c (2) = + (1 − λ ) · t − 1 + W 3 x t t − 1 t ⇣ ⌘ c (2) λ · c (2) c (1) = + (1 − λ ) · t − 1 + W 2 x t t − 1 t c (1) λ · c (1) = + (1 − λ ) · ( W 1 x t ) t − 1 t tanh( c (3) h t = + b ) t QCRI/MIT-CSAIL Annual Meeting – March 2015 13 QCRI/MIT-CSAIL Annual Meeting – March 2014 ‹#› ‹#›

  14. Formulas in the case of 3gram ⇣ ⌘ c (3) λ · c (3) c (2) = + (1 − λ ) · t − 1 + W 3 x t t − 1 t ⇣ ⌘ c (2) λ · c (2) c (1) = + (1 − λ ) · t − 1 + W 2 x t t − 1 t c (1) λ · c (1) = + (1 − λ ) · ( W 1 x t ) t − 1 t tanh( c (3) h t = + b ) t penalize skip grams weighted average of 1grams (to 3grams) up to position t QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 14 ‹#› ‹#›

  15. Formulas ⇣ ⌘ c (3) λ · c (3) c (2) = + (1 − λ ) · t − 1 + W 3 x t t − 1 t ⇣ ⌘ c (2) λ · c (2) c (1) = + (1 − λ ) · t − 1 + W 2 x t t − 1 t c (1) λ · c (1) = + (1 − λ ) · ( W 1 x t ) t − 1 t tanh( c (3) h t = + b ) t c (3) λ = 0 : = W 1 x t − 2 + W 2 x t − 1 + W 3 x t (one-layer CNN) t QCRI/MIT-CSAIL Annual Meeting – March 2015 15 QCRI/MIT-CSAIL Annual Meeting – March 2014 ‹#› ‹#›

  16. Gated version ⇣ ⌘ c (3) λ t � c (2) c (2) = + (1 � λ t ) � t − 1 + W 3 x t t − 1 t ⇣ ⌘ c (2) λ t � c (2) c (1) = + (1 � λ t ) � t − 1 + W 2 x t t − 1 t c (1) λ t � c (1) = + (1 � λ t ) � ( W 1 x t ) t − 1 t tanh( c (3) h t = + b ) t σ ( Wx t + Uh t � 1 + b 0 ) λ t = adaptive decay controlled by gate QCRI/MIT-CSAIL Annual Meeting – March 2015 16 QCRI/MIT-CSAIL Annual Meeting – March 2014 ‹#› ‹#›

  17. Training • Amount of annotation is scarce # of unique questions 167,765 # of marked questions 12,584 # of marked pairs 16,391 forum users only identify a few similar pairs only 10% of the number unique questions Ideally, want to use all questions available QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 17 ‹#› ‹#›

  18. Pre-training Encoder-Decoder Network Encoder trained to pull out important (summarized) information </ s > encoder decoder … … < s > encode question body/title re-generate question title Pre-training recently applied to classification task • Semi-supervised Sequence Learning. Dai and Le. 2015 QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 18 ‹#› ‹#›

  19. Evaluation Set-up Dataset: AskUbuntu 2014 dump pre-train on 167k, fine-tune on 16k evaluate using 8k pairs (50/50 split for dev/test) Baselines: TF-IDF , BM25 and SVM reranker CNNs, LSTMs and GRUs Grid-search: learning rate, dropout, pooling, filter size, pre-training, … 5 independent runs for each config. > 500 runs in total QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 19 ‹#› ‹#›

  20. Overall Results MAP MRR 75.6 71.4 71.3 70.1 68.0 62.3 59.3 57.6 56.8 56.0 BM25 LSTM CNN GRU Ours Our improvement is significant QCRI/MIT-CSAIL Annual Meeting – March 2015 20 QCRI/MIT-CSAIL Annual Meeting – March 2014 ‹#› ‹#›

  21. Analysis MAP MRR P@1 75.6 72.9 70.7 62.3 62.0 60.7 59.1 58.2 56.6 full model w/o pretraining w/o body QCRI/MIT-CSAIL Annual Meeting – March 2015 21 QCRI/MIT-CSAIL Annual Meeting – March 2014 ‹#› ‹#›

  22. Pre-training MRRs quite different PPLs are close MRR on the dev set versus Perplexity on a heldout corpus QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 22 ‹#› ‹#›

  23. Decay Factor (Neural Gate) ⇣ ⌘ c (3) = λ � c (3) c (2) t − 1 + (1 � λ ) � t − 1 + W 3 x t t Analyze the weight vector over time QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 23 ‹#› ‹#›

  24. Case Study (using a scalar decay) QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 24 ‹#› ‹#›

  25. Case Study (using a scalar decay) QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 25 ‹#› ‹#›

  26. Case Study (using a scalar decay) QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 26 ‹#› ‹#›

  27. Conclusions • AskUbuntu data as a natural benchmark for retrieval and summarization tasks • Neural model with good intuition and understanding (e.g. attention) can potentially lead to good performance https://github.com/taolei87/askubuntu https://github.com/taolei87/rcnn QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 27 ‹#› ‹#›

  28. QCRI/MIT-CSAIL Annual Meeting – March 2015 QCRI/MIT-CSAIL Annual Meeting – March 2014 28 ‹#› ‹#›

Recommend


More recommend