attention for machine comprehension
play

Attention for Machine Comprehension Made by : Rishab Goel Based on - PowerPoint PPT Presentation

Attention for Machine Comprehension Made by : Rishab Goel Based on slides by: Alex Graves, Hien Quoc, Renjie Liao Highway Networks Benefits ... Benefits ... Importance ... For training very deep architectures By allowing better information


  1. Attention for Machine Comprehension Made by : Rishab Goel Based on slides by: Alex Graves, Hien Quoc, Renjie Liao

  2. Highway Networks

  3. Benefits ...

  4. Benefits ...

  5. Importance ... For training very deep architectures By allowing better information flow Better optimization Intuition : linear transformation/input suffice for learning, language at higher level of http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/ abstraction???

  6. Hien Quoc Dang

  7. Idea of Maxout Hien Quoc Dang

  8. Intuitions Inspired from dropout Similar to bagging but integrated as a part of single network Hien Quoc Dang

  9. Idea of Maxout ... Hien Quoc Dang

  10. Idea of Maxout ... Hien Quoc Dang

  11. Comparison to Rectifiers Hien Quoc Dang

  12. Why Maxout Work ? Hien Quoc Dang

  13. Slides : Santi Pascual

  14. LSTMs ... Chris Olah’s blog

  15. Need for Attention The embeddings not sufficient to encode information over long distances Helps to attend to important patch of data Interpretability to the model

  16. Attentive Reader

  17. DYNAMIC COATTENTION NETWORKS FOR QUESTION ANSWERING Authors : Caiming Xiong, Victor Zhong, Richard Socher

  18. Introduction Machine Comprehension No knowledge base required Till SQUAD no large scale, natural dataset Cloze style datasets like CNN/Mail Daily Synthetic/small size

  19. About SQuAD Consists questions on a set of Wikipedia articles Wh type questions The answer is a segment of text, or span Source : Rajpurkar et al.

  20. Model in nutshell ... Socher et al

  21. Doc and Query Encoder Socher et al

  22. Liked ● Gagan Socher et al

  23. Liked : all Dynamic Decoder Socher et al

  24. Highway Maxout Network ... Socher et al

  25. Socher et al

  26. Socher et al

  27. Disliked ● Gagan (pt. 3) Implementation ● Akshay (pt. 4) claim not proven 1. CoreNLP for preprocessing 2. GloVe word vectors pretrained on 840B Common Crawl corpus 3. OOV set to 0 4. Sentinel vectors randomly initialized, optimized during training

  28. Iterative process visualisation ... Socher et al

  29. Socher et al

  30. Disliked ● Haroun (ensemble gain too Results much) Socher et al

  31. Liked ● Barun ● Nupur Socher et al

  32. Liked Performance across diff. types of ques. ● Shantanu Socher et al

  33. Liked ● Prachi Ablation studies ... Socher et al

  34. Predictions Socher et al

  35. Logistic Regression Prediction : Theatre Museum Socher et al

  36. Comments : Trouble decoding multiple intuitive answer Socher et al

  37. Cons Lack error analysis, need more ablation studies[Barun, Surag] System give extractive answer and not abstractive[Nupur] Do not compare HMN and MN[all] Unintuitive decoder[Dinesh]

  38. Doubts ... Why HMN worked out? Role of sentinel vectors?? Error propagation in argmax function Maxout for LSTMs as well (not clear) Use multiple initialisation of start and end pointers ( how ??)

  39. Extensions ... Use approach for others datasets like CNN/Daily Mail and MS COCO QA [Barun] Use different attention, Match LSTM [Barun] Bi-directional attention [Gagan] Use iterative idea to visual QA, classification, NER, SRL etc [Akshay, Surag] Find synonyms[Haroun]

  40. Extensions ... Combine char2vec and word2vec embeddings to represent the document and query

  41. Thanks!

Recommend


More recommend