Attention for Machine Comprehension Made by : Rishab Goel Based on slides by: Alex Graves, Hien Quoc, Renjie Liao
Highway Networks
Benefits ...
Benefits ...
Importance ... For training very deep architectures By allowing better information flow Better optimization Intuition : linear transformation/input suffice for learning, language at higher level of http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/ abstraction???
Hien Quoc Dang
Idea of Maxout Hien Quoc Dang
Intuitions Inspired from dropout Similar to bagging but integrated as a part of single network Hien Quoc Dang
Idea of Maxout ... Hien Quoc Dang
Idea of Maxout ... Hien Quoc Dang
Comparison to Rectifiers Hien Quoc Dang
Why Maxout Work ? Hien Quoc Dang
Slides : Santi Pascual
LSTMs ... Chris Olah’s blog
Need for Attention The embeddings not sufficient to encode information over long distances Helps to attend to important patch of data Interpretability to the model
Attentive Reader
DYNAMIC COATTENTION NETWORKS FOR QUESTION ANSWERING Authors : Caiming Xiong, Victor Zhong, Richard Socher
Introduction Machine Comprehension No knowledge base required Till SQUAD no large scale, natural dataset Cloze style datasets like CNN/Mail Daily Synthetic/small size
About SQuAD Consists questions on a set of Wikipedia articles Wh type questions The answer is a segment of text, or span Source : Rajpurkar et al.
Model in nutshell ... Socher et al
Doc and Query Encoder Socher et al
Liked ● Gagan Socher et al
Liked : all Dynamic Decoder Socher et al
Highway Maxout Network ... Socher et al
Socher et al
Socher et al
Disliked ● Gagan (pt. 3) Implementation ● Akshay (pt. 4) claim not proven 1. CoreNLP for preprocessing 2. GloVe word vectors pretrained on 840B Common Crawl corpus 3. OOV set to 0 4. Sentinel vectors randomly initialized, optimized during training
Iterative process visualisation ... Socher et al
Socher et al
Disliked ● Haroun (ensemble gain too Results much) Socher et al
Liked ● Barun ● Nupur Socher et al
Liked Performance across diff. types of ques. ● Shantanu Socher et al
Liked ● Prachi Ablation studies ... Socher et al
Predictions Socher et al
Logistic Regression Prediction : Theatre Museum Socher et al
Comments : Trouble decoding multiple intuitive answer Socher et al
Cons Lack error analysis, need more ablation studies[Barun, Surag] System give extractive answer and not abstractive[Nupur] Do not compare HMN and MN[all] Unintuitive decoder[Dinesh]
Doubts ... Why HMN worked out? Role of sentinel vectors?? Error propagation in argmax function Maxout for LSTMs as well (not clear) Use multiple initialisation of start and end pointers ( how ??)
Extensions ... Use approach for others datasets like CNN/Daily Mail and MS COCO QA [Barun] Use different attention, Match LSTM [Barun] Bi-directional attention [Gagan] Use iterative idea to visual QA, classification, NER, SRL etc [Akshay, Surag] Find synonyms[Haroun]
Extensions ... Combine char2vec and word2vec embeddings to represent the document and query
Thanks!
Recommend
More recommend