1 IITB-Monash Research Academy, India 2 IIT Bombay, India 3 Monash University, Australia Automating reading comprehension by generating question and answer pairs Vishwajeet Kumar 1 Kireeti Boorla 2 Ganesh Ramakrishnan 2 Yuan-Fang Li 3
A system to automatically generate questions and answers from text. Sachin Tendulkar received the Arjuna Award in 1994 for his outstanding sporting achievement, the Rajiv Gandhi Khel Ratna award in 1997... 1. When did Sachin Tendulkar received the Arjuna Award? 2. which award did sachin tendular received in 1994 for his outstanding sporting achievement? 3. when did Sachin tendulkar received the Rajiv Gandhi Khel Ratna Award? Ans: 1997 1 Automatic question and answer generation Some text Questions Ans: 1994 Ans: Arjuna Award
How would someone tell that you have read this text? Sachin Ramesh Tendulkar is a former Indian cricketer and captain, widely regarded as one of the greatest batsmen of all time. He took up cricket at the age of eleven, made his Test debut on 15 November 1989 against Pakistan in Karachi at the age of sixteen, and went on to represent Mumbai domestically and India internationally for close to twenty-four years.............. 2 Motivation
Sachin Ramesh Tendulkar is a former Indian cricketer and captain, widely regarded as one of the greatest batsmen of all time. He took up cricket at the age of eleven, made his Test debut on 15 November 1989 against Pakistan in Karachi at the age of sixteen, and went on to represent Mumbai domestically and India internationally for close to twenty-four years.............. 2 Motivation How would someone tell that you have read this text?
Sachin Ramesh Tendulkar is a former Indian cricketer and captain, widely regarded as one of the greatest batsmen of all time. He took up cricket at the age of eleven, made his Test debut on 15 November 1989 against Pakistan in Karachi at the age of sixteen, and went on to represent Mumbai domestically and India internationally for close to twenty-four years.............. 2 Motivation How would someone tell that you have read this text?
• Answer Must be Unambiguous • Question must be challenging and well formed 3 Why is this problem Challenging? • Question Must be Relevant to the Text
• Question must be challenging and well formed 3 Why is this problem Challenging? • Question Must be Relevant to the Text • Answer Must be Unambiguous
3 Why is this problem Challenging? • Question Must be Relevant to the Text • Answer Must be Unambiguous • Question must be challenging and well formed
• Use crowd sourced templates such as What is X ? • Rules for declarative-to-interrogative sentence transformation • Only syntax is considered not semantics. • Rely heavily on NLP tools. • First approach towards question generation from text using neural network. • Uses vanilla Seq2Seq model for question generation. 4 Existing Work Template Based [Mazidi and Nielsen, 2014, Mostow and Chen, 2009] Syntax Based [Heilman, 2011] Vanilla Seq2Seq for Question Generation [Du et al., 2017]
Example: < Fires Creek , contained by, nantahala national forest> Which forest is Fires Creek in? Template based [Seyler et al., 2015] Factoid question generation using RNN [Serban et al., 2016] Generate question given a fact/triple from KB/Ontology. • Assumption: Facts are present in Domain dependent knowledge base. • Generates question using templates based on facts. • Propose generating factoid question generation from freebase triples(subject,relation,object). • Embeds fact using KG embedding techniques such as TransE. 5 Some other related work
Template based [Seyler et al., 2015] Factoid question generation using RNN [Serban et al., 2016] Generate question given a fact/triple from KB/Ontology. • Assumption: Facts are present in Domain dependent knowledge base. • Generates question using templates based on facts. • Propose generating factoid question generation from freebase triples(subject,relation,object). • Embeds fact using KG embedding techniques such as TransE. 5 Some other related work Example: < Fires Creek , contained by, nantahala national forest> ⇒ Which forest is Fires Creek in?
Generate question given a fact/triple from KB/Ontology. • Assumption: Facts are present in Domain dependent knowledge base. • Generates question using templates based on facts. • Propose generating factoid question generation from freebase triples(subject,relation,object). • Embeds fact using KG embedding techniques such as TransE. 5 Some other related work Example: < Fires Creek , contained by, nantahala national forest> ⇒ Which forest is Fires Creek in? Template based [Seyler et al., 2015] Factoid question generation using RNN [Serban et al., 2016]
• Do not generate answer corresponding to the question. • Mostly rule based or template based. • Overly simple set of linguistic features. 6 Limitations of previous approaches
• Do not generate answer corresponding to the question. • Mostly rule based or template based. • Overly simple set of linguistic features. 6 Limitations of previous approaches
• Do not generate answer corresponding to the question. • Mostly rule based or template based. • Overly simple set of linguistic features. 6 Limitations of previous approaches
• Sequence to sequence model with attention and augmented with rich set of • Pointer network based method for automatic answer selection. linguistic features and answer encoding 7 Our contribution
• Sequence to sequence model with attention and augmented with rich set of • Pointer network based method for automatic answer selection. linguistic features and answer encoding 7 Our contribution
• Sequence to sequence model with attention and augmented with rich set of • Pointer network based method for automatic answer selection. linguistic features and answer encoding 7 Our contribution
• Sequence to sequence model with attention and augmented with rich set of • Pointer network based method for automatic answer selection. linguistic features and answer encoding 7 Our contribution
• Sequence to sequence model with attention and augmented with rich set of • Pointer network based method for automatic answer selection. linguistic features and answer encoding 7 Our contribution
8 Automatic question and answer generation using seq2seq model with pointer network Named Entity Selection Answer Selection Donald Trump is the Current President of United States of America. Pointer Network Donald Trump Answer and Features Encoding 0.3 0.4 0.5 0.6 0.8 0.7 0.9 0.1 ... .. .. Sentence Encoder Thought Vector for the sentence Question Who is the current president of United States Decoder of America ? Figure 1: High level architecture of our question generation model
n j , create representation ( R ) • For each NE, NE • R is fed to MLP along with n h s softmax R i W • P NE i S 9 mean , h s of named entity being pivotal answer a . to get probability B where h s h s h ne a Most relevant answer to ask question about h ne mean = n i Named Entity Selection Named Entity Selection Answer Selection • Sentence S = ( w 1 , w 2 , ..., w n ) is encoded using a 2-layer Pointer Network LSTM network into hidden states H = ( h s 1 , h s 2 , ..., h s n ) . Answer and Features Encoding Sentence Encoder n is final state mean is the mean of all activations mean is mean of activations in NE span ( h s j ) i , ..., h s Question Decoder
• R is fed to MLP along with n h s softmax R i W • P NE i S h s a Most relevant answer to ask question about h ne h s where h s B of named entity being pivotal answer a . to get probability mean 9 Named Entity Selection Named Entity Selection Answer Selection • Sentence S = ( w 1 , w 2 , ..., w n ) is encoded using a 2-layer Pointer Network LSTM network into hidden states H = ( h s 1 , h s 2 , ..., h s n ) . • For each NE, NE = ( n i , ..., n j ) , create representation ( R ) = < h ne mean > , Answer and Features Encoding Sentence Encoder n is final state mean is the mean of all activations mean is mean of activations in NE span ( h s j ) i , ..., h s Question Decoder
softmax R i W • P NE i S 9 of named entity being pivotal answer a . a Most relevant answer to ask question about h ne h s where h s B Named Entity Selection Named Entity Selection Answer Selection • Sentence S = ( w 1 , w 2 , ..., w n ) is encoded using a 2-layer Pointer Network LSTM network into hidden states H = ( h s 1 , h s 2 , ..., h s n ) . • For each NE, NE = ( n i , ..., n j ) , create representation ( R ) = < h ne mean > , Answer and • R is fed to MLP along with < h s n ; h s mean ; > to get probability Features Encoding Sentence Encoder n is final state mean is the mean of all activations mean is mean of activations in NE span ( h s j ) i , ..., h s Question Decoder
Recommend
More recommend