Neural Models for Key Phrase Extraction and Question Generation Sandeep Subramanian ♠♣ Tong Wang ♣ Xingdi Yuan ♣ Saizheng Zhang ♠ Yoshua Bengio ♠† Adam Trischler ♣ ♣ Microsoft Research, Montr´ ♠ MILA, Universit´ † CIFAR Senior Fellow eal e de Montr´ eal sandeep.subramanian.1@umontreal.ca Abstract To address this limitation, we propose and eval- uate neural models for automatic question-answer We propose a two-stage neural model pair generation that involves two inter-related to tackle question generation from doc- components: first, a system to identify candidate uments. First, our model estimates the answer entities or events (key phrases) within a probability that word sequences in a doc- passage or document (Becker et al., 2012); second, ument are ones that a human would a question generation module to construct ques- pick when selecting candidate answers by tions about a given key phrases. As a financially training a neural key-phrase extractor on more efficient and scalable alternative to the hu- the answers in a question-answering cor- man curation of QA datasets, the resulting system pus. Predicted key phrases then act as tar- can potentially accelerate further progress in the get answers and condition a sequence-to- field. sequence question-generation model with Specifically, We formulate the key phrase ex- a copy mechanism. Empirically, our key- traction component as modeling the probability of phrase extraction model significantly out- potential answers conditioned on a given docu- performs an entity-tagging baseline and ment, i.e., P ( a | d ) . Inspired by successful work existing rule-based approaches. We fur- in question answering, we propose a sequence-to- ther demonstrate that our question genera- sequence model that generates a set of key-phrase tion system formulates fluent, answerable boundaries . This model can flexibly select an ar- questions from key phrases. This two- bitrary number of key phrases from a document. stage system could be used to augment or To teach it to assign high probability to human- generate reading comprehension datasets, selected answers, we train the model on large- which may be leveraged to improve ma- scale, crowd-sourced question-answering datasets. chine reading systems or in educational We thus take a purely data-driven approach settings. to understand the priors that humans have when selecting answer candidates, working from the 1 Introduction premise that crowdworkers tend to select enti- Question answering and machine comprehension ties or events that interest them when formulat- has gained increased interest in the past few ing their own comprehension questions. If this years. An important contributing factor is the premise is correct, then the growing collection of emergence of several large-scale QA datasets (Ra- crowd-sourced question-answering datasets (Ra- jpurkar et al., 2016; Trischler et al., 2016) can be jpurkar et al., 2016; Trischler et al., 2016; Nguyen et al., 2016; Joshi et al., 2017). However, the cre- harnessed to learn models for key phrases of inter- est to human readers. ation of these datasets is a labour-intensive and ex- pensive process that usually comes at significant Given a set of extracted key phrases, we then financial cost. Meanwhile, given the complexity approach the question generation component by of the problem space, even the largest QA dataset modeling the conditional probability of a ques- tion given a document-answer pair, i.e., P ( q | a, d ) . can still exhibit strong biases in many aspects in- cluding question and answer types, domain cover- To this end, we use a sequence-to-sequence age, linguistic style, etc. model with attention (Bahdanau et al., 2014) and 78 Proceedings of the Workshop on Machine Reading for Question Answering , pages 78–88 Melbourne, Australia, July 19, 2018. c � 2018 Association for Computational Linguistics
the pointer-softmax mechanism (Gulcehre et al., encoder-decoder framework that is able both to 2016). This component is also trained to max- generate words from a vocabulary and point to imize the likelihood of questions estimated on a words from the document. Their model achieved QA dataset. When training this component, the state-of-the-art results on multiple keyword- model sees the ground truth answers from the extraction datasets. This model shares certain sim- dataset. ilarities with our key phrase extractor, i.e., using a Empirically, our proposed model for key phrase single neural model to learn the probabilities that extraction outperforms two baseline systems by words are key phrases. Since their focus was on a hybrid abstractive-extractive task in contrast to the a significant margin. We support these quantita- tive findings with qualitative examples of gener- purely extractive task in this work, a direct com- parison between works is difficult. ated question-answer pairs given documents. Yang et al. (2017) used rule-based methods to 2 Related Work extract potential answers from unlabeled text, and then generated questions given documents and ex- 2.1 Key Phrase Extraction tracted answers using a pre-trained question gener- An important aspect of question generation is ation model. The model-generated questions were identifying which elements of a given document then combined with human-generated questions are important or interesting to inquire about. Ex- for training question answering models. Experi- isting studies formulate key-phrase extraction in ments showed that question answering models can two steps. In the first, lexical features (e.g., part- benefit from the augmented data provided by their of-speech tags) are used to extract a key-phrase approach. candidate list exhibiting certain types (Liu et al., 2011; Wang et al., 2016; Le et al., 2016; Yang 2.2 Question Generation et al., 2017). In the second, ranking models are of- Automatic question generation systems are of- ten used to select a phrase from among the candi- ten used to alleviate (or eliminate) the burden of dates. Medelyan et al. (2009); Lopez and Romary human generation of questions to assess reading (2010) used bagged decision trees, while Lopez comprehension (Mitkov and Ha, 2003; Kunichika and Romary (2010) used a Multi-Layer Perceptron et al., 2004). Various NLP techniques have been (MLP) and Support Vector Machine to perform adopted in these systems to improve generation binary classification on the candidates. Mihalcea quality, including parsing (Heilman and Smith, and Tarau (2004); Wan and Xiao (2008); Le et al. 2010a; Mitkov and Ha, 2003), semantic role la- (2016) scored key phrases using PageRank . Heil- beling (Lindberg et al., 2013), and the use of lex- man and Smith (2010b) asked crowdworkers to icographic resources like WordNet (Miller, 1995; rate the acceptability of computer-generated nat- Mitkov and Ha, 2003). However, the majority ural language questions as quiz questions, and of the proposed methods resort to simple, rule- Becker et al. (2012) solicited quality ratings of text based techniques such as template-based slot fill- chunks as potential gaps for Cloze-style questions. ing (Lindberg et al., 2013; Chali and Golestanirad, These studies are closely related to our pro- 2016; Labutov et al., 2015) or syntactic transfor- posed work by the common goal of modeling mation heuristics (Agarwal and Mannem, 2011; the distribution of key phrases given a document. Ali et al., 2010) (e.g., subject-auxiliary inversion, The major difference is that previous studies begin (Heilman and Smith, 2010a)). These techniques with a prescribed list of candidates, which might generally do not capture the diversity of human significantly bias the distribution estimate. In con- generated questions. trast, we adopt a dataset that was originally de- signed for question answering, where crowdwork- To address this limitation, end-to-end-trainable ers presumably tend to pick entities or events that neural models have recently been proposed for interest them most. We postulate that the resulting question generation in both vision (Mostafazadeh distribution, learned directly from data, is more et al., 2016) and language. For the latter, Du likely to reflect the true relevance of potential an- et al. (2017) used a sequence-to-sequence model swer phrases. with an attention mechanism derived from the en- Recently, Meng et al. (2017) proposed a gen- coder states. Yuan et al. (2017) proposed a sim- erative model for key phrase prediction with an ilar architecture but further improved model per- 79
Recommend
More recommend