a recurrent bert based model for question generation
play

A Recurrent BERT-based Model for Question Generation Ying-Hong Chan - PDF document

A Recurrent BERT-based Model for Question Generation Ying-Hong Chan Yao-Chung Fan Department of Computer Science Department of Computer Science National Chung Hsing University National Chung Hsing University Taichung, Taiwan Taichung, Taiwan


  1. A Recurrent BERT-based Model for Question Generation Ying-Hong Chan Yao-Chung Fan Department of Computer Science Department of Computer Science National Chung Hsing University National Chung Hsing University Taichung, Taiwan Taichung, Taiwan harry831120@gmail.com yfan@nchu.edu.tw Abstract 2017) mainly use only sentence-level information as a context text for question generation. When In this study, we investigate the employment applied to a paragraph-level context, the existing of the pre-trained BERT language model to models show significant performance degradation. tackle question generation tasks. We intro- However, as indicated by (Du et al., 2017), provid- duce three neural architectures built on top ing paragraph-level information can improve QG of BERT for question generation tasks. The performance. For handling long context, the work first one is a straightforward BERT employ- (Zhao et al., 2018) introduces a maxout pointer ment, which reveals the defects of directly us- ing BERT for text generation. Accordingly, mechanism with a gated self-attention encoder for we propose another two models by restructur- processing paragraph-level input. The work re- ing our BERT employment into a sequential ports state-of-the-art performance. manner for taking information from previous Recently, the NLP community has seen the ex- decoded results. Our models are trained and citement around neural learning models that make evaluated on the recent question-answering use of pre-trained language models (Devlin et al., dataset SQuAD. Experiment results show that our best model yields state-of-the-art perfor- 2018; Radford et al., 2018). The latest develop- mance which advances the BLEU 4 score of ment is BERT, which has shown significant perfor- the existing best models from 16.85 to 22.17. mance improvement over various natural language understanding tasks, such as document summa- 1 Introduction rization, document classification, etc. Given the success of the BERT model, a natu- Question generation (QG) problem, which takes ral question follows: can we leverage the BERT a context text and an answer phase as input and models to further advance the state-of-the-art for generates a question corresponding to the given QG tasks? By our study, the answer is yes. In- answer phase, has received tremendous interests tuitively, the BERT employment brings two ad- in recent years from both industrial and academic vantages for tackling the QG problem. First, as natural language processing communities (Zhao reported by studies (Devlin et al., 2018; Rad- et al., 2018; Zhou et al., 2017; Du et al., 2017). ford et al., 2018), employing pre-training language The state-of-the-art model mainly adopts neural models has shown to be effective for improv- QG approaches: training a neural network based ing NLP tasks. Second, the BERT model is a on sequence-to-sequence framework. So far, the stack of multi-layer Transformer block (Vaswani best performing result is reported in (Zhao et al., et al., 2017), which eschews recurrence structure 2018), which advances the state-of-the-art results and relies entirely on self-attention mechanism from 13.9 to 16.85 (BLEU 4). to draw global dependencies between input se- The existing QG models mainly rely on recur- quences. With the Transformer blocks, processing rent neural networks (RNN), e.g. long short-term paragraph-level contexts for QG are therefore to memory LSTM network (Hochreiter and Schmid- be possible. huber, 1997) or gated recurrent unit (Chung et al., 2014), augmented by attention mechanisms (Lu- In this study, we investigate the employment of ong et al., 2015). However, the inherent sequential the pre-trained BERT language model to tackle nature of the RNN models suffers from the prob- question generation tasks. We introduce three neu- lem of handling long sequences. Therefore, the ral architectures built on top of BERT for question existing QG models (Du et al., 2017; Zhou et al., generation tasks. The first one is a straightforward 154 Proceedings of the Second Workshop on Machine Reading for Question Answering , pages 154–162 Hong Kong, China, November 4, 2019. c � 2019 Association for Computational Linguistics

  2. BERT employment, which reveals the defects of The rest of this paper is organized as follows. directly using BERT for text generation. As will In Section 2, we discuss the related work for QG be shown in the experiment, the naive BERT em- generation. In Section 3, we review the BERT ployment (called BERT-QG, BERT Question Gen- model (the basic building block for our model). In eration) offers poor performance, as by construc- Section 4, we introduce our models for question tion, BERT produces all tokens at a time without generation, and Section 5 provides the experiment considering decoding results in previous steps. We results. In Section 6, we conclude the paper and find that the question generated by the naive em- discuss future work. ployment is not even a readable sentence. As a re- 2 Related Work sult, we propose a sequential question generation model based on BERT as our second model called The question generation has been mainly tackled BERT-SQG (BERT-Sequential Question Genera- with two types of approaches. One is built on tion) for taking information from previous de- top of heuristic rules that creates questions with coded results. As will shown in the performance manually constructed template and ranks the gen- evaluation, the BERT-SQG model outperforms the erated results (Heilman and Smith, 2010; Mazidi exiting best model (Zhao et al., 2018) by advanc- and Nielsen, 2014; Labutov et al., 2015). In (Lab- ing the state-of-the-art results from 16.85 to 21.04 utov et al., 2015), the authors propose to use a (BLEU 4). crowdsourcing policy to generate question tem- Furthermore, we propose an augmented model plates from a large amount of text to generate called BERT-HLSQG (Highlight Sequential Ques- question. The research in (Heilman and Smith, tion Generation) for further enhancing the per- 2010) proposes to use manually written rules to formance of the BERT-SQG. Our BERT-HLSQG perform a sequence of general-purpose syntac- model works by marking the answer with [HL] tic transformations to turn declarative sentences tokens to avoid possible ambiguity in specifying into questions. The generated questions are then answers for question generation. Such design fur- ranked by a logistic regression model to select ther improves the BLEU 4 score from 21.04 to the qualified questions for later use. And, the re- 22.17. search in (Yao et al., 2012) proposes to convert The contribution of this paper is summarized as the sentence into a Minimal Recursion Seman- follows. tics (MRS) representation through linguistic pars- ing, and then construct semantic structures and • In this paper, we investigate the employment grammar rules from the representation to gener- of using the BERT model for QG tasks. We ate questions through the manually designed rules. show that the sequential structure is impor- Those approaches heavily depend on human ef- tant for the decoding of text generation. Aim- fort, which makes them hard to scale up and being ing at this point, we propose two sequential generalized in various domains. question generation models based on BERT The other one, which is becoming increasingly in this paper. popular, is to train an end-to-end neural network from scratch by using sequence to sequence or • Furthermore, we propose a simple but ef- encoder-decoder framework, e.g. (Du et al., 2017; fective input encoding scheme, which inserts Yuan et al., 2017; Song et al., 2017; Zhou et al., special highlighting tokens [HL] before and 2017; Zhao et al., 2018). after the given answer span, to address the (Du et al., 2017) pioneered the work of au- ambiguity issue when an answer phase ap- tomatic QG tasks using an end-to-end trainable pears multiple times in the question. seq2seq neural model. Automatic and human eval- • Extensive experiments are conducted using uation results showed that the proposed model out- performed the previous rule-based systems (Heil- benchmark datasets, and the experiment re- man and Smith, 2010; Rus et al., 2010). However, sults show the effectiveness of our question in their study, there was no control about which generation model. Our model outperforms part of the context text the generated question was the existing best models (Zhao et al., 2018) and pushes the state-of-the-art result from asking about. 16.85 to 22.17 (BLEU 4). On the other hands, the work (Zhou et al., 2017; 155

Recommend


More recommend