query focused abstractive summarization via incorporating
play

Query Focused Abstractive Summarization via Incorporating Query - PowerPoint PPT Presentation

Query Focused Abstractive Summarization via Incorporating Query Relevance and Transfer Learning with Transformer Models Md Tahmid Rahman Laskar 1,3 , Enamul Hoque 2 , Jimmy Huang 2,3 1 Department of Electrical Engineering and Computer Science, 2


  1. Query Focused Abstractive Summarization via Incorporating Query Relevance and Transfer Learning with Transformer Models Md Tahmid Rahman Laskar 1,3 , Enamul Hoque 2 , Jimmy Huang 2,3 1 Department of Electrical Engineering and Computer Science, 2 School of Information Technology, 3 Information Retrieval & Knowledge Management Research Lab, York University, Toronto, Canada

  2. Introduction 2

  3. Query Focused Abstractive Text Summarization • Problem Statement: A set of documents along with a query is given and the goal is to generate abstractive summaries from the document(s) based on the given query. • Abstractive summaries can contain novel words which were not appeared in the source document. Document: Even if reality shows were not enlightening, they generate massive revenues that can be used for funding more sophisticated programs. Take BBC for example, it offers entertaining reality shows such as total wipeout as well as brilliant documentaries. Query: What is the benefit of reality shows? Summary: Reality show generates revenues. 3

  4. Motivation • Challenges: • Lack of datasets. • Available datasets: Debatepedia, DUC. • Size of the available datasets is very small. • e.g. Debatepedia (Only around 10,000 training instances). • Few-shot Learning Problem. • Training a neural model end-to-end with small training data is challenging. • Solution: We introduce a transfer learning technique via utilizing the Transformer architecture [Vaswani et al., 2017]: • First, we pre-train a transformer-based model in a large generic abstractive summarization dataset. • Then, we fine-tune the pre-trained model in the target query focused abstractive summarization dataset. 4

  5. Contributions • Our proposed approach: • is the first work for the Query Focused Abstractive Summarization task where transfer learning is utilized with transformer. • sets a new state-of-the-art result in the Debatepedia dataset . • does not require any in-domain data augmentation for Few-shot Learning. • The source code of our proposed model is also made publicly available: https://github.com/tahmedge/QR-BERTSUM-TL-for-QFAS 5

  6. Literature Review 6

  7. Related Work • Generic Abstractive Summarization • Pointer Generator Network (PGN) [See et al., 2017]. • Sequence to sequence model based on Recurrent Neural Network (RNN). • Overcomes the repetition of the same word in the generated summaries via the copy and coverage mechanism. • BERT for SUMmarization (BERTSUM) [Liu and Lapata, 2019]. • Utilized BERT [Devlin et al., 2018] as the encoder and Decoder of Transformer as the Decoder. • Outperforms PGN for abstractive text summarization in several datasets. • Limitations: • Cannot incorporate query relevance. 7

  8. Related Work (cont’d) • Query Focused Abstractive Summarization (QFAS) • Diversity Driven Attention (DDA) Model [Nema et al., 2017]. • A neural encoder-decoder model based on RNN. • Introduced a new dataset for the QFAS task from Debatepedia. • Limitations: • Only performs well when the Debatepedia dataset is augmented. 8

  9. Related Work (cont’d) • Query Focused Abstractive Summarization (QFAS) • Relevance Sensitive Attention for Query Focused Summarization (RSA- QFS) [Baumel et al., 2018]. • First, pre-trained the PGN model on a generic abstractive summarization dataset. • Then, incorporated query relevance into the pre-trained model to predict query focused summaries in the target datasets. • Limitations: • Did not fine-tune their model on the QFAS datasets. • Obtained a very low Precision score in the Debatepedia dataset. 9

  10. Methodology 10

  11. Proposed Approach • Our proposed model works in two steps via utilizing transfer learning: Pre-train the BERTSUM model on a generic Step 1 abstractive summarization corpus (e.g. XSUM) Transfer Learning Incorporate query relevance in the pre- Step 2 trained model and fine-tune it for the QFAS task in the target domain (i.e. Debatepedia). • We choose the XSUM dataset for pre-training since the generated summaries in this dataset are more abstractive compared to the other datasets [Liu et al., 2019]. • To incorporate the query relevance in BERTSUM, we concatenate the query with the document as the input to the encoder [Lewis et al., 2019]. 11

  12. Proposed Approach (cont’d) Transformer Transformer Pre-train the BERTSUM Transfer Incorporate query relevance Decoder Decoder Learning into the pre-trained BERTSUM model on a large generic and fine-tune on the target abstractive summarization BERT Encoder BERT Encoder domain. dataset. [CLS] Sent 1 [SEP] [CLS] Sent 2 [SEP] … [CLS] Sent N [SEP] [CLS] Sent Q [SEP] [CLS] Sent 1 [SEP] … [CLS] Sent N [SEP] Input: Document{Sent 1 , Sent 2 , ... Sent N } Input: Query{Sent Q }, Document{Sent 1 ... Sent N } Document: The argument that too evil can be Query: What is the benefit of reality shows? prevented by assassination is highly questionable. The Document: Even if reality shows were not figurehead of an evil government is not necessarily the enlightening, they generate massive revenues that lynchpin that hold it together. Therefore, if Hitler had can be used for funding more sophisticated been assassinated, it is pure supposition that the Nazi programs. Take BBC for example, it offers would have acted any differently to how they did act. entertaining reality shows such as total wipeout as Summary: The idea that assassinations can prevent well as brilliant documentaries. injustice is questionable. Summary: Reality show generates revenues. (b) Fine-tune the pre-trained model for the (a) Pre-train the BERTSUM model on a QFAS task on the target domain. generic abstractive summarization corpus. 12

  13. Datasets 13

  14. Debatepedia Dataset Original Version: • Debatepedia dataset is created from the Debatepedia 1 website. • Previous work on this dataset for the QFAS task used 10-fold cross validation. Debatepedia (Original Dataset) Average Number of Instances in each fold Train Dev Test 10859 1357 1357 1 http://www.debatepedia.org/en/index.php/Welcome_to_Debatepedia%21 14

  15. Debatepedia Dataset (cont’d) Augmented Version: • We find in the official source code of the DDA model that dataset was augmented by creating more instances in the training set. • In the augmented dataset: • The average training instances in each fold were 95,843. • However, the test and the validation data were same as the original. Debatepedia (Augmented Dataset) Average Number of Instances in each fold Train Dev Test 95843 1357 1357 15

  16. Data Augmentation Approach: Debatepedia Dataset • We describe the data augmentation approach based on the source code 2 of DDA. • We find that for each training instance, 8 new training instances were created. • First a pre-defined vocabulary was created having 24,822 words with their synonyms. i. Then, each new training instance was created by randomly replacing: • M (1 ≤ M ≤ 3) words in each query. • N (10 ≤ N ≤ 17) words in each document. ii. Each word was replaced with their synonyms found in the pre-defined vocabulary. iii. When a word was not found in the pre-defined vocabulary, GloVe vocabulary was used. iv. Steps i, ii, and iii are repeated 8 times to create 8 new training instances. 2 Source Code of DDA: https://git.io/JeBZX 16

  17. Experimental Details 17

  18. Experimental Setup • Dataset: • We used the original version of the Debatepedia dataset to evaluate our proposed model. • Evaluation Metrics: • ROUGE scores with Precision, Recall, and F1 in terms of ROUGE - 1, 2, L. • Baselines: Baseline Model Description QR-BERTSUM BERTSUM model via only incorporating query relevance. BERTSUM XSUM Pre-trained BERTSUM model on XSUM dataset without any fine-tuning. RSA-QFS The result of the RSA-QFS model mentioned in [Baumel et al., 2018]. DDA The result of the DDA model mentioned in [Nema et al., 2017]. DDA (Original dataset) We run the DDA model in the Original version of Debatepedia. DDA (Augmented dataset) We run the DDA model in the Augmented version of Debatepedia. 18

  19. Results and Analyses 19

  20. Results Here, ‘Recall’, ‘Precision’, and ‘F 1 ’ are denoted by ‘R’, ‘P’, and ‘F’ respectively. ‘*’ denotes our implementation of the DDA model. Models ROUGE 1 ROUGE 2 ROUGE L R P F R P F R P F QR-BERTSUM 22.31 35.68 26.42 9.94 16.73 11.90 21.22 33.85 25.09 BERTSUM XSUM 17.36 11.48 13.32 3.03 2.47 2.75 14.96 9.88 11.46 RSA-QFS [Baumel et al.] 53.09 - - 16.10 - - 46.18 - - DDA [Nema et al.] 41.26 - - 18.75 - - 40.43 - - DDA* (Original Dataset) 7.52 7.67 7.35 2.83 2.88 2.84 7.13 7.54 7.24 DDA* (Augmented Dataset) 37.80 47.38 40.49 27.55 33.74 29.37 37.27 46.68 39.90 Our Model: QR-BERTSUM-TL 57.96 60.44 58.50 45.20 46.11 45.47 57.05 59.33 57.73 An improvement of 9.17% , and 23.54% in terms of ROUGE-1, and ROUGE-L • respectively over RSA-QFS + PGN. A huge gain in terms of ROUGE-2 compared to the previous models, with an improvement • of 141.67% from DDA and an improvement of 180.75% over RSA-QFS + PGN. 20

Recommend


More recommend