Improving Neural Abstractive Text Summarization with Prior Knowledge - PowerPoint PPT Presentation

Improving Neural Abstractive Text Summarization with Prior Knowledge Gaetano Rossiello , Pierpaolo Basile, Giovanni Semeraro, Marco Di Ciano and Gaetano Grasso gaetano.rossiello@uniba.it Department of Computer Science University of Bari - Aldo Moro, Italy URANIA 16 - 1st Italian Workshop on Deep Understanding and Reasoning: A challenge for Next-generation Intelligent Agents 28 November 2016 AI*IA 16 - Genoa, Italy

Text Summarization The goal of summarization is to produce a shorter version of a source text by preserving the meaning and the key contents of the original. A well written summary can significantly reduce the amount of cognitive work needed to digest large amounts of text. Gaetano Rossiello, et al. Neural Abstractive Text Summarization

Information Overload Information overload is a problem in modern digital society caused by the explosion of the amount of information produced on both the World Wide Web and the enterprise environments. Gaetano Rossiello, et al. Neural Abstractive Text Summarization

Text Summarization - Approaches Input Single-document Multi-document Output Extractive Abstract Headline Extractive Summarization The generated summary is a selection of relevant sentences from the source text in a copy-paste fashion. Abstractive Summarization The generated summary is a new cohesive text not necessarily present in the original source. Gaetano Rossiello, et al. Neural Abstractive Text Summarization

Extractive Summarization - Methods Statistical methods Features based Machine Learning Fuzzy Logic Graph based Distributional Semantic LSA (Latent Semantic Analysis) NMF (Non-Negative Matrix Factorization) Word2Vec Gaetano Rossiello, et al. Neural Abstractive Text Summarization

Abstractive Summarization: a Challenging Task Abstractive summarization requires deep understanding and reasoning over the text, determining the explicit or implicit meaning of each element, such as words, phrases, sentences and paragraphs, and making inferences about their properties a in order to generate new sentences which compose the summary a Norvig, P.: Inference in text understanding. AAAI, 1987. – Abstractive Example – Original : Russian defense minister Ivanov called Sunday for the creation of a joint front for combating global terrorism. Summary : Russia calls for joint front against terrorism. Gaetano Rossiello, et al. Neural Abstractive Text Summarization

Deep Learning for Abstractive Text Summarization Idea Casting the summarization task as a neural machine translation problem, where the models, trained on a large amount of data, learn the alignments between the input text and the target summary through an attention encoder-decoder paradigm. Rush, A., et al. A neural attention model for abstractive sentence summarization. EMNLP 2015 Nallapati, R., et al. Sequence-to-sequence RNNs for text summarization and Beyond. CoNNL 2016 Chopra, S., et al. Abstractive Sentence Summarization with Attentive Recurrent Neural Networks. NAACL 2016 Gaetano Rossiello, et al. Neural Abstractive Text Summarization

Deep Learning for Abstractive Text Summarization 1 Rush, A., et al.: A neural attention model for abstractive sentence summarization. EMNLP 2015 Gaetano Rossiello, et al. Neural Abstractive Text Summarization

Abstractive Summarization - Problem Formulation Let us consider: Original text x = { x 1 , x 2 , . . . , x n } Summary y = { y 1 , y 2 , . . . , y m } where n >> m and x i , y j ∈ V ( V is the vocabulary) A probabilistic perspective goal The summarization problem consists in finding an output sequence y that maximizes the conditional probability of y given an input sequence x arg max y ∈ V P ( y | x ) P ( y | x ) = P ( y | x ; θ ) = � | y | t =1 P ( y t |{ y 1 , . . . , y t − 1 } , x ; θ ) where θ denotes a set of parameters learnt from a training set of source text and target summary pairs. Gaetano Rossiello, et al. Neural Abstractive Text Summarization

Recurrent Neural Networks Recurrent neural network (RNN) is a neural network model proposed in the 80’s for modelling time series. The structure of the network is similar to feedforward neural network, with the distinction that it allows a recurrent hidden state whose activation at each time is dependent on that of the previous time (cycle). Gaetano Rossiello, et al. Neural Abstractive Text Summarization

Sequence to Sequence Learning Sequence to sequence learning problem can be modeled by RNNs using a encoder-decoder paradigm. The encoder is a RNN that reads one token at time from the input source and returns a fixed-size vector representing the input text. The decoder is another RNN that generates words for the summary and it is conditioned by the vector representation returned by the first network. Gaetano Rossiello, et al. Neural Abstractive Text Summarization

Abstractive Summarization and Sequence to Sequence P ( y | x ; θ ) = P ( y t |{ y 1 , . . . , y t − 1 } , x ; θ ) = g θ ( h t , c ) where: h t = g θ ( y t − 1 , h t − 1 , c ) The vector context c is the output of the encoder and it encodes the representation of the whole input source. g θ is a RNN and it can be modeled using: Elman RNN LSTM (Long-Short Term Memory) GRU (Gated Recurrent Unit) At the time t the decoder RNN computes the probability of the word y t given the last hidden state h t and the context input c . Gaetano Rossiello, et al. Neural Abstractive Text Summarization

Limits of the State-of-the-Art Neural Models The proposed neural attention-based models for abstractive summarization are still in an early stage, thus they show some limitations: Problems in distinguish rare and unknown words Grammar errors in the generated summaries – Example – Suppose that none of two tokens 10 and Genoa belong to the vocabulary, then the model cannot distinguish the probability of the two sentences: The airport is about 10 kilometers. The airport is about Genoa kilometers. Gaetano Rossiello, et al. Neural Abstractive Text Summarization

Infuse Prior Knowledge into Neural Networks Our Idea Infuse prior knowledge , such as linguistic features , into a RNNs in order to overtake these limits. Motivation: The airport is about ? kilometers DT NN VBZ IN CD NNS where CD is the Part-of-Speech (POS) tag that identifies a cardinal number. Thus, 10 is the unknown token with the higher probability because it is tagged as CD. Introducing information about the syntactical role of each word, the neural network can tend to learn the right collocation of the words by belonging to a certain part-of-speech class. Gaetano Rossiello, et al. Neural Abstractive Text Summarization

Infuse Prior Knowledge into Neural Networks Preliminary approach: Combine hand-crafted linguistic features and embeddings as input vectors into RNNs. Substitute the softmax layer of neural network with a Log-Linear model. Gaetano Rossiello, et al. Neural Abstractive Text Summarization

Evaluation Plan - Dataset We plan to evaluate our models on gold-standard datasets for the summarization task: DUC (Document Understanding Conference) 2002-2007 1 TAC (Text Analysis Conference) 2008-2011 2 Gigaword 3 CNN/DailyMail 4 Cornell University Library 5 Local government documents 6 1 http://duc.nist.gov/ 2 http://tac.nist.gov/data/index.html 3 https://catalog.ldc.upenn.edu/LDC2012T21 4 https://github.com/deepmind/rc-data 5 https://arxiv.org/ 6 made available by InnovaPuglia S.p.A. Gaetano Rossiello, et al. Neural Abstractive Text Summarization

Evaluation Plan - Metric ROUGE (Recall-Oriented Understudy for Gisting Evaluation) ROUGE a metrics compare an automatically produced summary against a reference or a set of references (human-produced) summary. a Lin, Chin-Yew. ROUGE: a Package for Automatic Evaluation of Summaries. WAS 2004 ROUGE-N: N-gram based co-occurrence statistics. ROUGE-L: Longest Common Subsequence (LCS) based statistics. � � gramn ∈ S count match ( gram n , X ) S ∈{ Ref Summaries } ROUGE N ( X ) = � � gramn ∈ S count ( gram n ) S ∈{ Ref Summaries } Gaetano Rossiello, et al. Neural Abstractive Text Summarization

Future Works Evaluate the proposed approach by comparing it with the SOA models. Integrate relational semantic knowledge into RNNs in order to learn jointly word and knowledge embeddings by exploiting knowledge bases and lexical thesaurus. Abstractive summaries from whole documents or multiple documents. Gaetano Rossiello, et al. Neural Abstractive Text Summarization

Improving Neural Abstractive Text Summarization with Prior Knowledge - PowerPoint PPT Presentation

Improving Neural Abstractive Text Summarization with Prior Knowledge Gaetano Rossiello , Pierpaolo Basile, Giovanni Semeraro, Marco Di Ciano and Gaetano Grasso gaetano.rossiello@uniba.it Department of Computer Science University of Bari - Aldo

A Neural Attention Model for Abstractive Sentence Summarization Alexander Rush Sumit Chopra

MeanSum : A Neural Model for Unsupervised Multi-Document Abstractive Summarization Eric Chu *

A Neural Attention Model for Sentence Summarization Alexander M. Rush, Sumit Chopra, Jason

Tutorial on Abstractive Text Summarization Advaith Siddharthan NLG Summer School, Aberdeen, 22

ACL19 Summarization Xiachong Feng Papers Multi-Document Summarization Scientific Paper

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Document Summarization Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC

Get To The Point: Summarization with Pointer-Generator Networks Abigail See* Peter J. Liu

Alternative Perspectives on Summarization Systems & Applications Ling 573 May 25, 2017

Alternative Summarization: Abstraction, Reviews & Speech Ling 573 Systems and Applications

Neural Text Summarization Piji Li NLP Center, Tencent AI Lab pijili@tencent.com Paper Reading,

A Unified Model for Extractive and Abstractive Summarization using Inconsistency Loss Project

Query Focused Abstractive Summarization via Incorporating Query Relevance and Transfer Learning

Multimodal Abstractive Summarization for How2 Videos ACL19 Shru* Palaskar Jindrich

Overview of TAC 2011 Summarization Track Karolina Owczarzak, Hoa Trang Dang National Institute of

Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC Berkeley Document

Rearranging and manipulating data Dr. Nomie Becker Dr. Sonja Grath Special thanks to : Dr.

Summarizing Long First-Person Videos Kristen Grauman Department of Computer Science University

Summary Extraction on Data Streams in Embedded Systems Sebastian Buschj ager and Katharina

Dialogue Summarization Presenter: Wang Chen Mentor: Piji Li 1 Outline Introduction Task

Antiretroviral Therapy Initiation: From Guidelines to Practice: ART 101 Medical Care of

Collection, Analysis, and Use of Data to Improve Traffic Incident Management (TIM): Innovative

Principles of Software Construction: Objects, Design, and Concurrency Distributed System Design,

GSM Short Message Service GSM Short Message Service GSM Short Message Service GSM Short Message

Improving Neural Abstractive Text Summarization with Prior Knowledge - PowerPoint PPT Presentation

Improving Neural Abstractive Text Summarization with Prior Knowledge Gaetano Rossiello , Pierpaolo Basile, Giovanni Semeraro, Marco Di Ciano and Gaetano Grasso gaetano.rossiello@uniba.it Department of Computer Science University of Bari - Aldo

A Neural Attention Model for Abstractive Sentence Summarization Alexander Rush Sumit Chopra

MeanSum : A Neural Model for Unsupervised Multi-Document Abstractive Summarization Eric Chu *

A Neural Attention Model for Sentence Summarization Alexander M. Rush, Sumit Chopra, Jason

Tutorial on Abstractive Text Summarization Advaith Siddharthan NLG Summer School, Aberdeen, 22

ACL19 Summarization Xiachong Feng Papers Multi-Document Summarization Scientific Paper

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Document Summarization Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC

Get To The Point: Summarization with Pointer-Generator Networks Abigail See* Peter J. Liu

Alternative Perspectives on Summarization Systems &amp; Applications Ling 573 May 25, 2017

Alternative Summarization: Abstraction, Reviews &amp; Speech Ling 573 Systems and Applications

Neural Text Summarization Piji Li NLP Center, Tencent AI Lab pijili@tencent.com Paper Reading,

A Unified Model for Extractive and Abstractive Summarization using Inconsistency Loss Project

Query Focused Abstractive Summarization via Incorporating Query Relevance and Transfer Learning

Multimodal Abstractive Summarization for How2 Videos ACL19 Shru* Palaskar Jindrich

Overview of TAC 2011 Summarization Track Karolina Owczarzak, Hoa Trang Dang National Institute of

Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC Berkeley Document

Rearranging and manipulating data Dr. Nomie Becker Dr. Sonja Grath Special thanks to : Dr.

Summarizing Long First-Person Videos Kristen Grauman Department of Computer Science University

Summary Extraction on Data Streams in Embedded Systems Sebastian Buschj ager and Katharina

Dialogue Summarization Presenter: Wang Chen Mentor: Piji Li 1 Outline Introduction Task

Antiretroviral Therapy Initiation: From Guidelines to Practice: ART 101 Medical Care of

Collection, Analysis, and Use of Data to Improve Traffic Incident Management (TIM): Innovative

Principles of Software Construction: Objects, Design, and Concurrency Distributed System Design,

GSM Short Message Service GSM Short Message Service GSM Short Message Service GSM Short Message

Alternative Perspectives on Summarization Systems & Applications Ling 573 May 25, 2017

Alternative Summarization: Abstraction, Reviews & Speech Ling 573 Systems and Applications