A Neural Attention Model for Abstractive Sentence Summarization - PowerPoint PPT Presentation

A Neural Attention Model for Abstractive Sentence Summarization Alexander Rush Sumit Chopra Jason Weston Facebook AI Research Harvard SEAS Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 1 / 42

Sentence Summarization Source Russian Defense Minister Ivanov called Sunday for the creation of a joint front for combating global terrorism. Target Russia calls for joint front against terrorism. Summarization Phenomena: Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 2 / 42

Sentence Summarization Source Russian Defense Minister Ivanov called Sunday for the creation of a joint front for combating global terrorism. Target Russia calls for joint front against terrorism. Summarization Phenomena: Generalization Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 2 / 42

Sentence Summarization Source Russian Defense Minister Ivanov called Sunday for the creation of a joint front for combating global terrorism. Target Russia calls for joint front against terrorism. Summarization Phenomena: Generalization Deletion Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 2 / 42

Sentence Summarization Source Russian Defense Minister Ivanov called Sunday for the creation of a joint front for combating global terrorism. Target Russia calls for joint front against terrorism. Summarization Phenomena: Generalization Deletion Paraphrase Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 2 / 42

Types of Sentence Summary [Not Standardized] Compressive : deletion-only Russian Defense Minister Ivanov called Sunday for the creation of a joint front for combating global terrorism. Extractive : deletion and reordering Abstractive : arbitrary transformation Russia calls for joint front against terrorism. Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 3 / 42

Elements of Human Summary Jing 2002 Phenomenon Abstract Compress Extract (1) Sentence Reduction � � � (2) Sentence Combination � � � (3) Syntactic Transformation � � (4) Lexical Paraphrasing � (5) Generalization or Specification � (6) Reordering � � Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 4 / 42

Related Work: Ext/Abs Sentence Summary Syntax-Based [Dorr, Zajic, and Schwartz 2003; Cohn and Lapata 2008; Woodsend, Feng, and Lapata 2010] Topic-Based [Zajic, Dorr, and Schwartz 2004] Machine Translation-Based [Banko, Mittal, and Witbrock 2000] Semantics-Based [Liu et al. 2015] Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 5 / 42

Related Work: Attention-Based Neural MT Bahdanau, Cho, and Bengio 2014 Use attention (“soft alignment”) over source to determine next word. Robust to longer sentences versus encoder-decoder style models. No explicit alignment step, trained end-to-end. Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 6 / 42

A Neural Attention Model for Summarization Question: Can a data-driven model capture abstractive phenomenon necessary for summarization without explicit representations? Properties: Utilizes a simple attention-based neural conditional language model. No syntax or other pipelining step, strictly data-driven. Generation is fully abstractive. Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 7 / 42

Attention-Based Summarization (ABS) Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 8 / 42

Model Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 9 / 42

Summarization Model Notation: x ; Source sentence of length M with M >> N y ; Summarized sentence of length N (we assume N is given) Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 10 / 42

Summarization Model Notation: x ; Source sentence of length M with M >> N y ; Summarized sentence of length N (we assume N is given) Past work: Noisy-channel summary [Knight and Marcu 2002] arg max log p ( y | x ) = arg max log p ( y ) p ( x | y ) y y Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 10 / 42

Summarization Model Notation: x ; Source sentence of length M with M >> N y ; Summarized sentence of length N (we assume N is given) Past work: Noisy-channel summary [Knight and Marcu 2002] arg max log p ( y | x ) = arg max log p ( y ) p ( x | y ) y y Neural machine translation: Direct neural-network parameteriziation p ( y i +1 | y c , x ; θ ) ∝ exp( NN ( x , y c ; θ )) where y i +1 is the current word and y c is the context Most neural MT is non-Markovian, i.e. y c is full history (RNN, LSTM) [Kalchbrenner and Blunsom 2013; Sutskever, Vinyals, and Le 2014; Bahdanau, Cho, and Bengio 2014] Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 10 / 42

Feed-Forward Neural Language Model Bengio et al. 2003 p ( y i + 1 | x , y c ; θ ) V h U ˜ y c E y c x y c ˜ = [ Ey i − C +1 , . . . , Ey i ] , h = tanh( U˜ y c ) , p ( y i +1 | y c , x ; θ ) ∝ exp( Vh ) . Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 11 / 42

Feed-Forward Neural Language Model Bengio et al. 2003 p ( y i + 1 | x , y c ; θ ) V W h src U ˜ y c E y c x ˜ y c = [ Ey i − C +1 , . . . , Ey i ] , h = tanh( U˜ y c ) , p ( y i +1 | y c , x ; θ ) ∝ exp( Vh + W src ( x , y c )) . Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 11 / 42

Source Model 1: Bag-of-Words Model src 1 p ˜ x F y c x ˜ x = [ Fx 1 , . . . , Fx M ] , p = [1 / M , . . . , 1 / M ] , [Uniform Distribution] p ⊤ ˜ src 1 ( x , y c ) = x . Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 12 / 42

Source Model 2: Convolutional Model Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 13 / 42

Source Model 3: Attention-Based Model y ′ ˜ ˜ x c F G y c x x ˜ = [ Fx 1 , . . . , Fx M ] , y ′ ˜ = [ Gy i − C +1 , . . . , Gy i ] , c Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 14 / 42

Source Model 3: Attention-Based Model p P y ′ ˜ ˜ x c F G y c x ˜ x = [ Fx 1 , . . . , Fx M ] , y ′ ˜ = [ Gy i − C +1 , . . . , Gy i ] , c y ′ p ∝ exp( ˜ xP˜ c ) , [Attention Distribution] Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 14 / 42

Source Model 3: Attention-Based Model src 3 p ¯ x P y ′ ˜ x ˜ c F G y c x ˜ x = [ Fx 1 , . . . , Fx M ] , y ′ ˜ = [ Gy i − C +1 , . . . , Gy i ] , c y ′ p ∝ exp( ˜ xP˜ c ) , [Attention Distribution] i +( Q − 1) / 2 � ∀ i ¯ x i = ˜ x i / Q , [Local Smoothing] q = i − ( Q − 1) / 2 p ⊤ ¯ src 3 ( x , y c ) = x . Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 14 / 42

ABS Example [ � s � Russia calls] for y c y i +1 x Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 15 / 42

ABS Example [ � s � Russia calls for] joint y c y i +1 x Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 15 / 42

ABS Example [ � s � Russia calls for joint] front y c y i +1 x Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 15 / 42

ABS Example � s � [Russia calls for joint front] against y c y i +1 x Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 15 / 42

ABS Example � s � Russia [calls for joint front against] terrorism y c y i +1 x Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 15 / 42

ABS Example � s � Russia calls [for joint front against terrorism] . y c y i +1 x Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 15 / 42

Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 16 / 42

Headline Generation Training Set Graff et al. 2003; Napoles, Gormley, and Van Durme 2012 Use Gigaword dataset. Total Sentences 3.8 M Newswire Services 7 Source Word Tokens 119 M Source Word Types 110 K Average Source Length 31 . 3 tokens Summary Word Tokens 31 M Summary Word Types 69 K Average Summary Length 8 . 3 tokens Average Overlap 4 . 6 tokens Average Overlap in first 75 2 . 6 tokens Comp with [Filippova and Altun 2013] 250K compressive pairs (although Filippova et al. 2015 2 million) Training done with mini-batch stochastic gradient descent. Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 17 / 42

Generation: Beam Search russia calls for joint defense minister calls joint joint front calls terrorism russia calls for terrorism . . . Markov assumption allows for hypothesis recombination. Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 18 / 42

Extension: Extractive Tuning Low-dim word embeddings unaware of exact matches. Log-linear parameterization: N − 1 exp( α ⊤ � p ( y | x ; θ, α ) ∝ f ( y i +1 , x , y c )) . i =0 Features f : Model score (neural model) 1 Unigram overlap 2 Bigram overlap 3 Trigram overlap 4 Word out-of-order 5 Similar to rare-word issue in neural MT [Luong et al. 2015] Use MERT for estimating α as post-processing (not end-to-end) Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 19 / 42

Results Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 20 / 42

A Neural Attention Model for Abstractive Sentence Summarization - PowerPoint PPT Presentation

A Neural Attention Model for Abstractive Sentence Summarization Alexander Rush Sumit Chopra Jason Weston Facebook AI Research Harvard SEAS Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 1 / 42 Sentence Summarization

A Neural Attention Model for Sentence Summarization Alexander M. Rush, Sumit Chopra, Jason

MeanSum : A Neural Model for Unsupervised Multi-Document Abstractive Summarization Eric Chu *

Attention in NLP CS 6956: Deep Learning for NLP Overview What is attention Attention in

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Document Modeling with Document Modeling with External Attention for Sentence External Attention

Tutorial on Abstractive Text Summarization Advaith Siddharthan NLG Summer School, Aberdeen, 22

A Sentence is a Sentence is a Sentence? Zarah Weiss Introduction Parallels and Differences

SENTENCE STRUCTURE ATI TEAS ENGLISH AND LANGUAGE USAGE SENTENCE STRUCTURE Sentence Structure

Probabilistic Models of Human Sentence Experiment 1: Entropy and Sentence Length 2 Processing

Attention! 1. Definitions and behavioral effects 2. Effects on neural firing rates: Spatial

Attention Eye tracking seminar 2/19/15 Presented by Tatiana Emmanouil Outline What is

Attention, Transformer and BERT Prof. Kuan-Ting Lai 2020/6/16 Attention is All You Need! A.

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Improving Neural Abstractive Text Summarization with Prior Knowledge Gaetano Rossiello , Pierpaolo

The Attention Economy What is the attention economy? A business model where you (as the

SEQ 3 : Differentiable Sequence-to-Sequence-to-Sequence Autoencoder for Unsupervised Abstractive

Rules of Processing Short-Circuiting Rules The last rule of evaluation (for now) (define (f x)

SEO // visibility is online currency no visibility = no clicks unattractive or spammy titles

Tackling Data Scarcity in Deep Learning Anima Anandkumar & Zachary Lipton email:

Programming Languages Function-Closure Idioms Adapted from Dan Grossman's PL class, U. of

E-Fi: Evasive Wi-Fi Measures for Presenter: Carlos Bocanegra and Zhengnan Li Surviving LTE on

High-level programming on the GPU with Julia Tim Besard (@maleadt) Yet another high-level

Absolutive movement in Polynesian: Syntactic ergativity and postverbal word order variation

CS 251 Fall 2019 CS 251 Fall 2019 Principles of Programming Languages Principles of

Sambuz

Useful Links

Newsletter

Mail Us

A Neural Attention Model for Abstractive Sentence Summarization - PowerPoint PPT Presentation

A Neural Attention Model for Abstractive Sentence Summarization Alexander Rush Sumit Chopra Jason Weston Facebook AI Research Harvard SEAS Rush, Chopra, Weston (Facebook AI) Neural Abstractive Summarization 1 / 42 Sentence Summarization

A Neural Attention Model for Sentence Summarization Alexander M. Rush, Sumit Chopra, Jason

MeanSum : A Neural Model for Unsupervised Multi-Document Abstractive Summarization Eric Chu *

Attention in NLP CS 6956: Deep Learning for NLP Overview What is attention Attention in

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Document Modeling with Document Modeling with External Attention for Sentence External Attention

Tutorial on Abstractive Text Summarization Advaith Siddharthan NLG Summer School, Aberdeen, 22

A Sentence is a Sentence is a Sentence? Zarah Weiss Introduction Parallels and Differences

SENTENCE STRUCTURE ATI TEAS ENGLISH AND LANGUAGE USAGE SENTENCE STRUCTURE Sentence Structure

Probabilistic Models of Human Sentence Experiment 1: Entropy and Sentence Length 2 Processing

Attention! 1. Definitions and behavioral effects 2. Effects on neural firing rates: Spatial

Attention Eye tracking seminar 2/19/15 Presented by Tatiana Emmanouil Outline What is

Attention, Transformer and BERT Prof. Kuan-Ting Lai 2020/6/16 Attention is All You Need! A.

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Improving Neural Abstractive Text Summarization with Prior Knowledge Gaetano Rossiello , Pierpaolo

The Attention Economy What is the attention economy? A business model where you (as the

SEQ 3 : Differentiable Sequence-to-Sequence-to-Sequence Autoencoder for Unsupervised Abstractive

Rules of Processing Short-Circuiting Rules The last rule of evaluation (for now) (define (f x)

SEO // visibility is online currency no visibility = no clicks unattractive or spammy titles

Tackling Data Scarcity in Deep Learning Anima Anandkumar &amp; Zachary Lipton email:

Programming Languages Function-Closure Idioms Adapted from Dan Grossman's PL class, U. of

E-Fi: Evasive Wi-Fi Measures for Presenter: Carlos Bocanegra and Zhengnan Li Surviving LTE on

High-level programming on the GPU with Julia Tim Besard (@maleadt) Yet another high-level

Absolutive movement in Polynesian: Syntactic ergativity and postverbal word order variation

CS 251 Fall 2019 CS 251 Fall 2019 Principles of Programming Languages Principles of

Sambuz

Useful Links

Newsletter

Mail Us

Tackling Data Scarcity in Deep Learning Anima Anandkumar & Zachary Lipton email: