Simple and Effective Retrieve-Edit-Rerank Text Generation Nabil - PowerPoint PPT Presentation

Simple and Effective Retrieve-Edit-Rerank Text Generation Nabil Hossain Marjan Ghazvininejad Luke Zettlemoyer Facebook AI Research Facebook AI Research University of Rochester nhossain@cs.rochester.edu

Overview • Retrieve-and-edit • Generate text using retrieved examples from training set • Uses: Summarization, Machine Translation, Conversation Generation • We apply post-generation ranking • Retrieve N examples, generate a candidate output with each • Rank these candidates • Post-ranking improves results on: • 2 Machine Translation tasks • Gigaword Summarization task

Retrieve (Gigaword) ( x ′ � , y ′ � ) Training Retrieve • 1st sentence of news article (x) -> title (y) Set • { y ′ � 1 , y ′ � 2 , y ′ � 3 } Retrieval: given x, find closest x', then obtain its title y' x Augmented Input • LUCENE (TF-IDF based) x Test [SEP] y ′ � x 1 Data [SEP] y ′ � x 2 x [SEP] y ′ � 3 • Examples: Article (x) Best retrieved (y') Title (y) factory orders for manufactured goods rose #.# u.s. factory orders us september percent in september , the commerce rises #.# percent in factory orders up department said here thursday . october #.# percent france , still high after their convincing ##-## win france poised to french keep same over new zealand have named the same team make history in #nd team for #nd test for the second test next saturday in paris . test

̂ ̂ ̂ ̂ Edit (Generate) ( x ′ � , y ′ � ) Training Module 1 Module 2 Retrieve Generate Set { y ′ � 1 , y ′ � 2 , y ′ � 3 } x Candidate Outputs Augmented Input x [SEP] y ′ � x [SEP] y ′ � y 1 x 1 Test 1 [SEP] y ′ � x x [SEP] y ′ � Data y 2 2 2 x [SEP] y ′ � x [SEP] y ′ � y 3 3 3 • For each augmented input � [SEP] � , generate � y ′ � x y i i

̂ ̂ ̂ ̂ Edit (Generate) ( x ′ � , y ′ � ) Training Module 1 Module 2 Retrieve Generate Set { y ′ � 1 , y ′ � 2 , y ′ � 3 } x Candidate Outputs Augmented Input x [SEP] y ′ � x [SEP] y ′ � y 1 x 1 Test 1 [SEP] y ′ � x x [SEP] y ′ � Data y 2 2 2 x [SEP] y ′ � x [SEP] y ′ � y 3 3 3 Article (x) Best retrieved (y') Title (y) factory orders for manufactured goods rose #.# percent in u.s. factory orders rises us september factory september , the commerce department said here thursday . #.# percent in october orders up #.# percent y 1 [SEP] y ′ � x factory orders rises #.# 1 percent in september

̂ ̂ ̂ ̂ ̂ ̂ Post-gen Rerank ( x ′ � , y ′ � ) Training Module 1 Module 2 Module 3 Retrieve Generate Set Post-Gen Rerank { y ′ � 1 , y ′ � 2 , y ′ � 3 } x Candidate Outputs Ranked Outputs Augmented Input x [SEP] y ′ � x [SEP] y ′ � y 2 y 1 x 1 Test 1 [SEP] y ′ � x x [SEP] y ′ � y 3 Data y 2 2 2 x [SEP] y ′ � x y 1 [SEP] y ′ � y 3 3 3 • Given: • Estimate:

̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ ̂ Post-gen Rerank ( x ′ � , y ′ � ) Training Module 1 Module 2 Module 3 Retrieve Generate Set Post-Gen Rerank { y ′ � 1 , y ′ � 2 , y ′ � 3 } x Candidate Outputs Ranked Outputs Augmented Input x [SEP] y ′ � x [SEP] y ′ � y 2 y 1 x 1 Test 1 [SEP] y ′ � x x [SEP] y ′ � y 3 Data y 2 2 2 x [SEP] y ′ � x y 1 [SEP] y ′ � y 3 3 3 Article (x) Best retrieved (y') Title (y) factory orders for manufactured goods rose #.# percent in u.s. factory orders rises us september factory september , the commerce department said here thursday . #.# percent in october orders up #.# percent y 2 y 1 y 3 factory orders rises #.# us september factory factory orders for good rose percent in september orders rose #.# percent #.# percent in september

Model • BPE • Transformer base • Segment Embeddings • A [RANK] token similar to [CLS] token in BERT • to estimate salience of the retrieved � y ′ � • Generate with beam = 5 [RANK]

Machine Translation • Data : EN-NL (Dutch) and EN-HU (Hungarian), from EU meetings • Current SOTA is NFR: Retrieval-based LSTM model • Uses SetSimilaritySearch for retrieval (retrieves top 3) • Our ranker: Select highest scored output from the trained MT model BLEU • Post-generation ranking amounting to extended beam search Bulté, Bram, and Arda Tezcan. "Neural Fuzzy Repair: Integrating Fuzzy Matches into Neural Machine Translation." In ACL 2019.

Gigaword Summarization • Metric: Rouge F-scores • Re 3 Sum model: LSTM, retrieve-and-edit, pre-ranking • uses 30 retrieved examples • Our ranker: select the most frequent of the 30 candidate outputs Method Rouge-1 Rouge-2 Rouge-LCS LSTM 35.01 16.55 32.42 Re 3 Sum 37.04 19.03 34.46 Transformer (Tr) 37.68 18.79 34.87 x Tr + Lucene + [SEP] y ′ � 37.51 19.15 34.86 1 Tr + Lucene + pre-rank 36.46 18.01 33.85 38.23 19.58 35.60 Tr + Luc + post-rank BiSET 39.11 19.78 36.87 Cao, Ziqiang, et al. "Retrieve, rerank and rewrite: Soft template based neural summarization." In ACL. 2018.

̂ Gigaword oracle experiments • Room for improvement with better post-ranking • use x, x ’ , y ’ , for re-ranking y

̂ Gigaword oracle experiments • Room for improvement with better post-ranking • use x, x ’ , y ’ , in post-ranking y

̂ ̂ Summary • We extended the retrieve-and-edit framework with post-generation ranking: 1. Retrieve N training set outputs y’ for input x 2. Edit each input x[SEP]y’ to produce N candidate outputs � . y 3. Re-rank � to select best ranked output y • Simple post-ranking improved results on MT and summarization • Interesting to explore better post-ranking using x, x’, y’, yhat Questions: nhossain@cs.rochester.edu

Simple and Effective Retrieve-Edit-Rerank Text Generation Nabil - PowerPoint PPT Presentation

Simple and Effective Retrieve-Edit-Rerank Text Generation Nabil Hossain Marjan Ghazvininejad Luke Zettlemoyer Facebook AI Research Facebook AI Research University of Rochester nhossain@cs.rochester.edu Overview Retrieve-and-edit

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Click to edit Master title style Click to edit Master title style Click to edit Master title

Retrieve, Rerank and Rewrite: Soft Template Based Neural Summarization Ziqiang Cao 1 Wenjie Li 1

Click to edit Master title style Click to edit Master title style Edit Master text styles Edit

Click to edit Master title style DRVR Click to edit Master title style Click to edit Master

Minimum Cost Edit Distance Edit a source string into a target string Each edit has a cost

Click to edit Master title style TSX:KGI Click to edit Master Click to edit Master text

Click to edit Master title style Click to edit Master Click to edit Master text styles

Click to edit Master title style TSX:KGI Click to edit Master Click to edit Master text

Click to edit Master title style Click to edit Master Click to edit Master text styles

Click to edit Master title style TSX:KGI Click to edit Master Click to edit Master text

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Click to edit Master title style TSX/AIM:KGI Click to edit Master Click to edit Master

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Click to edit Master text styles Click to edit Master text styles Second Level

Sequential Monte Carlo Methods Click to edit Master text styles Click to edit Master text

Retrieval of Autobiographical Information Erica Yu and Scott Fricker AAPOR May 18, 2014 All

Retrieval by Content Image Retrieval Image Retrieval Problem Large Image and video data sets

Vision and Language Representation Learning Self Supervised Pretraining and Multi-Task Learning

Incorporating External Textual Knowledge for Life Event Recognition and Retrieval NTUnlg at

TALP at GeoCLEF 2007: Using Terrier with Geographical Knowledge Filtering Daniel Ferr es and

Introduction to NLP Diyi Yang Some slides borrowed from Yulia Tsvetkov at CMU and Noah Smith at

Information Retrieval CS276: Information Retrieval and Web Search Pandu Nayak and Prabhakar

Informatics 1: Data & Analysis Lecture 16: Vector Spaces for Information Retrieval Ian Stark

Simple and Effective Retrieve-Edit-Rerank Text Generation Nabil - PowerPoint PPT Presentation

Simple and Effective Retrieve-Edit-Rerank Text Generation Nabil Hossain Marjan Ghazvininejad Luke Zettlemoyer Facebook AI Research Facebook AI Research University of Rochester nhossain@cs.rochester.edu Overview Retrieve-and-edit

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

Click to edit Master title style Click to edit Master title style Click to edit Master title

Retrieve, Rerank and Rewrite: Soft Template Based Neural Summarization Ziqiang Cao 1 Wenjie Li 1

Click to edit Master title style Click to edit Master title style Edit Master text styles Edit

Click to edit Master title style DRVR Click to edit Master title style Click to edit Master

Minimum Cost Edit Distance Edit a source string into a target string Each edit has a cost

Click to edit Master title style TSX:KGI Click to edit Master Click to edit Master text

Click to edit Master title style Click to edit Master Click to edit Master text styles

Click to edit Master title style TSX:KGI Click to edit Master Click to edit Master text

Click to edit Master title style Click to edit Master Click to edit Master text styles

Click to edit Master title style TSX:KGI Click to edit Master Click to edit Master text

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Click to edit Master title style TSX/AIM:KGI Click to edit Master Click to edit Master

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Click to edit Master text styles Click to edit Master text styles Second Level

Sequential Monte Carlo Methods Click to edit Master text styles Click to edit Master text

Retrieval of Autobiographical Information Erica Yu and Scott Fricker AAPOR May 18, 2014 All

Retrieval by Content Image Retrieval Image Retrieval Problem Large Image and video data sets

Vision and Language Representation Learning Self Supervised Pretraining and Multi-Task Learning

Incorporating External Textual Knowledge for Life Event Recognition and Retrieval NTUnlg at

TALP at GeoCLEF 2007: Using Terrier with Geographical Knowledge Filtering Daniel Ferr es and

Introduction to NLP Diyi Yang Some slides borrowed from Yulia Tsvetkov at CMU and Noah Smith at

Information Retrieval CS276: Information Retrieval and Web Search Pandu Nayak and Prabhakar

Informatics 1: Data &amp; Analysis Lecture 16: Vector Spaces for Information Retrieval Ian Stark

Informatics 1: Data & Analysis Lecture 16: Vector Spaces for Information Retrieval Ian Stark