BERTScore: Evaluating Text Generation with BERT Varsha Kishore - PowerPoint PPT Presentation

BERTScore: Evaluating Text Generation with BERT Varsha Kishore Tianyi Zhang Felix Wu Kilian Q. Weinberger Yoav Artzi

I am like I like translate ich liebe es I like it I love it I am loving it

I am like Candidate Reference I like translate ich liebe es I like it I love it I love it I am loving it 0.88/1.00 Metric

Text Generation Evaluation Metrics N-gram matching Embedding-based approaches metrics BLEU (Papineni et al., 2002) Meant 2.0 (Lo, 2017) METEOR (Banerjee & Lavie, 2005) YiSi -1 (Lo et al., 2018) ROUGE (Lin, 2004) BERTScore chrF (Popovic, 2015)

BLEU N-gram Matching Reference   The weather is cold today Candidate 1   Candidate 2   The weather is sunny today It is freezing today BLEU cannot identify synonyms BLEU gives higher score to candidate 1

BERTScore: an evaluation metric that uses BERT embeddings

BERT Transformer model pre-trained on   masked language modeling and next sentence prediction Generates word token embeddings that reflect their context

BERTScore the weather is cold today Reference the weather is cold today Pairwise cosine similarity Candidate it is freezing today Contextual it is freezing today embedding

Greedy Matching the weather Candidate is ≈ ç cold today is freezing today it ≈ ç Reference

Greedy Matching Precision Recall Match words in candidate to reference Match words in reference to candidate

Greedy Matching Precision Recall 0.713 0.713 0.515 0.858 0.858 0.796 0.796 0.913 0.913 Match words in candidate to reference Match words in reference to candidate

Greedy Matching - Aggregate Precision Recall 0.713 0.515 0.858 0.796 0.713 0.858 0.796 0.913 0.913

Greedy Matching - Aggregate Precision Recall 0.713 0.515 0.858 0.796 0.713 0.858 0.796 0.913 0.913 0.759 0.820

<latexit sha1_base64="ft0u0d5gLkLIktmgMFQEAFxGQ3o=">ACRnicbZDNaxNBGMbfTW0bo61rPXoZDIghN1SUJBCQCgeo5gPyC5hdvJuMmT2g5l3S8Oy4P/mpWdv/glePLSIV2ezOWjiCwMPz/O8zMwvypU05HnfndbBg8Oj4/bDzqPHJ6dP3KdnI5MVWuBQZCrTk4gbVDLFIUlSOMk18iRSOI5W7+t8fI3ayCz9TOscw4QvUhlLwclaMzcMCG+ovPIrdsnOWfAuiDUXZeMONApZr1YsEPOMWGN/QsGVqr91udxsztej1vM2xf+FvRhe0MZu63YJ6JIsGUhOLGTH0vp7DkmqRQWHWCwmDOxYovcGplyhM0YbnBULGX1pmzONP2pMQ27t8bJU+MWSeRbSaclmY3q83/ZdOC4rdhKdO8IExFc1FcKEYZq5myubQISK2t4EJL+1YmltxyJEu+YyH4u1/eF6Pznu/1/I8X3f7oS4OjDc/hBbwCH95AHz7AIYg4Cv8gDu4d26dn84v53dTbTlbhM/gn2nBHwO+s9M=</latexit> <latexit sha1_base64="ft0u0d5gLkLIktmgMFQEAFxGQ3o=">ACRnicbZDNaxNBGMbfTW0bo61rPXoZDIghN1SUJBCQCgeo5gPyC5hdvJuMmT2g5l3S8Oy4P/mpWdv/glePLSIV2ezOWjiCwMPz/O8zMwvypU05HnfndbBg8Oj4/bDzqPHJ6dP3KdnI5MVWuBQZCrTk4gbVDLFIUlSOMk18iRSOI5W7+t8fI3ayCz9TOscw4QvUhlLwclaMzcMCG+ovPIrdsnOWfAuiDUXZeMONApZr1YsEPOMWGN/QsGVqr91udxsztej1vM2xf+FvRhe0MZu63YJ6JIsGUhOLGTH0vp7DkmqRQWHWCwmDOxYovcGplyhM0YbnBULGX1pmzONP2pMQ27t8bJU+MWSeRbSaclmY3q83/ZdOC4rdhKdO8IExFc1FcKEYZq5myubQISK2t4EJL+1YmltxyJEu+YyH4u1/eF6Pznu/1/I8X3f7oS4OjDc/hBbwCH95AHz7AIYg4Cv8gDu4d26dn84v53dTbTlbhM/gn2nBHwO+s9M=</latexit> <latexit sha1_base64="ft0u0d5gLkLIktmgMFQEAFxGQ3o=">ACRnicbZDNaxNBGMbfTW0bo61rPXoZDIghN1SUJBCQCgeo5gPyC5hdvJuMmT2g5l3S8Oy4P/mpWdv/glePLSIV2ezOWjiCwMPz/O8zMwvypU05HnfndbBg8Oj4/bDzqPHJ6dP3KdnI5MVWuBQZCrTk4gbVDLFIUlSOMk18iRSOI5W7+t8fI3ayCz9TOscw4QvUhlLwclaMzcMCG+ovPIrdsnOWfAuiDUXZeMONApZr1YsEPOMWGN/QsGVqr91udxsztej1vM2xf+FvRhe0MZu63YJ6JIsGUhOLGTH0vp7DkmqRQWHWCwmDOxYovcGplyhM0YbnBULGX1pmzONP2pMQ27t8bJU+MWSeRbSaclmY3q83/ZdOC4rdhKdO8IExFc1FcKEYZq5myubQISK2t4EJL+1YmltxyJEu+YyH4u1/eF6Pznu/1/I8X3f7oS4OjDc/hBbwCH95AHz7AIYg4Cv8gDu4d26dn84v53dTbTlbhM/gn2nBHwO+s9M=</latexit> <latexit sha1_base64="ft0u0d5gLkLIktmgMFQEAFxGQ3o=">ACRnicbZDNaxNBGMbfTW0bo61rPXoZDIghN1SUJBCQCgeo5gPyC5hdvJuMmT2g5l3S8Oy4P/mpWdv/glePLSIV2ezOWjiCwMPz/O8zMwvypU05HnfndbBg8Oj4/bDzqPHJ6dP3KdnI5MVWuBQZCrTk4gbVDLFIUlSOMk18iRSOI5W7+t8fI3ayCz9TOscw4QvUhlLwclaMzcMCG+ovPIrdsnOWfAuiDUXZeMONApZr1YsEPOMWGN/QsGVqr91udxsztej1vM2xf+FvRhe0MZu63YJ6JIsGUhOLGTH0vp7DkmqRQWHWCwmDOxYovcGplyhM0YbnBULGX1pmzONP2pMQ27t8bJU+MWSeRbSaclmY3q83/ZdOC4rdhKdO8IExFc1FcKEYZq5myubQISK2t4EJL+1YmltxyJEu+YyH4u1/eF6Pznu/1/I8X3f7oS4OjDc/hBbwCH95AHz7AIYg4Cv8gDu4d26dn84v53dTbTlbhM/gn2nBHwO+s9M=</latexit> F1 = 2 Precision · Recall Precision + Recall

Reference the weather is cold today F1 Score Candidate it is freezing today Pairwise Contextual cosine embedding similarity

Evaluation: WMT Translation Benchmark Human Metric Reference: The weather is cold today. 0.85 0.77 Candidate: It is freezing today. compute correlation Reference: The garden is nice. 0.77 0.71 Candidate: The garden was pretty. Reference: I like apples very much. 0.80 0.79 Candidate: I love apples.

Correlation Study 0.8 BLEU ITER YiSi-1 RUSE BertScore F1 0.6 Correlation 0.4 0.2 0 Czech-English German-English English-Czech English-German Language Pair

4 tasks 8 languages 363 systems

Download here :https://pypi.org/project/bert-score/ Or Just: pip install bert_score Github

BERTScore: Evaluating Text Generation with BERT Varsha Kishore - PowerPoint PPT Presentation

BERTScore: Evaluating Text Generation with BERT Varsha Kishore Tianyi Zhang Felix Wu Kilian Q. Weinberger Yoav Artzi I am like I like translate ich liebe es I like it I love it I am loving it I am like Candidate Reference I like

What can Statistical Machine Translation teach Neural Text Generation about Optimization? Graham

GANocracy Outline Background: Text Generation Latent-Variable Generation Learning

Text-to-Text Generation Katja Filippova katjaf@google.com Friday, August 19, 2011 1 This

BERT 3.0 The New BERT Wheres Ernie????? Logging into Bert BERT now uses the same style logon as

Unsupervised Machine Translation Sachin Kumar Conditional Text Generation Generate text

11-830 Computational Ethics for NLP Ethical Concerns on OpenAI Text Generation System Discussion

BERT Bidirectional Encoder Representations from Transformers Introduction What is BERT?

Text-to-Image Generation Yu Cheng Text-to-Image Synthesis Text-to-Image Synthesis

From Text Message to Research Paper: Teaching the Text Message Generation Heidi Wright Ohio

Parsing Eric McCreath Overview In this lecture we will look at: structured text, generation,

Non-Monotonic Sequential Text Generation Sean Welleck, Kiant Brantley, Hal Daum III,

Statistical Perspectives on Text-to-Text Generation Noah Smith Language Technologies Institute

Text Generation with Exemplar-based Adaptive Decoding Hao Peng, Ankur Parikh, Manaal Faruqui,

Speech Generation From Concept and from Text Martin Jansche CS 6998 2004-02-11 Components of

Intractable Peacebuilding: Evaluating a Generation of Work Across the Israeli-Palestinian Divide

Automatically Evaluating Text Coherence Using Discourse Relations Ziheng Lin , Hwee Tou Ng and

Evaluating Haptic and Auditory Guidance to Assist Blind People in Reading Printed Text Using

Learning to generate: Concept-to-text generation using machine learning Ioannis Konstas

MaskGAN: Better Text Generation via Filling in the ______ June 5, 2018 (

Controllable Response Generation Susana Benavidez Andrew Kirjner Nick Seay Mentor: Sina

Unsupervised Concept-to-text Generation with Hypergraphs Ioannis Konstas, Mirella Lapata

Architecture in Motion How Adyen achieved 100x Bert Wolters - EVP Technology bert@adyen.com

T h e P o w e r o f S u p e r g r a v i t y S o l u t i o n s Bert

GANs for Word Embeddings Akshay Budhkar and Krishnapriya Introduction GANs have shown incredible