bertscore evaluating text generation with bert
play

BERTScore: Evaluating Text Generation with BERT Varsha Kishore - PowerPoint PPT Presentation

BERTScore: Evaluating Text Generation with BERT Varsha Kishore Tianyi Zhang Felix Wu Kilian Q. Weinberger Yoav Artzi I am like I like translate ich liebe es I like it I love it I am loving it I am like Candidate Reference I like


  1. BERTScore: Evaluating Text Generation with BERT Varsha Kishore Tianyi Zhang Felix Wu Kilian Q. Weinberger Yoav Artzi

  2. I am like I like translate ich liebe es I like it I love it I am loving it

  3. I am like Candidate Reference I like translate ich liebe es I like it I love it I love it I am loving it 0.88/1.00 Metric

  4. Text Generation Evaluation Metrics N-gram matching Embedding-based approaches metrics BLEU (Papineni et al., 2002) Meant 2.0 (Lo, 2017) METEOR (Banerjee & Lavie, 2005) YiSi -1 (Lo et al., 2018) ROUGE (Lin, 2004) BERTScore chrF (Popovic, 2015)

  5. BLEU N-gram Matching Reference 
 The weather is cold today Candidate 1 
 Candidate 2 
 The weather is sunny today It is freezing today BLEU cannot identify synonyms BLEU gives higher score to candidate 1

  6. BERTScore: an evaluation metric that uses BERT embeddings

  7. BERT Transformer model pre-trained on 
 masked language modeling and next sentence prediction Generates word token embeddings that reflect their context

  8. BERTScore the weather is cold today Reference the weather is cold today Pairwise cosine similarity Candidate it is freezing today Contextual it is freezing today embedding

  9. Greedy Matching the weather Candidate is ≈ ç cold today is freezing today it ≈ ç Reference

  10. Greedy Matching Precision Recall Match words in candidate to reference Match words in reference to candidate

  11. Greedy Matching Precision Recall Match words in candidate to reference Match words in reference to candidate

  12. Greedy Matching Precision Recall Match words in candidate to reference Match words in reference to candidate

  13. Greedy Matching Precision Recall Match words in candidate to reference Match words in reference to candidate

  14. Greedy Matching Precision Recall 0.713 0.713 0.515 0.858 0.858 0.796 0.796 0.913 0.913 Match words in candidate to reference Match words in reference to candidate

  15. Greedy Matching - Aggregate Precision Recall 0.713 0.515 0.858 0.796 0.713 0.858 0.796 0.913 0.913

  16. Greedy Matching - Aggregate Precision Recall 0.713 0.515 0.858 0.796 0.713 0.858 0.796 0.913 0.913 0.759 0.820

  17. <latexit sha1_base64="ft0u0d5gLkLIktmgMFQEAFxGQ3o=">ACRnicbZDNaxNBGMbfTW0bo61rPXoZDIghN1SUJBCQCgeo5gPyC5hdvJuMmT2g5l3S8Oy4P/mpWdv/glePLSIV2ezOWjiCwMPz/O8zMwvypU05HnfndbBg8Oj4/bDzqPHJ6dP3KdnI5MVWuBQZCrTk4gbVDLFIUlSOMk18iRSOI5W7+t8fI3ayCz9TOscw4QvUhlLwclaMzcMCG+ovPIrdsnOWfAuiDUXZeMONApZr1YsEPOMWGN/QsGVqr91udxsztej1vM2xf+FvRhe0MZu63YJ6JIsGUhOLGTH0vp7DkmqRQWHWCwmDOxYovcGplyhM0YbnBULGX1pmzONP2pMQ27t8bJU+MWSeRbSaclmY3q83/ZdOC4rdhKdO8IExFc1FcKEYZq5myubQISK2t4EJL+1YmltxyJEu+YyH4u1/eF6Pznu/1/I8X3f7oS4OjDc/hBbwCH95AHz7AIYg4Cv8gDu4d26dn84v53dTbTlbhM/gn2nBHwO+s9M=</latexit> <latexit sha1_base64="ft0u0d5gLkLIktmgMFQEAFxGQ3o=">ACRnicbZDNaxNBGMbfTW0bo61rPXoZDIghN1SUJBCQCgeo5gPyC5hdvJuMmT2g5l3S8Oy4P/mpWdv/glePLSIV2ezOWjiCwMPz/O8zMwvypU05HnfndbBg8Oj4/bDzqPHJ6dP3KdnI5MVWuBQZCrTk4gbVDLFIUlSOMk18iRSOI5W7+t8fI3ayCz9TOscw4QvUhlLwclaMzcMCG+ovPIrdsnOWfAuiDUXZeMONApZr1YsEPOMWGN/QsGVqr91udxsztej1vM2xf+FvRhe0MZu63YJ6JIsGUhOLGTH0vp7DkmqRQWHWCwmDOxYovcGplyhM0YbnBULGX1pmzONP2pMQ27t8bJU+MWSeRbSaclmY3q83/ZdOC4rdhKdO8IExFc1FcKEYZq5myubQISK2t4EJL+1YmltxyJEu+YyH4u1/eF6Pznu/1/I8X3f7oS4OjDc/hBbwCH95AHz7AIYg4Cv8gDu4d26dn84v53dTbTlbhM/gn2nBHwO+s9M=</latexit> <latexit sha1_base64="ft0u0d5gLkLIktmgMFQEAFxGQ3o=">ACRnicbZDNaxNBGMbfTW0bo61rPXoZDIghN1SUJBCQCgeo5gPyC5hdvJuMmT2g5l3S8Oy4P/mpWdv/glePLSIV2ezOWjiCwMPz/O8zMwvypU05HnfndbBg8Oj4/bDzqPHJ6dP3KdnI5MVWuBQZCrTk4gbVDLFIUlSOMk18iRSOI5W7+t8fI3ayCz9TOscw4QvUhlLwclaMzcMCG+ovPIrdsnOWfAuiDUXZeMONApZr1YsEPOMWGN/QsGVqr91udxsztej1vM2xf+FvRhe0MZu63YJ6JIsGUhOLGTH0vp7DkmqRQWHWCwmDOxYovcGplyhM0YbnBULGX1pmzONP2pMQ27t8bJU+MWSeRbSaclmY3q83/ZdOC4rdhKdO8IExFc1FcKEYZq5myubQISK2t4EJL+1YmltxyJEu+YyH4u1/eF6Pznu/1/I8X3f7oS4OjDc/hBbwCH95AHz7AIYg4Cv8gDu4d26dn84v53dTbTlbhM/gn2nBHwO+s9M=</latexit> <latexit sha1_base64="ft0u0d5gLkLIktmgMFQEAFxGQ3o=">ACRnicbZDNaxNBGMbfTW0bo61rPXoZDIghN1SUJBCQCgeo5gPyC5hdvJuMmT2g5l3S8Oy4P/mpWdv/glePLSIV2ezOWjiCwMPz/O8zMwvypU05HnfndbBg8Oj4/bDzqPHJ6dP3KdnI5MVWuBQZCrTk4gbVDLFIUlSOMk18iRSOI5W7+t8fI3ayCz9TOscw4QvUhlLwclaMzcMCG+ovPIrdsnOWfAuiDUXZeMONApZr1YsEPOMWGN/QsGVqr91udxsztej1vM2xf+FvRhe0MZu63YJ6JIsGUhOLGTH0vp7DkmqRQWHWCwmDOxYovcGplyhM0YbnBULGX1pmzONP2pMQ27t8bJU+MWSeRbSaclmY3q83/ZdOC4rdhKdO8IExFc1FcKEYZq5myubQISK2t4EJL+1YmltxyJEu+YyH4u1/eF6Pznu/1/I8X3f7oS4OjDc/hBbwCH95AHz7AIYg4Cv8gDu4d26dn84v53dTbTlbhM/gn2nBHwO+s9M=</latexit> F1 = 2 Precision · Recall Precision + Recall

  18. Reference the weather is cold today F1 Score Candidate it is freezing today Pairwise Contextual cosine embedding similarity

  19. Evaluation: WMT Translation Benchmark Human Metric Reference: The weather is cold today. 0.85 0.77 Candidate: It is freezing today. compute correlation Reference: The garden is nice. 0.77 0.71 Candidate: The garden was pretty. Reference: I like apples very much. 0.80 0.79 Candidate: I love apples.

  20. Correlation Study 0.8 BLEU ITER YiSi-1 RUSE BertScore F1 0.6 Correlation 0.4 0.2 0 Czech-English German-English English-Czech English-German Language Pair

  21. 4 tasks 8 languages 363 systems

  22. Download here :https://pypi.org/project/bert-score/ Or Just: pip install bert_score Github

Recommend


More recommend