Learning Unified Multi-Document Summarization From Collaborative - PowerPoint PPT Presentation

Learning Unified Multi-Document Summarization From Collaborative Journalism Master’s Thesis by Yasar Naci Gündüz First Referee : Prof.Dr.Benno Stein Second Referee : Prof.Dr.Andreas Jakoby 1

INTRODUCTION: New age, new habits 2

INTRODUCTION: New age, new habits 3

Introduction : How about journalism? Several research reported: Reading attention span is getting shorter ● Young generation is the least informed… ● ...and more interested in social media ● 4

Introduction : How about journalism? Several research reported: Reading attention span is getting shorter ● Young generation is the least informed… ● ...and more interested in social media ● Information Pollution: Reliable sources are more important than ever ● 5

Introduction : Our proposal Make the content: Less time consuming ● Yet still adequately informing ● Solution: Automatic Summarization 6

Introduction : Automatic Summarization for Journalism “Journalism is the activity of gathering, assessing, creating, and presenting news and information.” American Press Institute 7

Introduction : Automatic Summarization for Journalism “Journalism is the activity of gathering, assessing, creating, and presenting news and information.” Whole ● Extensive ● Unbiased ● 8

Introduction : Automatic Summarization for Journalism “Journalism is the activity of gathering, assessing, creating, and presenting news and information.” Whole ● Extensive ● Unbiased ● Solution: Multi-document Summarization 9

Introduction : Automatic Summarization for Journalism “Journalism is the activity of gathering, assessing, creating, and presenting news and information.” Extractive and Abstractive ● 10

Introduction : Automatic Summarization for Journalism “Journalism is the activity of gathering, assessing, creating, and presenting news and information.” Extractive and Abstractive ● Neural Abstractive Summarization ● Methods are generally for Single-Document ○ 11

Introduction : Automatic Summarization for Journalism “Journalism is the activity of gathering, assessing, creating, and presenting news and information.” Extractive and Abstractive ● Neural Abstractive Summarization ● Methods are generally for Single-Document ○ Unified Model : Extractive + Abstractive ● Content Selection ○ Multi-Document -> Single Document ○ 12

Dataset Unified Summarization Pipeline Experiments&Evaluation 13

Dataset 14

Dataset: What do we need? Neural Abstractive: Typically needs a dataset of thousands of documents ● i.e. CNN/Dailymail > 90k/197k (single-document dataset) ● 15

Dataset: What do we have? Multi-Document datasets are typically small ● One of the most well-known does not contain more than 60 cluster and ● 600 documents Data Source Cluster/Sample Documents Summaries DUC 2001 30 309 DUC 2002 59 567 DUC 2004 50 500 Total 139 1,376 16

Dataset: Solution We created Webis-wikinews-corpus ● One of the first of its kind... ● Large-scale ○ Multi-document ○ For the news domain ○ 17

Dataset: Source Wikimedia Projects : Wikinews & Wikipedia ● Unbiased ○ Open-source ○ Up-to-date ○ Clustered news from reliable sources ○ 18

Dataset: Construction Extract the useful information from Dump File: Article, source links, auxiliary information ● Only the pages with news sources for the Wikipedia ● 19

Dataset: Construction Retrieval: 20

Dataset: Size & Folder Structure Data Cluster/Sample Documents Source Summaries Wikinews 9,514 21,314 Wikipedia 2,174 17,807 Total 11,688 39,121 21

Unified Summarization Pipeline 22

Unified Summarization Extractive Summarization: Wikisummarizer ● Abstractive Summarization: Pointer-Generator Network [See et al., 2017] ● 23

Unified Summarization Extractive Summarization: Wikisummarizer ● A Google Brain project [Liu et al. ,2018] : Extraction from similar source (Wikipedia) ○ Abstractive Summarization: Pointer-Generator Network [See et al., 2017] ● 24

Unified Summarization Extractive Summarization: Wikisummarizer ● A Google Brain project [Liu et al. ,2018] : Extraction from similar source (Wikipedia) ○ CST: Filter out the duplication [Radev and Zhang, 2004] ○ Abstractive Summarization: Pointer-Generator Network [See et al., 2017] ● 25

Unified Summarization Extractive Summarization: Wikisummarizer ● A Google Brain project [Liu et al. ,2018] : Extraction from similar source (Wikipedia) ○ CST: Filter out the duplication [Radev and Zhang, 2004] ○ Abstractive Summarization: Pointer-Generator Network [See et al., 2017] ● 26

Unified Summarization Extractive Summarization: Wikisummarizer ● A Google Brain project [Liu et al. ,2018] : Extraction from similar source (Wikipedia) ○ CST: Filter out the duplication [Radev and Zhang, 2004] ○ Abstractive Summarization: Pointer-Generator Network [See et al., 2017] ● Solves the problems of earlier approaches such as repetitiveness, senseless sentences ○ and inaccurate facts 27

Experiments&Evaluation 28

Experiments and Evaluation: Training Models Double-abstractive ● Extractive + Abstractive Full Target ● Extractive + Abstractive Short Target ● 29

Experiments and Evaluation: Training Models Double-abstractive Trivial method ● To examine the unified model ● 30

Experiments and Evaluation: Training Models Unified Models: Extractive + Abstractive ea-full-target - Target document size : Full size ● ea-short-target - Target document size : 3 sentences ● To examine the effects of different ratio between ● input and target 31

Introduction : Automatic Summarization for Journalism “Journalism is the activity of gathering, assessing, creating, and presenting news and information.” 32

Experiments and Evaluation: Aspects “Journalism is the activity of gathering, assessing, creating, and presenting news and information.” Aspects : ● Content ○ Readability ○ 33

Experiments and Evaluation: Aspects Aspects : ● Content ○ Automatic > a state-of-the-art method exist ■ Readability ○ 34

Experiments and Evaluation: ROUGE Computer Generated Summary : the cat was found under the bed Ground-truth Summary : the cat was under the bed 35

Experiments and Evaluation: ROUGE ROUGE-N(ROUGE-1) : Overlapping n-grams > Word wise similarity ● ROUGE-L : Longest Common Subsequence > Sequence wise similarity ● 39

Experiments and Evaluation: Results Aspects : ● Content: ○ Automatic > a state-of-the-art method exist ■ ROUGE double-abstractive ea-full-target ROUGE-1 0.23 0.29 ROUGE-L 0.16 0.21 40

Experiments and Evaluation: Results Aspects : ● Content ○ Automatic > a state-of-the-art method exist ■ ROUGE double-abstractive ea-full-target ea-short-target ROUGE-1 0.23 0.29 0.54 ROUGE-L 0.16 0.21 0.49 41

Experiments and Evaluation: Aspects Aspects : ● Content ○ Automatic > a state-of-the-art method exist ■ Readability ○ 42

Experiments and Evaluation: ROUGE for readability? Computer Generated Summary : was the found under the cat Ground-truth Summary : the cat was found under the bed 1 ROUGE-1 Average_R: 0.83333 1 ROUGE-1 Average_P: 0.83333 1 ROUGE-1 Average_F: 0.83333 1 ROUGE-L Average_R: 0.50000 1 ROUGE-L Average_P: 0.50000 1 ROUGE-L Average_F: 0.50000 43

Experiments and Evaluation: ROUGE for readability? Computer Generated Summary : was the found under the cat Computer Generated Summary : he found no lights on Ground-truth Summary : the cat was found under the bed Ground-truth Summary : all of the lamps were off already when he walked into the room 1 ROUGE-1 Average_R: 0.83333 1 ROUGE-1 Average_P: 0.83333 1 ROUGE-1 Average_R: 0.07692 1 ROUGE-1 Average_F: 0.83333 1 ROUGE-1 Average_P: 0.20000 1 ROUGE-1 Average_F: 0.11111 1 ROUGE-L Average_R: 0.50000 1 ROUGE-L Average_P: 0.50000 1 ROUGE-L Average_R: 0.07692 1 ROUGE-L Average_F: 0.50000 1 ROUGE-L Average_P: 0.20000 1 ROUGE-L Average_F: 0.11111 44

Experiments and Evaluation: Aspects Aspects : ● Content ○ Automatic > a state-of-the-art method exist ■ Readability ○ ROUGE is not reliable for readability ■ Manual > There are not many automatic methods, mostly manual ■ 45

Experiments and Evaluation: Readability Aspects by DUC Grammaticality ● Non-redundancy ● Referential clarity ● Focus ● Structure and coherence ● 46

Experiments and Evaluation: Survey Grammaticality ● Non-redundancy ● Referential clarity ● Focus ● Structure and coherence ● First Survey 47

Learning Unified Multi-Document Summarization From Collaborative - PowerPoint PPT Presentation

Learning Unified Multi-Document Summarization From Collaborative Journalism Masters Thesis by Yasar Naci Gndz First Referee : Prof.Dr.Benno Stein Second Referee : Prof.Dr.Andreas Jakoby 1 INTRODUCTION: New age, new habits 2

ACL19 Summarization Xiachong Feng Papers Multi-Document Summarization Scientific Paper

Document Summarization Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC

Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC Berkeley Document

Overview of TAC 2011 Summarization Track Karolina Owczarzak, Hoa Trang Dang National Institute of

A Neural Attention Model for Sentence Summarization Alexander M. Rush, Sumit Chopra, Jason

Automatic Summarization (and other stuff) Taylor Berg-Kirkpatrick CS 288 UC Berkeley

Basics of Unified Sports Ways to get involved with Unified Sports in Ohio Ohio 1 What are

SARVAM UCS Unified Communication Server Unified Communication Server for Modern Enterprises

Movie Summarization and Movie Summarization and Skimming Demonstrator Skimming Demonstrator

Get To The Point: Summarization with Pointer-Generator Networks Abigail See* Peter J. Liu

A Neural Attention Model for Abstractive Sentence Summarization Alexander Rush Sumit Chopra

Tutorial on Abstractive Text Summarization Advaith Siddharthan NLG Summer School, Aberdeen, 22

Recent Advances in Automatic Speech Summarization Sadaoki Furui Department of Computer Science

Alternative Perspectives on Summarization Systems & Applications Ling 573 May 25, 2017

Alternative Summarization: Abstraction, Reviews & Speech Ling 573 Systems and Applications

linking, cross-lingual entity linking) TAC 2011 Summarization Track Guided Summarization task

The Case for Empiricism (with and without statistics) Kenneth Church IBM

Jephthahs Daughter CLASS 4B The Deuteronomistic History 1 3/10/20 The Documentary

(Un)Ordinary Time Matthew 14:13-21 When the gospel writers want to sum up Jesus entire

Destructors, Finalizers, and Destructors, Finalizers, and Synchronization Synchronization

Forensic Discovery Wietse Venema IBM T.J.Watson Research Hawthorne, New York, USA Overview

Lu Luke ke 14 14:1 :1-14 14

SP PHOTON DETECTION CONSORTIUM ETTORE SEGRETO 30% READINESS REVIEW NOVEMBER 13, 2018 X-ARAPUCA

The J-PARC KOTO Experiment Yau WAH Fermilab Project-X Workshop June 2012 1 K0 at To kai

Learning Unified Multi-Document Summarization From Collaborative - PowerPoint PPT Presentation

Learning Unified Multi-Document Summarization From Collaborative Journalism Masters Thesis by Yasar Naci Gndz First Referee : Prof.Dr.Benno Stein Second Referee : Prof.Dr.Andreas Jakoby 1 INTRODUCTION: New age, new habits 2

ACL19 Summarization Xiachong Feng Papers Multi-Document Summarization Scientific Paper

Document Summarization Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC

Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC Berkeley Document

Overview of TAC 2011 Summarization Track Karolina Owczarzak, Hoa Trang Dang National Institute of

A Neural Attention Model for Sentence Summarization Alexander M. Rush, Sumit Chopra, Jason

Automatic Summarization (and other stuff) Taylor Berg-Kirkpatrick CS 288 UC Berkeley

Basics of Unified Sports Ways to get involved with Unified Sports in Ohio Ohio 1 What are

SARVAM UCS Unified Communication Server Unified Communication Server for Modern Enterprises

Movie Summarization and Movie Summarization and Skimming Demonstrator Skimming Demonstrator

Get To The Point: Summarization with Pointer-Generator Networks Abigail See* Peter J. Liu

A Neural Attention Model for Abstractive Sentence Summarization Alexander Rush Sumit Chopra

Tutorial on Abstractive Text Summarization Advaith Siddharthan NLG Summer School, Aberdeen, 22

Recent Advances in Automatic Speech Summarization Sadaoki Furui Department of Computer Science

Alternative Perspectives on Summarization Systems &amp; Applications Ling 573 May 25, 2017

Alternative Summarization: Abstraction, Reviews &amp; Speech Ling 573 Systems and Applications

linking, cross-lingual entity linking) TAC 2011 Summarization Track Guided Summarization task

The Case for Empiricism (with and without statistics) Kenneth Church IBM

Jephthahs Daughter CLASS 4B The Deuteronomistic History 1 3/10/20 The Documentary

(Un)Ordinary Time Matthew 14:13-21 When the gospel writers want to sum up Jesus entire

Destructors, Finalizers, and Destructors, Finalizers, and Synchronization Synchronization

Forensic Discovery Wietse Venema IBM T.J.Watson Research Hawthorne, New York, USA Overview

Lu Luke ke 14 14:1 :1-14 14

SP PHOTON DETECTION CONSORTIUM ETTORE SEGRETO 30% READINESS REVIEW NOVEMBER 13, 2018 X-ARAPUCA

The J-PARC KOTO Experiment Yau WAH Fermilab Project-X Workshop June 2012 1 K0 at To kai

Alternative Perspectives on Summarization Systems & Applications Ling 573 May 25, 2017

Alternative Summarization: Abstraction, Reviews & Speech Ling 573 Systems and Applications