Automatic Summarization Project - Deliverable 3 - Anca Burducea - PowerPoint PPT Presentation

Automatic Summarization Project - Deliverable 3 - Anca Burducea Joe Mulvey Nate Perkins May 19, 2015

Outline Deliverable 2 Summary System overview Sentence scoring Topic orientation Other methods Score combination Topic clustering Information ordering Results and conclusions

Deliverable 2 Summary ◮ MEAD style approach ◮ TF-IDF sentence scoring + redundancy reduction ◮ ROUGE scores R P F ROUGE-1 0.25909 0.30675 0.27987 ROUGE-2 0.06453 0.07577 0.06942 ROUGE-3 0.01881 0.02138 0.01992 ROUGE-4 0.00724 0.00774 0.00745

D2 system ◮ score all sentences – CS ◮ choose highest scored sentences – CS ◮ order sentences – IO

D3 system ◮ score all sentences – CS ◮ cluster sentences by their similarity – CS ◮ choose highest scored sentences from each cluster – CS ◮ order sentences using block ordering – IO

New features ◮ experimented with different methods for sentence scoring ◮ added option for combining scores ◮ added topic clustering ◮ added information ordering

Sentence scoring - Topic orientation ◮ TAC topic as query (e.g. ”Columbine Massacre”) ◮ use TF*IDF-like measure over sentences and query N +1 idf(w) = log( 0 . 5+ sf ( w ) ) rel(s | q) = � w ∈ q log ( tf w , s + 1) ∗ log ( tf w , q + 1) ∗ idf w

Sentence scoring - Topic orientation ROUGE scores: R P F ROUGE-1 0.20103 0.21993 0.20954 ROUGE-2 0.04781 0.05200 0.04968 ROUGE-3 0.01533 0.01669 0.01593 ROUGE-4 0.00689 0.00751 0.00716

Sentence scoring - Other methods We tried other sentence scoring methods: ◮ LLR ◮ sentence position ◮ document headline similarity ◮ number of NERs

Sentence scoring - Other methods We tried other sentence scoring methods: ◮ LLR ◮ sentence position ◮ document headline similarity ◮ number of NERs ... but all had low(er) scores (than our D2 results) by themselves.

Sentence scoring - Score combination ◮ scale all scores to [0,1] range ◮ linearly combine different scoring methods using weights

Sentence scoring - Score combination ◮ scale all scores to [0,1] range ◮ linearly combine different scoring methods using weights e.g. 0.5 * TF*IDF-score + 0.5 * headline-similarity-score

Topic clustering ◮ cluster sentences into at most 5 clusters using cosine similarity ◮ remove sentences that are too similar ( > 0.5) within each cluster ◮ select highest ranked sentences accross all topic clusters

Information ordering ◮ similar to Barzilay et al, 2002 ◮ sentences A and B belong to the same topic block if sim(A,B) > 0.6 ◮ for all sentence pairs (A i ,B j ), with A i from cluster(A) and B j from cluster(B): sim(A,B) = # AB + # AB #AB – #(A i ,B j ) coming from same document #AB+ – #(A i ,B j ) coming from same document & same topic

Information ordering ◮ similar to Barzilay et al, 2002 ◮ sentences A and B belong to the same topic block if sim(A,B) > 0.6 ◮ for all sentence pairs (A i ,B j ), with A i from cluster(A) and B j from cluster(B): sim(A,B) = # AB + # AB #AB – #(A i ,B j ) coming from same document #AB+ – #(A i ,B j ) coming from same document & same topic ◮ tweak: within the same topic segment = within a sentence window (of 5)

Final system ◮ sentence scoring 0.7 * TF*IDF + 0.3 * sentence position ◮ topic clustering ◮ block ordering

Results ROUGE scores: R P F ROUGE-1 0.25467 0.28628 0.26853 ROUGE-2 0.06706 0.07494 0.07052 ROUGE-3 0.02043 0.02219 0.02119 ROUGE-4 0.00642 0.00673 0.00655

Results - Comparison ROUGE R scores: LEAD D2 D3 ROUGE-1 0.19143 0.25467 0.25909 ROUGE-2 0.04542 0.06453 0.06706 ROUGE-3 0.01196 0.01881 0.02043 ROUGE-4 0.00306 0.00724 0.00642

Summary example: D2 • Japan, where whale meat is part of the traditional cuisine, reluctantly accepted a 1986 moratorium on commercial whaling by the International Whaling Commission (IWC). • "The humpback whale was almost hunted into extinction. • We’re very, very keen to see firstly, no reopening of commercial whaling, and very importantly, no scientific whaling in the future," he said. • Opponents of the plan have claimed that Japan is seeking to double to 800 the number of minke whales it will slaughter each year, and to add 50 humpback whales and 50 fin whales.

Summary example: D3 ◮ International Whaling Commission, or IWC, banned commercial whaling in 1986, but grants limited permits to countries such as Japan that maintain whaling programs for scientific purposes. ◮ Japan, where whale meat is part of culinary culture, reluctantly halted commercial whaling in line with a 1986 IWC moratorium, but the next year resumed catches under a loophole that allows "research whaling". ◮ An animal rights group on Friday lost a bid to sue a Japanese whaling company for allegedly killing hundreds of whales inside an Australian whale sanctuary. ◮ "Whaling is also part of the Japanese culture," he said.

Future improvements ◮ improve redundancy elimination inside topic clustering ◮ anaphora resolution ◮ remove temporal expressions

Automatic Summarization Project - Deliverable 3 - Anca Burducea - PowerPoint PPT Presentation

Automatic Summarization Project - Deliverable 3 - Anca Burducea Joe Mulvey Nate Perkins May 19, 2015 Outline Deliverable 2 Summary System overview Sentence scoring Topic orientation Other methods Score combination Topic clustering

Automatic Summarization (and other stuff) Taylor Berg-Kirkpatrick CS 288 UC Berkeley

ACL19 Summarization Xiachong Feng Papers Multi-Document Summarization Scientific Paper

Document Summarization Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC

Automatic Summarization Project Ling573 - Deliverable 2 Eric Garnick John T. McCranie Olga

Recent Advances in Automatic Speech Summarization Sadaoki Furui Department of Computer Science

Overview of TAC 2011 Summarization Track Karolina Owczarzak, Hoa Trang Dang National Institute of

A Neural Attention Model for Sentence Summarization Alexander M. Rush, Sumit Chopra, Jason

Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC Berkeley Document

Summarization: Overview Ling573 Systems & Applications April 2, 2015 Roadmap

Deliverable N: 6.14 Name Deliverable: Project Presentation Covering period:

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Movie Summarization and Movie Summarization and Skimming Demonstrator Skimming Demonstrator

Get To The Point: Summarization with Pointer-Generator Networks Abigail See* Peter J. Liu

A Neural Attention Model for Abstractive Sentence Summarization Alexander Rush Sumit Chopra

Tutorial on Abstractive Text Summarization Advaith Siddharthan NLG Summer School, Aberdeen, 22

Alternative Perspectives on Summarization Systems & Applications Ling 573 May 25, 2017

Full-Gradient Representation for Neural Network Visualization Suraj Srinivas Francois Fleuret

You and Your Research & The Elements of Style Philip Wadler University of Edinburgh Logic

Object Detection using NVIDIA DIGITS Customization and Modification Deep Learning Institute

Smart Contracts for Bribing Miners Patrick McCorry, Alexander Hicks , Sarah Meiklejohn University

Abstract Classes and Interfaces (?) June 21, 2017 Reading Quiz Abstract Classes A. Abstract

3: Statistical Properties of Language Machine Learning and Real-world Data (MLRD) Paula Buttery

The Media and the Public Understanding of Paleontology Keith B. Miller Department of Geology

Economics 2 Professor Christina Romer Spring 2018 Professor David Romer LECTURE 11 COMPARATIVE

Automatic Summarization Project - Deliverable 3 - Anca Burducea - PowerPoint PPT Presentation

Automatic Summarization Project - Deliverable 3 - Anca Burducea Joe Mulvey Nate Perkins May 19, 2015 Outline Deliverable 2 Summary System overview Sentence scoring Topic orientation Other methods Score combination Topic clustering

Automatic Summarization (and other stuff) Taylor Berg-Kirkpatrick CS 288 UC Berkeley

ACL19 Summarization Xiachong Feng Papers Multi-Document Summarization Scientific Paper

Document Summarization Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC

Automatic Summarization Project Ling573 - Deliverable 2 Eric Garnick John T. McCranie Olga

Recent Advances in Automatic Speech Summarization Sadaoki Furui Department of Computer Science

Overview of TAC 2011 Summarization Track Karolina Owczarzak, Hoa Trang Dang National Institute of

A Neural Attention Model for Sentence Summarization Alexander M. Rush, Sumit Chopra, Jason

Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC Berkeley Document

Summarization: Overview Ling573 Systems &amp; Applications April 2, 2015 Roadmap

Deliverable N: 6.14 Name Deliverable: Project Presentation Covering period:

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Movie Summarization and Movie Summarization and Skimming Demonstrator Skimming Demonstrator

Get To The Point: Summarization with Pointer-Generator Networks Abigail See* Peter J. Liu

A Neural Attention Model for Abstractive Sentence Summarization Alexander Rush Sumit Chopra

Tutorial on Abstractive Text Summarization Advaith Siddharthan NLG Summer School, Aberdeen, 22

Alternative Perspectives on Summarization Systems &amp; Applications Ling 573 May 25, 2017

Full-Gradient Representation for Neural Network Visualization Suraj Srinivas Francois Fleuret

You and Your Research &amp; The Elements of Style Philip Wadler University of Edinburgh Logic

Object Detection using NVIDIA DIGITS Customization and Modification Deep Learning Institute

Smart Contracts for Bribing Miners Patrick McCorry, Alexander Hicks , Sarah Meiklejohn University

Abstract Classes and Interfaces (?) June 21, 2017 Reading Quiz Abstract Classes A. Abstract

3: Statistical Properties of Language Machine Learning and Real-world Data (MLRD) Paula Buttery

The Media and the Public Understanding of Paleontology Keith B. Miller Department of Geology

Economics 2 Professor Christina Romer Spring 2018 Professor David Romer LECTURE 11 COMPARATIVE

Summarization: Overview Ling573 Systems & Applications April 2, 2015 Roadmap

Alternative Perspectives on Summarization Systems & Applications Ling 573 May 25, 2017

You and Your Research & The Elements of Style Philip Wadler University of Edinburgh Logic