The LIA Summarization Systems at DUC 2007 - PowerPoint PPT Presentation

The LIA Summarization Systems at DUC 2007 florian.boudin@univ-avignon.fr Laboratoire Informatique d’Avignon, France co-authors : Frédéric Béchet, Marc El-Bèze, Benoit Favre, Laurent Gillard and Juan-Manuel Torres-Moreno April 26, 2007 LIA Summarizers at DUC'07 0

Outline • Main task – Using a fusion process ? – Results – Discussion • Update task – Cosine maximization-minimization approach – Novelty boosting – Results – Discussion April 26, 2007 LIA Summarizers at DUC'07 1

Main Task April 26, 2007 LIA Summarizers at DUC'07 2

How is it working • Use of several different summarizers as sentence selection components April 26, 2007 LIA Summarizers at DUC'07 3

Using a fusion process ? • Successful in other domains – Classification – Speaker Recognition • Robustness – Small training dataset • Reliability – Smoothing system performance variations April 26, 2007 LIA Summarizers at DUC'07 4

More summarizers • 5 systems in 2006, 7 systems in 2007 – (S1) MMR+LSA (2006 & 2007) – (S2) Neo-Cortex (2006 & 2007) – (S3) n-term with variable length insertion (2006 & 2007) – (S4) LNU*LTC (2007) – (S5) Okapi similarity (2007) – (S6) Prosit similarity (2007) – (S7) Compactness score (2006 & 2007) – (S8) Passage retrieval (2006) April 26, 2007 LIA Summarizers at DUC'07 5

Fusion strategy • Combining each system output – Ranked sentence lists • Building a sentence graph – Sentences weighted according to their ranks and scores • Output summary – The best path in the graph April 26, 2007 LIA Summarizers at DUC'07 6

Post-processing • Person name rewriting • Acronym rewriting • Redundancy removal – word overlap • Fusion, a second pass – New sentence lengths, redundancy and rewriting are backpropagated April 26, 2007 LIA Summarizers at DUC'07 7

Results Comparison between 2006 and 2007 April 26, 2007 LIA Summarizers at DUC'07 8

Automatic evaluation Fusion 7 systems Without fusion April 26, 2007 LIA Summarizers at DUC'07 9

Manual evaluation (1) DUC 2006 DUC 2007 2.78 2.933 Mean is 2.542 Mean is 2.61 Standard deviation of 0.288 Standard deviation of 0.462 April 26, 2007 LIA Summarizers at DUC'07 10

Manual evaluation (2) • Linguistic quality scores of our submission in 2006 and 2007 • Unchanged linguistic processing module • Small difference between the two evaluations April 26, 2007 LIA Summarizers at DUC'07 11

Fusion - Conclusions • Outperforms the best system • Prevent overfitting • Toolkits available (we use the AT&T FSM toolkit) • Flexible • Parameter tuning using a development corpus April 26, 2007 LIA Summarizers at DUC'07 12

Update Task April 26, 2007 LIA Summarizers at DUC'07 13

Principle • Based on a very simple user-focused Multi-Document Summarizer (MDS) – Similarity with topic • Added features: – Cross summaries redundancy removal •Cosine maximization-minimization – Novelty boosting •Topic enrichment April 26, 2007 LIA Summarizers at DUC'07 14

How is it working April 26, 2007 LIA Summarizers at DUC'07 15

A simple user-oriented MDS • Documents segmented in sentences • Sentences filtered and stemmed • Each sentence is scored in relation to the topic – cosinus angle written – tf.idf weights • Drawbacks – Summaries do not inform the reader of new facts • Cross summaries redundancy removal techniques • Novelty boosting April 26, 2007 LIA Summarizers at DUC'07 16

Two-step cosine maximization-minimization (1) • Improved sentence scoring method – Cross summaries redundancy removal sentence|topic Sentence|early summaries April 26, 2007 LIA Summarizers at DUC'07 17

Two-step cosine maximization-minimization (2) • Limits – All sentences are scored in relation to the same topic •Selected sentences are syntactically related – Force irrelevant sentences to enter the summary  Propose a novelty boosting technique April 26, 2007 LIA Summarizers at DUC'07 18

Novelty boosting • Point summary to the major cluster novelty – Novelty in comparison to early clusters – Extraction of high weighted term lists • Topic enrichment using the unique terms Enrichment Early clusters’s Bag of words Bag of words boost April 26, 2007 LIA Summarizers at DUC'07 19

Example (Novelty boosting for cluster C summary) Extracted High-weighted terms xxxxxx xxxxxx A xxxxxx … Unique Terms xxxxxx Summarization xxxxxx Clusters + xxxxxx B Topic xxxxxx xxxxxx xxxxxx engine … … xxxxxx xxxxxx C xxxxxx … April 26, 2007 LIA Summarizers at DUC'07 20

Summary construction (1) • Arranging the most high scored sentences • No special order within the summary • Limit of 100 words  high probability of truncated last sentence • Propose a better last sentence selection method April 26, 2007 LIA Summarizers at DUC'07 21

Summary construction (2) Last sentence selection method : – If remaining word number > 5 •After-last preferred if – Length 1/3 shorter – Score greater than a threshold » threshold obtained empirically •Otherwise truncate sentence – Else produce non-optimal sized summary April 26, 2007 LIA Summarizers at DUC'07 22

Post-processing (1) • Within summary redundancy removal – Cosine similarity with threshold – Threshold obtained empirically (~ 0.4) • Sentence Rewriting techniques – Person name rewriting • Vice President Al Gore … • … Al Gore … April 26, 2007 LIA Summarizers at DUC'07 23

Post-processing (2) • Sentence Rewriting techniques – Acronym rewriting • Massachusetts Institute of Technology … • … MIT … – Link words removal, say clauses removal • Moreover, the president is ... • ... said the judge. – Cleanup punctuation April 26, 2007 LIA Summarizers at DUC'07 24

Experiments (1) Automatic evaluations (ROUGE-2 and SU4) in relation to the number of extracted terms •Novelty boosting introduces « noise » •Enhances the readability April 26, 2007 LIA Summarizers at DUC'07 25

Experiments (2) Automatic evaluations (ROUGE-2 and SU4) for each cluster of documents (A~10, B~8 and C~7 articles) •Enhances system stability and reliability •Non-optimal enrichment •Slight decrease with cluster B April 26, 2007 LIA Summarizers at DUC'07 26

Results at DUC 2007 April 26, 2007 LIA Summarizers at DUC'07 27

Results (1) The correlation between automatic evaluations (ROUGE-2 and SU4) and responsiveness scores •Responsiveness score 2.633 •mean = 2.32 •Standard deviation = 0.35 •Poor sentence rewriting April 26, 2007 LIA Summarizers at DUC'07 28

Results (2) Automatic evaluations (Basic Elements) for each system at DUC 2007 •BE score 0.0546 •Mean = 0.0409 •Standard deviation = 0.0139 April 26, 2007 LIA Summarizers at DUC'07 29

Conclusion • Very simple approach • Summary quality enhanced across time • Novelty boosting •Helps preventing within redundancy •Introduces “noise” • Language Independent April 26, 2007 LIA Summarizers at DUC'07 30

What’s next ? • Enhance cross summaries redundancy removal process – Change granularity •Considering previous sentences instead of summaries • Dynamic novelty boosting • Improve sentence rewriting techniques April 26, 2007 LIA Summarizers at DUC'07 31

Thank You ! Florian.boudin@univ-avignon.fr co-authors : Frédéric Béchet, Marc El-Bèze, Benoit Favre, Laurent Gillard and Juan-Manuel Torres-Moreno This work was partially supported by the Laboratoire de chimie organique de synthèse , FUNDP ( Facultés Universitaires Notre-Dame de la Paix ), Namur, Belgium April 26, 2007 LIA Summarizers at DUC'07 32

The LIA Summarization Systems at DUC 2007 - PowerPoint PPT Presentation

The LIA Summarization Systems at DUC 2007 florian.boudin@univ-avignon.fr Laboratoire Informatique dAvignon, France co-authors : Frdric Bchet, Marc El-Bze, Benoit Favre, Laurent Gillard and Juan-Manuel Torres-Moreno April 26, 2007

ACL19 Summarization Xiachong Feng Papers Multi-Document Summarization Scientific Paper

Document Summarization Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC

Overview of TAC 2011 Summarization Track Karolina Owczarzak, Hoa Trang Dang National Institute of

A Neural Attention Model for Sentence Summarization Alexander M. Rush, Sumit Chopra, Jason

Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC Berkeley Document

Automatic Summarization (and other stuff) Taylor Berg-Kirkpatrick CS 288 UC Berkeley

The case of LIA-Libri Italiani Accessibili Cristina Mussinelli, Secretary General of Fondazione

Alternative Perspectives on Summarization Systems & Applications Ling 573 May 25, 2017

Alternative Summarization: Abstraction, Reviews & Speech Ling 573 Systems and Applications

Summarization: Overview Ling573 Systems & Applications April 2, 2015 Roadmap

Movie Summarization and Movie Summarization and Skimming Demonstrator Skimming Demonstrator

Get To The Point: Summarization with Pointer-Generator Networks Abigail See* Peter J. Liu

A Neural Attention Model for Abstractive Sentence Summarization Alexander Rush Sumit Chopra

Tutorial on Abstractive Text Summarization Advaith Siddharthan NLG Summer School, Aberdeen, 22

Recent Advances in Automatic Speech Summarization Sadaoki Furui Department of Computer Science

linking, cross-lingual entity linking) TAC 2011 Summarization Track Guided Summarization task

Multicriteria optimization of force field models for molecular simulation of interfacial and bulk

Multicriteria optimization of molecular force field models M. T. Horsch, 1 K. Stbener, 1, 2 S.

Harmonic measure, Riesz transforms, and uniform rectifiability Xavier Tolsa 12 May 2017 X.

Low-energy local density of states of the 1D Hubbard model Imke Schneider, Institut fr

Local null-controllability of the 2-D Vlasov-Navier-Stokes system Iv an Moyano CMLS,

CAR' 2010 Soraya Arias Florine Boudin Roger Pissard-Gibollet Daniel Simon 2 Orccad : status

VERIFICATION OF DEEP NEURAL NETWORKS SOURADEEP DUTTA, SUSMIT JHA # , SRIRAM SANKARANARAYANAN,

SUMMARY OF RECENT RHIC RESULTS THEODORE KOBLESKY UNIVERSITY OF COLORADO BOULDER MPI@LHC 2016

The LIA Summarization Systems at DUC 2007 - PowerPoint PPT Presentation

The LIA Summarization Systems at DUC 2007 florian.boudin@univ-avignon.fr Laboratoire Informatique dAvignon, France co-authors : Frdric Bchet, Marc El-Bze, Benoit Favre, Laurent Gillard and Juan-Manuel Torres-Moreno April 26, 2007

ACL19 Summarization Xiachong Feng Papers Multi-Document Summarization Scientific Paper

Document Summarization Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC

Overview of TAC 2011 Summarization Track Karolina Owczarzak, Hoa Trang Dang National Institute of

A Neural Attention Model for Sentence Summarization Alexander M. Rush, Sumit Chopra, Jason

Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC Berkeley Document

Automatic Summarization (and other stuff) Taylor Berg-Kirkpatrick CS 288 UC Berkeley

The case of LIA-Libri Italiani Accessibili Cristina Mussinelli, Secretary General of Fondazione

Alternative Perspectives on Summarization Systems &amp; Applications Ling 573 May 25, 2017

Alternative Summarization: Abstraction, Reviews &amp; Speech Ling 573 Systems and Applications

Summarization: Overview Ling573 Systems &amp; Applications April 2, 2015 Roadmap

Movie Summarization and Movie Summarization and Skimming Demonstrator Skimming Demonstrator

Get To The Point: Summarization with Pointer-Generator Networks Abigail See* Peter J. Liu

A Neural Attention Model for Abstractive Sentence Summarization Alexander Rush Sumit Chopra

Tutorial on Abstractive Text Summarization Advaith Siddharthan NLG Summer School, Aberdeen, 22

Recent Advances in Automatic Speech Summarization Sadaoki Furui Department of Computer Science

linking, cross-lingual entity linking) TAC 2011 Summarization Track Guided Summarization task

Multicriteria optimization of force field models for molecular simulation of interfacial and bulk

Multicriteria optimization of molecular force field models M. T. Horsch, 1 K. Stbener, 1, 2 S.

Harmonic measure, Riesz transforms, and uniform rectifiability Xavier Tolsa 12 May 2017 X.

Low-energy local density of states of the 1D Hubbard model Imke Schneider, Institut fr

Local null-controllability of the 2-D Vlasov-Navier-Stokes system Iv an Moyano CMLS,

CAR' 2010 Soraya Arias Florine Boudin Roger Pissard-Gibollet Daniel Simon 2 Orccad : status

VERIFICATION OF DEEP NEURAL NETWORKS SOURADEEP DUTTA*, SUSMIT JHA # , SRIRAM SANKARANARAYANAN*,

SUMMARY OF RECENT RHIC RESULTS THEODORE KOBLESKY UNIVERSITY OF COLORADO BOULDER MPI@LHC 2016

Alternative Perspectives on Summarization Systems & Applications Ling 573 May 25, 2017

Alternative Summarization: Abstraction, Reviews & Speech Ling 573 Systems and Applications

Summarization: Overview Ling573 Systems & Applications April 2, 2015 Roadmap

VERIFICATION OF DEEP NEURAL NETWORKS SOURADEEP DUTTA, SUSMIT JHA # , SRIRAM SANKARANARAYANAN,