CLASSY Summarization-- English and Beyond Judith D. Schlesinger - PowerPoint PPT Presentation

CLASSY Summarization-- English and Beyond Judith D. Schlesinger John M. Conroy IDA Center for Computing Sciences Joint Work with Jeff Kubina, DOD Dianne P . O’Leary, University of Maryland

Overview • Linguistic Processing – Guided Summarization – Multi-lingual Summarization – Future Tasks • Scoring and Selection – Guided Summarization – Multi-lingual Summarization – Future Tasks

Guided Summarization ‏ Linguistic Processing • Tasks • Classify sentences: -1, 0, 1 • Sentence split: FASST-E • Tokenize and trim • Query term generation

Guided Summarization Linguistic Processing (cont.) • Basically very stable – Changing only to correct errors or to handle new situations • But … – Error in “clean” data – Others

Multi-lingual Summarization Linguistic Processing • New: 2 variations for other languages – Based on FASST-E – upper/lower case alphabets; single case only – Growing pain errors • Missed splits after numbers • New formats...new problems – Datelines, including English – Catch-22 on how to handle

Linguistic Processing Future Tasks • Strengthen non-English sentence splitters – 2 nd pass for datelines, quotes, short sentences, etc. • Non-English trimming – Lead phrases ‏ – Other trims???? • English: Anaphora resolution

Questions???

• Examples of new dateline formats – Tuesday, July 18, 2005 – Meadow Lake, Saskatchewan -- – On same line as following text

Human Summary Space P ( t | τ ) Cluster of τ Probability that a human Docs will include term t in a τ summary on topic and an estimate. P ( t | τ ) ˆ

General Recipe 1. Estimate probability that a term (bigram) will be included by a human. 2. Optionally project term sentence matrix to be orthogonal to previously generated summary. 3. Select a non-redundant subset of sentences with high density of terms likely chosen by a human. 4. Order the sentences to improve flow (approximate TSP).

Submission 25 qs ρ ( t | τ ) = α q q ( t ) + α s s ( t ) + α ρ ρ ( t ) P ⎧ ⎪ 1 if t is a signature [query] term s ( t )[ q ( t )] = ⎨ 0 if t is not a signature [query] term ⎪ ⎩ ρ ( t | τ ) = probability t occurs in a sentence considered for selection. Followed by non-negative QR, knapsack to insure 100 words or less, and the approximate TSP to improve flow. Major changes: bigrams and expanded query set. Parameters set optimizing using ROUGE-2 and ROUGE-SU4 as well as nouveu variants for updates.

Submission 42 4 ∑ NB ( t | τ ) = P 4 P ( i i | f 1 , f 2 ) i = 0 P ( i | f 1 , f 2 ) = Bayes posterior prob that i humans would include a term whose features are f 1 and f 2 . Intitial Summaries: 1 = log( p − value used in signature term computation A f 1 2 = TextRank of term t . f A B / f 2 1 = log( f 2 Update Summaries: f B A ). Low scoring non-query terms removed to compute TextRank. Followed by non-negative QR, knapsack to insure 100 words or less, and an approximate TSP to improve flow. Major changes: bigrams and expanded query set. Trained on TAC 2010 using naïve Bayes, normal approximation.

Results Submission Resp. Pyr. Read. ROUGE-2 Rank (#humans beat) 25 Set A 1 10 6 3 (7) 25 Set B 3 4 2 2 (4) 42 Set A 18 28 9 9 (5) 42 Set B 17 26 9 15 (1)

A View of the Results

View of the Update Results

Multi-lingual Task Goal: Develop a language independent summarizer. Approach: 1. Collect a background model for each target language(Wiki news). 2. Compute language independent features. 3. Train a naïve Bayes classifier on DUC 2005-2007 to compute P NB ( t | τ ) 4. Use binary integer linear program to achieve a maximum covering (better than non-negative QR > 100 words).

Features 1. log( p ) p -value of Dunning (signature term) G-statistic. 2. Sentence TextRank; terms with p -value<0.001 are included. (Auto-stop list.) 3. log( P ( t j | S 0 )); log probability that a term occurs in a sentence in the cluster of documents to be summarized. 4. log( P ( t j | S 1 )); log probability that a term occurs in a sentence with 1 or more signature term in the cluster of documents to be summarized.

Multilingual Results

Things to Do  Investigate further why ML failed to do as well.  Investigate to what extent current features are language independent.  Further use of pairwise testing to determine best approach. (See Peter Rankel’s talk.)

CLASSY Summarization-- English and Beyond Judith D. Schlesinger - PowerPoint PPT Presentation

CLASSY Summarization-- English and Beyond Judith D. Schlesinger John M. Conroy IDA Center for Computing Sciences Joint Work with Jeff Kubina, DOD Dianne P . OLeary, University of Maryland Overview Linguistic Processing Guided

ACL19 Summarization Xiachong Feng Papers Multi-Document Summarization Scientific Paper

Document Summarization Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC

4 English I CP or Honors Credits English II CP or Honors of English III CP or

Automatic Summarization (and other stuff) Taylor Berg-Kirkpatrick CS 288 UC Berkeley

Overview of TAC 2011 Summarization Track Karolina Owczarzak, Hoa Trang Dang National Institute of

A Neural Attention Model for Sentence Summarization Alexander M. Rush, Sumit Chopra, Jason

Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC Berkeley Document

Movie Summarization and Movie Summarization and Skimming Demonstrator Skimming Demonstrator

Alternative Summarization: Abstraction, Reviews & Speech Ling 573 Systems and Applications

Tutorial on Abstractive Text Summarization Advaith Siddharthan NLG Summer School, Aberdeen, 22

linking, cross-lingual entity linking) TAC 2011 Summarization Track Guided Summarization task

Get To The Point: Summarization with Pointer-Generator Networks Abigail See* Peter J. Liu

A Neural Attention Model for Abstractive Sentence Summarization Alexander Rush Sumit Chopra

Recent Advances in Automatic Speech Summarization Sadaoki Furui Department of Computer Science

Alternative Perspectives on Summarization Systems & Applications Ling 573 May 25, 2017

Summarization: Overview Ling573 Systems & Applications April 2, 2015 Roadmap

DavidsonMissouriWestern FoundationalAdvancesinBiology

2015 Season In many ways, racing is the perfect marketing platform. Its a sport that is all

System Marianna dr. Szabn dr. Gl 25th of February 2016. Governmental commitment to

Project FACTAGE MIKKEL BARSLUND Centre for European Policy Studies (CEPS) 5 11 Longer working

Bitcoin Core v0.17 john newbery 20th August 2018 github.com/jnewbery

OUTLINE Problems Solutions Differences Difficulties What have done up to date ?

S8242 AI FOR COMPUTATIONAL SCIENCE Yang Juntao, 26th March, 2018 Introduction Nvidia AI

SECTOR MEETINGS 2014 AGENDA SECTOR MEETING DATES SECTOR DATE TIME

Sambuz

Useful Links

Newsletter

Mail Us

CLASSY Summarization-- English and Beyond Judith D. Schlesinger - PowerPoint PPT Presentation

CLASSY Summarization-- English and Beyond Judith D. Schlesinger John M. Conroy IDA Center for Computing Sciences Joint Work with Jeff Kubina, DOD Dianne P . OLeary, University of Maryland Overview Linguistic Processing Guided

ACL19 Summarization Xiachong Feng Papers Multi-Document Summarization Scientific Paper

Document Summarization Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC

4 English I CP or Honors Credits English II CP or Honors of English III CP or

Automatic Summarization (and other stuff) Taylor Berg-Kirkpatrick CS 288 UC Berkeley

Overview of TAC 2011 Summarization Track Karolina Owczarzak, Hoa Trang Dang National Institute of

A Neural Attention Model for Sentence Summarization Alexander M. Rush, Sumit Chopra, Jason

Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC Berkeley Document

Movie Summarization and Movie Summarization and Skimming Demonstrator Skimming Demonstrator

Alternative Summarization: Abstraction, Reviews &amp; Speech Ling 573 Systems and Applications

Tutorial on Abstractive Text Summarization Advaith Siddharthan NLG Summer School, Aberdeen, 22

linking, cross-lingual entity linking) TAC 2011 Summarization Track Guided Summarization task

Get To The Point: Summarization with Pointer-Generator Networks Abigail See* Peter J. Liu

A Neural Attention Model for Abstractive Sentence Summarization Alexander Rush Sumit Chopra

Recent Advances in Automatic Speech Summarization Sadaoki Furui Department of Computer Science

Alternative Perspectives on Summarization Systems &amp; Applications Ling 573 May 25, 2017

Summarization: Overview Ling573 Systems &amp; Applications April 2, 2015 Roadmap

DavidsonMissouriWestern FoundationalAdvancesinBiology

2015 Season In many ways, racing is the perfect marketing platform. Its a sport that is all

System Marianna dr. Szabn dr. Gl 25th of February 2016. Governmental commitment to

Project FACTAGE MIKKEL BARSLUND Centre for European Policy Studies (CEPS) 5 11 Longer working

Bitcoin Core v0.17 john newbery 20th August 2018 github.com/jnewbery

OUTLINE Problems Solutions Differences Difficulties What have done up to date ?

S8242 AI FOR COMPUTATIONAL SCIENCE Yang Juntao, 26th March, 2018 Introduction Nvidia AI

SECTOR MEETINGS 2014 AGENDA SECTOR MEETING DATES SECTOR DATE TIME

Sambuz

Useful Links

Newsletter

Mail Us

Alternative Summarization: Abstraction, Reviews & Speech Ling 573 Systems and Applications

Alternative Perspectives on Summarization Systems & Applications Ling 573 May 25, 2017

Summarization: Overview Ling573 Systems & Applications April 2, 2015 Roadmap