Outline Problem Description MASC Architecture MASC Results - PowerPoint PPT Presentation

Multiple Alternative Sentence Compressions (MASC) A Framework for Automatic Summarization Nitin Madnani, David Zajic, Bonnie Dorr Necip Fazil Ayan, Jimmy Lin University of Maryland, College Park 1

Outline • Problem Description • MASC Architecture • MASC Results • Improving Candidate Selection • Summary & Future Work 2

Problem Description • Sentence-level extractive summarization – Source sentences contain mixture of relevant/non- relevant, novel/redundant information. • Compression – Single output compression can’t provide best compression of each sentence for every user need. • Multiple Alternative Sentence Compression – Generation of multiple candidate compressions of source sentences. – Feature-based selection to choose among candidates. 3

MASC Architecture Sentence Filtering Documents Sentences HMM Hedge Sentence Trimmer Compression Topiary Candidates Candidate Summary Selection Task-Specific Features (e.g. query) 5 (Zajic et al., 2005) (Zajic et al., 2006)

HMM Hedge Architecture Sentence        Part of Speech Tagger 1 Sentence with Language models based on 242,918 AP headlines and Verb Tags stories from Tipster Corpus        Headline VERB VERB Language Model HMM Hedge Compressions Story           Language Model            1 TreeTagger (Schmid, 1994) 6

HMM Hedge Multiple Alternative Compressions • Calculate best compression at each word-length from 5 to 15 words • Calculate 5 best compressions at each word length 7

Trimmer Architecture Sentence        Sentence with Entity Tagger 1 Entity Tags        PERSON TIME EXPR Trimmer Parse Parser 2 Compressions                      1 BBN IdentiFinder (Bikel et al., 1999) 2 Charniak Parser (Charniak, 2000) 8

Multi-candidate Trimmer • How to generate multiple candidate compressions? – Use the state of the parse tree after each rule application as a candidate – Use rules that generate multiple candidates – 9 single-output rules, 3 multi-output rules • Zajic et al, 2005, 2006; Zajic 2007 9

Trimmer Rule: Root-S • Select node to be root of compression • Consider any S node with NP,VP children S 1 S S 2 CC NP VP S 3 The latest flood crest and waters were rising in state reported passed Chongqing in Yichang on the middle television Sunday southwest China reaches of the Yangtze 10

Trimmer Rule: Conjunction • Conjunction rule removes right, left or neither child. S VP NP VP CC VP Illegal injured hundreds and started fireworks of people six fires 11

Topiary Architecture Document Document Corpus Sentence        Trimmer Compressions Topic           Assignment 1            Topic Topiary Terms Candidates           1 BBN Unsupervised Topic        Detection     12

Topiary Examples DUC2004 PINOCHET: wife appealed saying he too sick to be extradited to face charges MAHATHIR ANWAR_IBRAHIM: Lawyers went to court to demand client's release – Mahathir Mohamad is the former Prime Minister of Malaysia – Anwar bin Ibrahim is a former deputy prime minister and finance minister of Malaysia, convicted of corruption in 1998 13

Selector Architecture Candidates + Query Document Features ? Document           Set            Relevance & Centrality Scorer 1 Candidates + More Features                      Cull & Sentence Rescore Selector Feature Summary Weights       1 Uniform Retrieval Architecture 14 (URA), UMD’s software infrastructure for IR tasks.

Evaluation of Headline Generation Systems 0.3 0.28 0.26 Rouge 1 Recall 0.24 0.22 0.2 0.18 0.16 First 75 UTD HMM Trimmer Topiary HMM Trimmer Topiary Topics Hedge Hedge No MASC MASC 16 DUC2004 Test Data, Rouge recall with unigrams

Evaluation of Multi-Document Summarization Systems 0.075 0.07 Rouge 2 Recall 0.065 0.06 0.055 0.05 No Compression HMM Hedge Trimmer 17 DUC2006 Test Data

Tuning Feature Weights with Δ ROUGE Initialize: S = {}, H = {} c 1 Δ 1 C ← current k-best candidates . c 2 Δ 2 . for c ∈ C . . Δ ROUGE (c) = R 2R ( S ∪ {c}) - R 2R ( S) . Add hypothesis to H . S ← S ∪ {c 1 } Hypotheses (H) c k Update remaining candidates Δ k Repeat unless | S | > L C w opt ← powell ROUGE ( H , w 0 ) … Summary( S) 19

Optimization Results Δ ROUGE ( k=10 ) R OUGE Manual 1 0.363 0.403 2 0.081 0.104 SU-4 0.126 0.154 Manual : Feature weights optimized manually to maximize R OUGE -2 Recall on the final system output Key Insights for Δ ROUGE optimization: • Uses multiple alternative sentence compressions • Directly optimizes candidate selection process. 20 DUC2007 data, all differences significant at p < 0.05

Redundancy • Candidate words can be emitted by two disparate word distributions + (1- λ ) ( ) ( ) P ( w ) = P ( w | L ) = n ( w , L ) L P ( w | S ) = n ( w , S ) S λ R EDUNDANT N ON- R EDUNDANT S = Summary, L = General English language  • Assuming candidate words are i.i.d., the redundancy feature for a given candidate is: � � � ( ) = log R ( c ) = log P ( c ) � P ( w | S ) + (1 � � ) P ( w | L ) � � � � w � c 21  Other documents in the same cluster are used to represent the general language

Incorporating Paraphrases • Redundancy uses bags-of-words to compute P(w|S) P ( w | S ) = n ( w , S ) | S | • Not useful if candidate word is a paraphrase of summary word (classified as non-redundant) • Add another bag-of-words P , such that P = { a paraphrase for w , } � w � S • Use n(w,P) for redundancy computation if n(w,S) = 0 22

Generating Paraphrases • Leverage phrase-based MT system – Use E-F correspondences extracted from word-aligned bi- text – Pivot each pair of E-F correspondence with common foreign side to get E-E correspondence – � c ( e 1 , e 2 ) = c ( e 1 , f ) c ( f , e 2 ) f • Example increased ||| climbed ||| 2.0 上升 ||| climbed ||| 1.0 climbed ||| uplifted ||| 1.0 上升 ||| increased ||| 2.0 . . . 上升 ||| uplifted ||| 1.0 . . . uplifted ||| increased ||| 2.0 • Pick most frequent correspondence for w 23

Paraphrase Results • Using paraphrases yields no significant improvements • Unrelated to the quality of the paraphrases • Anomalous cases occur extremely rarely – The original bag-of-words is sufficient to capture candidate redundancy almost all the time 24

DUC 2007 Results • Systems 7, 36 • Main: – Responsiveness = 3.089 (4 th ) – R OUGE - 2 = 0.108 (8 th ) – R OUGE - SU4 = 0.158 (11 th ) • Update: – Responsiveness = 2.800 (2 nd ) – R OUGE-2 = 0.086 (9 th ) – R OUGE-SU4 = 0.124 (8 th ) 26

Summary • MASC with feature-based candidate selection improves headline generation and shows promise for multi-document summarization. • Optimizing for Δ ROUGE provides significant improvements over previous approach • Redundancy feature works at lexical as well as document-level • Using paraphrases requires novel formulation 27

Outline Problem Description MASC Architecture MASC Results - PowerPoint PPT Presentation

Multiple Alternative Sentence Compressions (MASC) A Framework for Automatic Summarization Nitin Madnani, David Zajic, Bonnie Dorr Necip Fazil Ayan, Jimmy Lin University of Maryland, College Park 1 Outline Problem Description MASC

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

Verteilte Systeme Synchronisation I Prof. Dr. Oliver Haase 1 berblick Synchronisation 1

James VII and II New York named after James as Duke of York Queen Henrietta Maria by unknown

Goals and Preferences Alice . . . went on Would you please tell me, please, which way I ought

A Weakly Supervised Approach for Adaptive Detection of Cyberbullying Roles Bert Huang Department

What will be your mark? New Year New Beginnings Expectations for academic, behavior, and

PAVING THE WAY : EMWAs leadership in raising medical communications standards February 2016

Connecting Research and Policy in Early Psychosis Treatment Robert K. Heinssen, Ph.D., ABPP

Handout I Co C on ns st ta an nc ce e D D. . B Ba al ld dw w i in n, , P Ph