Entity Type Modeling for Multi-Document Summarization: Generating - PowerPoint PPT Presentation

Entity Type Modeling for Multi-Document Summarization: Generating Descriptive Summaries of Geo-Located Entities Ahmet Aker Natural Language Processing Group Department of Computer Science

Research question  Multi-document summarization has several challenges ─ Identifying the most relevant sentences in the documents ─ Reducing redundancy within the summary Producing a coherent summary ─ Can entity type models be used to address these challenges? 

What are entity type models?  Sets of patterns that capture the ways an entity is described in natural language Tower: Church: Volcano:  Visiting  When built  When last erupted  Location  Location  Visiting  When built  Visiting  Location  Design  Preacher  Surroundings  Purpose  Events  Height  Height  History  Status  …  …  …

Further research questions  Do entity type models exist?  How can we derive entity type models?  Do entity type models help to select relevant sentences from the documents?  Do entity type models help to reduce redundancy and lead to more coherent summaries? ─ Manual approach ─ Automatic approach

Do humans associate sets of attributes with entity types? Height, Year, Location, Designer, Entrance Fee

Investigation: Where is it located? Location When is it constructed? Year How tall is it? Height

Investigation shows Humans have an “entity type model” of what is salient regarding a certain entity type and this model informs their choice of what attributes to seek when seeing an instance of this type (Aker & Gaizauskas 2011, Aker et al. 2013) Tower: Church: Volcano:  Visiting  When built  When last erupted  Location  Location  Visiting  When built  Visiting  Location  Design  Preacher  Surroundings  Purpose  Events  Height  Height  History  Status  …  …  …

Can we derive entity type models from existing text resources such as Wikipedia articles? Height, Year, Wikipedia Location, Designer, articles Entrance Fee

Investigation ? Height, Year, Height, Year, Wikipedia Location, Location, Designer, Designer, articles Entrance Fee Entrance Fee Results show: Attributes humans associate with entity types are also found in Wikipedia articles  Entity type models can be derived from existing text resources

• 107 different entity types are extracted: village (40k), school (15k), mountain (5k), church (3k), lake (3k), etc. Wikipedia Entity Type • Accuracy: 90% Corpus Collection For each Wikipedia (Aker & Gaizauskas 2009) article Extract entity Entity Type 1 Entity Type 2 Entity Type 3 type Add article to corpus

How can we represent the entity type models  Signature words  N-gram language models  Dependency patterns

How can we represent the entity type models  Signature words Church is located 200 corpus is constructed 150  N-gram language models is located 200=0.1 is constructed 150=0.08  Dependency patterns [Entity] is [entityType] 400 was built [date] 300 [Entity] has design 200

Summary generation process WEB SEARCH Eiffel Tower, Paris, France The Eiffel Tower (French: Tour Eiffel , [ tuʁ ɛfɛl ], nickname La dame de fer , the iron woman) is an 1889 iron lattice tower located on the Champ de Mars in Paris that… Feature Extraction Sentence Sentence Preprocessing Sentence position Scoring Selection The-MDS: The-MDS: Centroid similarity Sentence splitting Selecting sentences Scoring sentences Tokenizing Query similarity From the sorted list Sorting sentences POS tagging Starter similarity Web Documents Redundancy Lemmatizing reduction Entity Type NE tagging Model

Entity type model feature  Signature words model feature  N-gram language model feature  Dependency pattern model feature

Experiments

Evaluation settings – image set Image collection contains 310 images from sites worldwide (Aker &  Gaizauskas 2010a) Eiffel Tower, Paris, France 10 web-documents Training Testing 205 105 The Eiffel Tower (French: Tour Eiffel , [ tuʁ ɛfɛl ], The Eiffel Tower (French: Tour Eiffel , [ tuʁ ɛfɛl ], nickname La dame de fer , the iron woman) is an 1889 nickname La dame de fer , the iron woman) is an 1889 iron lattice tower located on the Champ de Mars in Paris iron lattice tower located on the Champ de Mars in Paris that… that… The Eiffel Tower (French: Tour Eiffel , [ tuʁ ɛfɛl ], The Eiffel Tower (French: Tour Eiffel , [ tuʁ ɛfɛl ], nickname La dame de fer , the iron woman) is an 1889 nickname La dame de fer , the iron woman) is an 1889 iron lattice tower located on the Champ de Mars in Paris iron lattice tower located on the Champ de Mars in Paris that… that…

Evaluation settings – ROUGE evaluation  We use ROUGE (Lin, 2004) to evaluate our image captions automatically ─ Need model captions  We use model captions described in Aker & Gaizauskas (2010a) model Training Testing summaries 205 105  For comparison two baselines are generated: ─ From the top retrieved web-document (FirstDoc) ─ From the Wikipedia article (Wiki) – upper bound

Evaluation settings – Manual evaluation  We also evaluated our summaries using a readability assessment as in DUC  Five criteria approach: grammaticality, redundancy, clarity, focus and coherence  Each criterion is scored on a five point scale with high scores indicating a better result  We asked four humans to perform this task

Experimental results – ROUGE evaluation FirstDoc Wiki centroidSim sentencePos querySim starterSim SigSim LMSim DpMSim .0869 .079 .0895 .093 R2 .042 .097 .0734 .066 .0774 RSU4 .079 .14 .12 .11 .12 .137 .133 .142 .145  Entity type models help to achieve better results ─ However, this is not the case for signature words model  The representation method is also relevant  DpMSim captions are significantly better than all other automated captions (except LMSim captions)  Only moderate improvement between DpMSim and LMSim. Same with Wiki baseline captions (in RSU4)

Experimental results – ROUGE evaluation starterSim + LMSim DpMSim Wiki .093 R2 .095 .097 RSU4 .145 .145 .14 We performed different feature combinations   Best performing feature combination is starterSim + LMSim (=> DpMSim)

Experimental results – Manual evaluation starterSim + LMSim Wiki 94.3% Clarity 80% Focus 75% 92.6% coherence 70% 90.7% redundancy 60% 91.5% grammar 84% 81.6%  Table shows scores for level 5 and 4 Each score has to be read as “X% of the summaries were judged with at least 4 for  the criterion Y” There is a lot of room to improve 

 We also manually categorized the dependency patterns and use them for redundancy reduction and sentence ordering (DepCat feature) type location year background surrounding visiting

Experiments

Experimental results – ROUGE evaluation starterSim + LMSim starterSim + LMSim + DepCat Wiki .102 R2 .095 .097 RSU4 .145 .155 .14 We performed different feature combinations   Best performing feature combination is starterSim + LMSim (> DpMSim)  To the best performing feature combination we added the DepCat feature  Both R2 and RSU4 results are significantly better than Wikipedia baseline captions

Experimental results – Manual evaluation starterSim + starterSim + LMSim + Wiki LMSim DepCat 85% 94.3% Clarity 80% Focus 75% 76.4% 92.6% coherence 70% 74% 90.7% redundancy 60% 83% 91.5% grammar 84% 92% 81.6%  Table shows scores for level 5 and 4 Each score has to be read as “X% of the summaries were judged with at least 4 for  the criterion Y” Adding DepCat (for sentence ordering) helps to improve readability   In all criteria starterSim + LMSim + DepCat summaries obtain better results than starterSim + LMSim

Entity Type Modeling for Multi-Document Summarization: Generating - PowerPoint PPT Presentation

Entity Type Modeling for Multi-Document Summarization: Generating Descriptive Summaries of Geo-Located Entities Ahmet Aker Natural Language Processing Group Department of Computer Science Research question Multi-document summarization

ACL19 Summarization Xiachong Feng Papers Multi-Document Summarization Scientific Paper

Document Summarization Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC

linking, cross-lingual entity linking) TAC 2011 Summarization Track Guided Summarization task

Type Checking Grammar Rule Semantic Rule var-decl id : type-exp Insert (id.name, type-exp .

Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC Berkeley Document

Overview of TAC 2011 Summarization Track Karolina Owczarzak, Hoa Trang Dang National Institute of

A Neural Attention Model for Sentence Summarization Alexander M. Rush, Sumit Chopra, Jason

Automatic Summarization (and other stuff) Taylor Berg-Kirkpatrick CS 288 UC Berkeley

Movie Summarization and Movie Summarization and Skimming Demonstrator Skimming Demonstrator

Get To The Point: Summarization with Pointer-Generator Networks Abigail See* Peter J. Liu

A Neural Attention Model for Abstractive Sentence Summarization Alexander Rush Sumit Chopra

Tutorial on Abstractive Text Summarization Advaith Siddharthan NLG Summer School, Aberdeen, 22

Recent Advances in Automatic Speech Summarization Sadaoki Furui Department of Computer Science

Alternative Perspectives on Summarization Systems & Applications Ling 573 May 25, 2017

Alternative Summarization: Abstraction, Reviews & Speech Ling 573 Systems and Applications

Summarization: Overview Ling573 Systems & Applications April 2, 2015 Roadmap

Measuring Semantic Coherence of a Conversation Svitlana Vakulenko , Maarten de Rijke, Michael

2 nd semester ENG NGLI LISH SH LA LANG NGUAGE AGE Topic 45: Pupils book for the 11 th grade

Syntax of Eiffel: a Brief Overview EECS3311 A: Software Design Fall 2018 C HEN -W EI W ANG

Software Design, Modelling and Analysis in UML ...shall be usable... for UML Inheritance

Social communication with Covid DSA study Virtual vs. physical communication New ideas

Monte Carlo Semantics Robust Inference and Logical Pattern Processing Based on Integrated Deep

Contents What is genericity? Alternatives: Genericity and its Implementation

Outline More Java Classes class declaration static members access controls

Entity Type Modeling for Multi-Document Summarization: Generating - PowerPoint PPT Presentation

Entity Type Modeling for Multi-Document Summarization: Generating Descriptive Summaries of Geo-Located Entities Ahmet Aker Natural Language Processing Group Department of Computer Science Research question Multi-document summarization

ACL19 Summarization Xiachong Feng Papers Multi-Document Summarization Scientific Paper

Document Summarization Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC

linking, cross-lingual entity linking) TAC 2011 Summarization Track Guided Summarization task

Type Checking Grammar Rule Semantic Rule var-decl id : type-exp Insert (id.name, type-exp .

Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC Berkeley Document

Overview of TAC 2011 Summarization Track Karolina Owczarzak, Hoa Trang Dang National Institute of

A Neural Attention Model for Sentence Summarization Alexander M. Rush, Sumit Chopra, Jason

Automatic Summarization (and other stuff) Taylor Berg-Kirkpatrick CS 288 UC Berkeley

Movie Summarization and Movie Summarization and Skimming Demonstrator Skimming Demonstrator

Get To The Point: Summarization with Pointer-Generator Networks Abigail See* Peter J. Liu

A Neural Attention Model for Abstractive Sentence Summarization Alexander Rush Sumit Chopra

Tutorial on Abstractive Text Summarization Advaith Siddharthan NLG Summer School, Aberdeen, 22

Recent Advances in Automatic Speech Summarization Sadaoki Furui Department of Computer Science

Alternative Perspectives on Summarization Systems &amp; Applications Ling 573 May 25, 2017

Alternative Summarization: Abstraction, Reviews &amp; Speech Ling 573 Systems and Applications

Summarization: Overview Ling573 Systems &amp; Applications April 2, 2015 Roadmap

Measuring Semantic Coherence of a Conversation Svitlana Vakulenko , Maarten de Rijke, Michael

2 nd semester ENG NGLI LISH SH LA LANG NGUAGE AGE Topic 45: Pupils book for the 11 th grade

Syntax of Eiffel: a Brief Overview EECS3311 A: Software Design Fall 2018 C HEN -W EI W ANG

Software Design, Modelling and Analysis in UML ...shall be usable... for UML Inheritance

Social communication with Covid DSA study Virtual vs. physical communication New ideas

Monte Carlo Semantics Robust Inference and Logical Pattern Processing Based on Integrated Deep

Contents What is genericity? Alternatives: Genericity and its Implementation

Outline More Java Classes class declaration static members access controls

Alternative Perspectives on Summarization Systems & Applications Ling 573 May 25, 2017

Alternative Summarization: Abstraction, Reviews & Speech Ling 573 Systems and Applications

Summarization: Overview Ling573 Systems & Applications April 2, 2015 Roadmap