Entity Type Modeling for Multi-Document Summarization: Generating Descriptive Summaries of Geo-Located Entities Ahmet Aker Natural Language Processing Group Department of Computer Science
Research question Multi-document summarization has several challenges ─ Identifying the most relevant sentences in the documents ─ Reducing redundancy within the summary Producing a coherent summary ─ Can entity type models be used to address these challenges?
What are entity type models? Sets of patterns that capture the ways an entity is described in natural language Tower: Church: Volcano: Visiting When built When last erupted Location Location Visiting When built Visiting Location Design Preacher Surroundings Purpose Events Height Height History Status … … …
Further research questions Do entity type models exist? How can we derive entity type models? Do entity type models help to select relevant sentences from the documents? Do entity type models help to reduce redundancy and lead to more coherent summaries? ─ Manual approach ─ Automatic approach
Further research questions Do entity type models exist? How can we derive entity type models? Do entity type models help to select relevant sentences from the documents? Do entity type models help to reduce redundancy and lead to more coherent summaries? ─ Manual approach ─ Automatic approach
Do humans associate sets of attributes with entity types? Height, Year, Location, Designer, Entrance Fee
Investigation: Where is it located? Location When is it constructed? Year How tall is it? Height
Investigation shows Humans have an “entity type model” of what is salient regarding a certain entity type and this model informs their choice of what attributes to seek when seeing an instance of this type (Aker & Gaizauskas 2011, Aker et al. 2013) Tower: Church: Volcano: Visiting When built When last erupted Location Location Visiting When built Visiting Location Design Preacher Surroundings Purpose Events Height Height History Status … … …
Further research questions Do entity type models exist? How can we derive entity type models? Do entity type models help to select relevant sentences from the documents? Do entity type models help to reduce redundancy and lead to more coherent summaries? ─ Manual approach ─ Automatic approach
Can we derive entity type models from existing text resources such as Wikipedia articles? Height, Year, Wikipedia Location, Designer, articles Entrance Fee
Investigation ? Height, Year, Height, Year, Wikipedia Location, Location, Designer, Designer, articles Entrance Fee Entrance Fee Results show: Attributes humans associate with entity types are also found in Wikipedia articles Entity type models can be derived from existing text resources
• 107 different entity types are extracted: village (40k), school (15k), mountain (5k), church (3k), lake (3k), etc. Wikipedia Entity Type • Accuracy: 90% Corpus Collection For each Wikipedia (Aker & Gaizauskas 2009) article Extract entity Entity Type 1 Entity Type 2 Entity Type 3 type Add article to corpus
How can we represent the entity type models Signature words N-gram language models Dependency patterns
How can we represent the entity type models Signature words Church is located 200 corpus is constructed 150 N-gram language models is located 200=0.1 is constructed 150=0.08 Dependency patterns [Entity] is [entityType] 400 was built [date] 300 [Entity] has design 200
Further research questions Do entity type models exist? How can we derive entity type models? Do entity type models help to select relevant sentences from the documents? Do entity type models help to reduce redundancy and lead to more coherent summaries? ─ Manual approach ─ Automatic approach
Summary generation process WEB SEARCH Eiffel Tower, Paris, France The Eiffel Tower (French: Tour Eiffel , [ tuʁ ɛfɛl ], nickname La dame de fer , the iron woman) is an 1889 iron lattice tower located on the Champ de Mars in Paris that… Feature Extraction Sentence Sentence Preprocessing Sentence position Scoring Selection The-MDS: The-MDS: Centroid similarity Sentence splitting Selecting sentences Scoring sentences Tokenizing Query similarity From the sorted list Sorting sentences POS tagging Starter similarity Web Documents Redundancy Lemmatizing reduction Entity Type NE tagging Model
Entity type model feature Signature words model feature N-gram language model feature Dependency pattern model feature
Experiments
Evaluation settings – image set Image collection contains 310 images from sites worldwide (Aker & Gaizauskas 2010a) Eiffel Tower, Paris, France 10 web-documents Training Testing 205 105 The Eiffel Tower (French: Tour Eiffel , [ tuʁ ɛfɛl ], The Eiffel Tower (French: Tour Eiffel , [ tuʁ ɛfɛl ], nickname La dame de fer , the iron woman) is an 1889 nickname La dame de fer , the iron woman) is an 1889 iron lattice tower located on the Champ de Mars in Paris iron lattice tower located on the Champ de Mars in Paris that… that… The Eiffel Tower (French: Tour Eiffel , [ tuʁ ɛfɛl ], The Eiffel Tower (French: Tour Eiffel , [ tuʁ ɛfɛl ], nickname La dame de fer , the iron woman) is an 1889 nickname La dame de fer , the iron woman) is an 1889 iron lattice tower located on the Champ de Mars in Paris iron lattice tower located on the Champ de Mars in Paris that… that…
Evaluation settings – ROUGE evaluation We use ROUGE (Lin, 2004) to evaluate our image captions automatically ─ Need model captions We use model captions described in Aker & Gaizauskas (2010a) model Training Testing summaries 205 105 For comparison two baselines are generated: ─ From the top retrieved web-document (FirstDoc) ─ From the Wikipedia article (Wiki) – upper bound
Evaluation settings – Manual evaluation We also evaluated our summaries using a readability assessment as in DUC Five criteria approach: grammaticality, redundancy, clarity, focus and coherence Each criterion is scored on a five point scale with high scores indicating a better result We asked four humans to perform this task
Experimental results – ROUGE evaluation FirstDoc Wiki centroidSim sentencePos querySim starterSim SigSim LMSim DpMSim .0869 .079 .0895 .093 R2 .042 .097 .0734 .066 .0774 RSU4 .079 .14 .12 .11 .12 .137 .133 .142 .145 Entity type models help to achieve better results ─ However, this is not the case for signature words model The representation method is also relevant DpMSim captions are significantly better than all other automated captions (except LMSim captions) Only moderate improvement between DpMSim and LMSim. Same with Wiki baseline captions (in RSU4)
Experimental results – ROUGE evaluation starterSim + LMSim DpMSim Wiki .093 R2 .095 .097 RSU4 .145 .145 .14 We performed different feature combinations Best performing feature combination is starterSim + LMSim (=> DpMSim)
Experimental results – Manual evaluation starterSim + LMSim Wiki 94.3% Clarity 80% Focus 75% 92.6% coherence 70% 90.7% redundancy 60% 91.5% grammar 84% 81.6% Table shows scores for level 5 and 4 Each score has to be read as “X% of the summaries were judged with at least 4 for the criterion Y” There is a lot of room to improve
Further research questions Do entity type models exist? How can we derive entity type models? Do entity type models help to select relevant sentences from the documents? Do entity type models help to reduce redundancy and lead to more coherent summaries? ─ Manual approach ─ Automatic approach
We also manually categorized the dependency patterns and use them for redundancy reduction and sentence ordering (DepCat feature) type location year background surrounding visiting
Experiments
Experimental results – ROUGE evaluation starterSim + LMSim starterSim + LMSim + DepCat Wiki .102 R2 .095 .097 RSU4 .145 .155 .14 We performed different feature combinations Best performing feature combination is starterSim + LMSim (> DpMSim) To the best performing feature combination we added the DepCat feature Both R2 and RSU4 results are significantly better than Wikipedia baseline captions
Experimental results – Manual evaluation starterSim + starterSim + LMSim + Wiki LMSim DepCat 85% 94.3% Clarity 80% Focus 75% 76.4% 92.6% coherence 70% 74% 90.7% redundancy 60% 83% 91.5% grammar 84% 92% 81.6% Table shows scores for level 5 and 4 Each score has to be read as “X% of the summaries were judged with at least 4 for the criterion Y” Adding DepCat (for sentence ordering) helps to improve readability In all criteria starterSim + LMSim + DepCat summaries obtain better results than starterSim + LMSim
Recommend
More recommend