category specific information
play

Category Specific Information for Guided Summarization Jun-Ping Ng - PowerPoint PPT Presentation

Exploiting Category Specific Information for Guided Summarization Jun-Ping Ng Praveen Bysani Ziheng Lin Min-Yen Kan Chew-Lim Tan National University of Singapore 1 Monday, November 14, 11 1 Outline System Overview Category


  1. Exploiting Category Specific Information for Guided Summarization Jun-Ping Ng Praveen Bysani Ziheng Lin Min-Yen Kan Chew-Lim Tan National University of Singapore 1 Monday, November 14, 11 1

  2. Outline • System Overview • Category Specific Features • Evaluation and Discussion 2 Monday, November 14, 11 2

  3. System Overview 3 Monday, November 14, 11 3

  4. Hypothesis • Word frequency distribution across different categories should be different • Some words are more important in certain categories • e.g. ‘health’ is more salient in “Health and Safety Issues” 4 Monday, November 14, 11 4

  5. What are those words? Category Attacks Health Endangered people people years minister food state told years national government new --- two health water 5 Monday, November 14, 11 5

  6. A Hint of Sentence Saliency • Two ways to look at the difference in word distribution • Frequency - Words which are used more are more important • Difference in usage - Words which are used differently from the “usual” are more important 6 Monday, November 14, 11 6

  7. Category Specific Information • Category Relevance Score • Category KL-Divergence 7 Monday, November 14, 11 7

  8. Category Relevance Score • Intuition - A word that appears across many documents within a topic and category is more useful • Linearly weight topic and document frequency scores 8 Monday, November 14, 11 8

  9. Category KL-Divergence • Intuition - The use of a word varies according to the category an article is written in. • KL-Divergence between frequency of word across all categories vs specific category 9 Monday, November 14, 11 9

  10. Generic Features • Bigram document frequency • Backoff model with unigram and bigram document frequencies • Sentence position • Sentence length 10 Monday, November 14, 11 10

  11. Update Summarization • Update summaries generated in similar fashion • But we take into account existing snippets from Set A Typical MMR Penalise sentences similar to those in Set A 11 Monday, November 14, 11 11

  12. Evaluation • Against ROUGE-2 NUS1 NUS2 Baseline2 Baseline1 0.14 0.105 ROUGE-2 0.07 0.035 0 Set A Set B 12 Monday, November 14, 11 12

  13. What is Important? - CRS -CKLD -CRS-CKLD 0.003 0.002 0.002 0.001 ROUGE-2 0 -0.001 -0.002 -0.002 -0.003 Set A Set B 13 Monday, November 14, 11 13

  14. All Features - CRS -CKLD -CRS - CKLD -BDFS -SL -SP 0.013 0 ROUGE-2 -0.013 -0.025 -0.038 -0.05 Set A Set B 14 Monday, November 14, 11 14

  15. Future Work • Do better studies to determine influence of category specific information • Exploit aspect-level information 15 Monday, November 14, 11 15

  16. Thank You • Word distribution within and outside a category plays a significant role in sentence selection • Category relevance score • Category KL-Divergence score 16 Monday, November 14, 11 16

Recommend


More recommend