GOFAI SUM A Symbolic Summarizer Fabrizio Gotti, Guy Lapalme Université de Montréal Luka Nerima, Éric Wehrli Université de Genève
Originality of our approach • Symbolic approach - Syntactic parser that produces an XML file - Tree transformations using only XSLT rules (700 lines) no Java, no C++, no Perl, no Python... • No outside language information - no gazeeter, no Wordnet - ROUGE only used for evaluation not within the system itself 2
FIPS output for The U.S. government said today that the European Union (EU) should accept Turkey as its new member in the future. TP DP VP PONC NP AdvP CP The said . TP today that U.S. government DP VP should NP DP AdvP AdvP the accept AdvP : Adverbial phrase AP : Adjectival phrase CP : Complementizer phrase PAR Turkey PP European in the DP : Determiner phrase Union future NP : Noun phrase (EU) as DP PAR : Parenthetical phrase PONC : Ponctuation DP NP PP : Prepositional phrase TP : Tense phrase its AP member VP : Verb phrase new
FIPS output for The U.S. government said today that the European Union (EU) should accept Turkey as its new member in the future. Full parse of TP sentences DP VP PONC 71% for docs NP AdvP CP The said . 93% for topics TP today that U.S. government DP VP should NP DP AdvP AdvP the accept AdvP : Adverbial phrase AP : Adjectival phrase CP : Complementizer phrase PAR Turkey PP European in the DP : Determiner phrase Union future NP : Noun phrase (EU) as DP PAR : Parenthetical phrase PONC : Ponctuation DP NP PP : Prepositional phrase TP : Tense phrase its AP member VP : Verb phrase new
News Topic cluster Topic Cluster Preprocessing Preprocessing FIPS Sentence Scoring Sentence Selection Sentence Post-Processing Summary
Minutes News Topic per cluster cluster (25 articles) Topic Cluster 0.1 Preprocessing Preprocessing 4.0 FIPS 4.0 Sentence Scoring Sentence Selection 0.1 Sentence Post-Processing Total: 8.2 Summary
Sentence scoring • Word-based tf ⋅ idf similarity score (15%) • Lemma-based tf ⋅ idf similarity score (50%) • Lemma-based tf ⋅ idf similarity score with node depth (5%) • Sentence weight (20%) • Absolute sentence position (10%) 5
Sentence selection • Keep sentences with the highest scores • Sentences are dismissed (regardless of score) if - they cannot be parsed by FIPS (29%) - duplicate from different documents (4%) - without a verb (5%) - with the « I » pronoun (3%) - ending with « : » or « ? » (2%) - with all upper case words or with less than 5 words (4%) 6
Sentence post-processing • Referential clarity - some pronouns are removed Climate is changing, he said ⇒ Climate is changing - ambiguous temporal references are fixed • Reference to the present day ⇒ date of document • Day of the week ⇒ month and year of document • No repetition of a date within a summary • Sentence compression by pruning non-essential subtrees (e.g. parenthetical expressions) 7
Results • Content (11 th ) • Linguistic quality (5 th ) • Bad non-redundancy (23 rd ) • Pyramid: 8 th over 11 8
DUC 2007 Average Scores 5.5 0.40 5.0 RALI 0.35 4.5 Content & Linguistic Quality Scores 0.30 4.0 Rouge & Pyramid Scores 3.5 0.25 3.0 0.20 2.5 0.15 2.0 1.5 0.10 1.0 0.05 0.5 0.0 0.00 C D F J 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 1 2 A B E G H I Avg. Content Avg. Linguistic Quality Basic Elements Avg. Pyramid
Possible Improvements • Parsing : dedicated lexicons • Anaphora resolution with pronoun resolutions • Reduce redundancy with internal tf ⋅ idf • Better pruning of subordinate clauses, adjectival and adverbial modifiers 10
Possible Improvements • Parsing : dedicated lexicons • Anaphora resolution with pronoun resolutions • Reduce redundancy with internal tf ⋅ idf • Better pruning of subordinate clauses, adjectival and adverbial modifiers Combine with Wordnet, Gazetteers, etc 10
Conclusion • Simple and powerful • Back to the roots of AI • Modern tools and reliable syntactic parsers open new possibilities for principled summarization 11
Recommend
More recommend