Chinese Text Summarization Using A Trainable Summarizer and Latent - PowerPoint PPT Presentation

Chinese Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis Jen-Yuan Yeh 1 , Hao-Ren Ke 2 , and Wei-Pang Yang 1 1 Department of Computer & Information Science, National Chiao-Tung University, Taiwan, R.O.C. 2 Digital Library & Information Section of Library, National Chiao-Tung University, Taiwan, R.O.C.

Outline � Introduction and related works � Modified Corpus-based approach � LSA-based Text Relationship Map approach � Evaluations � Conclusions 2002/12/13 2/31 Chinese Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis

Text summarization � The process of distilling the most important information from a source (or sources) to produce an abridged version for a particular user (or users) and task (or tasks) [Mani99] . Documents Compression Ratio Transformation Summaries Synthesis Analysis 2002/12/13 4/31 Chinese Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis

Corpus-based Approach: A Trainable Document Summarizer [Kupiec95] vectors Feature Learning Extractor Algorithm Source Labeler Source Rules Test Phase Training Phase Rule Application Summary Machine-generated Test Corpus Training Corpus Summary ( ) ( ) ∏ k ∈ ∈ P f | s S P s S ( ) j = ∈ = j 1 P s S | f , f ,.., f 1 2 k ( ) ∏ k P f j = j 1 2002/12/13 5/31 Chinese Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis

Text Relationship Map (T.R.M.) Approach: Automated Text Structure and Summarization [Salton97] P 4 : 3 P 5 : 7 P 3 : 7 P 2 : 2 P 6 : 6 ( ) ⋅ P 1 : 6 P P i j = Sim P , P i j P 7 : 5 P P i j P 8 : 9 Three heuristic methods: P 11 :2 P 9 : 8 •Global bushy path P 10 : 3 •Depth-first path •Segmented bushy path •Each node is represented as P i =( k 1 , k 2 , …, k n ) • P i and P j are say to be connected when their vector similarity is greater than the threshold. 2002/12/13 6/31 Chinese Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis

Modified Corpus-based Approach � We use a “Score Function” to measure the significance of a sentence. ( ) ( ) ( ) ( ) ( ) ( ) = ⋅ + ⋅ − ⋅ + ⋅ + ⋅ Score s w Score s w Score s w Score s w Score s w Score s Overall 1 f 2 f 3 f 4 f 5 f 1 2 3 4 5 where f represent s "Positio n", f represent s "Positiv e Keyword" , f represent s "Negativ e Keyword" , 1 2 3 f represent s "Resembl ance to th e Title", f represent s "Central ity", and w indicates the impor tance 4 5 i of each fe ature. � Original approach computes the probability that a sentence will be included in the summary. ( ) ( ) ∏ k ∈ ∈ P f | s S P s S j ( ) = ∈ = j 1 P s S | f , f ,..., f 1 2 k ( ) ∏ k P f j = j 1 2002/12/13 8/31 Chinese Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis

f 1 : Position � For a sentence s , this feature-score is obtained as Average ra nk of Posi tion ( ) ( ) i = ∈ × Score s P s S|Position f i 1 5 . 0 where s co mes from P osition i a five-level rank from 1 to 5 used to emphasize the significance of positions. 2002/12/13 9/31 Chinese Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis

Word Aggregation for f 2 , f 3 , f 4 , and f 5 � Use Word Co-occurrence to reshape word unit. WC(B,C) > threshold 個人電腦個人電腦 ABCD AED Assume A, B, C, D, E are keywords, E is composed of B and C in order, if WC(B, C) > threshold , then replace B and C with E. freq ( ) E = WC B , C × freq freq B C [Kowalski97] 2002/12/13 10/31 Chinese Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis

f 2 : Positive Keyword � For a sentence s , assume s contains Keyword 1 , Keyword 2 , …, Keyword n , this feature-score is obtained as ( ) ( ) ∑ = ∈ Score s c P s S | Keyword ⋅ f k k 2 = k 1 , 2 ,..., n where c is the no . of Keywo rd in s. k k 2002/12/13 11/31 Chinese Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis

f 3 : Negative Keyword � For a sentence s , assume s contains Keyword 1 , Keyword 2 , …, Keyword n , this feature-score is obtained as ( ) ( ) ∑ = ∉ Score s c P s S | Keyword ⋅ f k k 3 = k 1 , 2 ,..., n where c is the no . of Keywo rd in s. k k 2002/12/13 12/31 Chinese Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis

f 4 : Resemblance to the Title � For a sentence s , this feature-score is obtained as I Keywords i n s Keywords i n Title ( ) = Score f s 4 U Keywords i n s Keywords i n Title 2002/12/13 13/31 Chinese Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis

f 5 : Centrality � For a sentence s , this feature-score is obtained as I Keywords i n s Keywords i n other se ntences ( ) = Score f s 5 U Keywords i n s Keywords i n other se ntences 2002/12/13 14/31 Chinese Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis

Train the Score Function by the Genetic Algorithm � Help to find a suitable combination of feature-weights. � Represent a genome as ( w 1 ,w 2 ,w 3 ,w 4 ,w 5 ), and perform the genetic algorithm (GA) to determine the value of w i . � Fitness: the average recall got with the genome when applying on the training corpus. 2002/12/13 15/31 Chinese Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis

LSA-based T.R.M. Approach � Combine T.R.M. [Salton97] and the semantic representations derived by LSA to promote summarization to semantic-level. Latent Semantic Analysis Document Chinese Summarization based on Document Chinese Summary Document Text Relationship Map [Salton97] Summary Document Word-by-Sentence Matrix Construction Sentence Sentence Identification Global Bushy Path Singular Value Relationship Construction Decomposition Analysis Word Segmentation & Keyword- Dimension Reduction Sentence Selection Semantic Related Frequency Sentence Link Calculation Sentence Selection Semantic Matrix Text Relationship Map Reconstruction Preprocessing Construction Semantic Model Analysis Semantic Sentence/Word Text Relationship Map Semantic Sentence/Word Text Relationship Map Representations Representations 2002/12/13 17/31 Chinese Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis

Semantic Representations � Represent a document D as a Word-by-Sentence matrix A and apply SVD to A to derive latent semantic structures of D from A . Sentence = ⋅ L S S S a L G ij ij i 1 2 N ⎛ + ⎞ c L W a a a ij = ⎜ ⎟ 1 11 12 1 N L log 1 ij ⎝ ⎠ n = j L A W a a a 2 21 22 2 N = − G 1 E i i M M M O M N 1 ( ) ∑ = − × E f log f ( ) i ij ij L [Bellegarda96] W a a a log N M M 1 M 2 MN = j 1 where c ij is the frequency of W i in S j , n j is the number Keyword: Nouns & Verbs of words in S j , and E i is the normalized entropy of W i 2002/12/13 18/31 Chinese Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis

Chinese Text Summarization Using A Trainable Summarizer and Latent - PowerPoint PPT Presentation

Chinese Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis Jen-Yuan Yeh 1 , Hao-Ren Ke 2 , and Wei-Pang Yang 1 1 Department of Computer & Information Science, National Chiao-Tung University, Taiwan, R.O.C. 2 Digital

Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis Jen-Yuan Yeh 1 ,

A DOCUMENT SUMMARIZER FOR NOVICES REX RUBIN WHY A DOCUMENT SUMMARIZER? Getting into a field

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

ACL19 Summarization Xiachong Feng Papers Multi-Document Summarization Scientific Paper

Document Summarization Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC

Tutorial on Abstractive Text Summarization Advaith Siddharthan NLG Summer School, Aberdeen, 22

Overview of TAC 2011 Summarization Track Karolina Owczarzak, Hoa Trang Dang National Institute of

A Neural Attention Model for Sentence Summarization Alexander M. Rush, Sumit Chopra, Jason

Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC Berkeley Document

Automatic Summarization (and other stuff) Taylor Berg-Kirkpatrick CS 288 UC Berkeley

WELCOME CHINESE Your Access Channel to the Chinese Market Welcome Chinese mission statement

Trainable Decoding of Sets of Sequences for Neural Sequence Models Ashwin Kalyan Peter Anderson

A Contextual Query Expansion Approach by Term Clustering for Robust Text Summarization Massih

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Recent Advances in Automatic Speech Summarization Sadaoki Furui Department of Computer Science

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

The Case for Change Notifications in Pull-Based Databases Wolfram Wingerath, Felix Gessert,

Informatics 1: Data & Analysis Lecture 11: Navigating XML using XPath Ian Stark School of

The Impedance Mismatch is Our Fault Stuart Halloway Datomic Team, Clojure/core, Relevance 1 All

Fenwick Trees Dr. Mattox Beckman University of Illinois at Urbana-Champaign Department of

How the Internet Works 15-110 Wednesday 11/04 Learning Goals Recognize core terms related

Approaches to net neutrality in Norway, Europe and US Frode Sorensen (@ipfrode) Norwegian

CheesePi: Swedish home network monitoring Liam McNamara & Ian Marsh (SICS) Sverker Forslin

Objec(ves Defining your own func(ons Control flow Scope, variable life(me Tes(ng

Chinese Text Summarization Using A Trainable Summarizer and Latent - PowerPoint PPT Presentation

Chinese Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis Jen-Yuan Yeh 1 , Hao-Ren Ke 2 , and Wei-Pang Yang 1 1 Department of Computer & Information Science, National Chiao-Tung University, Taiwan, R.O.C. 2 Digital

Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis Jen-Yuan Yeh 1 ,

A DOCUMENT SUMMARIZER FOR NOVICES REX RUBIN WHY A DOCUMENT SUMMARIZER? Getting into a field

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

ACL19 Summarization Xiachong Feng Papers Multi-Document Summarization Scientific Paper

Document Summarization Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC

Tutorial on Abstractive Text Summarization Advaith Siddharthan NLG Summer School, Aberdeen, 22

Overview of TAC 2011 Summarization Track Karolina Owczarzak, Hoa Trang Dang National Institute of

A Neural Attention Model for Sentence Summarization Alexander M. Rush, Sumit Chopra, Jason

Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC Berkeley Document

Automatic Summarization (and other stuff) Taylor Berg-Kirkpatrick CS 288 UC Berkeley

WELCOME CHINESE Your Access Channel to the Chinese Market Welcome Chinese mission statement

Trainable Decoding of Sets of Sequences for Neural Sequence Models Ashwin Kalyan Peter Anderson

A Contextual Query Expansion Approach by Term Clustering for Robust Text Summarization Massih

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Recent Advances in Automatic Speech Summarization Sadaoki Furui Department of Computer Science

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

The Case for Change Notifications in Pull-Based Databases Wolfram Wingerath, Felix Gessert,

Informatics 1: Data &amp; Analysis Lecture 11: Navigating XML using XPath Ian Stark School of

The Impedance Mismatch is Our Fault Stuart Halloway Datomic Team, Clojure/core, Relevance 1 All

Fenwick Trees Dr. Mattox Beckman University of Illinois at Urbana-Champaign Department of

How the Internet Works 15-110 Wednesday 11/04 Learning Goals Recognize core terms related

Approaches to net neutrality in Norway, Europe and US Frode Sorensen (@ipfrode) Norwegian

CheesePi: Swedish home network monitoring Liam McNamara &amp; Ian Marsh (SICS) Sverker Forslin

Objec(ves Defining your own func(ons Control flow Scope, variable life(me Tes(ng

Informatics 1: Data & Analysis Lecture 11: Navigating XML using XPath Ian Stark School of

CheesePi: Swedish home network monitoring Liam McNamara & Ian Marsh (SICS) Sverker Forslin