Text Summarization Using A Trainable Summarizer and Latent Semantic - PowerPoint PPT Presentation

Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis Jen-Yuan Yeh 1 , Hao-Ren Ke 2 , and Wei-Pang Yang 1 1 Department of Computer & Information Science, National Chiao-Tung University, Taiwan, R.O.C. 2 Digital Library & Information Section of Library, National Chiao-Tung University, Taiwan, R.O.C.

Outline � Introduction and related work � Modified Corpus-based approach (MCBA) � LSA-based Text Relationship Map approach (LSA+T.R.M.) � Evaluation � Conclusion 2003/9/13 2/36 Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis

Text summarization � The process of distilling the most important information from a source (or sources) to produce an abridged version for a particular user (or users) and task (or tasks) [Mani & Bloedorn, 1999] . Documents Compression Ratio Transformation Summaries Synthesis Analysis 2003/9/13 3/36 Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis

Corpus-based Approach: A Trainable Document Summarizer [Kupiec et al., 1995] vectors Feature Learning Extractor Algorithm Source Labeler Source Rules Test Phase Training Phase Rule Application Summary Machine-generated Test Corpus Training Corpus Summary ( ) ( ) ∏ k ∈ ∈ P f | s S P s S ( ) j = ∈ = j 1 P s S | f , f ,.., f 1 2 k ( ) ∏ k P f j = j 1 2003/9/13 4/36 Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis

Text Relationship Map (T.R.M.) Approach: Automated Text Structure and Summarization [Salton et al., 1997] P 4 : 3 P 5 : 7 P 3 : 7 P 2 : 2 P 6 : 6 ( ) P 1 : 6 ⋅ P P = i j Sim P , P P 7 : 5 i j P P i j P 8 : 9 Three heuristic methods: P 11 :2 P 9 : 8 •Global bushy path P 10 : 3 •Depth-first path •Segmented bushy path •Each node is represented as P i =( k 1 , k 2 , …, k n ) • P i and P j are judged to be connected when their similarity is greater than the threshold. 2003/9/13 5/36 Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis

Modified Corpus-based Approach � We use a score function to measure the significance of a sentence. ( ) ( ) ( ) ( ) ( ) ( ) = ⋅ + ⋅ − ⋅ + ⋅ + ⋅ Score s w Score s w Score s w Score s w Score s w Score s Overall 1 f 2 f 3 f 4 f 5 f 1 2 3 4 5 where f represent s "Positio n", f represent s "Positiv e Keyword" , f represent s "Negativ e Keyword" , 1 2 3 f represent s "Central ity", f represent s "Resembl ance to th e Title", and w indicates the impor tance 4 5 i of each fe ature. � Kupiec et al. (1995) computes the probability that a sentence will be included in the summary. ( ) ( ) ∏ k ∈ ∈ P f | s S P s S j ( ) = ∈ = j 1 P s S | f , f ,..., f 1 2 k ( ) ∏ k P f j = j 1 2003/9/13 6/36 Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis

f 1 : Position � For a sentence s , the position score is defined as Average ra nk ( PiSj ) ( ) ( ) = ∈ × Score f s P s S|PiSj 1 R where s co mes from P iSj . ( e . g . P 1 S 1 indicates the first sentence of the first paragraph ) R is a rank which implies significan ce of each sentence. 2003/9/13 7/36 Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis

f 2 : Positive Keyword � For a sentence s , assume s contains Keyword 1 , Keyword 2 , …, Keyword n , the positive-keyword is defined as 1 ( ) ( ) ∑ = ∈ Score s tf P s S | Keyword ⋅ f i 2 i length ( s ) = i 1 ~ n where tf is the oc currence f req . of Keyword in s . i i 2003/9/13 8/36 Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis

f 3 : Negative Keyword � For a sentence s , assume s contains Keyword 1 , Keyword 2 , …, Keyword n , this negative-keyword score is defined as 1 ( ) ( ) ∑ = ⋅ ∉ Score s tf P s S | Keyword f i 3 i length ( s ) = i 1 ~ n where tf is the oc currence freq . of Keywor d in s . k i 2003/9/13 9/36 Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis

f 4 : Centrality � For a sentence s , the score is defined as I Keywords i n s Keywords i n other se ntences ( ) = Score f s 4 U Keywords i n s Keywords i n other se ntences 2003/9/13 10/36 Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis

f 5 : Resemblance to the Title � For a sentence s , the score is defined as I Keywords i n s Keywords i n Title ( ) = Score f s 5 U Keywords i n s Keywords i n Title 2003/9/13 11/36 Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis

Word Aggregation for f 2 , f 3 , f 4 , and f 5 � Use Word Co-occurrence to reshape word unit. MI(B,C) > threshold 個人電腦個人電腦 ABCD AED Assume A, B, C, D, E are keywords, E is composed of B and C in order, if MI(B, C) > threshold , then replace B and C with E. ( ) P ( x , y ) = MI x , y log [Maosong et al., 1998] × P ( x ) P ( y ) P ( x ) is the probabilit y that x occurs in the corpus; P ( x,y ) is the probabilit y that x and y occurs adjacently in the corpus. 2003/9/13 12/36 Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis

Train the Score Function by the Genetic Algorithm � Help to find a suitable combination of feature-weights. � Regard ( w 1 ,w 2 ,w 3 ,w 4 ,w 5 ) as a genome, and perform the genetic algorithm (GA) to determine the value of w i . � Fitness: the average f-measure got with the genome when applying on the training corpus. � 100 generations, each with 1000 genomes. 2003/9/13 13/36 Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis

Summary of Modified Corpus-based Approach � Use a weighted score function to measure the importance of a sentence. � Employ ranked positions to emphasize the significance of sentence positions. � Train the score function by the genetic algorithm to find a suitable combination of feature weights. 2003/9/13 14/36 Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis

LSA-based T.R.M. Approach � Combine T.R.M. [Salton et al., 1997] and semantic representations derived by LSA to promote summarization to semantic-level. Latent Semantic Analysis Document Chinese Summarization based on Document Chinese Summary Document Text Relationship Map [Salton et al., 1997] Summary Document Word-by-Sentence Matrix Construction Sentence Sentence Identification Global Bushy Path Singular Value Relationship Construction Decomposition Analysis Word Segmentation & Keyword- Dimension Reduction Sentence Selection Semantic Related Frequency Sentence Link Calculation Sentence Selection Semantic Matrix Text Relationship Map Reconstruction Preprocessing Construction Semantic Model Analysis Semantic Sentence/Word Text Relationship Map Semantic Sentence/Word Text Relationship Map Representations Representations 2003/9/13 15/36 Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis

Semantic Representations � Represent a document D as a Word-by-Sentence matrix A and apply SVD to A to derive latent semantic structures of D from A . Sentence = ⋅ L S S S a L G ij ij i 1 2 N ⎛ + ⎞ c L W a a a ij = ⎜ ⎟ 1 11 12 1 N L log 1 ij ⎝ ⎠ n = j L A W a a a 2 21 22 2 N = − G 1 E i i M M M O M N 1 ( ) ∑ = − × E f log f ( ) i ij ij L [Bellegarda et al., 1996] W a a a log N M M 1 M 2 MN = j 1 where c ij is the frequency of W i in S j , n j is the number Keyword: Nouns & Verbs of words in S j , and E i is the normalized entropy of W i 2003/9/13 16/36 Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis

Text Summarization Using A Trainable Summarizer and Latent Semantic - PowerPoint PPT Presentation

Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis Jen-Yuan Yeh 1 , Hao-Ren Ke 2 , and Wei-Pang Yang 1 1 Department of Computer & Information Science, National Chiao-Tung University, Taiwan, R.O.C. 2 Digital Library

Chinese Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis Jen-Yuan Yeh

A DOCUMENT SUMMARIZER FOR NOVICES REX RUBIN WHY A DOCUMENT SUMMARIZER? Getting into a field

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

ACL19 Summarization Xiachong Feng Papers Multi-Document Summarization Scientific Paper

Document Summarization Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC

Tutorial on Abstractive Text Summarization Advaith Siddharthan NLG Summer School, Aberdeen, 22

Automatic Summarization (and other stuff) Taylor Berg-Kirkpatrick CS 288 UC Berkeley

Overview of TAC 2011 Summarization Track Karolina Owczarzak, Hoa Trang Dang National Institute of

A Neural Attention Model for Sentence Summarization Alexander M. Rush, Sumit Chopra, Jason

Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC Berkeley Document

Trainable Decoding of Sets of Sequences for Neural Sequence Models Ashwin Kalyan Peter Anderson

A Contextual Query Expansion Approach by Term Clustering for Robust Text Summarization Massih

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Recent Advances in Automatic Speech Summarization Sadaoki Furui Department of Computer Science

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Alternative Summarization: Abstraction, Reviews & Speech Ling 573 Systems and Applications

Lecture 14 HCI History Mark Woehrer CS 3053 - Human-Computer Interaction Computer Science

CSE 373: Analysis of Algorithms Topic: Reinventing search engines using Tries Nov 03, 2003

DIMACS Workshop Opening-Closing Comments Stephen E. Fienberg Department of Statistics &

CS490W Web Search (I) Luo Si Department of Computer Science Purdue University Slides from

Overview Agenda Architecture of search on the web including an overview of Crawling,

Web Engineering HTTP is based on TCP to experiment with the protocol telnet can be used.

Workflow basics, RMarkdown, git/Github Cleaning up Cleaning up Cleaning up Cleaning up

Quick wins for an accessible website Baris Wanschers & Marloes Bosch - LimoenGroen Quick wins

Text Summarization Using A Trainable Summarizer and Latent Semantic - PowerPoint PPT Presentation

Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis Jen-Yuan Yeh 1 , Hao-Ren Ke 2 , and Wei-Pang Yang 1 1 Department of Computer & Information Science, National Chiao-Tung University, Taiwan, R.O.C. 2 Digital Library

Chinese Text Summarization Using A Trainable Summarizer and Latent Semantic Analysis Jen-Yuan Yeh

A DOCUMENT SUMMARIZER FOR NOVICES REX RUBIN WHY A DOCUMENT SUMMARIZER? Getting into a field

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

ACL19 Summarization Xiachong Feng Papers Multi-Document Summarization Scientific Paper

Document Summarization Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC

Tutorial on Abstractive Text Summarization Advaith Siddharthan NLG Summer School, Aberdeen, 22

Automatic Summarization (and other stuff) Taylor Berg-Kirkpatrick CS 288 UC Berkeley

Overview of TAC 2011 Summarization Track Karolina Owczarzak, Hoa Trang Dang National Institute of

A Neural Attention Model for Sentence Summarization Alexander M. Rush, Sumit Chopra, Jason

Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC Berkeley Document

Trainable Decoding of Sets of Sequences for Neural Sequence Models Ashwin Kalyan Peter Anderson

A Contextual Query Expansion Approach by Term Clustering for Robust Text Summarization Massih

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Recent Advances in Automatic Speech Summarization Sadaoki Furui Department of Computer Science

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Alternative Summarization: Abstraction, Reviews &amp; Speech Ling 573 Systems and Applications

Lecture 14 HCI History Mark Woehrer CS 3053 - Human-Computer Interaction Computer Science

CSE 373: Analysis of Algorithms Topic: Reinventing search engines using Tries Nov 03, 2003

DIMACS Workshop Opening-Closing Comments Stephen E. Fienberg Department of Statistics &amp;

CS490W Web Search (I) Luo Si Department of Computer Science Purdue University Slides from

Overview Agenda Architecture of search on the web including an overview of Crawling,

Web Engineering HTTP is based on TCP to experiment with the protocol telnet can be used.

Workflow basics, RMarkdown, git/Github Cleaning up Cleaning up Cleaning up Cleaning up

Quick wins for an accessible website Baris Wanschers &amp; Marloes Bosch - LimoenGroen Quick wins

Alternative Summarization: Abstraction, Reviews & Speech Ling 573 Systems and Applications

DIMACS Workshop Opening-Closing Comments Stephen E. Fienberg Department of Statistics &

Quick wins for an accessible website Baris Wanschers & Marloes Bosch - LimoenGroen Quick wins