Prosody-Based Unsupervised Speech Summarization with Two-Layer Mutually Reinforced Random Walk Sujay Kumar Jauhar {sjauhar, yvchen, fmetze}@cs.cmu.edu Yun-Nung (Vivian) Chen The 6th International Joint Conference on Natural Florian Metze Language Processing – Oct. 14-18, 2013 Language Technologies Institute School of Computer Science Carnegie Mellon University
2 Outline Introduction Approach Experiments Conclusion
3 Outline Introduction Approach Experiments Conclusion O Motivation O Extractive Summarization
4 Outline Introduction Approach Experiments Conclusion O Motivation O Extractive Summarization
5 Motivation O Speech Summarization O Spoken documents are more difficult to browse than texts easy to browse, save time, easily get the key points O Prosodic Features O Speakers may use prosody to implicitly convey the importance of the speech
6 Outline Introduction Approach Experiments Conclusion O Motivation O Extractive Summarization
7 Extractive Summarization (1/2) O Extractive Speech Summarization O Select the indicative utterances in a spoken document O Cascade the utterances to form a summary 1st utterance 2nd utterance 3rd utterance Extractive 4th utterance Summary : : n-th utterance : :
8 Extractive Summarization (2/2) O Selection of Indicative Utterances O Each utterance U in a spoken document d is given an importance score I(U, d) O Select the indicative utterances based on I(U,d) O The number of utterances selected as summary is decided by a predefined ratio utterance term U t t t t 1 2 i n n I U , d [ s t i d , ] Importance score i 1 term statistical measure (ex. TF-IDF)
9 Outline Introduction Approach Experiments Conclusion O Prosodic Feature Extraction O Graph Construction O Two-Layer Mutually Reinforced Random Walk
10 Outline Introduction Approach Experiments Conclusion O Prosodic Feature Extraction O Graph Construction O Two-Layer Mutually Reinforced Random Walk
11 Prosodic Feature Extraction O For each pre-segmented audio file, we extract O number of syllables O number of pauses O duration time: speaking time including pauses O phonation time: speaking time excluding pauses O speaking rate: #syllable / duration time O articulation rate: #syllable / phonation time O fundamental frequency measured in Hz: avg, max, min O energy measured in Pa 2 /sec O intensity measured in dB
12 Outline Introduction Approach Experiments Conclusion O Prosodic Feature Extraction O Graph Construction O Two-Layer Mutually Reinforced Random Walk
13 Graph Construction (1/3) O Utterance-Layer O Each node is the utterance in the meeting document U 2 U 5 U 1 U 3 U 7 U 4 U 6 Utterance-Layer
14 Graph Construction (2/3) O Utterance-Layer O Each node is the P 2 P 5 utterance in the P 1 P 3 P 4 meeting document P 6 Prosody-Layer O Prosody-Layer O Each node is a U 2 U 5 prosodic feature U 1 U 3 U 7 U 4 U 6 Utterance-Layer
15 Graph Construction (3/3) O Utterance-Layer O Each node is the P 2 P 5 utterance in the P 1 P 3 P 4 meeting document P 6 Prosody-Layer O Prosody-Layer O Each node is a U 2 U 5 prosodic feature U 1 U 3 U 7 U 4 U 6 O Between-Layer Utterance-Layer Relation O The weight of the edge is the normalized value of the prosodic feature extracted from the utterance
16 Outline Introduction Approach Experiments Conclusion O Prosodic Feature Extraction O Graph Construction O Two-Layer Mutually Reinforced Random Walk
17 Two-Layer Mutual Reinforced Random Walk (1/2) O Mathematical Formulation utterance scores at (t+1)-th iteration P 2 P 5 P 1 P 3 P 4 P 6 Prosody-Layer U 2 U 5 U 1 U 3 U 7 U 4 U 6 Utterance-Layer
18 Two-Layer Mutual Reinforced Random Walk (1/2) O Mathematical Formulation original importance of utterances O Original importance P 2 O Utterance: equal weight P 5 P 1 P 3 P 4 P 6 Prosody-Layer U 2 U 5 U 1 U 3 U 7 U 4 U 6 Utterance-Layer
19 Two-Layer Mutual Reinforced Random Walk (1/2) O Mathematical Formulation scores propagated from prosody nodes weighted by prosodic values O Original importance P 2 O Utterance: equal weight P 5 P 1 P 3 P 4 P 6 Prosody-Layer U 2 U 5 U 1 U 3 U 7 U 4 U 6 Utterance-Layer
20 Two-Layer Mutual Reinforced Random Walk (1/2) O Mathematical Formulation prosody scores at (t+1)-th iteration O Original importance P 2 O Utterance: equal weight P 5 P 1 P 3 P 4 P 6 Prosody-Layer U 2 U 5 U 1 U 3 U 7 U 4 U 6 Utterance-Layer
21 Two-Layer Mutual Reinforced Random Walk (1/2) O Mathematical Formulation original importance of prosodic features O Original importance P 2 O Utterance: equal weight P 5 P 1 P 3 P 4 O Prosody: equal weight P 6 Prosody-Layer U 2 U 5 U 1 U 3 U 7 U 4 U 6 Utterance-Layer
22 Two-Layer Mutual Reinforced Random Walk (1/2) O Mathematical Formulation scores propagated from utterances O Original importance weighted by prosodic values P 2 O Utterance: equal weight P 5 P 1 P 3 P 4 O Prosody: equal weight P 6 Prosody-Layer U 2 U 5 U 1 U 3 U 7 U 4 U 6 Utterance-Layer
23 Two-Layer Mutual Reinforced Random Walk (2/2) O Mathematical Formulation Utterance node U can get higher score when • More important prosodic features with higher weights corresponding to utterance U
24 Two-Layer Mutual Reinforced Random Walk (2/2) O Mathematical Formulation Utterance node U can get higher score when • More important prosodic features with higher weights corresponding to utterance U Prosody node P can get higher score when • More important utterances have higher weights corresponding to the prosodic feature P Unsupervised learn important utterances/prosodic features
25 Outline Introduction Approach Experiments Conclusion O Experimental Setup O Evaluation Metrics O Results O Analysis
26 Outline Introduction Approach Experiments Conclusion O Experimental Setup O Evaluation Metrics O Results O Analysis
27 Experimental Setup O CMU Speech Meeting Corpus O 10 meetings from 2006/04 – 2006/06 O #Speaker: 6 (total), 2-4 (each meeting) O WER = 44% O Reference Summaries O Manually labeled by two annotators as three “noteworthiness” level (1 -3) O Extract utterances with level 3 as reference summaries O Parameter Setting O α = 0.9 O Extractive summary ratio = 10%, 20%, 30%
28 Outline Introduction Approach Experiments Conclusion O Experimental Setup O Evaluation Metrics O Results O Analysis
29 Evaluation Metrics O ROUGE O ROUGE-1 O F-measure of matched unigram between extracted summary and reference summary O ROUGE-L (Longest Common Subsequence) O F-measure of matched LCS between extracted summary and reference summary O Average Relevance Score O Average noteworthiness scores for the extracted utterances
30 Outline Introduction Approach Experiments Conclusion O Experimental Setup O Evaluation Metrics O Results O Analysis
31 Baseline O Longest O the longest utterances based on #tokens O Begin O the utterances that appear in the beginning O Latent Topic Entropy (LTE) O Estimate the “focus” of an utterance O Lower topic entropy represents more topically informative O TFIDF O Average TFIDF scores of all words in the utterances
32 10% Results ROUGE-1 ROUGE-L 70.00 70.00 65.00 65.00 60.00 60.00 55.00 55.00 50.00 50.00 45.00 45.00 40.00 40.00 35.00 35.00 30.00 30.00 Longest Begin LTE TFIDF Proposed Longest Begin LTE TFIDF Proposed Avg. Relevance 2.50 2.45 2.40 2.35 2.30 2.25 2.20 Longest Begin LTE TFIDF Proposed For 10% summaries, Begin performs best and proposed performs comparable results
33 10% & 20% Results ROUGE-1 ROUGE-L 70.00 70.00 65.00 65.00 60.00 60.00 55.00 55.00 50.00 50.00 45.00 45.00 40.00 40.00 35.00 35.00 30.00 30.00 Longest Begin LTE TFIDF Proposed Longest Begin LTE TFIDF Proposed Avg. Relevance 2.50 2.45 2.40 2.35 2.30 2.25 2.20 Longest Begin LTE TFIDF Proposed For 20% summaries, proposed approach outperforms all of the baselines
34 10% & 20% & 30% Results ROUGE-1 ROUGE-L 70.00 70.00 65.00 65.00 60.00 60.00 55.00 55.00 50.00 50.00 45.00 45.00 40.00 40.00 35.00 35.00 30.00 30.00 Longest Begin LTE TFIDF Proposed Longest Begin LTE TFIDF Proposed Avg. Relevance 2.50 2.45 2.40 2.35 2.30 2.25 2.20 Longest Begin LTE TFIDF Proposed For 30% summaries, proposed approach outperforms all of the baselines
35 Outline Introduction Approach Experiments Conclusion O Experimental Setup O Evaluation Metrics O Results O Analysis
36 Analysis O Based on converged scores for prosodic features O Predictive features O number of pauses O min pitch O avg pitch O intensity O Least predictive features O the duration time O the number of syllables O the energy
37 Outline Introduction Approach Experiments Conclusion O Two-layer mutually reinforced random walk integrates prosodic knowledge into an unsupervised model for speech summarization O We show the first attempt at performing unsupervised speech summarization without using lexical information O Compared to some lexically derived baselines, the proposed approach outperforms all of them but one scenario
38
Recommend
More recommend