Prosody-Based Unsupervised Speech Summarization with Two-Layer - PowerPoint PPT Presentation

Prosody-Based Unsupervised Speech Summarization with Two-Layer Mutually Reinforced Random Walk Sujay Kumar Jauhar {sjauhar, yvchen, fmetze}@cs.cmu.edu Yun-Nung (Vivian) Chen The 6th International Joint Conference on Natural Florian Metze Language Processing – Oct. 14-18, 2013 Language Technologies Institute School of Computer Science Carnegie Mellon University

2 Outline Introduction Approach Experiments Conclusion

3 Outline Introduction Approach Experiments Conclusion O Motivation O Extractive Summarization

5 Motivation O Speech Summarization O Spoken documents are more difficult to browse than texts  easy to browse, save time, easily get the key points O Prosodic Features O Speakers may use prosody to implicitly convey the importance of the speech

7 Extractive Summarization (1/2) O Extractive Speech Summarization O Select the indicative utterances in a spoken document O Cascade the utterances to form a summary 1st utterance 2nd utterance 3rd utterance Extractive 4th utterance Summary : : n-th utterance : :

8 Extractive Summarization (2/2) O Selection of Indicative Utterances O Each utterance U in a spoken document d is given an importance score I(U, d) O Select the indicative utterances based on I(U,d) O The number of utterances selected as summary is decided by a predefined ratio utterance term      U t t t t 1 2 i n n            I U , d [ s t i d , ] Importance score  i 1 term statistical measure (ex. TF-IDF)

9 Outline Introduction Approach Experiments Conclusion O Prosodic Feature Extraction O Graph Construction O Two-Layer Mutually Reinforced Random Walk

11 Prosodic Feature Extraction O For each pre-segmented audio file, we extract O number of syllables O number of pauses O duration time: speaking time including pauses O phonation time: speaking time excluding pauses O speaking rate: #syllable / duration time O articulation rate: #syllable / phonation time O fundamental frequency measured in Hz: avg, max, min O energy measured in Pa 2 /sec O intensity measured in dB

13 Graph Construction (1/3) O Utterance-Layer O Each node is the utterance in the meeting document U 2 U 5 U 1 U 3 U 7 U 4 U 6 Utterance-Layer

14 Graph Construction (2/3) O Utterance-Layer O Each node is the P 2 P 5 utterance in the P 1 P 3 P 4 meeting document P 6 Prosody-Layer O Prosody-Layer O Each node is a U 2 U 5 prosodic feature U 1 U 3 U 7 U 4 U 6 Utterance-Layer

15 Graph Construction (3/3) O Utterance-Layer O Each node is the P 2 P 5 utterance in the P 1 P 3 P 4 meeting document P 6 Prosody-Layer O Prosody-Layer O Each node is a U 2 U 5 prosodic feature U 1 U 3 U 7 U 4 U 6 O Between-Layer Utterance-Layer Relation O The weight of the edge is the normalized value of the prosodic feature extracted from the utterance

17 Two-Layer Mutual Reinforced Random Walk (1/2) O Mathematical Formulation utterance scores at (t+1)-th iteration P 2 P 5 P 1 P 3 P 4 P 6 Prosody-Layer U 2 U 5 U 1 U 3 U 7 U 4 U 6 Utterance-Layer

18 Two-Layer Mutual Reinforced Random Walk (1/2) O Mathematical Formulation original importance of utterances O Original importance P 2 O Utterance: equal weight P 5 P 1 P 3 P 4 P 6 Prosody-Layer U 2 U 5 U 1 U 3 U 7 U 4 U 6 Utterance-Layer

19 Two-Layer Mutual Reinforced Random Walk (1/2) O Mathematical Formulation scores propagated from prosody nodes weighted by prosodic values O Original importance P 2 O Utterance: equal weight P 5 P 1 P 3 P 4 P 6 Prosody-Layer U 2 U 5 U 1 U 3 U 7 U 4 U 6 Utterance-Layer

20 Two-Layer Mutual Reinforced Random Walk (1/2) O Mathematical Formulation prosody scores at (t+1)-th iteration O Original importance P 2 O Utterance: equal weight P 5 P 1 P 3 P 4 P 6 Prosody-Layer U 2 U 5 U 1 U 3 U 7 U 4 U 6 Utterance-Layer

21 Two-Layer Mutual Reinforced Random Walk (1/2) O Mathematical Formulation original importance of prosodic features O Original importance P 2 O Utterance: equal weight P 5 P 1 P 3 P 4 O Prosody: equal weight P 6 Prosody-Layer U 2 U 5 U 1 U 3 U 7 U 4 U 6 Utterance-Layer

22 Two-Layer Mutual Reinforced Random Walk (1/2) O Mathematical Formulation scores propagated from utterances O Original importance weighted by prosodic values P 2 O Utterance: equal weight P 5 P 1 P 3 P 4 O Prosody: equal weight P 6 Prosody-Layer U 2 U 5 U 1 U 3 U 7 U 4 U 6 Utterance-Layer

23 Two-Layer Mutual Reinforced Random Walk (2/2) O Mathematical Formulation Utterance node U can get higher score when • More important prosodic features with higher weights corresponding to utterance U

24 Two-Layer Mutual Reinforced Random Walk (2/2) O Mathematical Formulation Utterance node U can get higher score when • More important prosodic features with higher weights corresponding to utterance U Prosody node P can get higher score when • More important utterances have higher weights corresponding to the prosodic feature P  Unsupervised learn important utterances/prosodic features

25 Outline Introduction Approach Experiments Conclusion O Experimental Setup O Evaluation Metrics O Results O Analysis

27 Experimental Setup O CMU Speech Meeting Corpus O 10 meetings from 2006/04 – 2006/06 O #Speaker: 6 (total), 2-4 (each meeting) O WER = 44% O Reference Summaries O Manually labeled by two annotators as three “noteworthiness” level (1 -3) O Extract utterances with level 3 as reference summaries O Parameter Setting O α = 0.9 O Extractive summary ratio = 10%, 20%, 30%

29 Evaluation Metrics O ROUGE O ROUGE-1 O F-measure of matched unigram between extracted summary and reference summary O ROUGE-L (Longest Common Subsequence) O F-measure of matched LCS between extracted summary and reference summary O Average Relevance Score O Average noteworthiness scores for the extracted utterances

31 Baseline O Longest O the longest utterances based on #tokens O Begin O the utterances that appear in the beginning O Latent Topic Entropy (LTE) O Estimate the “focus” of an utterance O Lower topic entropy represents more topically informative O TFIDF O Average TFIDF scores of all words in the utterances

32 10% Results ROUGE-1 ROUGE-L 70.00 70.00 65.00 65.00 60.00 60.00 55.00 55.00 50.00 50.00 45.00 45.00 40.00 40.00 35.00 35.00 30.00 30.00 Longest Begin LTE TFIDF Proposed Longest Begin LTE TFIDF Proposed Avg. Relevance 2.50 2.45 2.40 2.35 2.30 2.25 2.20 Longest Begin LTE TFIDF Proposed For 10% summaries, Begin performs best and proposed performs comparable results

33 10% & 20% Results ROUGE-1 ROUGE-L 70.00 70.00 65.00 65.00 60.00 60.00 55.00 55.00 50.00 50.00 45.00 45.00 40.00 40.00 35.00 35.00 30.00 30.00 Longest Begin LTE TFIDF Proposed Longest Begin LTE TFIDF Proposed Avg. Relevance 2.50 2.45 2.40 2.35 2.30 2.25 2.20 Longest Begin LTE TFIDF Proposed For 20% summaries, proposed approach outperforms all of the baselines

34 10% & 20% & 30% Results ROUGE-1 ROUGE-L 70.00 70.00 65.00 65.00 60.00 60.00 55.00 55.00 50.00 50.00 45.00 45.00 40.00 40.00 35.00 35.00 30.00 30.00 Longest Begin LTE TFIDF Proposed Longest Begin LTE TFIDF Proposed Avg. Relevance 2.50 2.45 2.40 2.35 2.30 2.25 2.20 Longest Begin LTE TFIDF Proposed For 30% summaries, proposed approach outperforms all of the baselines

36 Analysis O Based on converged scores for prosodic features O Predictive features O number of pauses O min pitch O avg pitch O intensity O Least predictive features O the duration time O the number of syllables O the energy

37 Outline Introduction Approach Experiments Conclusion O Two-layer mutually reinforced random walk integrates prosodic knowledge into an unsupervised model for speech summarization O We show the first attempt at performing unsupervised speech summarization without using lexical information O Compared to some lexically derived baselines, the proposed approach outperforms all of them but one scenario

Prosody-Based Unsupervised Speech Summarization with Two-Layer - PowerPoint PPT Presentation

Prosody-Based Unsupervised Speech Summarization with Two-Layer Mutually Reinforced Random Walk Sujay Kumar Jauhar {sjauhar, yvchen, fmetze}@cs.cmu.edu Yun-Nung (Vivian) Chen The 6th International Joint Conference on Natural Florian Metze

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Recent Advances in Automatic Speech Summarization Sadaoki Furui Department of Computer Science

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

ACL19 Summarization Xiachong Feng Papers Multi-Document Summarization Scientific Paper

Document Summarization Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC

Modeling Prosody Pattern of Chinese Expressive Speech Application in Personalized Speech

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech

The Future of Prosody Its about Time Dafydd Gibbon Bielefeld University Jinan University

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Overview of TAC 2011 Summarization Track Karolina Owczarzak, Hoa Trang Dang National Institute of

A Neural Attention Model for Sentence Summarization Alexander M. Rush, Sumit Chopra, Jason

Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC Berkeley Document

Automatic Summarization (and other stuff) Taylor Berg-Kirkpatrick CS 288 UC Berkeley

Alternative Summarization: Abstraction, Reviews & Speech Ling 573 Systems and Applications

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Various Approaches Various Approaches acoustic classic The Prosody The Prosody measurement

So Southea heastern stern Regio ion Tra ransm smissio ission P Pla lannin ning East

Decomposition for Koopman Analysis of Time-Variant Systems Naoya Takeishi (RIKEN) Takehisa Yairi

BSEE INC/Civil Penalty Update Gulf Coast Action Team (GCAT) Meeting Shell Robert Facility

In Intr troductions Group Introductions including hopes and fears for the day, why you are

Interim Report Q3/2013 30 th October, 2013 Tecnotree Group in Brief Highlights for Q3 &

Constraining the slope parameter of symmetry energy from nuclear structure International

CFD Analysis for a Westinghouse Natural Circulation Experiment during Severe Accidents Hyung Seok

EXPERIENCES WITH THE PERMANENT SERIES CONNECTION OF USM IN GERMAN GAS MARKET Jrg Wenzel &

Prosody-Based Unsupervised Speech Summarization with Two-Layer - PowerPoint PPT Presentation

Prosody-Based Unsupervised Speech Summarization with Two-Layer Mutually Reinforced Random Walk Sujay Kumar Jauhar {sjauhar, yvchen, fmetze}@cs.cmu.edu Yun-Nung (Vivian) Chen The 6th International Joint Conference on Natural Florian Metze

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Recent Advances in Automatic Speech Summarization Sadaoki Furui Department of Computer Science

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

ACL19 Summarization Xiachong Feng Papers Multi-Document Summarization Scientific Paper

Document Summarization Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC

Modeling Prosody Pattern of Chinese Expressive Speech Application in Personalized Speech

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech

The Future of Prosody Its about Time Dafydd Gibbon Bielefeld University Jinan University

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Overview of TAC 2011 Summarization Track Karolina Owczarzak, Hoa Trang Dang National Institute of

A Neural Attention Model for Sentence Summarization Alexander M. Rush, Sumit Chopra, Jason

Statistical NLP Spring 2011 Lecture 25: Summarization Dan Klein UC Berkeley Document

Automatic Summarization (and other stuff) Taylor Berg-Kirkpatrick CS 288 UC Berkeley

Alternative Summarization: Abstraction, Reviews &amp; Speech Ling 573 Systems and Applications

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Various Approaches Various Approaches acoustic classic The Prosody The Prosody measurement

So Southea heastern stern Regio ion Tra ransm smissio ission P Pla lannin ning East

Decomposition for Koopman Analysis of Time-Variant Systems Naoya Takeishi (RIKEN) Takehisa Yairi

BSEE INC/Civil Penalty Update Gulf Coast Action Team (GCAT) Meeting Shell Robert Facility

In Intr troductions Group Introductions including hopes and fears for the day, why you are

Interim Report Q3/2013 30 th October, 2013 Tecnotree Group in Brief Highlights for Q3 &amp;

Constraining the slope parameter of symmetry energy from nuclear structure International

CFD Analysis for a Westinghouse Natural Circulation Experiment during Severe Accidents Hyung Seok

EXPERIENCES WITH THE PERMANENT SERIES CONNECTION OF USM IN GERMAN GAS MARKET Jrg Wenzel &amp;

Alternative Summarization: Abstraction, Reviews & Speech Ling 573 Systems and Applications

Interim Report Q3/2013 30 th October, 2013 Tecnotree Group in Brief Highlights for Q3 &

EXPERIENCES WITH THE PERMANENT SERIES CONNECTION OF USM IN GERMAN GAS MARKET Jrg Wenzel &