Automatic Prosody Labeling Final Presentation Andrew Rosenberg - PowerPoint PPT Presentation

Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN 6820 - Speech and Audio Processing and Recognition 4/27/05

Overview • Project Goal • ToBI standard for prosodic labeling • Previous Work • Method • Results • Conclusion

Project Goal: • Automatic assignment of tones tier elements – Given the waveform, orthographic and break index tiers, predict a subset/simplification of elements in the tones tier. – Distinct experiments for determining each of pitch accents, phrase tones, and phrase boundary tones

ToBI Annotation • Tones and Break Index (ToBI) labeling scheme consists of a speech waveform and 4 tiers: – Tones • Annotation of pitch accents and phrasal tones – Orthographic • Transcription of text – Break Index • Pauses between words, rated on a scale from 0-4. – Miscellaneous • Notes about the annotation (e.g., ambiguities, non-speech noise)

ToBI Transcription Example

ToBI Examples • Pitch Accents (made3.wav): – H*, L*, L+H* • Boundary Tones (money.wav): – L-H%, H-H%, L-L%, H-L%, (H-, L-)

Previous Work • Ross: “Prediction of abstract prosodic labels for speech synthesis” 1996 – BU Radio News Corpus (~48 minutes) • Public news broadcasts spoken by 7 speakers – Uses decision tree output as input to an HMM for pitch accent identification; Decision trees for phrase/boundary tone identification – Employs no acoustic features. • Narayanan: “An Automatic Prosody Recognizer using a Coupled Multi- Stream Acoustic Model and a Syntactic-Prosodic Language Model” 2005 – BU Radio News Corpus – Detects stressed syllables (collapsed ToBI labels) and all boundaries. – Uses CHMM on pitch, intensity and duration to track these “asynchronous” acoustic features, and a trigram POS/stress-boundary language model • Wightman: “Automatic Labeling of Prosodic Patterns” 1994 – Single speaker subset of BNC and ambiguous sentence corpus (read speech). – Like Ross, uses decision tree output as input to HMM – Uses many acoustic features

Method • JRip – Classification rule learner – Better at working with nominal attributes – Easier to read output • Corpus – Boston Direction Corpus • 4 speakers • ~65 minutes of semi-spontaneous speech • Original Plan: – HMMs and SVMs • SVMs took a prohibitive amount of time to learn and performed worse. • HMM implementation problems, and not enough time to implement my own

Method - Features • Min, max, mean, std.dev. F0 and Intensity • # Syllables, Duration, approx. vowel length, POS • F0 slope (weighted) • zscore of max F0 and intensity • Phrase-length F0, intensity and vowel length features • Phrase position

Results - Tasks • Pitch Accent – Identification – Detection • Phrase Tone identification • Boundary Tone identification • Phrase/Boundary Tone – Identification – Detection

Results - Pitch Accent Identification • Accuracy Best No Breaks Base Ross* Acc. 79.2% 78.0% 58.8% 80.2% • Relevant Features – # syllables, duration (previous 2), vowel length (prev, next 2), POS, max & stdev F0, slope F0, max & stdev intensity, zscore of F0, phrase level zscore of F0 and intensity *Ross identifies a different subset of ToBI pitch accents

Results - Pitch Accent Detection Narayanan Best No Ross Wightman Breaks Acc. 85.7% 83.9% 82.5% - - T/F 83.2/ 80.1/ - 79.5/ 83/ 12.4 14% 13.2% 14% Baseline: 58.9% On BNC, human agreement of 91%, in general 86-88% Idenical relevant features as id task

Results - Phrase Tone • Accuracy Best Base No Break Base Acc. 72.4% 57.9% 86.7% 77.4% • Relevant Features – Duration of next word, max, min, mean F0. – Linear slope F0, zscore of intensity, phrase zscores of F0 and intensity

Results - Boundary Tone Identification • Accuracy Best Base No Break Base Acc. 73.2% 65.1% 91.3% 84.5% • Relevant Features – Quadratically weighted F0 slope

Results - Phrase/Boundary Tone Identification • Accuracy Best Base Ross Base Acc. 54.7% 33.8% 66.9% 56.3% • Relevant Features – Duration of next two words, POS (current and 2 next), max, mean and slope (all weighting) of F0, mean intensity, phrase zscores of F0 and intensity, – zscore of difference in max intensity in the current word and the phrase.

Results – Phrase/Boundary Tone Detection • Accuracy Best Narayanan Wightman T/F 82.5/3.9% 80.9/16.0% 77/3% • Human agreement (in general): 95% • Best agreement: 93.0% over 77% baseline • Relevant Features – Vowel length (current and next word) – POS of the next word

Conclusion • Relatively low-tech acoustic features and ml algorithms can perform competitively with more complicated NLP approaches • Break index information was not as helpful as initially suspected. • Potential Improvements: – Sequential Modeling (HMM) – Different features • More sophisticated pitch contour feature • Content-based features (similar to Ross)

Automatic Prosody Labeling Final Presentation Andrew Rosenberg - PowerPoint PPT Presentation

Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN 6820 - Speech and Audio Processing and Recognition 4/27/05 Overview Project Goal ToBI standard for prosodic labeling Previous Work Method Results

The Future of Prosody Its about Time Dafydd Gibbon Bielefeld University Jinan University

Various Approaches Various Approaches acoustic classic The Prosody The Prosody measurement

11-823 Conlanging Prosody 2: so what does it all mean? Prosody Timing Stress timed vs

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Requirements of the Final Rule for Restaurant Menu Labeling Loretta Carey Food Labeling and

Definitions in the Final Rule for Restaurant Menu Labeling Loretta Carey Food Labeling and

Fall Seminar Seed Sampling & Labeling Larry Nees Seed Administrator Office of INDIANA

Hub Labeling Algorithms Andrew V. Goldberg Amazon.com A.V. Goldberg Hub Labeling 6/2/2016 1 /

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Modeling Prosody Pattern of Chinese Expressive Speech Application in Personalized Speech

Form-Meaning Interface in Constraint-based Unified Grammar: Prosody and Pragmatics PACLIC 19

BBNANG243 Phonological analysis Prosody: Phrase stress, rhythm & intonation Zoltn G.

prosody in Ekegusii (Kisii) narratives Daniel W. Hieber Discourse Workshop University of

The Influence of Prosody and Ambiguity on English Relativization Strategies Ted Briscoe &

CHiVE Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical

Some Cyclicity and Opacity Effects in the Prosody of Two Different Clitic Classes in New-

Announcements Extra credit Dont forget to enter questions on Canvas. At least 1 on

Developing the Prosody XMPP server in Lua Matthew Wild ('MattJ') @FOSDEM 16 Introduction Why

Understanding POWER multiprocessors Susmit Sarkar 1 Peter Sewell 1 Jade Alglave 2 , 3 Luc Maranget

Implicit Prosodic Priming and Autistic Traits in Relative Clause Attachment Sun-Ah Jun &

Meetings Research at ICSI Barbara Peskin reporting on work of: Don Baron, Sonali Bhagat, Hannah

Adaptive Feature Discovery for TRECVID Broadcast News Video Story Segmentation @TRECVID Workshop

Critical Analysis & the Reading Process revised: 07.21.12 || English 1302: Composition II ||

Automatic Prosody Labeling Final Presentation Andrew Rosenberg - PowerPoint PPT Presentation

Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN 6820 - Speech and Audio Processing and Recognition 4/27/05 Overview Project Goal ToBI standard for prosodic labeling Previous Work Method Results

The Future of Prosody Its about Time Dafydd Gibbon Bielefeld University Jinan University

Various Approaches Various Approaches acoustic classic The Prosody The Prosody measurement

11-823 Conlanging Prosody 2: so what does it all mean? Prosody Timing Stress timed vs

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Requirements of the Final Rule for Restaurant Menu Labeling Loretta Carey Food Labeling and

Definitions in the Final Rule for Restaurant Menu Labeling Loretta Carey Food Labeling and

Fall Seminar Seed Sampling &amp; Labeling Larry Nees Seed Administrator Office of INDIANA

Hub Labeling Algorithms Andrew V. Goldberg Amazon.com A.V. Goldberg Hub Labeling 6/2/2016 1 /

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Modeling Prosody Pattern of Chinese Expressive Speech Application in Personalized Speech

Form-Meaning Interface in Constraint-based Unified Grammar: Prosody and Pragmatics PACLIC 19

BBNANG243 Phonological analysis Prosody: Phrase stress, rhythm &amp; intonation Zoltn G.

prosody in Ekegusii (Kisii) narratives Daniel W. Hieber Discourse Workshop University of

The Influence of Prosody and Ambiguity on English Relativization Strategies Ted Briscoe &amp;

CHiVE Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical

Some Cyclicity and Opacity Effects in the Prosody of Two Different Clitic Classes in New-

Announcements Extra credit Dont forget to enter questions on Canvas. At least 1 on

Developing the Prosody XMPP server in Lua Matthew Wild ('MattJ') @FOSDEM 16 Introduction Why

Understanding POWER multiprocessors Susmit Sarkar 1 Peter Sewell 1 Jade Alglave 2 , 3 Luc Maranget

Implicit Prosodic Priming and Autistic Traits in Relative Clause Attachment Sun-Ah Jun &amp;

Meetings Research at ICSI Barbara Peskin reporting on work of: Don Baron, Sonali Bhagat, Hannah

Adaptive Feature Discovery for TRECVID Broadcast News Video Story Segmentation @TRECVID Workshop

Critical Analysis &amp; the Reading Process revised: 07.21.12 || English 1302: Composition II ||

Fall Seminar Seed Sampling & Labeling Larry Nees Seed Administrator Office of INDIANA

BBNANG243 Phonological analysis Prosody: Phrase stress, rhythm & intonation Zoltn G.

The Influence of Prosody and Ambiguity on English Relativization Strategies Ted Briscoe &

Implicit Prosodic Priming and Autistic Traits in Relative Clause Attachment Sun-Ah Jun &

Critical Analysis & the Reading Process revised: 07.21.12 || English 1302: Composition II ||