Automatic Detection and Classification of Prosodic Events Thesis Proposal Andrew Rosenberg December 12, 2007
Introduction Intonation • “What” v. “How” • Dimensions of Prosodic Variation • Speaking Rate • Pitch Range • Voice Quality • Loudness • Accenting* • Phrasing* A. Rosenberg - Thesis Proposal - 12/12/07 2
Introduction Prosodic Events • Categorical Phenomena • Accenting • Acoustic excursion which makes a word “prominent” or “stand out” from its surroundings • Phrasing • “Perceived disjuncture” between words A. Rosenberg - Thesis Proposal - 12/12/07 3
Introduction Accenting • Directs the listeners attention to a concept • Contrast • Topic • Information Status • Example: Eileen is pro-English. • Expected accenting goes unnoticed • Unexpected accenting leads to unexpected meaning • A: Is Eileen pro-French? B: Eileen is pro-English. A. Rosenberg - Thesis Proposal - 12/12/07 4
Introduction Phrasing • Phrasing defines an acoustic unit • Physiologically necessary • Communicatively useful • Attachment Example: Anna will win Manny. • Phrase final tones indicate: • How phrases are composed • Example: I need some oregano (H-) and marjoram (H-) and some fresh basil (L-) okay? • Pragmatic and discourse effect • Example: Mariana made the marmalade. A. Rosenberg - Thesis Proposal - 12/12/07 5
Introduction Why Prosodic Events • Consensus • Understanding • Availability A. Rosenberg - Thesis Proposal - 12/12/07 6
Introduction Goals • Provide prosodic information to SLP systems • Develop novel techniques for classification and detection • Increase understanding of the acoustic and lexical influences on the use of prosodic event A. Rosenberg - Thesis Proposal - 12/12/07 7
Outline • Detection of Prosodic Events • Classification of Prosodic Events • Applications A. Rosenberg - Thesis Proposal - 12/12/07 8
Outline • Detection of Prosodic Events • Pitch Accent • Phrase Boundary • Integrated Prosodic Event Detection • Classification of Prosodic Events • Applications A. Rosenberg - Thesis Proposal - 12/12/07 9
Pitch Accent Detection • Recognition of Acoustic Excursion • Acoustic Correlates • Pitch • Energy* • Duration • Previous Approaches • [Wightman&Ostendorff 1994, Conkie et al. 1999, Sun 2002, Marsi et al. 2003, Gregory 2004, Ananthakrishnan et al. 2005, Tamburini 2006, Chaolei 2007, Levow 2008, inter alia] A. Rosenberg - Thesis Proposal - 12/12/07 10
Pitch Accent Detection Basic Assumptions • Unit of Analysis: Syllable vs. Word • Use of Lexical or Syntactic Information • Supervised vs. Unsupervised Learning A. Rosenberg - Thesis Proposal - 12/12/07 11
Pitch Accent Detection Experiments • Feature Representation • Pitch - min, max, stdev, mean, rms • Energy - min, max, stdev, mean, rms • Duration • Context Normalization of max and mean • Range and z-score normalization over nine static context windows • Speaker Normalization (z-score) • Naïve Bayes, J48, SVM, Boosting, Bagging, Dagging* A. Rosenberg - Thesis Proposal - 12/12/07 12
Pitch Accent Detection Results 90.0 Human Agreement 87.5 85.0 82.5 80.0 77.5 75.0 BDC-spon BDC-read BU-RNC TDT -4 Naïve Bayes J48 Boosting Bagging SVM A. Rosenberg - Thesis Proposal - 12/12/07 13
Pitch Accent Detection Spectral Analysis • Spectral Balance • [Sluijter & Van Heuven 1996 1997, Fant 2000, Heldner 1999] [My name is Randy Keller] A. Rosenberg - Thesis Proposal - 12/12/07 14
Pitch Accent Detection Spectral Analysis • Spectral Balance • [Sluijter & Van Heuven 1996 1997, Fant 2000, Heldner 1999] [My name is Randy Keller] A. Rosenberg - Thesis Proposal - 12/12/07 15
Pitch Accent Detection Spectral Analysis • Examined the predictive power of 210 frequency regions [Rosenberg & Hirschberg 2006] etc. [My name is Randy Keller] A. Rosenberg - Thesis Proposal - 12/12/07 16
Pitch Accent Detection Spectral Analysis Findings • There is significant difference in the predictive power of energy information in frequency regions (14.8%) • >99.9% of data points are correctly classified by at least one classifier • Majority voting leads to ~81.8% correct classification using only energy features • Worse than SVM, but better than J48 and Boosting detection A. Rosenberg - Thesis Proposal - 12/12/07 17
Pitch Accent Detection Correcting Classifier • Can pitch and duration information be combined with these results to improve pitch accent detection accuracy? [Rosenberg & Hirschberg 2007] • For each of 210 energy-based classifiers, train a second pitch and duration based classifier to correct the predictions of the energy classifiers A. Rosenberg - Thesis Proposal - 12/12/07 18
Pitch Accent Detection Correcting Classifier Diagram Filters Energy ... Classifiers Correctors ... Aggregator ∑ A. Rosenberg - Thesis Proposal - 12/12/07 19
Pitch Accent Detection Correcting Classifier Results 90.0 Human Agreement 87.5 85.0 82.5 80.0 77.5 75.0 BDC-spon BDC-read TDT -4 Boosting Bagging SVM Energy Voting Corrected Voting A. Rosenberg - Thesis Proposal - 12/12/07 20
Pitch Accent Detection Proposed Work • Define Word Boundaries using ASR Transcripts • Inclusion of Syntactic Features: 1. Extend the Feature Vector 2. Syntactic-Class-Dependent Modeling • Penn Treebank, Collapsed Classes, Function v. Content 3. Model Combination A. Rosenberg - Thesis Proposal - 12/12/07 21
Outline • Detection of Prosodic Events • Pitch Accent • Phrase Boundary • Integrated Prosodic Event Detection • Classification of Prosodic Events • Applications A. Rosenberg - Thesis Proposal - 12/12/07 22
Phrase Boundary Detection • “Perceived Disjuncture” • Intermediate v. Intonational phrases • Acoustic Features • Silence • Pre-boundary Lengthening* • The final syllable in a phrase has increased duration • Declination Line Reset • Pitch and intensity decrease over the duration of a phrase A. Rosenberg - Thesis Proposal - 12/12/07 23
Phrase Boundary Detection Experiments • Reuse the feature vector from pitch accent detection experiments • Include Pitch and Energy Reset Features • Classify word boundaries as intonational and intermediate phrase boundaries • Naïve Bayes, J48, SVM* A. Rosenberg - Thesis Proposal - 12/12/07 24
Phrase Boundary Detection SVM Results - Full Intonational Phrases 100 90 BDC-read 80 BDC-spon BU-RNC 70 TDT -4 60 Communicator 50 IBM TTS Trains 40 30 20 10 0 Baseline Accuracy Difference A. Rosenberg - Thesis Proposal - 12/12/07 25
Phrase Boundary Detection SVM Results - Full Intonational Phrases 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 F-Measure Baseline* BDC-read BDC-spon BU-RNC TDT -4 Communicator IBM TTS Trains A. Rosenberg - Thesis Proposal - 12/12/07 26
Phrase Boundary Detection Proposed Work • Inclusion of Lexical Features • Similar to pitch accent inclusion approaches • Pre-boundary lengthening • Requires syllable information • Forced aligned from manual word boundaries • ASR phone hypothesis A. Rosenberg - Thesis Proposal - 12/12/07 27
Outline • Detection of Prosodic Events • Pitch Accent • Phrase Boundary • Integrated Prosodic Event Detection • Classification of Prosodic Events • Applications A. Rosenberg - Thesis Proposal - 12/12/07 28
Integrated Prosodic Event Detection • Pitch accents can improve phrase boundary detection [Wang & Hirschberg 1992] • Hypothesis: Phrase boundaries can improve pitch accent detection • Accents “stand out” from context. • Phrase boundaries define acoustic context. A. Rosenberg - Thesis Proposal - 12/12/07 29
Integrated Prosodic Event Detection Proposed Approaches • Simultaneous Detection • 4-way classification {acc, non}x{phrase, non} • Preliminary results show improved performance on pitch accent and phrase boundary on some corpora • Iterative Detection • Detect pitch accents. Use these to detect phrase boundaries. Use these to detect accent. Repeat • Classifier Fusion • Dynamic Bayesian Model A. Rosenberg - Thesis Proposal - 12/12/07 30
Outline • Detection of Prosodic Events • Classification of Prosodic Events • Pitch Accent Type • Phrase-final Tone • Applications A. Rosenberg - Thesis Proposal - 12/12/07 31
Outline • Detection of Prosodic Events • Classification of Prosodic Events • Pitch Accent Type • Phrase-final Tone • Applications A. Rosenberg - Thesis Proposal - 12/12/07 32
Prosodic Event Categorization Accent Types and Phrase-final Tones • Intonation can be described by sequence of low (L) and high (H) tones [Pierrehumbert 1980, Silverman 1992] • Accents: L*, H* • Complex tones: L+H*, L*+H, H+!H* • Intermediate Phrase-final tones (Phrase Accents): • Phrase Accents: L-, H- • Intonational Phrase-final tones (Phrase Accent + Boundary Tone): • L-L%, L-H%, H-L%, H-H% A. Rosenberg - Thesis Proposal - 12/12/07 33
Recommend
More recommend