? classification user model speech = sensor adapts its dialog - PowerPoint PPT Presentation

Speaker Classification: Supervector Approach and Detection Task Christian Müller, DFKI

Speech as a Source for Non-Intrusive UM Now it’s time to get to gate 38. Information about adaptive the user speech dialog system A speaker ? classification user model speech = sensor adapts its dialog behavior inference from (e.g. detailed map with sensors shops vs. arrows) ( not intrusive ) B provides explicit statement recommendations ( intrusive ) (e.g. a different route to the gate) Christian M ü ller

Overview Speech as a source of information for non-intrusive  user modeling Speech/signal processing Take-away messages GMM/SVM supervector Classification method   approach for acoustic for independent “bag of speech features observations” features Detection task and Valid application-   pseudo-NIST evaluation independent evaluation procedure Rank and polynomial Feature space warping   rank normalization normalization  Conclusions Christian M ü ller

Speaker Classification Systems Cognitive Load  Best Research Paper Award UM 2001 Age and Gender  Voice Award 2007  Telekom live operation 2009 S y Language Audio segment s  14 languages + dialects (telephone quality)  NIST evaluation 2007 t e Identity m  Project with BKA 2009  NIST* Evaluation 2008 Acoustic Events  Project with VW 2008  Interspeech 2008 Christian M ü ller

 How can your features be modeled assuming that they  are multi-dimensional  represent repeating observations of the same kind  can be assumed to be independent (“bag” of observations)  Proposing the GMM/SVM Supervector Approach on the example of frame-by-frame acoustic features Christian M ü ller

Hierarchical Feature Model High-level features (learned characteristics) semantics ? dialog A b b a e b B : d d e c : ideolect <s> how shall I say this <c> <s> yeah I know... phonetics /S/ /oU/ /m/ /i:/ /D/ /&/ /m/ /  / /n/ /i:/ ... prosody spectrum Low-level features (physical characterstics) Christian M ü ller

Modeling Acoustics and Prosodics semantics ? dialog A b b a e b B : d d e c : ideolect no ASR <s> how shall I say this <c> <s> yeah I know... phonetics /S/ /oU/ /m/ /i:/ /D/ /&/ /m/ /  / /n/ /i:/ ... prosody spectrum Christian M ü ller

General Classification Scheme z k e.g. channel compensation w kj -0,4 multilayer perceptron support-vector machines 0.7 -1 (not addressed in this networks Preprocessing talk) y 1 y 2 -1.5 0.5 1 Feature 1 1 w ji Extraction 1 x 2 x 1 Classification Fusion Top-Down- Knowledge Christian M ü ller

Generative Approach: Gaussian Mixture Model (GMM) training “emergency vehicle” probability density “emergency feature vehicle” extraction model frame of speech test ? avg likelihood over all frames “emergency feature for class vehicle” extraction “emergency model vehicle” Christian M ü ller

Generative Approach: Gaussian Mixture Model (GMM) test ? “emergency feature vehicle” extraction avg. log model likelihood ratio over all frames for frame of speech class “emergency vehicle” background model Christian M ü ller

A Mixture of Gaussians  Means, variances, and mixtures weights are optimized in training  Black line = mixture of 3 Gaussians Christian M ü ller

Discriminative Method: Support Vector Machine (SVM) training “em. vehic.” (1) “em. vehic.” feature model “not em. vehic.” (-1) extraction Features are transformed into higher-dimensional space where problem  is linear Discriminating hyper plane is learned using linear regression  Trade-o fg between training error and width of margin  Model is stored in form of “support vectors” (data points on the margin)  Christian M ü ller

Discriminative Method: Support Vector Machine (SVM) test ? feature score extraction (distance to hyper plane) Discriminative methods have shown to be superior to generative  methods for similar tasks Features vectors have to be of the same lengths (sensitive to variable  segment lengths) Solutions:   feature statistics calculated over the entire utterance  fixes portion of the segment  sequential kernels Christian M ü ller

GMM/SVM Supervector Approach feature extraction Gaussian means (MAP adapted)  Combines discriminative power of SVMs with length independency of GMMs  Very successful with similar tasks such as speaker recognition  GMM is trained using MAP adaptation Christian M ü ller

Evaluation Results Christian Müller, Joan-Isaac Biel, Edward Kim, and Daniel Rosario, “Speech-overlapped Acoustic Event Detection for Automotive Applications,” in Proceedings of the Interspeech 2008 , Brisbane, Australia, 2008. Christian M ü ller

 How can you evaluate your multi- class models independently from the given application?  How can you establish a appropriate evaluation procedure in order to obtain valid results?  Proposing the detection task and the “pseudo NIST” evaluation procedure on the example of acoustic event detection and speaker age recognition. Christian M ü ller

Background  With multi-class recognition problems, many test/analyzing methods are very application specific.  e.g. confusion matrices.  we want a method that allows results to be generalized across a large set of applications.  With home-grown databases, parameter tuning on the evaluation set often compromises the validity of the results/inferences.  we want a fair “one shot” evaluation. Christian M ü ller

The Detection Task system yes , 1.324326 emergency vehicle ?  Given  a speech segment (s)  and an acoustic event to be detected (target event, ET )  the task is to decide whether ET is present in s (yes or no)  the system's output shall also contain a score indicating its confidence with more positive scores indicating greater confidence. Christian M ü ller

Terminology  Segment class  e.g. segment event, segment age-class.  ground truth (not known).  Target  the hypothesized class.  Trial  a combination of segment and target. Christian M ü ller

Evaluation yes 1.32432 system no -0.3212 emergency vehicle ? no 1.8463 music ? no -2.5773 talking ? yes 0.00132 laughing ? phone ? no 2.20122 no event ?  The system performance is evaluated by presenting it with a set of trials.  Each test segment is used for multiple trials.  The absence of all of all targets is explicitly included. Christian M ü ller

Type of Errors segment “em. vehic.” system no “MISS” target “em. vehic” ? segment “em. vehic” system yes “FALSE ALARM” target “phone” ? Christian M ü ller

Decision-Error Tradeo fg misses “equal error rate” false alarms  Selecting an operating point (decision threshold) along the dotted line trades misses o fg false alarms.  Optimal operating point is application dependent.  Low false alarm rates are desirable for most applications. Christian M ü ller

Decision Cost Function C(E T , E N ) = C Miss · P Target · P Miss (E T ) + C FA · (1-P Target ) · P FA (E T ,E N ) where E T and E N are the target and non-target events, and C Miss , C FA and P Target are application model parameters. The application parameters for EER are: C Miss = C FA = 1 and P Target = 0.5  Weighted sum of misses and false alarms using variable costs and priors.  Application model parameters are selected according to the application. Christian M ü ller

Example DET-Plot miss probability false alarm probability Christian Müller, Joan-Isaac Biel, Edward Kim, and Daniel Rosario, “Speech-overlapped Acoustic Event Detection for Automotive Applications,” in Proceedings of the Interspeech 2008 , Brisbane, Australia, 2008. Christian M ü ller

? classification user model speech = sensor adapts its dialog - PowerPoint PPT Presentation

Speaker Classification: Supervector Approach and Detection Task Christian Mller, DFKI Speech as a Source for Non-Intrusive UM Now its time to get to gate 38. Information about adaptive the user speech dialog system A speaker ?

Speech Processing 15-492/18-492 Spoken Dialog Systems Advanced Concepts in Dialog Spoken Dialog

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Spoken Dialog Systems SDS

Speech Processing 11-492/18-495 Speech Processing 11-492/18-495 Spoken Dialog Systems Conversing

Advanced NLU & Dialog Models Ling575 Spoken Dialog Systems April 21, 2016 Roadmap

Dialog Models 11-716 September 18, 2003 Thomas Harris What is a (dialog) model? A model is

AI DIALOG SEARCH news services Josef Krupi ka Michal Svoboda Goals dialog system

Dialog Management EE596/LING580 -- Conversational Artificial Intelligence Hao Cheng University

SDS: ASR, NLU, & VXML Ling575 Spoken Dialog April 14, 2016 Roadmap Dialog System

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Speech Processing 15-492/18-492 Spoken Dialog Systems - Details of Olympus modules - Dialog Task

Speech Processing 15-492/18-492 Spoken Dialog Systems Conversing with machines Spoken Dialog

Speech Processing 15-492/18-492 Spoken Dialog Systems SDS components Spoken Dialog Systems More

Speech Processing 15-492/18-492 Spoken Dialog Systems Tree based dialogs VoiceXML State-based

Wrapping Up Ling575 Spoken Dialog Systems June 5, 2013 Roadmap Overview Distinctive

Speech and Language CS 188: Artificial Intelligence Speech technologies Automatic

Pr rtt

Long-Term Formant Long-Term Formant Distribution as a forensic- phonetic feature phonetic

Hagen Telg Allison McComiskey Elisabeth Andrews Gary Hodges Don Collins Thomas Watson May 23,

End-Users Group Meeting Berlin 21th of February 2008 3D Face Prototype Integration Page 1 /

Numerical methods for inertial confinement fusion Xavier Blanc blanc@ann.jussieu.fr CEA, DAM,

Cybersecurity: Contractual guidelines and other recommendations to maximise the legal security

TI PROGRESS REPORT: ENFORCEMENT OF THE OECD CONVENTION ON COMBATING BRIBERY OF FOREIGN PUBLIC

MySQL Developments Narayan Newton Lead Sysadmin Drupal.org Performance Engineer Tag1 Consulting

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

? classification user model speech = sensor adapts its dialog - PowerPoint PPT Presentation

Speaker Classification: Supervector Approach and Detection Task Christian Mller, DFKI Speech as a Source for Non-Intrusive UM Now its time to get to gate 38. Information about adaptive the user speech dialog system A speaker ?

Speech Processing 15-492/18-492 Spoken Dialog Systems Advanced Concepts in Dialog Spoken Dialog

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Spoken Dialog Systems SDS

Speech Processing 11-492/18-495 Speech Processing 11-492/18-495 Spoken Dialog Systems Conversing

Advanced NLU &amp; Dialog Models Ling575 Spoken Dialog Systems April 21, 2016 Roadmap

Dialog Models 11-716 September 18, 2003 Thomas Harris What is a (dialog) model? A model is

AI DIALOG SEARCH news services Josef Krupi ka Michal Svoboda Goals dialog system

Dialog Management EE596/LING580 -- Conversational Artificial Intelligence Hao Cheng University

SDS: ASR, NLU, &amp; VXML Ling575 Spoken Dialog April 14, 2016 Roadmap Dialog System

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Speech Processing 15-492/18-492 Spoken Dialog Systems - Details of Olympus modules - Dialog Task

Speech Processing 15-492/18-492 Spoken Dialog Systems Conversing with machines Spoken Dialog

Speech Processing 15-492/18-492 Spoken Dialog Systems SDS components Spoken Dialog Systems More

Speech Processing 15-492/18-492 Spoken Dialog Systems Tree based dialogs VoiceXML State-based

Wrapping Up Ling575 Spoken Dialog Systems June 5, 2013 Roadmap Overview Distinctive

Speech and Language CS 188: Artificial Intelligence Speech technologies Automatic

Pr rtt

Long-Term Formant Long-Term Formant Distribution as a forensic- phonetic feature phonetic

Hagen Telg Allison McComiskey Elisabeth Andrews Gary Hodges Don Collins Thomas Watson May 23,

End-Users Group Meeting Berlin 21th of February 2008 3D Face Prototype Integration Page 1 /

Numerical methods for inertial confinement fusion Xavier Blanc blanc@ann.jussieu.fr CEA, DAM,

Cybersecurity: Contractual guidelines and other recommendations to maximise the legal security

TI PROGRESS REPORT: ENFORCEMENT OF THE OECD CONVENTION ON COMBATING BRIBERY OF FOREIGN PUBLIC

MySQL Developments Narayan Newton Lead Sysadmin Drupal.org Performance Engineer Tag1 Consulting

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Advanced NLU & Dialog Models Ling575 Spoken Dialog Systems April 21, 2016 Roadmap

SDS: ASR, NLU, & VXML Ling575 Spoken Dialog April 14, 2016 Roadmap Dialog System