speaker state
play

Speaker State Elizabeth Shriberg Speech Technology and Research Lab - PowerPoint PPT Presentation

Speaker State Elizabeth Shriberg Speech Technology and Research Lab SRI International, Menlo Park, CA May 7-8, 2015 NSF Speech Science Workshop Overview Umbrella term covering variations within an individual Emotional Cognitive


  1. Speaker State Elizabeth Shriberg Speech Technology and Research Lab SRI International, Menlo Park, CA May 7-8, 2015 NSF Speech Science Workshop

  2. Overview • Umbrella term covering variations within an individual – Emotional – Cognitive (uncertainty, engagement) – Health (stress, fatigue, Parkinson’s…) – Mental health (depression, PTSD, MCI, mTBI) – Social, pragmatic (engagement, entrainment) • Synergy with some of the other talks here: Anton, Tom, Julia, Florian …. • Standard approach – Annotate data  “gold standard” – Extract features from speech (words, acoustic, prosodic, discourse) – Machine learning to predict annotations – Range of metrics for evaluation • Funding: some govt, some commercial; but limited • Interest from industry – e.g. call centers, but largely ASR based and data is often proprietary. May 7-8, 2015 NSF Speech Science Workshop

  3. Impact for Speech Technology 1. Detection of state from speech – For adaptation / action of system / filtering – For monitoring / filtering – Massively applicable, including for passive speech, especially with increases in mobile phone use and apps – Growing interest in industry in emotion, but speech content analysis is generally behind that of text and video 2. Improvement of speech recognition (via modeling of context for better train/test data matching) May 7-8, 2015 NSF Speech Science Workshop

  4. Challenges • Major effects of speaker, context, semantics but almost no understanding of effects • Hundreds of papers/year, but we start over with each data set • Small data sets • Annotation issues – validity, reliability, unit of analysis • Common evaluations — have been great service to community but focus has been on large feature sets + deep learning  we’re adding layers, not understanding • Feature sets biased toward those available from ASR • Metrics and evaluation • Sensitive data sets can’t be shared May 7-8, 2015 NSF Speech Science Workshop

  5. Future Directions • Core pursuits – Understand how to decouple effects of speaker from state, and context – Go smaller, not bigger. What’s the minimum feature set and what can we learn from it? – Value generalization across data sets – learn what features and approaches transfer to new data – Explore robustness in real world data – studies often assume better audio than we can get in real applications – Understand role of lexical, visual, physiological information – increasingly available and need to understand where speech offers added value • Needs – Invest in longitudinal data with real-world spontaneous speech – Add spontaneous collection to studies in medical community – Community focus on annotations and meaningful metrics – working group support if no government evaluations – User studies that involve real-world end applications May 7-8, 2015 NSF Speech Science Workshop

Recommend


More recommend