Session: Speaker State, Personality, Entrainment NSF Workshop May 8-9, 2015
My Lab Projects • I work on speech generation and analysis, focusing on the role of prosody with 7 PhD students and many MS and Ugrads – Low Resource Languages (IARPA Babel): Language ID, augmenting LMs from web data, TTS – Deceptive Speech Across Cultures (AFOSR): English and Chinese, personality factors – Code Switching (NSF): language mixing among bilinguals – Text-to-Scene Generation (WordsEye startup):demo – Tools for Endangered Languages: via WordEye – Hedging behavior (DARPA DEFT): English, Spanish and Chinese – Emotional Text and Speech (DARPA Lorelei) – Interdisciplinary work with History and Latin American Studies – Spoken Dialogue Syst ems (DARPA BOLT) – Entrainment in Text and Speech (NSF, AFOSR): English, Chinese, Slovak, Spanish : the tendency of speakers to begin speaking and behaving like one another as they talk
Entrainment: Impact • Entrainment: in speech pronunciation, acoustic and prosodic features, turn-taking behaviors , gesture, facial expression, posture, … • Proof of concept systems that rely upon users entraining lexically to systems to aid ASR • Proof of concept systems that entrain to users showing improvement in task success, trust, likeability – CMU – Stonybrook – Columbia • Interest from Apple and Nuance but no research
Challenges and Needs • Needs: – More objective evidence of the social value of entrainment in speech technology to convince industry – Improved techniques for identifying individual speech characteristics and entraining to them in System behavior in real time – Experiments with real, working systems
Needs • There’s plenty of data: Any conversational corpus can be examined for evidence of speech entrainment but • We need: – More comparisons of entrainment in different cultures in similar circumstances – Better ways to measure entrainment, especially multi-party (+2) entrainment – Better ways to assess and generate entrained speech
Other Topics NSF Might Fund • Text-to-Speech Synthesis : Parametric approaches make it possible to make huge strides in modeling prosody and emotion • Code-Switching : huge problem for all speech technologies (ASR, MT, syntactic and semantic tagging)
Recommend
More recommend