Speech Encoder Importance of body language 2 Why data-driven? - PowerPoint PPT Presentation

Feb 07, 2023 •105 likes •401 views

Speech Encoder Importance of body language 2 Why data-driven? Yoon et al. "Robots Learn Social Skills: End-to-End Learning of Co- Cassell et al. "BEAT: the Behavior Expression Speech Gesture Generation for Humanoid Robots." In

Speech Encoder
Importance of body language 2
Why data-driven? Yoon et al. "Robots Learn Social Skills: End-to-End Learning of Co- Cassell et al. "BEAT: the Behavior Expression Speech Gesture Generation for Humanoid Robots." In ICRA. 2019 Animation Toolkit" In SIGGRAPH, 2001. ✔ Scalability ✔ Adaptability ✔ Variability 3
Speech-driven gesture generation ? 4
Related work  Hybrid between data-driven and rule-based approaches  Based on PGM with an additional hidden node for a constraint  Evaluate 3 hand gestures and 2 head motions.  Do smoothing afterwards Sadoughi et al. "Speech-driven animation with meaningful behaviors." Speech Communication 110. 2019 5
Related work  From speech to 3D motion  Deep-learning based approach  Applied a lot of smoothing as post-processing Hasegawa et al. "Evaluation of Speech-to-Gesture Generation Using Bi-Directional LSTM Network." In IVA’18. ACM. 2018. 6
Contributions 1. A novel speech-driven method for non-verbal behavior generation that can be applied to any embodiment. 2. Evaluation of the importance of representation both for the motion and for the speech 7
General framework 8
Our baseline model Hasegawa, Dai, Naoshi Kaneko, Shinichi Shirakawa, Hiroshi Sakuta, and Kazuhiko Sumi. "Evaluation of Speech-to-Gesture Generation Using Bi-Directional LSTM Network." In Proceedings of the 18th International Conference on Intelligent Virtual Agents. ACM, pp. 79-86. 2018. 9
Proposed method Step 1 10
Proposed method Step 2 11
Proposed method Step 3 12
Proposed method 13
Experimental results 14
Dataset used  Japanese language  171 min of speech and 3D motion  Speech in mp3 format  Motion in bvh format Takeuchi et al. "Creating a gesture-speech dataset for speech-based automatic gesture generation." In HCII. 2017. 15
Dimensionality choice Original dim. was 384 16
Input feature analysis 17
Histogram for wrists joints 18
User study measures All were evaluated in the Likert scale from 1 to 7 19
User study results * 19 participants with 10 videos x 9 questions x 2 conditions = 180 ratings each 20
Visual comparison No smoothing was applied 21
Visual comparison No smoothing was applied 22
Conclusion 23
The team 24 24
Questions?
Related work  DNN + CRF = DCNF  Virtual character  Discrete set of motions Chung-Cheng Chiu, Louis-Philippe Morency, and Stacy Marsella. Predicting co-verbal gestures: a deep and temporal modeling approach. International Conference on Intelligent Virtual Agents. Springer, Cham, 2015. 27
Human-robot communication Speech Body language Speech Body language https://www.ald.softbankrobotics.com 28

Recommend

T Levels/Skills Plan Body Copy Body Copy Body Copy Body Copy Body Copy Body Copy Body Copy Body

HERTFORD REGIONAL COLLEGE HEADER Body Copy Body Copy Body Copy Body Copy Body Copy Body Copy Body Copy Body Copy Body Copy Body Copy Body Body Copy Body Copy Body Copy Body Copy Body Copy Body Copy Body Copy Body Copy Body Copy Body Copy Body

515 views • 17 slides

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs Text Speech vs Text Same but different Same but different Core Speech Technologies Core Speech Technologies Speech Recognition Speech

705 views • 38 slides

Exercise 2: Encoder / Decoder Framework Goals : Implement basic framework for encoder and decoder

Exercise 2: Encoder / Decoder Framework Goals : Implement basic framework for encoder and decoder (without any real data processing) Implement block-wise processing of samples Encoder Operation Read image in PGM format Write data to bitstream

153 views • 3 slides

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone Sequence To Speech Articulatory Approaches Concatenative Approaches HMM-based Approaches Rule-Based Approaches 1 Speech Synthesis Concept

749 views • 57 slides

personal heating and cooling for people with paraplegia body temperature regulation Body gets

personal heating and cooling for people with paraplegia body temperature regulation Body gets cold Body gets hot Person starts to Person starts to shiver sweat Body warms up Body cools down body temperature regulation Body gets cold Body

671 views • 53 slides

Speech and Language CS 188: Artificial Intelligence Spring 2011 Speech technologies

Speech and Language CS 188: Artificial Intelligence Spring 2011 Speech technologies Automatic speech recognition (ASR) Text-to-speech synthesis (TTS) Dialog systems Language processing technologies Speech and Language

206 views • 5 slides

UN13750 Programmable Encoder/Decoder Single chip contains both Encoder and Decoder. Schmitt

UN13750 Programmable Encoder/Decoder Single chip contains both Encoder and Decoder. Schmitt Trigger input provides excellent noise immunity. 3V to 11 V operation. Applications. alarm control system, security system On chip oscillator uses

620 views • 6 slides

Hybrid Sequence Encoder Of Collaborative Experts For Video Retrieval Kaixu Cui, Hui Liu, Cheng

Hybrid Sequence Encoder Of Collaborative Experts For Video Retrieval Kaixu Cui, Hui Liu, Cheng Wang, Yudong Jiang Introduction 1. Hybrid Sequence Encoder 2. Datasets Fusion 3. Caption Ensemble Hybrid Sequence Encoder 1. Dual Encoding [Dong

130 views • 9 slides

Speech and Language CS 188: Artificial Intelligence Speech technologies Automatic

Speech and Language CS 188: Artificial Intelligence Speech technologies Automatic speech recognition (ASR) Text-to-speech synthesis (TTS) Dialog systems Language processing technologies Lecture 18: Speech

193 views • 3 slides

Michael Faraday James Clerk Maxwell James Clerk Maxwell Gin a body meet a body Gin a body meet a

Michael Faraday James Clerk Maxwell James Clerk Maxwell Gin a body meet a body Gin a body meet a body Flyin' through the air. Gin a body hit a body, Will it fly? And where?

155 views • 4 slides

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and

Part-of-Speech Tagging Part-of-Speech Tagging Berlin Chen 2003 References: 1. Speech and Language Processing, chapter 8 2. Foundations of Statistical Natural Language Processing, chapter 10 1 Review Tagging (part-of-speech tagging)

671 views • 38 slides

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented by Jen-Wei Kuo Reference 1. X. Huang et. al., Spoken Language Processing, Chapter 8 2. Daniel Jurafsky and James H. Martin, Speech and Language

1.05k views • 65 slides

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

What Is Speech Recognition? EECS E6870 converting speech to text Speech Recognition automatic speech recognition (ASR), speech-to-text (STT) what its not Michael Picheny,

345 views • 22 slides

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis Evaluating Speech Synthesis How good is the voice? How good is the voice? This voice is a 45.67 This voice is a

463 views • 24 slides

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis From text to speech From text to speech Text Analysis Text Analysis Strings of characters to words Strings of characters to words

667 views • 25 slides

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody Speech Synthesis Speech Synthesis Linguistic Analysis Linguistic Analysis Pronunciations Pronunciations Prosody Prosody

420 views • 24 slides

Course Info Instructor: Pascal Poupart Email: cs486@students.cs.uwaterloo.ca CS 486/686

Course Info Instructor: Pascal Poupart Email: cs486@students.cs.uwaterloo.ca CS 486/686 Office Hours: TBA (watch Web page), by appt. Artificial Intelligence Lectures: Tue & Thu Sect. 1: 08:30-09:50 (RCH306) May 3rd,

436 views • 5 slides

MANAGEMENT SOLUTION SERIES: EMOTIONAL INTELLIGENCE IN PRACTICE SESSION 1 INTRODUCTIONS AND

4/03/2020 WORKSHOP MANAGEMENT SOLUTION SERIES: EMOTIONAL INTELLIGENCE IN PRACTICE SESSION 1 INTRODUCTIONS AND EXPECTATIONS 1 4/03/2020 LEARNING OUTCOMES The course provides the opportunity to: Define emotional intelligence and its

598 views • 43 slides

Leadership in Psychology: Harnessing transferable skills to transform your career Zarina Giannone,

Leadership in Psychology: Harnessing transferable skills to transform your career Zarina Giannone, Andrea Piotrowski, Michelle Guzman- Ratko, & Amanda OBrien Offered by the CPA Section for Students in Psychology, 2016 Todays Presenters:

476 views • 9 slides

Neural-Symbolic Systems for Human-like Computing Artur dAvila Garcez City, University of

Dagstuhl seminar 17192 8 May 2017 Neural-Symbolic Systems for Human-like Computing Artur dAvila Garcez City, University of London a.garcez@city.ac.uk Neural-Symbolic Systems Cognitive Science Logic Learning Neural Computation

501 views • 28 slides

Application of Supply Chain Concepts to the Analysis Process DO5 WRM Seminar September 9, 2015

Application of Supply Chain Concepts to the Analysis Process DO5 WRM Seminar September 9, 2015 Poulton Innovation Center Rob Handfield, PhD Bank of America University Distinguished Professor of Supply Chain Management Executive Director,

691 views • 25 slides

Overview Last Time Sequence Labeling Dynamic programming Viterbi algorithm Forward

University of Oslo : Department of Informatics INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Context-Free Grammars & Parsing Stephan Oepen & Murhaf Fares Language Technology Group (LTG) October 25,

413 views • 29 slides

Ontologies For Baby Animals and Robots. Aaron Sloman School of Computer Science, University of

Presentation at Brown Univ 10 Jun 2009 Ontologies For Baby Animals and Robots. Aaron Sloman School of Computer Science, University of Birmingham http://www.cs.bham.ac.uk/ axs/ These PDF slides are available in my talks directory:

945 views • 55 slides

IMPROVING ENGLISH SPEAKING SKILLS B2 (CEFR) FOR EFL STUDENTS BY USING MULTIPLE INTELLIGENCES

Research IMPROVING ENGLISH SPEAKING SKILLS B2 (CEFR) FOR EFL STUDENTS BY USING MULTIPLE INTELLIGENCES ACTIVITIES Presenter: Chau Van Don Phu Yen University Tel: 0914 072 000 Email: chaudondhpy@gmail.com Intrapersonal intelligence

369 views • 16 slides