computer supported human human multilingual communication
play

Computer Supported Human-Human Multilingual Communication February - PowerPoint PPT Presentation

Computer Supported Human-Human Multilingual Communication February 29, 2008 Alex Waibel International Center for Advanced Communication Technologies Carnegie Mellon University University of Karlsruhe http://www.interact.cs.cmu.edu Classical


  1. Computer Supported Human-Human Multilingual Communication February 29, 2008 Alex Waibel International Center for Advanced Communication Technologies Carnegie Mellon University University of Karlsruhe http://www.interact.cs.cmu.edu

  2. Classical Human-Computer Interaction Computer Human

  3. Present Human-Computer Interaction

  4. Classical Human-Computer Interaction Computer Human

  5. New Roles for Humans and Computers Datasource Computer Human Human

  6. Human-Human Interaction

  7. Humans Interacting With Humans

  8. Human-Human Interaction Support • CHIL – Computer in the Human Interaction Loop – Rather than Humans in the Computer Loop – Explicit Computing Complemented by Implicit Support • Implicit Computing Services – Support Human-Human Interaction Implicitly – Increasingly Powerful Computing Services – Implicit Services Observe Context and Understanding – Reduction in Attention to Technological Artifact, � Increased Productivity – Computer Learns from Human Activity Implicitly

  9. Project CHIL • Integrated Project (IP) in 6 th Framework Program of the EC – One of three IP’s in the first call Multimodal/Multilingual: • International Consortium : – 15 Partners from 9 countries in Europe (12) and the US (3) • Budget – CHIL: 25 Million Euro Cost Volume for three Years • Other Projects: – Integrated Projects: AMI, TC-STAR – DARPA: CALO

  10. The CHIL Project Coordination: – Scientific Coordinator: Univ. Karlsruhe, Prof. A. Waibel, R. Stiefelhagen – Financial Coordinator: Fraunhofer IITB, Prof. Steusloff, K. Watson The CHIL Team: Universit Universitä ät t Karlsruhe Karlsruhe (TH) (TH) Logo Logo Logo

  11. Examples of Human-Human Communication Problems Requiring Computer Support

  12. Phone Calls During Meetings

  13. Phone Calls During Meetings

  14. name? …Where did I meet him? …What did we discuss last ….What was his time? Memory Jog

  15. ….what is he saying? 你们的评估准则是什么 Language Support

  16. Objekt Situation Human Robot Interaction SFB 588 Humanoid Robots

  17. Interpreting Human Communication “Why did Joe get angry at Bob Why did Joe get angry at Bob about the budget ? about the budget ?” ” “ Need Recognition and Understanding of Multimodal Cues Need Recognition and Understanding of Multimodal Cues • Verbal: • Visual – Speech – Identity • Words – Gestures • Speakers • Emotion – Body-language • Genre – Track Face, Gaze, Pose – Language – Facial Expressions – Summaries – Focus of Attention – Topic – Handwriting We need to understand the: Who, What, Where, Why and How !

  18. Sensors in the CHIL Room Microphone Pan-Tilt-Zoom Array Camera (64 channels) Camera (fixed) Ceiling Mounted Microphone Fish-Eye Camera Array for Source- Localization (4 channels) Stereo-Camera Screen

  19. Describing Human Activities

  20. Describing Human Activities x

  21. Technologies/Functionalities What is he Who is this? pointing What does he to? say? To whom does he Where is he speak? going to? x What is his environment? Where is he?

  22. Technologies & Fusion • Who & Where ? • What ? (Output) – Audio-Visual Person Tracking – Animated Social Agents – Tracking Hands and Faces – Steerable targeted Sound – AV Person Identification – Q&A Systems – Head Pose / Focus of Attention – Summarization – Pointing Gestures – Audio Activity Detection • Why & How ? – Classification of Activities • What ? (Input) – Emotion Recognition – Far-field Speech Recognition – Interaction & Context – Far-field Audio-Visual Speech Modelling Recognition – Acoustic Event Classification – Vision-based posture recognition – Topical Segmentation

  23. Special New Challenges & Opportunities • Require: Performance, Robustness, Realism – Distant, Remote Microphones – Hands-Free, Always On � Segmentation – Sloppy Speech – Cross-Talk – Noise – Disfluencies, Prosody, Structuring Discourse – Communication by Other Modalities – Other Elements of Speech (Emotion, Direction, Scene Analysis – Multimodal People ID – Free People Movement – Focus of Attention and Direction – Named Entities, OOV’s – Adaptation and Evolution – Summarization • Now rapid Progress by Way of Competitive Evaluations

  24. Evaluation: International Effort • NIST and EC Programs Join Forces – RT-Meeting’06 – Rich Transcription • Emerges from established DARPA activity • MLMI Workshops, AMI/CHIL • Evaluated Verbal Content Extraction • Chair: Garofolo (NIST) – CLEAR’06, ’07.. – Classification of Locations, Events, Activities, Relationships • Emerging from European program efforts (CHIL, etc.) and US-Programs (VACE,..) • First Joint Workshop to be Held in Europe after Face & Gesture Reco WS, April 13 & 14, Southampton • Chair: Stiefelhagen (UKA)

  25. Technologies Localization Identification Localization Identification Tracking & Gesture Tracking & Gesture Focus of Attention Focus of Attention

  26. Fusion, Integration, PID

  27. Activity Analysis

  28. Hearing Personal Translations • Technology: Targeted Audio – Research under EC Project CHIL (Build Inobtrusive Computer Services) – Project Partner, Daimler-Chrysler – Array of Ultra-Sound Speakers • Result: Narrow Sound Beam – Audible by one Individual Only – Others not Disturbed – Multiple Arrays Could Provide Multiple Languages – Steerable – Recognize/Track Individual Listener and Keep Language Beam on Target

  29. Seeing Personal Translations • Technology: Heads-up Display Goggles – Create Translation Goggles – Run Real-Time Simultaneous Translation of Speech – Text is Projected into Field of View of Listener – Translations are Seen as Text Captions Under Speaker – Output: Spanish, German,…

  30. Silent Speech based on EMG Signals

  31. Human-Human Support Services – Connector • Connects people through the right device at the right moment – Meeting Browser • Create Corporate Memory of Events – Memory Jog • Unobtrusive service. Helps meeting attendees with information • Provides pertinent information at the right time (proactive/reactive) • Lecture Tracking and Memory – Relational Report • Informs the current speaker about interest/boredom of audience • Coaches Meetings to be More Effective – Socially Supportive Workspaces • Physically shared infrastructure aimed at fostering collaboration – Cross-Lingual Communication Services • Detect Language Need and Deliver Services Inobtrusively – … (and more)

  32. Multilingual Communication

  33. Motivation • Dilemma: – Living in the Global Village • Globalization, Global Markets • Increased Exchange and Communication • European Integration – Cultural Diversity: • Beauty, Identity, Language, Culture, Customs • Pride and Individualism – Challenge: • Providing Access to Global Markets and Opportunities �� Maintaining Cultural Diversity • Can Technology Provide Solutions?

  34. The Grand Challenge • A World without Linguistic Borders • Dimensions of the Problem: – Overcoming Performance Limitations • Noise, Errors, Disfluencies – Expanding Domains and Scope • Hotel Reservation � Broadcast News, Lectures – Providing Suitable Access and Delivery • Mobile or Stationary Use • Modality � Speech, Image, • Natural Interaction � Human Factors/Devices – The Portability Problem • DARPA: 3 Languages • InterACT: 20 Languages • Speech and Language Companies: <40 Languages • Total World Languages: ~6,000

  35. Fieldeable Domain Limited Speech Translation Fieldable Systems: PDA Speech Translators – Tourism • Conferences • Business • Olympics – Humanitarian • Refugee Registration • First Responder • Healthcare – USA, Latino Population – Europe, Expansion – Third World – Government • Peace Keeping, Police

  36. Image Translation Pocket Translator of Foreign Signs (Mobile Technologies, LLC Pittsburgh)

  37. Missing Science Problem 1: Domain Limitation cannot handle: – TV/Radio Broadcast Translation – Translation of Lectures and Speeches – Parliamentary Speeches (UN, EU,..) – Telephone Conversations – Meeting Translation 你们的评估准则是什么

  38. ….what is he saying? 你们的评估准则是什么 Language Support

  39. Translation of Speeches

  40. Translation of Speeches • Technical Challenges: – Open Domain, Open Vocab, Open Speaking Style – No Sentence Markers/Boundaries – Too Complex to Program Rules – Reasonable Speaking Style, Prepared Speeches, Reasonable Acoustics • How it is Done: – Statistical Learning Algorithms – Learn Speech and Translation Mappings from Large Example Corpora

  41. Progress TC-STAR 60 50 40 BLEU 30 20 10 EPPS S2E CORTES S2E EPPS E2S 0 2004 2005 2006 2007 Year Speech Recognition [WER] Machine Translation [Bleue]

  42. Human vs. Machine Performance

  43. Translation of Lectures

Recommend


More recommend