speech processing 11 492 18 495 speech processing 11 492
play

Speech Processing 11-492/18-495 Speech Processing 11-492/18-495 - PowerPoint PPT Presentation

Speech Processing 11-492/18-495 Speech Processing 11-492/18-495 Spoken Dialog Systems Conversing with machines Spoken Dialog Systems Spoken Dialog Systems Not just ASR bolted onto TTS Not just ASR bolted onto TTS Different styles of


  1. Speech Processing 11-492/18-495 Speech Processing 11-492/18-495 Spoken Dialog Systems Conversing with machines

  2. Spoken Dialog Systems Spoken Dialog Systems  Not just ASR bolted onto TTS Not just ASR bolted onto TTS  Different styles of interaction Different styles of interaction  Question/response systems Question/response systems  Mixed initiative systems Mixed initiative systems  “ “How May I Help You?” open questions How May I Help You?” open questions  True conversational machine-human interaction True conversational machine-human interaction

  3. SDS Overview SDS Overview  Introduction Introduction  Building simple dialog systems Building simple dialog systems  VoiceXML VoiceXML  A language for writing systems A language for writing systems  Beyond tree-based systems Beyond tree-based systems  Beyond spoken language Beyond spoken language  Non-task-oriented systems Non-task-oriented systems  Real-world deployment considerations Real-world deployment considerations

  4. SDS Applications SDS Applications  Information giving/request Information giving/request  Flights, buses, stocks and weather Flights, buses, stocks and weather  Driving directions Driving directions  Answer questions, news Answer questions, news  Transactional Transactional  Reply your email Reply your email  Credit card and bank enquiries, product purchase Credit card and bank enquiries, product purchase  Maintenance Maintenance  Technical support Technical support  Customer service Customer service

  5. SDS Applications SDS Applications  Entertainment Entertainment  Game characters (NPC), toys, robots Game characters (NPC), toys, robots  Tutoring Tutoring  Math, science Math, science  Language learning Language learning  Health care Health care  Depression screening Depression screening  Aphasia therapy Aphasia therapy

  6. Dialog Types Dialog Types  System initiative System initiative  Form-filling paradigm Form-filling paradigm  Can switch language models at each turn Can switch language models at each turn  Can “know” which is likely to be said Can “know” which is likely to be said  Mixed initiative Mixed initiative  Users can go where they like Users can go where they like  System or user can lead the discussion System or user can lead the discussion  Classifying: Classifying:  Users can say what they like Users can say what they like  But really only “N” operations possible But really only “N” operations possible  E.g. AT&T? “How may I help you?” E.g. AT&T? “How may I help you?”

  7. System Initiative System Initiative  Most common Most common  Machine controls the call Machine controls the call  Few choices in the dialog Few choices in the dialog  Simple form filling: Simple form filling:  What is your bank account number What is your bank account number  Advantages: Advantages:  You know what users will say (sort of) You know what users will say (sort of)  Hard for user to get confused Hard for user to get confused  Hard for system to get confused Hard for system to get confused  Easy to build Easy to build  Disadvantages: Disadvantages:  Limited flexibility in interaction Limited flexibility in interaction  Fixed dialog structure Fixed dialog structure  Most reliable, but many turns Most reliable, but many turns

  8. System Initiative System Initiative  Let’s Go Bus Information Let’s Go Bus Information  412 268 3526 (Anytime) 412 268 3526 (Anytime)  Provides bus information for Pittsburgh Provides bus information for Pittsburgh  Tell Me Tell Me  Company getting others to build systems Company getting others to build systems  Stocks, weather, entertainment Stocks, weather, entertainment

  9. Mixed Initiative Mixed Initiative  User or system takes initiative User or system takes initiative  More interesting dialogs More interesting dialogs  “ “jump” through different parts of dialog state jump” through different parts of dialog state  Advantages Advantages  More realistic dialog More realistic dialog  Can do more complex tasks Can do more complex tasks  Disadvantages Disadvantages  Can get confusing Can get confusing  Can miss important parts Can miss important parts

  10. Classification Dialogs Classification Dialogs  Sort out from N things Sort out from N things  User says “anything” and system directs them User says “anything” and system directs them  Receptionist Receptionist  I have a problem with my bill I have a problem with my bill  What’s the area code for Miami What’s the area code for Miami  Did you know I can see the beach from here Did you know I can see the beach from here  Advantages Advantages  (Apparently) complex understanding (Apparently) complex understanding  Solves a very common task Solves a very common task  Disadvantages Disadvantages  Actually quite restrictive Actually quite restrictive  Needs data to train from Needs data to train from  Needs to be updated Needs to be updated

  11. Beyond Telephones Beyond Telephones  Telematics Telematics  Voice communication in cars Voice communication in cars  CPS, music selection etc CPS, music selection etc  Web-based dialog systems Web-based dialog systems  Robot Interaction Robot Interaction  Robot-robot and robot-human interaction Robot-robot and robot-human interaction  Animated talking head Animated talking head  Non-player characters – web agents Non-player characters – web agents  Speech to Speech translation Speech to Speech translation  CMU Dialport: integrating many dialog CMU Dialport: integrating many dialog systems systems

  12. Team Talk Team Talk  Using speech to control multiple robots Using speech to control multiple robots  Robots have names and distinct voices Robots have names and distinct voices  They report to each other and to you in voice They report to each other and to you in voice

  13. Other SDS Other SDS  Microsoft: Situated Interaction Microsoft: Situated Interaction  Talking Head that follows you Talking Head that follows you  CMU SV: Aidas CMU SV: Aidas  Restaurant recommendations in situ Restaurant recommendations in situ

  14. True conversation True conversation  Requires more than just speech Requires more than just speech  Non-verbal noises: laughing, er, um, etc Non-verbal noises: laughing, er, um, etc  Eye gaze Eye gaze  Proper timing (not waiting 500ms before Proper timing (not waiting 500ms before speaker) speaker)  Back-channeling Back-channeling  Movement Movement  Talking about nothing Talking about nothing

  15. Roboreceptionist Roboreceptionist  Entrance to NSH Entrance to NSH  Keyboard (no ASR) Keyboard (no ASR)  TTS, face, movement TTS, face, movement  Range finder to detect people Range finder to detect people  Significant background Significant background character character  Mostly talks about nothing Mostly talks about nothing

  16. Personal Intelligent Systems Personal Intelligent Systems  Example: Apple Siri, Google Now, Microsoft Example: Apple Siri, Google Now, Microsoft Cortana, Amazon Echo, etc. Cortana, Amazon Echo, etc.  Hub of all applications Hub of all applications  Extendable Extendable  Personalization Personalization  Cross-Language Cross-Language  Cross-Cultural Cross-Cultural  Future: interface-> true companion Future: interface-> true companion

  17. Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Spoken Dialog Systems SDS components

  18. Spoken Dialog Systems Spoken Dialog Systems  More than just ASR and TTS More than just ASR and TTS  Recognition Recognition  Language understanding Language understanding  Manipulation of utterances Manipulation of utterances  Generation of new information Generation of new information  Text generation Text generation  Synthesis Synthesis

  19. SDS Architecture SDS Architecture Language ASR Understanding Dialog Manager Language Understanding Synthesis Non Generation Error Handling Strategies

  20. SDS Internals SDS Internals  Language Understanding Language Understanding  From words to structure From words to structure  Dialog Manager Dialog Manager  State of dialog (who is talking) State of dialog (who is talking)  Direction of dialog (what next) Direction of dialog (what next)  References, user profile etc References, user profile etc  Interaction of database/internet Interaction of database/internet  Language Generation Language Generation  From structure to words From structure to words

  21. Language Understanding Language Understanding  Parsing of SPEECH not TEXT Parsing of SPEECH not TEXT  Eh, I wanna go, wanna go to Boston tomorrow Eh, I wanna go, wanna go to Boston tomorrow  If its not too much trouble I’d be very grateful if If its not too much trouble I’d be very grateful if one might be able to aid me in arranging my one might be able to aid me in arranging my travel arrangements to Boston, Logan airport, at travel arrangements to Boston, Logan airport, at sometime tomorrow morning, thank you. sometime tomorrow morning, thank you.  Boston, tomorrow Boston, tomorrow

Recommend


More recommend