speech processing 15 492 18 492
play

Speech Processing 15-492/18-492 Spoken Dialog Systems Conversing - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Spoken Dialog Systems Conversing with machines Spoken Dialog Systems Not just ASR bolted onto TTS Not just ASR bolted onto TTS Different styles of interaction Different styles of interaction


  1. Speech Processing 15-492/18-492 Spoken Dialog Systems Conversing with machines

  2. Spoken Dialog Systems Not just ASR bolted onto TTS � Not just ASR bolted onto TTS � Different styles of interaction � Different styles of interaction � � IVR/Tree question/response systems IVR/Tree question/response systems � � Mixed initiative systems Mixed initiative systems � � “How May I Help You?” open questions “How May I Help You?” open questions � � True conversational machine True conversational machine- -human interaction human interaction � � Strings of characters to words Strings of characters to words �

  3. SDS Overview Introduction � Introduction � Building simple dialog systems � Building simple dialog systems � VoiceXML � VoiceXML � � A language for writing systems A language for writing systems � Beyond tree- -based systems based systems � Beyond tree � � CMU’s Olympus systems CMU’s Olympus systems � Real- -world deployment considerations world deployment considerations � Real �

  4. SDS Applications � Information giving Information giving � � Flights, buses, stocks weather Flights, buses, stocks weather � � Driving directions Driving directions � � News News � � Information navigators Information navigators � � Read your mail Read your mail � � Search the web Search the web � � Answer questions Answer questions � � Provide personalities Provide personalities � � Game characters (NPC), toys, robots Game characters (NPC), toys, robots � � Speech Speech- -to to- -speech translation speech translation � � Cross Cross- -lingual interaction lingual interaction �

  5. Dialog Types � System initiative System initiative � � Form Form- -filling paradigm filling paradigm � � Can switch language models at each turn Can switch language models at each turn � � Can “know” which is likely to be said Can “know” which is likely to be said � � Mixed initiative Mixed initiative � � Users can go where they like Users can go where they like � � System or user can lead the discussion System or user can lead the discussion � � Classifying: Classifying: � � Users can say what they like Users can say what they like � � But really only “N” operations possible But really only “N” operations possible � � E.g. AT&T? “How may I help you?” E.g. AT&T? “How may I help you?” �

  6. System Initiative � Most common Most common � Machine controls the call Machine controls the call � � Few choices in the dialog Few choices in the dialog � � � Simple form filling: Simple form filling: � What is your bank account number What is your bank account number � � � Advantages: Advantages: � You know what users will say (sort of) You know what users will say (sort of) � � Hard for user to get confused Hard for user to get confused � � Hard for system to get confused Hard for system to get confused � � Easy to build Easy to build � � � Disadvantages: Disadvantages: � Limited flexibility in interaction Limited flexibility in interaction � � Fixed dialog structure Fixed dialog structure � � � Most reliable, but many turns Most reliable, but many turns �

  7. System Initiative Let’s Go Bus Information � Let’s Go Bus Information � � 412 442 2000 (Evenings) 412 442 2000 (Evenings) � � Provides bus information for Pittsburgh East Provides bus information for Pittsburgh East � End (61x 5[469]x) End (61x 5[469]x) Tell Me � Tell Me � � Company getting others to build systems Company getting others to build systems � � Stocks, weather, entertainment Stocks, weather, entertainment � � 1 800 555 8355 1 800 555 8355 �

  8. Mixed Initiative � User or system takes initiative User or system takes initiative � � More interesting dialogs More interesting dialogs � � “jump” through different parts of dialog state “jump” through different parts of dialog state � � Advantages Advantages � � More realistic dialog More realistic dialog � � Can do more complex tasks Can do more complex tasks � � Disadvantages Disadvantages � � Can get confusing Can get confusing � � Can miss important parts Can miss important parts �

  9. Vera

  10. Classification Dialogs � Sort out from N things Sort out from N things � � User says “anything” and system directs them User says “anything” and system directs them � � Receptionist Receptionist �  I have a problem with my bill I have a problem with my bill   What’s the area code for Miami What’s the area code for Miami   Did you know I can see the beach from here Did you know I can see the beach from here  � Advantages Advantages � � (Apparently) complex understanding (Apparently) complex understanding � � Solves a very common task Solves a very common task � � Disadvantages Disadvantages � � Actually quite restrictive Actually quite restrictive � � Needs data to train from Needs data to train from � � Needs to be updated Needs to be updated �

  11. Beyond Telephones Telematics � Telematics � � Voice communication in cars Voice communication in cars � � CPS, music selection etc CPS, music selection etc � Robot Interaction � Robot Interaction � � Robot Robot- -robot and robot robot and robot- -human interaction human interaction � Animated talking head � Animated talking head � � Non Non- -player characters player characters – – web agents web agents � Speech to Speech translation � Speech to Speech translation �

  12. Team Talk Using speech to control multiple robots � Using speech to control multiple robots � � Robots have names and distinct voices Robots have names and distinct voices � � They report to each other and to you in voice They report to each other and to you in voice �

  13. USI Lots of different interfaces is confusing � Lots of different interfaces is confusing � � Try to have general expectations and discover Try to have general expectations and discover � Try for some level of standardization � Try for some level of standardization � � (like programming applications: file menu) (like programming applications: file menu) �

  14. True conversation Requires mores than just speech � Requires mores than just speech � � Non Non- -verbal noises: laughing, verbal noises: laughing, er er, um, etc , um, etc � � Eye gaze Eye gaze � � Proper timing (not waiting 500ms before Proper timing (not waiting 500ms before � speaker) speaker) � Back Back- -channeling channeling � � Movement Movement � � Talking about nothing Talking about nothing �

  15. Roboreceptionist Entrance to NSH � Entrance to NSH � � Keyboard (no ASR) Keyboard (no ASR) � � TTS, face, movement TTS, face, movement � � Range finder to detect people Range finder to detect people � � Significant background Significant background � character character Mostly talks about nothing � Mostly talks about nothing �

Recommend


More recommend