Speech Processing 15-492/18-492 Spoken Dialog Systems Conversing with machines
Spoken Dialog Systems Not just ASR bolted onto TTS � Not just ASR bolted onto TTS � Different styles of interaction � Different styles of interaction � � IVR/Tree question/response systems IVR/Tree question/response systems � � Mixed initiative systems Mixed initiative systems � � “How May I Help You?” open questions “How May I Help You?” open questions � � True conversational machine True conversational machine- -human interaction human interaction � � Strings of characters to words Strings of characters to words �
SDS Overview Introduction � Introduction � Building simple dialog systems � Building simple dialog systems � VoiceXML � VoiceXML � � A language for writing systems A language for writing systems � Beyond tree- -based systems based systems � Beyond tree � � CMU’s Olympus systems CMU’s Olympus systems � Real- -world deployment considerations world deployment considerations � Real �
SDS Applications � Information giving Information giving � � Flights, buses, stocks weather Flights, buses, stocks weather � � Driving directions Driving directions � � News News � � Information navigators Information navigators � � Read your mail Read your mail � � Search the web Search the web � � Answer questions Answer questions � � Provide personalities Provide personalities � � Game characters (NPC), toys, robots Game characters (NPC), toys, robots � � Speech Speech- -to to- -speech translation speech translation � � Cross Cross- -lingual interaction lingual interaction �
Dialog Types � System initiative System initiative � � Form Form- -filling paradigm filling paradigm � � Can switch language models at each turn Can switch language models at each turn � � Can “know” which is likely to be said Can “know” which is likely to be said � � Mixed initiative Mixed initiative � � Users can go where they like Users can go where they like � � System or user can lead the discussion System or user can lead the discussion � � Classifying: Classifying: � � Users can say what they like Users can say what they like � � But really only “N” operations possible But really only “N” operations possible � � E.g. AT&T? “How may I help you?” E.g. AT&T? “How may I help you?” �
System Initiative � Most common Most common � Machine controls the call Machine controls the call � � Few choices in the dialog Few choices in the dialog � � � Simple form filling: Simple form filling: � What is your bank account number What is your bank account number � � � Advantages: Advantages: � You know what users will say (sort of) You know what users will say (sort of) � � Hard for user to get confused Hard for user to get confused � � Hard for system to get confused Hard for system to get confused � � Easy to build Easy to build � � � Disadvantages: Disadvantages: � Limited flexibility in interaction Limited flexibility in interaction � � Fixed dialog structure Fixed dialog structure � � � Most reliable, but many turns Most reliable, but many turns �
System Initiative Let’s Go Bus Information � Let’s Go Bus Information � � 412 442 2000 (Evenings) 412 442 2000 (Evenings) � � Provides bus information for Pittsburgh East Provides bus information for Pittsburgh East � End (61x 5[469]x) End (61x 5[469]x) Tell Me � Tell Me � � Company getting others to build systems Company getting others to build systems � � Stocks, weather, entertainment Stocks, weather, entertainment � � 1 800 555 8355 1 800 555 8355 �
Mixed Initiative � User or system takes initiative User or system takes initiative � � More interesting dialogs More interesting dialogs � � “jump” through different parts of dialog state “jump” through different parts of dialog state � � Advantages Advantages � � More realistic dialog More realistic dialog � � Can do more complex tasks Can do more complex tasks � � Disadvantages Disadvantages � � Can get confusing Can get confusing � � Can miss important parts Can miss important parts �
Vera
Classification Dialogs � Sort out from N things Sort out from N things � � User says “anything” and system directs them User says “anything” and system directs them � � Receptionist Receptionist � I have a problem with my bill I have a problem with my bill What’s the area code for Miami What’s the area code for Miami Did you know I can see the beach from here Did you know I can see the beach from here � Advantages Advantages � � (Apparently) complex understanding (Apparently) complex understanding � � Solves a very common task Solves a very common task � � Disadvantages Disadvantages � � Actually quite restrictive Actually quite restrictive � � Needs data to train from Needs data to train from � � Needs to be updated Needs to be updated �
Beyond Telephones Telematics � Telematics � � Voice communication in cars Voice communication in cars � � CPS, music selection etc CPS, music selection etc � Robot Interaction � Robot Interaction � � Robot Robot- -robot and robot robot and robot- -human interaction human interaction � Animated talking head � Animated talking head � � Non Non- -player characters player characters – – web agents web agents � Speech to Speech translation � Speech to Speech translation �
Team Talk Using speech to control multiple robots � Using speech to control multiple robots � � Robots have names and distinct voices Robots have names and distinct voices � � They report to each other and to you in voice They report to each other and to you in voice �
USI Lots of different interfaces is confusing � Lots of different interfaces is confusing � � Try to have general expectations and discover Try to have general expectations and discover � Try for some level of standardization � Try for some level of standardization � � (like programming applications: file menu) (like programming applications: file menu) �
True conversation Requires mores than just speech � Requires mores than just speech � � Non Non- -verbal noises: laughing, verbal noises: laughing, er er, um, etc , um, etc � � Eye gaze Eye gaze � � Proper timing (not waiting 500ms before Proper timing (not waiting 500ms before � speaker) speaker) � Back Back- -channeling channeling � � Movement Movement � � Talking about nothing Talking about nothing �
Roboreceptionist Entrance to NSH � Entrance to NSH � � Keyboard (no ASR) Keyboard (no ASR) � � TTS, face, movement TTS, face, movement � � Range finder to detect people Range finder to detect people � � Significant background Significant background � character character Mostly talks about nothing � Mostly talks about nothing �
Recommend
More recommend