speech processing 15 492 18 492
play

Speech Processing 15-492/18-492 Spoken Dialog Systems Tree based - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Spoken Dialog Systems Tree based dialogs VoiceXML State-based Dialogs Simple state- -based dialog systems based dialog systems Simple state Get Name Get Name Get Account number Get


  1. Speech Processing 15-492/18-492 Spoken Dialog Systems Tree based dialogs VoiceXML

  2. State-based Dialogs Simple state- -based dialog systems based dialog systems � Simple state � � Get Name Get Name � � Get Account number Get Account number � � Get Pin Get Pin � � Present balance Present balance � � Go back to start or exit Go back to start or exit �

  3. State-based Dialogs Get Name: � Get Name: � � What is your name? What is your name? �  ASR Name ASR Name   May be correct (in the database) May be correct (in the database)   May be unknown (not in database) May be unknown (not in database)   May not be name (What do I say?/Help/Repeat) May not be name (What do I say?/Help/Repeat)   Should you echo the recognized name? Should you echo the recognized name?   Confirmation (or not) Confirmation (or not) 

  4. State-based dialog Get name � Get name � � Check in database Check in database � � Ask again if not Ask again if not � � Deal with help Deal with help � Get account number � Get account number � � Check in database (with name) Check in database (with name) � Confirm account number and name � Confirm account number and name � � For security For security �

  5. State-based Interaction Trees can get very large � Trees can get very large � � User can get lost easily User can get lost easily � You want to minimize the number of turns � You want to minimize the number of turns � � Faster throughput means more calls Faster throughput means more calls � � Faster throughput means happier customer Faster throughput means happier customer �

  6. The level of help First time users *need* a successful call � First time users *need* a successful call � � Otherwise, they wont call back Otherwise, they wont call back � � Having very helpful prompts is good Having very helpful prompts is good �  At start, gets annoying quickly At start, gets annoying quickly  Designing prompts is a craft � Designing prompts is a craft � � What should say that is understood What should say that is understood � � How much should you tailor it to the user How much should you tailor it to the user �

  7. VoiceXML A W3C standard for voice browsing � A W3C standard for voice browsing � � XML based “programming” language for XML based “programming” language for � speech speech � Output synthesized (and recorded) speech Output synthesized (and recorded) speech � � Recognition of speech and DTMF Recognition of speech and DTMF � � Recording of spoken input Recording of spoken input � � Telephony features Telephony features �

  8. VoiceXML ASR � ASR � � From Grammars (JSGF) From Grammars (JSGF) � � From tri From tri- -grams grams � � From “Domain Managers” From “Domain Managers” �  Credit card numbers Credit card numbers   City, Stats City, Stats 

  9. VoiceXML TTS � TTS � � < <ssml ssml> markup > markup � � Choice of voice Choice of voice � � Choice of language Choice of language � � Choice of how to pronounce things Choice of how to pronounce things � � Specify breaks, timing emphasis Specify breaks, timing emphasis �

  10. Structure <vxml vxml version="1.0"> version="1.0"> < <meta name="author" content="John Doe"/> <meta name="author" content="John Doe"/> < <var var name="hi" name="hi" expr expr="'Hello World!'"/> ="'Hello World!'"/> <form> <form> <block> <block> <value expr expr="hi"/> ="hi"/> <value < <goto goto next="# next="#say_goodbye say_goodbye"/> "/> </block> </block> </form> </form> <form id="say_goodbye say_goodbye"> "> <form id=" <block> <block> Goodbye! Goodbye! </block> </block> </form> </form> </ </vxml vxml> >

  11. Basic Tags <form id=“xxxx xxxx”> ”> � <form id=“ � � < <goto goto next=“#xxx”> next=“#xxx”> � <field> gather info from user through � <field> gather info from user through � speech or DTMF speech or DTMF <record> record record data user data user � <record> � <subdialog subdialog> performs some sub dialog > performs some sub dialog � < �

  12. <field> tag <form id=“getBusNumber getBusNumber”> ”> <form id=“ <field name=“BusNumber BusNumber”> ”> <field name=“ <prompt>Which bus line do you want?</prompt> <prompt>Which bus line do you want?</prompt> <grammar src src=“grams/ =“grams/bus.gram bus.gram”> ”> <grammar <help> Please say you desired bus number, e.g. <help> Please say you desired bus number, e.g. 61C</help> 61C</help> </field> </field> </form> </form>

  13. Flow of Control � Goto Goto � <goto goto next=“# next=“#GetBusNumber GetBusNumber> > < <goto goto next=“ next=“Trains.vxml Trains.vxml”> ”> < � <if <if cond cond=“ =“BusNumber BusNumber == ‘501”> == ‘501”> � <prompt> Sorry that bus no longer runs</prompt> <prompt> Sorry that bus no longer runs</prompt> <elseif elseif cond cond=“ =“BusNumber BusNumber == ’56U”> == ’56U”> < <prompt> Sorry it’ll be a long wait </prompt> <prompt> Sorry it’ll be a long wait </prompt> <else /> <else /> <prompt> One will be along shortly </prompt> <prompt> One will be along shortly </prompt> </if> </if>

  14. Variables <var var name=“var1” name=“var1” expr expr =“hello”> =“hello”> � < � <prompt I just wanted to say <value <prompt I just wanted to say <value expr=“var1”> </prompt> =“var1”> </prompt> expr <assign name=“var1” expr expr=“goodbye”> =“goodbye”> <assign name=“var1”

  15. Recognition Grammars Speech Recognition Grammar Specification � Speech Recognition Grammar Specification � � (SRGS) (SRGS) � Augmented BNF � Augmented BNF � $order = I would like a $drink $order = I would like a $drink $drink = coke | pepsi pepsi | | mountain_dew mountain_dew $drink = coke |

  16. VoiceXML Browsers Compatibility � Compatibility � � Not as compatible as one would like Not as compatible as one would like � � <objects> can be different (but useful) <objects> can be different (but useful) �  City, State recognizers City, State recognizers  � ECMAscript ECMAscript ( (Javascript Javascript) ) �

  17. Beyond VoiceXML (in VoiceXML) Mixing html/cgi cgi scripts in scripts in VoiceXML VoiceXML � Mixing html/ � � Use Use php php to generate to generate VoiceXML VoiceXML files files � � Use Use urls urls (with ?...) to calculate/get data (with ?...) to calculate/get data �  http://weather.com?zip=“15213 http://weather.com?zip=“15213” ”  � Use Use urls urls to get waveforms to get waveforms �  http://tts.com?text=“Hello http://tts.com?text=“Hello World” World” 

  18. VoiceXML future N- -gram grammar Markup Language gram grammar Markup Language � N � � Many browsers hove own extensions Many browsers hove own extensions � Pronunciation Lexicon Markup Language � Pronunciation Lexicon Markup Language � � A way to add new items to the lexicon A way to add new items to the lexicon � � Hard to find good standards Hard to find good standards � Call Control Markup Language � Call Control Markup Language � � For management and logging of calls For management and logging of calls �

  19. Microsoft SALT SALT tags � SALT tags � Listen DTMF prompt bind grammar (plus ssml ssml) ) Listen DTMF prompt bind grammar (plus Designed for desktop not just phone � Designed for desktop not just phone � Design to be shared documents � Design to be shared documents � � Viewing (HTML) and Speech (SALT) Viewing (HTML) and Speech (SALT) �

  20. Available Systems Nuance � Nuance � Be- -vocal vocal � Be � Tell Me � Tell Me � � Tell Tell- -me studio me studio � OpenVXI/publicvoicexml.org � OpenVXI/publicvoicexml.org � Many others others � Many �

  21. SDS Architecture

Recommend


More recommend