Speech Processing 15-492/18-492 Spoken Dialog Systems Tree based dialogs VoiceXML
State-based Dialogs Simple state- -based dialog systems based dialog systems � Simple state � � Get Name Get Name � � Get Account number Get Account number � � Get Pin Get Pin � � Present balance Present balance � � Go back to start or exit Go back to start or exit �
State-based Dialogs Get Name: � Get Name: � � What is your name? What is your name? � ASR Name ASR Name May be correct (in the database) May be correct (in the database) May be unknown (not in database) May be unknown (not in database) May not be name (What do I say?/Help/Repeat) May not be name (What do I say?/Help/Repeat) Should you echo the recognized name? Should you echo the recognized name? Confirmation (or not) Confirmation (or not)
State-based dialog Get name � Get name � � Check in database Check in database � � Ask again if not Ask again if not � � Deal with help Deal with help � Get account number � Get account number � � Check in database (with name) Check in database (with name) � Confirm account number and name � Confirm account number and name � � For security For security �
State-based Interaction Trees can get very large � Trees can get very large � � User can get lost easily User can get lost easily � You want to minimize the number of turns � You want to minimize the number of turns � � Faster throughput means more calls Faster throughput means more calls � � Faster throughput means happier customer Faster throughput means happier customer �
The level of help First time users *need* a successful call � First time users *need* a successful call � � Otherwise, they wont call back Otherwise, they wont call back � � Having very helpful prompts is good Having very helpful prompts is good � At start, gets annoying quickly At start, gets annoying quickly Designing prompts is a craft � Designing prompts is a craft � � What should say that is understood What should say that is understood � � How much should you tailor it to the user How much should you tailor it to the user �
VoiceXML A W3C standard for voice browsing � A W3C standard for voice browsing � � XML based “programming” language for XML based “programming” language for � speech speech � Output synthesized (and recorded) speech Output synthesized (and recorded) speech � � Recognition of speech and DTMF Recognition of speech and DTMF � � Recording of spoken input Recording of spoken input � � Telephony features Telephony features �
VoiceXML ASR � ASR � � From Grammars (JSGF) From Grammars (JSGF) � � From tri From tri- -grams grams � � From “Domain Managers” From “Domain Managers” � Credit card numbers Credit card numbers City, Stats City, Stats
VoiceXML TTS � TTS � � < <ssml ssml> markup > markup � � Choice of voice Choice of voice � � Choice of language Choice of language � � Choice of how to pronounce things Choice of how to pronounce things � � Specify breaks, timing emphasis Specify breaks, timing emphasis �
Structure <vxml vxml version="1.0"> version="1.0"> < <meta name="author" content="John Doe"/> <meta name="author" content="John Doe"/> < <var var name="hi" name="hi" expr expr="'Hello World!'"/> ="'Hello World!'"/> <form> <form> <block> <block> <value expr expr="hi"/> ="hi"/> <value < <goto goto next="# next="#say_goodbye say_goodbye"/> "/> </block> </block> </form> </form> <form id="say_goodbye say_goodbye"> "> <form id=" <block> <block> Goodbye! Goodbye! </block> </block> </form> </form> </ </vxml vxml> >
Basic Tags <form id=“xxxx xxxx”> ”> � <form id=“ � � < <goto goto next=“#xxx”> next=“#xxx”> � <field> gather info from user through � <field> gather info from user through � speech or DTMF speech or DTMF <record> record record data user data user � <record> � <subdialog subdialog> performs some sub dialog > performs some sub dialog � < �
<field> tag <form id=“getBusNumber getBusNumber”> ”> <form id=“ <field name=“BusNumber BusNumber”> ”> <field name=“ <prompt>Which bus line do you want?</prompt> <prompt>Which bus line do you want?</prompt> <grammar src src=“grams/ =“grams/bus.gram bus.gram”> ”> <grammar <help> Please say you desired bus number, e.g. <help> Please say you desired bus number, e.g. 61C</help> 61C</help> </field> </field> </form> </form>
Flow of Control � Goto Goto � <goto goto next=“# next=“#GetBusNumber GetBusNumber> > < <goto goto next=“ next=“Trains.vxml Trains.vxml”> ”> < � <if <if cond cond=“ =“BusNumber BusNumber == ‘501”> == ‘501”> � <prompt> Sorry that bus no longer runs</prompt> <prompt> Sorry that bus no longer runs</prompt> <elseif elseif cond cond=“ =“BusNumber BusNumber == ’56U”> == ’56U”> < <prompt> Sorry it’ll be a long wait </prompt> <prompt> Sorry it’ll be a long wait </prompt> <else /> <else /> <prompt> One will be along shortly </prompt> <prompt> One will be along shortly </prompt> </if> </if>
Variables <var var name=“var1” name=“var1” expr expr =“hello”> =“hello”> � < � <prompt I just wanted to say <value <prompt I just wanted to say <value expr=“var1”> </prompt> =“var1”> </prompt> expr <assign name=“var1” expr expr=“goodbye”> =“goodbye”> <assign name=“var1”
Recognition Grammars Speech Recognition Grammar Specification � Speech Recognition Grammar Specification � � (SRGS) (SRGS) � Augmented BNF � Augmented BNF � $order = I would like a $drink $order = I would like a $drink $drink = coke | pepsi pepsi | | mountain_dew mountain_dew $drink = coke |
VoiceXML Browsers Compatibility � Compatibility � � Not as compatible as one would like Not as compatible as one would like � � <objects> can be different (but useful) <objects> can be different (but useful) � City, State recognizers City, State recognizers � ECMAscript ECMAscript ( (Javascript Javascript) ) �
Beyond VoiceXML (in VoiceXML) Mixing html/cgi cgi scripts in scripts in VoiceXML VoiceXML � Mixing html/ � � Use Use php php to generate to generate VoiceXML VoiceXML files files � � Use Use urls urls (with ?...) to calculate/get data (with ?...) to calculate/get data � http://weather.com?zip=“15213 http://weather.com?zip=“15213” ” � Use Use urls urls to get waveforms to get waveforms � http://tts.com?text=“Hello http://tts.com?text=“Hello World” World”
VoiceXML future N- -gram grammar Markup Language gram grammar Markup Language � N � � Many browsers hove own extensions Many browsers hove own extensions � Pronunciation Lexicon Markup Language � Pronunciation Lexicon Markup Language � � A way to add new items to the lexicon A way to add new items to the lexicon � � Hard to find good standards Hard to find good standards � Call Control Markup Language � Call Control Markup Language � � For management and logging of calls For management and logging of calls �
Microsoft SALT SALT tags � SALT tags � Listen DTMF prompt bind grammar (plus ssml ssml) ) Listen DTMF prompt bind grammar (plus Designed for desktop not just phone � Designed for desktop not just phone � Design to be shared documents � Design to be shared documents � � Viewing (HTML) and Speech (SALT) Viewing (HTML) and Speech (SALT) �
Available Systems Nuance � Nuance � Be- -vocal vocal � Be � Tell Me � Tell Me � � Tell Tell- -me studio me studio � OpenVXI/publicvoicexml.org � OpenVXI/publicvoicexml.org � Many others others � Many �
SDS Architecture
Recommend
More recommend