Internet Engineering: VoiceXML Ali Kamandi Sharif University of Technology Fall 2007 kamandi@ce.sharif.edu
What Is VoiceXML? � VoiceXML, or VXML, is a markup language like HTML. � The difference: � HTML is rendered by your Web browser to format content and user-input forms; � VoiceXML is rendered by a voice browser. Sharif University of Technology 2
User Interaction � Your application can speak to the user via � synthesized speech � prerecorded audio files. � Your software can receive input from the user via � speech � the tones from their telephone keypad. Sharif University of Technology 3
How to Make Your Content Telephone-Accessible � Rent a telephone line and run commercial voice recognition software and text-to-speech (TTS) conversion software. � VoiceXML revolution � There are free VoiceXML gateways, such as: � BeVocal (http://www.bevocal.com), � Voxeo (http://www.voxeo.com), and � VoiceGenie (http://www.voicegenie.com). Sharif University of Technology 4
VoiceXML � These gateways take VoiceXML pages from your Web server and read them to your user. If your application needs input from the user, the gateway will interpret the incoming response and pass that response to your server in a way that your software can understand. Sharif University of Technology 5
VoiceXML Sharif University of Technology 6
VoiceXML Sharif University of Technology 7
VoiceXML Basics <?xml version="1.0"?> <vxml version="2.0"> <form> <block> <audio>Hello, World</audio> </block> </form> </vxml> Sharif University of Technology 8
VoiceXML Basics (2) � Within that is a <form>, which can either be an interactive element-requesting input from the user-or informational. � You can have as many forms as you want within a VoiceXML document. � A <block> is a container for your executables, meaning that all your tags that make your application do something, such as <audio>, <goto>,… Sharif University of Technology 9
VoiceXML Basics (3) � <audio>text</audio> � will read the text with a TTS converter, whereas � <audio src="wav_file_URL"/> � will play a prerecorded .wav audio file. Sharif University of Technology 10
More VoiceXML Sharif University of Technology 11
More VoiceXML <?xml version="1.0"?> <vxml version="2.0"> <form id="animal_questionnaire"> <field name="favorite_animal"> <prompt> <audio>Which do you like better, dogs or cats?</audio> </prompt> <grammar> <![CDATA[ [ [dog dogs] {<option "dogs">} [cat cats] {<option "cats">} ] ]]> </grammar> Sharif University of Technology 12
More VoiceXML (2) <!-- if the user gave a valid response, the filled block is executed. --> <filled> <if cond="favorite_animal == ‘dogs’"> <goto next="#popular_dog_facts"/> <else/> <goto expr="‘psychological_evaluation.cgi?affliction=’+ favorite_animal"/> </if> </filled> Sharif University of Technology 13
More VoiceXML (3) <!-- if the user responded but it didn’t match the grammar, the nomatch block is executed --> <nomatch> I’m sorry, I didn’t understand what you said. <reprompt/> </nomatch> Sharif University of Technology 14
More VoiceXML (4) <!-- if there is no response for a few seconds, the noinput block is executed --> <noinput> I’m sorry, I didn’t hear you. <reprompt/> </noinput> </field> </form> <!-- additional forms can go here --> </vxml> Sharif University of Technology 15
Barge-in � Normally the user does not have to wait for the prompt to finish before speaking. Instead, he can “ barge in ” and speak a response at any time. <prompt bargein="false"> <audio src="advertisement.wav"> </prompt> Sharif University of Technology 16
Speech Timeouts � If a user does not speak after hearing a prompt, the interpreter will generate a timeout and execute the <noinput> event handler, if there is one. � <property name="timeout" value="10"> Sharif University of Technology 17
Grammar Format � In VoiceXML 1.0, the W3C did not specify the grammar format, allowing each Voice-XML platform to implement grammars as they chose. � In VoiceXML 2.0, each platform is required to implement the XML format of the W3C’s Speech Recognition Grammar Format (SRGF), Sharif University of Technology 18
Mobile versus Voice Applications Mobile Browser VoiceXML Requires browser-enabled Can be used with any telephones phone User-input with Speech or keypad input uncomfortable keypads Works well in noisy Hard to use in noisy environment environment Sharif University of Technology 19
Mobile versus Voice Applications Mobile Browser VoiceXML You need to develop You only need to develop versions of your software one version of your for a variety of mobile software gateways Works well for displaying Works poorly for giving the long list of information user long lists of information User can enter arbitrary Users can only say information predefined phrases Sharif University of Technology 20
Syntax in HTML & VoiceXML � Compared to HTML, VoiceXML is much stricter about using correct syntax. � In HTML � writing attribute values without quotes � omitting the ending tag of a container � In VoiceXML, you must use proper syntax in all documents. Sharif University of Technology 21
Beyond VoiceXML: Conversational Speech � You Will it rain tomorrow in Boston? � JUPITER To my knowledge, the forecast calls for no rain tomorrow in Boston. � You What about Detroit? � JUPITER To my knowledge, the forecast calls for no rain tomorrow in Detroit. • Assumed that you were still interested in rain when asking about Detroit, context carried over from the Boston question. Sharif University of Technology 22
References Chapter 10: � Software Engineering for Internet Applications by Eve Andersson, Philip Greenspun, and Andrew Grumet; The MIT Press Cambridge, Massachusetts London, England, 2006. Sharif University of Technology 23
Recommend
More recommend