Internet Engineering: VoiceXML Ali Kamandi Sharif University of - - PowerPoint PPT Presentation
Internet Engineering: VoiceXML Ali Kamandi Sharif University of - - PowerPoint PPT Presentation
Internet Engineering: VoiceXML Ali Kamandi Sharif University of Technology Fall 2007 kamandi@ce.sharif.edu What Is VoiceXML? VoiceXML, or VXML, is a markup language like HTML. The difference: HTML is rendered by your Web browser
Sharif University of Technology 2
What Is VoiceXML?
VoiceXML, or VXML, is a markup language
like HTML.
The difference:
HTML is rendered by your Web browser to format
content and user-input forms;
VoiceXML is rendered by a voice browser.
Sharif University of Technology 3
User Interaction
Your application can speak to the user via
synthesized speech prerecorded audio files.
Your software can receive input from the user
via
speech the tones from their telephone keypad.
Sharif University of Technology 4
How to Make Your Content Telephone-Accessible
Rent a telephone line and run commercial
voice recognition software and text-to-speech (TTS) conversion software.
VoiceXML revolution There are free VoiceXML gateways, such as:
BeVocal (http://www.bevocal.com), Voxeo (http://www.voxeo.com), and VoiceGenie (http://www.voicegenie.com).
Sharif University of Technology 5
VoiceXML
These gateways take VoiceXML pages from
your Web server and read them to your user. If your application needs input from the user, the gateway will interpret the incoming response and pass that response to your server in a way that your software can understand.
Sharif University of Technology 6
VoiceXML
Sharif University of Technology 7
VoiceXML
Sharif University of Technology 8
VoiceXML Basics
<?xml version="1.0"?> <vxml version="2.0"> <form> <block> <audio>Hello, World</audio> </block> </form> </vxml>
Sharif University of Technology 9
VoiceXML Basics (2)
Within that is a <form>, which can either be
an interactive element-requesting input from the user-or informational.
You can have as many forms as you want
within a VoiceXML document.
A <block> is a container for your executables,
meaning that all your tags that make your application do something, such as <audio>, <goto>,…
Sharif University of Technology 10
VoiceXML Basics (3)
<audio>text</audio> will read the text with a TTS converter,
whereas
<audio src="wav_file_URL"/> will play a prerecorded .wav audio file.
Sharif University of Technology 11
More VoiceXML
Sharif University of Technology 12
More VoiceXML
<?xml version="1.0"?> <vxml version="2.0"> <form id="animal_questionnaire"> <field name="favorite_animal"> <prompt> <audio>Which do you like better, dogs or cats?</audio> </prompt> <grammar> <![CDATA[ [ [dog dogs] {<option "dogs">} [cat cats] {<option "cats">} ] ]]> </grammar>
Sharif University of Technology 13
More VoiceXML (2)
<!-- if the user gave a valid response, the filled block is executed. --> <filled> <if cond="favorite_animal == ‘dogs’"> <goto next="#popular_dog_facts"/> <else/> <goto expr="‘psychological_evaluation.cgi?affliction=’+ favorite_animal"/> </if> </filled>
Sharif University of Technology 14
More VoiceXML (3)
<!-- if the user responded but it didn’t match the grammar, the nomatch block is executed --> <nomatch> I’m sorry, I didn’t understand what you said. <reprompt/> </nomatch>
Sharif University of Technology 15
More VoiceXML (4)
<!-- if there is no response for a few seconds, the noinput block is executed --> <noinput> I’m sorry, I didn’t hear you. <reprompt/> </noinput> </field> </form> <!-- additional forms can go here --> </vxml>
Sharif University of Technology 16
Barge-in
Normally the user does not have to wait for
the prompt to finish before speaking. Instead, he can “barge in” and speak a response at any time. <prompt bargein="false"> <audio src="advertisement.wav"> </prompt>
Sharif University of Technology 17
Speech Timeouts
If a user does not speak after hearing a
prompt, the interpreter will generate a timeout and execute the <noinput> event handler, if there is one.
<property name="timeout" value="10">
Sharif University of Technology 18
Grammar Format
In VoiceXML 1.0, the W3C did not specify the
grammar format, allowing each Voice-XML platform to implement grammars as they chose.
In VoiceXML 2.0, each platform is required to
implement the XML format of the W3C’s Speech Recognition Grammar Format (SRGF),
Sharif University of Technology 19
Mobile versus Voice Applications
Hard to use in noisy environment Works well in noisy environment Speech or keypad input User-input with uncomfortable keypads Can be used with any phone Requires browser-enabled telephones VoiceXML Mobile Browser
Sharif University of Technology 20
Mobile versus Voice Applications
You only need to develop
- ne version of your
software You need to develop versions of your software for a variety of mobile gateways Users can only say predefined phrases User can enter arbitrary information Works poorly for giving the user long lists of information Works well for displaying long list of information VoiceXML Mobile Browser
Sharif University of Technology 21
Syntax in HTML & VoiceXML
Compared to HTML, VoiceXML is much
stricter about using correct syntax.
In HTML
writing attribute values without quotes
- mitting the ending tag of a container
In VoiceXML, you must use proper syntax in
all documents.
Sharif University of Technology 22
Beyond VoiceXML: Conversational Speech
You Will it rain tomorrow in Boston? JUPITER To my knowledge, the forecast
calls for no rain tomorrow in Boston.
You What about Detroit? JUPITER To my knowledge, the forecast
calls for no rain tomorrow in Detroit.
- Assumed that you were still interested in
rain when asking about Detroit, context carried over from the Boston question.
Sharif University of Technology 23
References
Chapter 10:
Software Engineering for Internet Applications