homework 3 dialog
play

Homework 3: Dialog Part 1 Part 1 Call Call TellMe TellMe and - PowerPoint PPT Presentation

Homework 3: Dialog Part 1 Part 1 Call Call TellMe TellMe and get two sets of driving directions and get two sets of driving directions Call CMUs Lets Go Call CMUs Lets Go Call Amtrak Call Amtrak


  1. Homework 3: Dialog � Part 1 Part 1 � � Call Call TellMe TellMe and get two sets of driving directions and get two sets of driving directions � � Call CMU’s Let’s Go Call CMU’s Let’s Go � � Call Amtrak Call Amtrak � � Part 2 Part 2 � � Build your own pizza ordering systems Build your own pizza ordering systems � � Register with Tell Me Studio Register with Tell Me Studio � � Use Use VoiceXML VoiceXML to build a system to build a system � th November 3:30pm � Results are due 17 Results are due 17 th November 3:30pm �

  2. Speech Processing 15-492/18-492 Spoken Dialog Systems Beyond VoiceXML: the Olympus Spoken Dialog Framework

  3. Spoken Dialog - VoiceXML Write (several) vxml vxml “pages” and resources “pages” and resources � Write (several) � � Your dialog application control Your dialog application control � � Provide grammar for understanding Provide grammar for understanding � � Define what your system says Define what your system says � Generally just use provided ASR/TTS � Generally just use provided ASR/TTS � Great for basic form- -filling applications filling applications � Great for basic form � � What if your application can’t be made into a What if your application can’t be made into a � form- -filling one? filling one? form

  4. Olympus Spoken Dialog Framework A general dialog system architecture � A general dialog system architecture � Modular, open source framework � Modular, open source framework � � Provides components needed to build SDS Provides components needed to build SDS �  ASR/TTS, Language Understanding/Generation, ASR/TTS, Language Understanding/Generation,  Dialog Management, etc. Dialog Management, etc. � Can replace components with other options Can replace components with other options �  e.g., use a different ASR engine e.g., use a different ASR engine  � Tied together via Galaxy message Tied together via Galaxy message- -passing passing � communication infrastructure communication infrastructure http://wiki.speech.cs.cmu.edu/olympus wiki.speech.cs.cmu.edu/olympus � http:// �

  5. Example Olympus Systems Let’s Go! (bus information) � Let’s Go! (bus information) � TeamTalk (robot interaction) (robot interaction) � TeamTalk � � http:// http://wiki.speech.cs.cmu.edu/teamtalk wiki.speech.cs.cmu.edu/teamtalk/ / � Vera � Vera � � http:// http://www.speech.cs.cmu.edu/~awb/vera.wmv www.speech.cs.cmu.edu/~awb/vera.wmv � Many others � Many others �

  6. Organization of Olympus Systems Core components � Core components � � Generic, useful in multiple different systems Generic, useful in multiple different systems � Application components � Application components � � System System- -specific, useful for a single application specific, useful for a single application �

  7. Olympus Core Directory Structure Source code for all system-independent Galaxy servers Binaries Scripts to compile Olympus Generic system configuration includes External dependencies System- independent resources (ASR and Tools and scripts for VAD acoustic LM training, log mining… models)

  8. System Directory Structure Source code for system-specific System-specific Galaxy servers binaries System configurations System documentation System-specific Dialog logs resources (grammars, language models, …)

  9. Typical Pipeline Architecture ����������� �������������� �������������� ��������� ����������

  10. Pipeline Architecture in Olympus Recog. Engine (SPHINX) ����������� �������������� Knowledge ����������� �������������� ����������� ������� Source Phone / ��������������� �������������� �������������� Backend ������ ��������� Desktop ��������� ���������� ��������� ���������� �������� ������� Synth. Engine (SAPI/FLITE)

  11. The Olympus Architecture • Fast and small • Interface between real world and dialog manager Recog. Engine • Slot-filling templates • Acoustic/Language (SPHINX) models • Allows for random ����������� �������������� • Manages timing/turn- Knowledge • Suitable for ����������� ������� Source variations taking channel/domain • Controls dialog • Allows multiple Phone / ��������������� �������������� Backend • Grammar based ������ ��������� Desktop recognition engines • Plan-based • Interface to external • Robust parser engines (SAPI, Swift, ��������� ���������� Flite) �������� ������� • Does playback Synth. Engine (SAPI/FLITE)

  12. Olympus Architecture Modules Recog. Engine (SPHINX) ����������� �������������� Knowledge ����������� �������������� Source Phone / ��������������� �������������� Backend ������ ��������� Desktop ��������� ���������� �������� ������� Synth. Engine (SAPI/FLITE)

  13. Grammar Used for two things: � Used for two things: � � Parsing Parsing � � ASR language model if one isn’t available ASR language model if one isn’t available � The Phoenix Parser � The Phoenix Parser � � Context Context- -Free Grammar Free Grammar � � Robust parser Robust parser �

  14. Phoenix Parser / Grammar [room_size_spec] ([rss_large]) � CFG Grammar CFG Grammar � ([rss_small]) ([rss_larger]) ([rss_smaller]) � Manually Manually- -generated domain generated domain- - � ([rss_smallest]) ([rss_largest]) specific grammar rules specific grammar rules ; [rss_large] � Reusable, generic sub Reusable, generic sub- -grammars grammars (large) � (big)  [Yes], [No], [Number], [ [Yes], [No], [Number], [DateTime DateTime], ], (huge)  ; [Help], [Repeat], [Suspend], etc… [Help], [Repeat], [Suspend], etc… [rss_larger] (*the larger) DO YOU HAVE SOMETHING A BIT LARGER? (*the bigger) [NeedRoom] ( (too small) [_i_want] (DO YOU HAVE SOMETHING) ) ; [RoomSizeSpec] ( [rss_largest] [room_size_spec] ( (*the largest) [rss_larger] (LARGER))) (*the biggest) ; [rss_small] � Parses all incoming hypotheses Parses all incoming hypotheses � (small) (little) and passes all parses along… and passes all parses along… ;

  15. Example Phoenix Grammar [Place] [NextBus] (carnegie mellon university) (*WHEN_IS *the next *BUS) (downtown) (*WHEN_IS *the BUS after that *BUS) (robinson towne center) (the airport) WHEN_IS (south hills junction) (when is) (mount oliver) (when's) (the south side) (oakland) BUS (bloomfield) (bus) (polish hill) (one) (the strip district) ; (the north side) ;

  16. Confidence Annotation - Helios Builds accurate confidence scores using � Builds accurate confidence scores using � features from 3 sources of knowledge: features from 3 sources of knowledge: � Speech recognition Speech recognition � � Language understanding Language understanding � � Dialog management Dialog management � Selects hypothesis with maximum � Selects hypothesis with maximum � confidence score confidence score

Recommend


More recommend