speech processing 11 492 18 492 speech processing 11 492
play

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 - PowerPoint PPT Presentation

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Spoken Dialog Systems SDS components Spoken Dialog Systems Spoken Dialog Systems More than just ASR and TTS More than just ASR and TTS Recognition Recognition


  1. Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Spoken Dialog Systems SDS components

  2. Spoken Dialog Systems Spoken Dialog Systems  More than just ASR and TTS More than just ASR and TTS  Recognition Recognition  Language understanding Language understanding  Manipulation of utterances Manipulation of utterances  Generation of new information Generation of new information  Text generation Text generation  Synthesis Synthesis

  3. SDS Architecture SDS Architecture Language ASR Understanding Dialog Manager Language Understanding Synthesis Non Generation Error Handling Strategies

  4. SDS Internals SDS Internals  Language Understanding Language Understanding  From words to structure From words to structure  Dialog Manager Dialog Manager  State of dialog (who is talking) State of dialog (who is talking)  Direction of dialog (what next) Direction of dialog (what next)  References, user profile etc References, user profile etc  Interaction of database/internet Interaction of database/internet  Language Generation Language Generation  From structure to words From structure to words

  5. Language Understanding Language Understanding  Parsing of SPEECH not TEXT Parsing of SPEECH not TEXT  Eh, I wanna go, wanna go to Boston tomorrow Eh, I wanna go, wanna go to Boston tomorrow  If its not too much trouble I’d be very grateful if If its not too much trouble I’d be very grateful if one might be able to aid me in arranging my one might be able to aid me in arranging my travel arrangements to Boston, Logan airport, at travel arrangements to Boston, Logan airport, at sometime tomorrow morning, thank you. sometime tomorrow morning, thank you.  Boston, tomorrow Boston, tomorrow

  6. Parsing: Output structure Parsing: Output structure  “ “I wanna go to Boston, tomorrow” I wanna go to Boston, tomorrow”  Destination: BOS Destination: BOS  Departure: 20081028, AM Departure: 20081028, AM  Airline: unspecifed Airline: unspecifed  Special: unspecifed Special: unspecifed  Convert speech to structure Convert speech to structure  Sufficient for further processing/query Sufficient for further processing/query

  7. Interaction Example Interaction Example User fjnd a cheap eating place oor taiwanese oood Cheap Taiwanese eating places include Din Tai Fung, Boiling Point, etc. What do you want to choose? I can help you go there. Intelligent Agent

  8. SDS Process SDS Process User fjnd a cheap eating place oor taiwanese oood price oood AMOD NN targe t PREP_FOR seekin Intelligent g Agent

  9. SDS Process SDS Process User fjnd a cheap eating place oor taiwanese oood Ontology Induction (semanti price oood c slot) AMOD NN target PREP_FOR seekin Intelligent g Agent Organized Domain Knowledge

  10. SDS Process SDS Process User fjnd a cheap eating place oor taiwanese oood Ontology Induction (semanti price oood c slot) AMOD NN target Structure Learning PREP_FOR seekin (inter-slot relation) Intelligent g Agent Organized Domain Knowledge

  11. SDS Process SDS Process User fjnd a cheap eating place oor taiwanese oood price oood AMOD NN targe seeking=“fjnd” t target=“eating PREP_FOR seekin Intelligent place” g Agent price=“cheap” oood=“taiwanese”

  12. SDS Process SDS Process User fjnd a cheap eating place oor taiwanese oood price oood AMOD NN Semantic targe Decoding seeking=“fjnd” t target=“eating PREP_FOR seekin Intelligent place” g Agent price=“cheap” oood=“taiwanese”

  13. Automatic Slot Induction Automatic Slot Induction Chen et al. ASRU’13 Chen et al. ASRU’13 Domain Domain can i have a cheap restaurant can i have a cheap restaurant General Frame: expensiveness Frame: capability Frame: locale by use slot candidate 1 5

  14. Parsing vs Language Model Parsing vs Language Model  Language Model Language Model  Model what actually gets said Model what actually gets said  Parsing Parsing  Extract the information you want Extract the information you want  Models *can* be shared Models *can* be shared  Only accept things in the grammar Only accept things in the grammar  Can be over limiting Can be over limiting

  15. Neural Networks for SLU Neural Networks for SLU Mesnil et al.  RNN for Slot Filling 2013 RNN for Slot Filling  Step 1: word embedding Step 1: word embedding  Step 2: short-term dependencies capturing Step 2: short-term dependencies capturing  Step 3: long-term dependencies capturing Step 3: long-term dependencies capturing  Step 4: different types of neural architecture Step 4: different types of neural architecture http://deeplearning.net/tutorial/rnnslu.html#rnnslu http://deeplearning.net/tutorial/rnnslu.html#rnnslu

  16. Interactive Learning for SLU Interactive Learning for SLU Williams et al. 2016 Luis : Interactive machine learning for Luis : Interactive machine learning for language understanding language understanding Advantages: Advantages: Non-expert could add in knowledge in  Non-expert could add in knowledge in feature engineering feature engineering Active-learning reduces heavy labeling  Active-learning reduces heavy labeling https://www.luis.ai/ https://www.luis.ai/

  17. Dialog Manager Dialog Manager  Maintain state Maintain state  Where are we in the dialog Where are we in the dialog  Whose turn is it Whose turn is it  Waiting for speaker Waiting for speaker  Waiting for database query (stall user) Waiting for database query (stall user)  Deal with barge-in Deal with barge-in

  18. Frame Based Dialog Manger Frame Based Dialog Manger  Used for transaction dialog Used for transaction dialog  Generalizes finite-state approach by allowing Generalizes finite-state approach by allowing multiple paths to acquire info multiple paths to acquire info  Central data structure is frame with slots Central data structure is frame with slots • DM is monitoring frame, filling in slots DM is monitoring frame, filling in slots  Frame: Frame:  Set of information needed Set of information needed  Context for utterance interpretation Context for utterance interpretation  Context for dialogue progress Context for dialogue progress  Allows mixed initiative Allows mixed initiative  Allows over-answering Allows over-answering  Also called form-based (MIT): Often called “slot-filling” Also called form-based (MIT): Often called “slot-filling”

  19. Problems with Frames Problems with Frames  Not easily applicable to complex tasks Not easily applicable to complex tasks  May not be a single frame May not be a single frame  Dynamic construction of information Dynamic construction of information  User access to “product” User access to “product”

  20. Agenda + Frame Agenda + Frame  Product: Product:  hierarchical composition of frames hierarchical composition of frames  Process: Process:  Agenda Agenda  Generalization of stack Generalization of stack  Ordered list of topics Ordered list of topics  List of handlers List of handlers

  21. Statistical Approaches to DM Statistical Approaches to DM  Allow for dialog complexity beyond human Allow for dialog complexity beyond human mind mind  Find optimal decision for non-trivial design Find optimal decision for non-trivial design problems problems  Life-long learning Life-long learning

  22. Decisions Decisions  Difficult design decision over the course of Difficult design decision over the course of interaction interaction  When to ask open / directive questions? When to ask open / directive questions?  When to confirm? When to confirm?  When to barge-in / wait? When to barge-in / wait?  Which type of feedback to provide? (ex, Which type of feedback to provide? (ex, intelligent tutoring system) intelligent tutoring system)  Sample efficient policy search Sample efficient policy search  Policy space is too huge to search with Policy space is too huge to search with traditional ways of SDS development traditional ways of SDS development

  23. System-Initiative VS Mixed-Initiative System-Initiative VS Mixed-Initiative S1: Welcome to CMU Let’s Go. Where do you leave from? Where do you leave from? S1: Welcome to CMU Let’s Go. U1: CMU U1: CMU S2: From CMU, did I get that right? S2: From CMU, did I get that right? U2: Yes. U2: Yes. S3: Where are you going? Where are you going? S3: U3: Downtown. U3: Downtown. S4: To Downtown, did I get that right? S4: To Downtown, did I get that right? U4: Yes. U4: Yes. S1: Welcome to CMU Let’s Go. How may I help you? How may I help you? S1: Welcome to CMU Let’s Go. U1: I'd like to go from CMU to Downtown. U1: I'd like to go from CMU to Downtown. S2: From CMU to Downtown, did I get that right? S2: From CMU to Downtown, did I get that right? U2: Yes. U2: Yes. S3 : When are you going to take the bus? : When are you going to take the bus? S3 U3: Now U3: Now S3: You want the next bus, is that right? S3: You want the next bus, is that right? U3: Yes. U3: Yes.

Recommend


More recommend