de deep le learnin ing fo for di dialogue sy systems
play

De Deep Le Learnin ing fo for Di Dialogue Sy Systems GTC 2018 - PowerPoint PPT Presentation

De Deep Le Learnin ing fo for Di Dialogue Sy Systems GTC 2018 P ROF . Y UN -N UNG (V IVIAN ) C HEN Mar 28 th , 2018 HTTP://VIVIANCHEN.IDV.TW 2 Best Poster Award @ GTC 2017 Thanks NVIDIA!!! Future Life Intelligent Assistant


  1. De Deep Le Learnin ing fo for Di Dialogue Sy Systems GTC 2018 P ROF . Y UN -N UNG (V IVIAN ) C HEN 陳縕儂 Mar 28 th , 2018 HTTP://VIVIANCHEN.IDV.TW

  2. 2 Best Poster Award @ GTC 2017 Thanks NVIDIA!!!

  3. Future Life – Intelligent Assistant 3

  4. Introduction & Background 4

  5. Language Empowering Intelligent Assistant 5 Microsoft Cortana (2014) Google Now (2012) Apple Siri (2011) Google Assistant (2016) Apple HomePod (2017) Amazon Alexa/Echo (2014) Facebook M & Bot (2015) Google Home (2016)

  6. Why We Need? 6  Get things done  E.g. set up alarm/reminder, take note  Easy access to structured data, services and apps  E.g. find docs/photos/restaurants  Assist your daily schedule and routine  E.g. commute alerts to/from work  Be more productive in managing your work and personal life “Hey Assistant” 6

  7. Why Natural Language? 7  Global Digital Statistics (2017 January) Unique Active Social Active Mobile Internet Users Total Population Media Users Mobile Users Social Users 3.77B 7.48B 2.79B 4.92B 2.55B The more natural and convenient input of devices evolves towards speech. 7

  8. Dialogue System 8  Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions.  Spoken dialogue systems are being incorporated into various devices (smart-phones, smart TVs, in- car navigating system, etc). JARVIS – Iron Man’s Personal Assistant Baymax – Personal Healthcare Companion Good dialogue systems assist users to access information conveniently and finish tasks efficiently. 8

  9. App  Bot 9  A bot is responsible for a “single” domain, similar to an app Users can initiate dialogues instead of following the GUI design 9

  10. Task-Oriented Dialogue System (Young, 2000) 10 http://rsta.royalsocietypublishing.org/content/358/1769/1389.short Speech Signal Hypothesis are there any action movies to see this weekend Language Understanding (LU) • Domain Identification Speech • User Intent Detection Recognition • Slot Filling Text Input Are there any action movies to see this weekend? Semantic Frame request_movie genre=action, date=this weekend Dialogue Management (DM) Natural Language • Dialogue State Tracking (DST) Generation (NLG) Text response • Dialogue Policy Where are you located? System Action/Policy request_location Backend Action / Knowledge Providers 10

  11. Interaction Example 11 User find a good eating place for taiwanese food Good Taiwanese eating places include Din Tai Fung, Boiling Point, etc. What do you want to choose? I can help you go there. Intelligent Q: How does a dialogue system process this request? Agent 11

  12. Task-Oriented Dialogue System (Young, 2000) 12 Speech Signal Hypothesis are there any action movies to see this weekend Language Understanding (LU) • Domain Identification Speech • User Intent Detection Recognition • Slot Filling Text Input Are there any action movies to see this weekend? Semantic Frame request_movie genre=action, date=this weekend Dialogue Management (DM) Natural Language • Dialogue State Tracking (DST) Generation (NLG) Text response • Dialogue Policy Where are you located? System Action/Policy request_location Backend Action / Knowledge Providers 12

  13. 1. Domain Identification Requires Predefined Domain Ontology 13 User find a good eating place for taiwanese food Movie DB Restaurant DB Taxi DB Organized Domain Knowledge (Database) Intelligent Agent Classification! 13

  14. 2. Intent Detection Requires Predefined Schema 14 User find a good eating place for taiwanese food FIND_RESTAURANT FIND_PRICE Restaurant DB FIND_TYPE : Intelligent Agent Classification! 14

  15. 3. Slot Filling Requires Predefined Schema 15 O O B-rating O O O B-type O User find a good eating place for taiwanese food Restaurant Rating Type Rest 1 good Taiwanese Rest 2 bad Thai Restaurant DB : : : FIND_RESTAURANT SELECT restaurant { Intelligent rest.rating =“good” rating=“good” Agent type=“ taiwanese ” rest.type =“ taiwanese ” } Semantic Frame Sequence Labeling 15

  16. Task-Oriented Dialogue System (Young, 2000) 16 Speech Signal Hypothesis are there any action movies to see this weekend Language Understanding (LU) • Domain Identification Speech • User Intent Detection Recognition • Slot Filling Text Input Are there any action movies to see this weekend? Semantic Frame request_movie genre=action, date=this weekend Dialogue Management (DM) Natural Language • Dialogue State Tracking (DST) Generation (NLG) Text response • Dialogue Policy Where are you located? System Action/Policy request_location Backend Action / Knowledge Providers 16

  17. Elements of Dialogue Management 17 (Figure from Gašić ) 17

  18. State Tracking Requires Hand-Crafted States 18 User find a good eating place for taiwanese food i want it near to my office NULL location rating type Intelligent rating, loc, loc, rating Agent type type all 18

  19. State Tracking Requires Hand-Crafted States 19 User find a good eating place for taiwanese food i want it near to my office NULL location rating type Intelligent rating, loc, loc, rating Agent type type all 19

  20. State Tracking Handling Errors and Confidence 20 User find a good eating place for taixxxx food FIND_RESTAURANT FIND_RESTAURANT FIND_RESTAURANT rating=“good” rating=“good” rating=“good” type=“ taiwanese ” type=“ thai ” ? rating=“good” NULL , type=“ thai ” ? ? rating=“good”, location rating type type=“ taiwanese ” ? Intelligent rating, loc, loc, rating Agent type type all 20

  21. Elements of Dialogue Management 21 (Figure from Gašić ) 21

  22. Dialogue Policy for Agent Action 22  Inform(location=“Taipei 101”)  “The nearest one is at Taipei 101”  Request(location)  “Where is your home?”  Confirm(type=“ taiwanese ”)  “Did you want Taiwanese food?” 22

  23. Task-Oriented Dialogue System (Young, 2000) 23 Speech Signal Hypothesis are there any action movies to see this weekend Language Understanding (LU) • Domain Identification Speech • User Intent Detection Recognition • Slot Filling Text Input Are there any action movies to see this weekend? Semantic Frame request_movie genre=action, date=this weekend Dialogue Management (DM) Natural Language • Dialogue State Tracking (DST) Generation (NLG) Text response • Dialogue Policy Where are you located? System Action/Policy request_location Backend Action / Knowledge Providers

  24. Output / Natural Language Generation 24  Goal: generate natural language or GUI given the selected dialogue action for interactions  Inform(location=“Taipei 101”)  “The nearest one is at Taipei 101” v.s.  Request(location)  “Where is your home?” v.s.  Confirm(type=“ taiwanese ”)  “Did you want Taiwanese food?” v.s. 24

  25. Deep Learning for Dialogue Systems 25

  26. Machine Learning ≈ Looking for a Function 26  Speech Recognition    “你好 ( Hello ) ” f  Image Recognition    cat f  Go Playing    f 5-5 (next move)  Chat Bot    “The address is … ” “ Where is GTC? ” f

  27. A Single Neuron 27 w x 1 1 Activation function w x 2   z 2   y   z  z w … N   1 x   1 z N   z e b Sigmoid function 1 bias z w, b are the parameters of this neuron 27

  28. A Single Neuron 28  N M f : R R w x 1 1 w x 2 z 2  y w … N   is " 2 " y 0 . 5 x  N   not " 2 " y 0 . 5 b 1 bias A single neuron can only handle binary classification 28

  29. A Layer of Neurons 29  N M f : R R  Handwriting digit classification  y x 1 1 “1” or not x  y 2 2 Which “2” or not one is … max?  x y N 3 “3” or not … … 1 10 neurons/10 classes A layer of neurons can handle multiple possible output, and the result depends on the max one

  30. Deep Neural Networks (DNN) 30  N M f : R R  Fully connected feedforward network Layer L Input Output Layer 1 Layer 2 …… y x 1 1 vector vector x …… y y 2 x 2 …… …… …… …… …… y x M N Deep NN: multiple hidden layers

  31. Recurrent Neural Network (RNN) 31 : tanh, ReLU time RNN can learn accumulated sequential information (time-series) http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/

  32. Deep Learning for LU 32  IOB Sequence Labeling for Slot Filling 𝑧 0 𝑧 1 𝑧 2 𝑧 𝑜 𝑧 0 𝑧 1 𝑧 2 𝑧 𝑜 𝑧 0 𝑧 1 𝑧 2 𝑧 𝑜 𝑐 𝑐 𝑐 𝑐 ℎ 𝑜 ℎ 0 ℎ 1 ℎ 2 ℎ 0 ℎ 1 ℎ 0 ℎ 1 ℎ 2 ℎ 𝑜 ℎ 2 ℎ 𝑜 𝑔 𝑔 𝑔 𝑔 ℎ 1 ℎ 0 ℎ 2 ℎ 𝑜 𝑥 0 𝑥 0 𝑥 1 𝑥 2 𝑥 𝑜 𝑥 1 𝑥 2 𝑥 𝑜 𝑥 0 𝑥 1 𝑥 2 𝑥 𝑜 (b) LSTM-LA (a) LSTM  Intent Classification (c) bLSTM intent ℎ 0 ℎ 1 ℎ 2 ℎ 𝑜 𝑥 0 𝑥 1 𝑥 2 𝑥 𝑜 (d) Intent LSTM 32

Recommend


More recommend