De Deep Le Learnin ing fo for Di Dialogue Sy Systems GTC 2018 P ROF . Y UN -N UNG (V IVIAN ) C HEN 陳縕儂 Mar 28 th , 2018 HTTP://VIVIANCHEN.IDV.TW
2 Best Poster Award @ GTC 2017 Thanks NVIDIA!!!
Future Life – Intelligent Assistant 3
Introduction & Background 4
Language Empowering Intelligent Assistant 5 Microsoft Cortana (2014) Google Now (2012) Apple Siri (2011) Google Assistant (2016) Apple HomePod (2017) Amazon Alexa/Echo (2014) Facebook M & Bot (2015) Google Home (2016)
Why We Need? 6 Get things done E.g. set up alarm/reminder, take note Easy access to structured data, services and apps E.g. find docs/photos/restaurants Assist your daily schedule and routine E.g. commute alerts to/from work Be more productive in managing your work and personal life “Hey Assistant” 6
Why Natural Language? 7 Global Digital Statistics (2017 January) Unique Active Social Active Mobile Internet Users Total Population Media Users Mobile Users Social Users 3.77B 7.48B 2.79B 4.92B 2.55B The more natural and convenient input of devices evolves towards speech. 7
Dialogue System 8 Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions. Spoken dialogue systems are being incorporated into various devices (smart-phones, smart TVs, in- car navigating system, etc). JARVIS – Iron Man’s Personal Assistant Baymax – Personal Healthcare Companion Good dialogue systems assist users to access information conveniently and finish tasks efficiently. 8
App Bot 9 A bot is responsible for a “single” domain, similar to an app Users can initiate dialogues instead of following the GUI design 9
Task-Oriented Dialogue System (Young, 2000) 10 http://rsta.royalsocietypublishing.org/content/358/1769/1389.short Speech Signal Hypothesis are there any action movies to see this weekend Language Understanding (LU) • Domain Identification Speech • User Intent Detection Recognition • Slot Filling Text Input Are there any action movies to see this weekend? Semantic Frame request_movie genre=action, date=this weekend Dialogue Management (DM) Natural Language • Dialogue State Tracking (DST) Generation (NLG) Text response • Dialogue Policy Where are you located? System Action/Policy request_location Backend Action / Knowledge Providers 10
Interaction Example 11 User find a good eating place for taiwanese food Good Taiwanese eating places include Din Tai Fung, Boiling Point, etc. What do you want to choose? I can help you go there. Intelligent Q: How does a dialogue system process this request? Agent 11
Task-Oriented Dialogue System (Young, 2000) 12 Speech Signal Hypothesis are there any action movies to see this weekend Language Understanding (LU) • Domain Identification Speech • User Intent Detection Recognition • Slot Filling Text Input Are there any action movies to see this weekend? Semantic Frame request_movie genre=action, date=this weekend Dialogue Management (DM) Natural Language • Dialogue State Tracking (DST) Generation (NLG) Text response • Dialogue Policy Where are you located? System Action/Policy request_location Backend Action / Knowledge Providers 12
1. Domain Identification Requires Predefined Domain Ontology 13 User find a good eating place for taiwanese food Movie DB Restaurant DB Taxi DB Organized Domain Knowledge (Database) Intelligent Agent Classification! 13
2. Intent Detection Requires Predefined Schema 14 User find a good eating place for taiwanese food FIND_RESTAURANT FIND_PRICE Restaurant DB FIND_TYPE : Intelligent Agent Classification! 14
3. Slot Filling Requires Predefined Schema 15 O O B-rating O O O B-type O User find a good eating place for taiwanese food Restaurant Rating Type Rest 1 good Taiwanese Rest 2 bad Thai Restaurant DB : : : FIND_RESTAURANT SELECT restaurant { Intelligent rest.rating =“good” rating=“good” Agent type=“ taiwanese ” rest.type =“ taiwanese ” } Semantic Frame Sequence Labeling 15
Task-Oriented Dialogue System (Young, 2000) 16 Speech Signal Hypothesis are there any action movies to see this weekend Language Understanding (LU) • Domain Identification Speech • User Intent Detection Recognition • Slot Filling Text Input Are there any action movies to see this weekend? Semantic Frame request_movie genre=action, date=this weekend Dialogue Management (DM) Natural Language • Dialogue State Tracking (DST) Generation (NLG) Text response • Dialogue Policy Where are you located? System Action/Policy request_location Backend Action / Knowledge Providers 16
Elements of Dialogue Management 17 (Figure from Gašić ) 17
State Tracking Requires Hand-Crafted States 18 User find a good eating place for taiwanese food i want it near to my office NULL location rating type Intelligent rating, loc, loc, rating Agent type type all 18
State Tracking Requires Hand-Crafted States 19 User find a good eating place for taiwanese food i want it near to my office NULL location rating type Intelligent rating, loc, loc, rating Agent type type all 19
State Tracking Handling Errors and Confidence 20 User find a good eating place for taixxxx food FIND_RESTAURANT FIND_RESTAURANT FIND_RESTAURANT rating=“good” rating=“good” rating=“good” type=“ taiwanese ” type=“ thai ” ? rating=“good” NULL , type=“ thai ” ? ? rating=“good”, location rating type type=“ taiwanese ” ? Intelligent rating, loc, loc, rating Agent type type all 20
Elements of Dialogue Management 21 (Figure from Gašić ) 21
Dialogue Policy for Agent Action 22 Inform(location=“Taipei 101”) “The nearest one is at Taipei 101” Request(location) “Where is your home?” Confirm(type=“ taiwanese ”) “Did you want Taiwanese food?” 22
Task-Oriented Dialogue System (Young, 2000) 23 Speech Signal Hypothesis are there any action movies to see this weekend Language Understanding (LU) • Domain Identification Speech • User Intent Detection Recognition • Slot Filling Text Input Are there any action movies to see this weekend? Semantic Frame request_movie genre=action, date=this weekend Dialogue Management (DM) Natural Language • Dialogue State Tracking (DST) Generation (NLG) Text response • Dialogue Policy Where are you located? System Action/Policy request_location Backend Action / Knowledge Providers
Output / Natural Language Generation 24 Goal: generate natural language or GUI given the selected dialogue action for interactions Inform(location=“Taipei 101”) “The nearest one is at Taipei 101” v.s. Request(location) “Where is your home?” v.s. Confirm(type=“ taiwanese ”) “Did you want Taiwanese food?” v.s. 24
Deep Learning for Dialogue Systems 25
Machine Learning ≈ Looking for a Function 26 Speech Recognition “你好 ( Hello ) ” f Image Recognition cat f Go Playing f 5-5 (next move) Chat Bot “The address is … ” “ Where is GTC? ” f
A Single Neuron 27 w x 1 1 Activation function w x 2 z 2 y z z w … N 1 x 1 z N z e b Sigmoid function 1 bias z w, b are the parameters of this neuron 27
A Single Neuron 28 N M f : R R w x 1 1 w x 2 z 2 y w … N is " 2 " y 0 . 5 x N not " 2 " y 0 . 5 b 1 bias A single neuron can only handle binary classification 28
A Layer of Neurons 29 N M f : R R Handwriting digit classification y x 1 1 “1” or not x y 2 2 Which “2” or not one is … max? x y N 3 “3” or not … … 1 10 neurons/10 classes A layer of neurons can handle multiple possible output, and the result depends on the max one
Deep Neural Networks (DNN) 30 N M f : R R Fully connected feedforward network Layer L Input Output Layer 1 Layer 2 …… y x 1 1 vector vector x …… y y 2 x 2 …… …… …… …… …… y x M N Deep NN: multiple hidden layers
Recurrent Neural Network (RNN) 31 : tanh, ReLU time RNN can learn accumulated sequential information (time-series) http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/
Deep Learning for LU 32 IOB Sequence Labeling for Slot Filling 𝑧 0 𝑧 1 𝑧 2 𝑧 𝑜 𝑧 0 𝑧 1 𝑧 2 𝑧 𝑜 𝑧 0 𝑧 1 𝑧 2 𝑧 𝑜 𝑐 𝑐 𝑐 𝑐 ℎ 𝑜 ℎ 0 ℎ 1 ℎ 2 ℎ 0 ℎ 1 ℎ 0 ℎ 1 ℎ 2 ℎ 𝑜 ℎ 2 ℎ 𝑜 𝑔 𝑔 𝑔 𝑔 ℎ 1 ℎ 0 ℎ 2 ℎ 𝑜 𝑥 0 𝑥 0 𝑥 1 𝑥 2 𝑥 𝑜 𝑥 1 𝑥 2 𝑥 𝑜 𝑥 0 𝑥 1 𝑥 2 𝑥 𝑜 (b) LSTM-LA (a) LSTM Intent Classification (c) bLSTM intent ℎ 0 ℎ 1 ℎ 2 ℎ 𝑜 𝑥 0 𝑥 1 𝑥 2 𝑥 𝑜 (d) Intent LSTM 32
Recommend
More recommend