How Contexts Matter Understanding in Dialogues Y UN -N UNG (V IVIAN ) C HEN
§ Word-Level Contexts in Sentences § Learning from Prior Knowledge – K nowledge-Guided S tructural A ttention N etworks (K-SAN) [Chen et al., ‘16] § Learning from Observations – M odularizing U nsupervised S ense E mbedding (MUSE) [Lee & Chen, ‘17] § Sentence-Level Contexts in Dialogues § Investigation of Understanding Impact – Reinforcement Learning Based Neural Dialogue System [Li et al., ‘17] § Conclusion 2
§ Dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via conversational interactions. § Dialogue systems are being incorporated into various devices (smart- phones, smart TVs, in-car navigating system, etc). JARVIS – Iron Man’s Personal Assistant Baymax – Personal Healthcare Companion 3
§ Word-level context § Prior knowledge such as linguistic syntax show me the flights from seattle to san francisco § Collocated words Smartphone companies including apple, blackberry, and sony will be invited. Contexts provide informative cues for better understanding § Sentence-level context request_movie (browsing action movie reviews…) (genre=action, date=this weekend) Find me a good one this weekend London Has Fallen is currently the number 1 action movie in America How misunderstanding influences the dialogue system performance 4
5 Knowledge-Guided Structural Attention Network (K-SAN) Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Knowledge as a Teacher: Knowledge-Guided Structural Attention Networks,” preprint arXiv: 1609.00777, 2016.
§ Syntax (Dependency Tree) § Semantics (AMR Graph) Sentence s show me the flights from seattle to san francisco ROOT show Knowledge-Guided Substructure x i Knowledge-Guided Substructure x i 1. you show 1. show you 1. show me 1. 2. show flight seattle 2. show flights the 4. flight me flights I 3. show flight san francisco 3. show flights from seattle city 2. 2. 4. show i 4. show flights to franciscosan to the from city Seattle (s / show 3. 3. :ARG0 (y / you) seattle francisco . San Francisco :ARG1 (f / flight :source (c / city 4. san :name (d / name :op1 Seattle)) :destination (c2 / city :name (s2 / name :op1 San :op2 Francisco))) :ARG2 (i / I) :mode imperative) Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Knowledge as a Teacher: Knowledge-Guided 6 Structural Attention Networks,” preprint arXiv: 1609.00777, 2016.
s Input Sentence Knowledge Encoding Module Knowledge-Guided ROOT Sentence Representation Encoding show me the flights from seattle to san francisco ∑ NN out CNN in RNN u o Tagger knowledge-guided structure {x i } Inner M M M Product w t+1 w t-1 w t U U U Knowledge Attention Distribution p i h t h t+1 h t-1 W W W W CNN kg V V V m i y t+1 y t-1 y t Knowledge h slot tagging sequence y Encoding Encoded Knowledge Representation Weighted Sum The model will pay more attention to more important substructures that may be crucial for slot tagging. Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Knowledge as a Teacher: Knowledge-Guided 7 Structural Attention Networks,” preprint arXiv: 1609.00777, 2016.
§ Darker blocks and lines correspond to higher attention weights Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Knowledge as a Teacher: Knowledge-Guided 8 Structural Attention Networks,” preprint arXiv: 1609.00777, 2016.
§ Darker blocks and lines correspond to higher attention weights K-SAN learns the similar attention to salient substructures with less training data Y.-N. Chen, D. Hakkani-Tur, G. Tur, A. Celikyilmaz, J. Gao, and L. Deng, “Knowledge as a Teacher: Knowledge-Guided 9 Structural Attention Networks,” preprint arXiv: 1609.00777, 2016.
Modularizing Unsupervised Sense Embeddings (MUSE) G.-H. Lee and Y.-N. Chen, “MUSE: Modularizing Unsupervised Sense Embeddings,” in EMNLP , 2017. 10
§ Word embeddings are trained on a corpus in an unsupervised manner Finally I chose Google instead of Apple. Can you buy me a bag of apples, oranges, and bananas? § Using the same embeddings for different senses for NLP tasks, e.g. NLU, POS tagging Words with different senses should correspond different embeddings 11 G.-H. Lee and Y.-N. Chen, “MUSE: Modularizing Unsupervised Sense Embeddings,” in EMNLP , 2017.
§ Input: unannotated text corpus § Two key mechanisms § Sense selection given a text context § Sense representation to embed statistical characteristics of sense identity sense embedding apple-1 apple-2 sense selection apple Smartphone companies including blackberry, and sony will be invited. 12 G.-H. Lee and Y.-N. Chen, “MUSE: Modularizing Unsupervised Sense Embeddings,” in EMNLP , 2017.
sample collocation 1 Corpus: { Smartphone companies including apple blackberry, and sony will be invited.} Sense selection for target word 𝐷 $ Sense selection for collocated word 𝐷 $ % sense selection ← 2 2 negative sampling 3 sense selection → reward signal ← 𝑟(𝑨 *, |𝐷 $' ) 𝑟(𝑨 *2 |𝐷 $' ) 𝑟(𝑨 *3 |𝐷 $' ) < ) < ) < ) 𝑄(𝑨 *2 |𝑨 7, ) 𝑄(𝑨 89 |𝑨 7, ) 𝑟(𝑨 7, |𝐷 $ 𝑟(𝑨 72 |𝐷 $ 𝑟(𝑨 73 |𝐷 $ … matrix 𝑅 * matrix 𝑊 matrix 𝑅 7 matrix 𝑄 matrix 𝑄 matrix 𝑉 𝐷 $'+, 𝐷 $' = 𝑥 * 𝐷 $'6, … … 𝐷 $+, 𝐷 $6, … 𝑨 7, … 𝐷 $ = 𝑥 7 including apple sony blackberry and blackberry and companies including apple Sense Selection Module Sense Representation Module Sense Selection Module § Sense representation learning § Sense selection § Policy-based § Skip-gram approximation § Value-based 13 Collocated likelihood serves as a reward signal to optimize the sense selection module.
§ Dataset: SCWS for multi-sense embedding evaluation correlation=? He borrowed the money from banks . I live near to a river . Approach MaxSimC AvgSimC Huang et al., 2012 26.1 65.7 Neelakantan et al., 2014 60.1 69.3 Tian et al., 2014 63.6 65.4 Li & Jurafsky, 2015 66.6 66.8 Bartunov et al., 2016 53.8 61.2 Qiu et al., 2016 64.9 66.1 MUSE-Policy 66.1 67.4 MUSE-Greedy 66.3 68.3 MUSE-ε-Greedy 67.4 + 68.6 14
Context … braves finish the season in … his later years proudly wore tie with the los angeles tie with the chinese characters dodgers … for … k-NN scoreless otl shootout 6-6 pants trousers shirt juventus hingis 3-3 7-7 0-0 blazer socks anfield Figure 15
Context … of the mulberry or the … of the large number of blackberry and minos sent blackberry users in the us him to … federal … k-NN cranberries maple vaccinium smartphones sap microsoft ipv6 apricot apple smartphone Figure 16
Context … shells and/or high … head was shaven to … appoint john pope explosive squash prevent head lice republican as head of head and/or anti- serious threat back the new army of … tank … then … k-NN venter thorax neck shaved thatcher loki multi-party appoints spear millimeters thorax mao luther chest unicameral beria fusiform appointed Figure MUSE learns sense embeddings in an unsupervised way and achieves the first purely sense-level representation learning system with linear-time sense selection 17
RL-Based Neural Dialogue Systems X. Li, Y.-N. Chen, L. Li, J. Gao, and A. Celikyilmaz, “End-to-End Task-Completion Neural Dialogue Systems,” in IJCNLP , 2017.
§ Dialogue management is framed as a reinforcement learning task § Agent learns to select actions to maximize the expected reward Observation If booking a right ticket, reward = +30 If failing, reward = -30 Reward Otherwise, reward = -1 Environment Agent Action X. Li, Y.-N. Chen, L. Li, J. Gao, and A. Celikyilmaz, “End-to-End Task-Completion Neural Dialogue Systems,” in IJCNLP , 2017. 19
§ Dialogue management is framed as a reinforcement learning task § Agent learns to select actions to maximize the expected reward Observation Text Input: Are there any action movies to see this weekend? User Simulator Neural Dialogue System Natural Language Generation Language Understanding Dialogue Management User Agenda Modeling Environment Agent Dialogue Policy: request_location Action X. Li, Y.-N. Chen, L. Li, J. Gao, and A. Celikyilmaz, “End-to-End Task-Completion Neural Dialogue Systems,” in IJCNLP , 2017. 20
§ LU, DST (neural dialogue system), and NLG (user simulation) are trained in supervised way § End-to-end training for dialogue policy learning w i+1 w i EOS w i+1 Natural w i EOS w i+1 w i EOS LU Language NLG <intent> <slot> O w 1 EOS <intent> w w 2 <slot> O <intent> <slot> O 0 DST Knowledge 𝑡 $ User Goal Database Dialogue Dialogue Policy Learning Policy …… User Model 𝑡 , 𝑡 2 𝑡 > Neural User Simulation Dialogue System … 𝑏 , 𝑏 2 𝑏 @ X. Li, Y.-N. Chen, L. Li, J. Gao, and A. Celikyilmaz, “End-to-End Task-Completion Neural Dialogue Systems,” in IJCNLP , 2017. 21
Recommend
More recommend