What’s so Hard about Natural Language Understanding? Alan Ritter Computer Science and Engineering The Ohio State University Collaborators: Jiwei Li, Dan Jurafsky (Stanford) Bill Dolan, Michel Galley, Jianfeng Gao (MSR), Colin Cherry (Google) Jeniya Tabassum (Ohio State), Alexander Konovalov (Ohio State), Wei Xu (Ohio State) Brendan O’Connor (Umass)
What’s so Hard about Natural Language Understanding? Alan Ritter Computer Science and Engineering The Ohio State University Collaborators: Jiwei Li , Dan Jurafsky (Stanford) Bill Dolan, Michel Galley, Jianfeng Gao (MSR), Colin Cherry (Google) Jeniya Tabassum (Ohio State), Alexander Konovalov (Ohio State), Wei Xu (Ohio State) Brendan O’Connor (Umass)
Q: Why are we so good at Speech, MT (but bad at NLU)? People naturally translate and transcribe.
Q: Why are we so good at Speech, MT (but bad at NLU)? People naturally translate and transcribe. Q: Large, End-to-End Datasets for NLU? • Web-scale Conversations? • Web-scale Structured Data?
Q: Why are we so good at Speech, MT (but bad at NLU)? People naturally translate and transcribe. Q: Large, End-to-End Datasets for NLU? • Web-scale Conversations? • Web-scale Structured Data?
Data-Driven Conversation • Twitter: ~ 500 Million Public SMS-Style Conversations per Month • Goal: Learn conversational agents directly from massive volumes of data. 6
Data-Driven Conversation • Twitter: ~ 500 Million Public SMS-Style Conversations per Month • Goal: Learn conversational agents directly from massive volumes of data. 6
[Ritter, Cherry, Dolan EMNLP 2011] Noisy Channel Model Input: Who wants to come over for dinner tomorrow? 7
[Ritter, Cherry, Dolan EMNLP 2011] Noisy Channel Model Input: Who wants to come over for dinner tomorrow? { Output: Yum ! I 7
[Ritter, Cherry, Dolan EMNLP 2011] Noisy Channel Model Input: Who wants to come over for dinner tomorrow? { { Output: Yum ! I want to 7
[Ritter, Cherry, Dolan EMNLP 2011] Noisy Channel Model Input: Who wants to come over for dinner tomorrow? { { { Output: Yum ! I want to be there 7
[Ritter, Cherry, Dolan EMNLP 2011] Noisy Channel Model Input: Who wants to come over for dinner tomorrow? { { { { Output: Yum ! I want to be there tomorrow ! 7
Neural Conversation [Sordoni et. al. 2015] [Xu et. al. 2016] [Wen et. al. 2016] [Li et. al. 2016] [Kannan et. al. 2016] [Serban et. al. 2016] 8
Neural Conversation [Sordoni et. al. 2015] [Xu et. al. 2016] [Wen et. al. 2016] [Li et. al. 2016] [Kannan et. al. 2016] [Serban et. al. 2016] 8
How old are you? Slide Credit: Jiwei Li 9
How old are you? i 'm 16 . Slide Credit: Jiwei Li 10
How old are you? i 'm 16 . 16 ? Slide Credit: Jiwei Li 11
How old are you? i 'm 16 . 16 ? i don 't know what you 're talking about Slide Credit: Jiwei Li 12
How old are you? i 'm 16 . 16 ? i don 't know what you 're talking about you don 't know what you 're saying Slide Credit: Jiwei Li 13
How old are you? i 'm 16 . 16 ? i don 't know what you 're talking about you don 't know what you 're saying i don 't know what you 're talking about … Slide Credit: Jiwei Li
Bad Action How old are you? i 'm 16 . 16 ? i don 't know what you 're talking about you don 't know what you 're saying i don 't know what you 're talking about … Slide Credit: Jiwei Li
Bad Action How old are you? i 'm 16 . Outcome 16 ? i don 't know what you 're talking about you don 't know what you 're saying i don 't know what you 're talking about … Slide Credit: Jiwei Li
Deep Reinforcement Learning [Li, Monroe, Ritter, Galley, Gao, Jurafsky EMNLP 2016] How old are you? State Encoding how old are you
Deep Reinforcement Learning [Li, Monroe, Ritter, Galley, Gao, Jurafsky EMNLP 2016] Action How old are you? i 'm 16 . . 16 EOS I’m Encoding Decoding how EOS old are you I’m 16 .
Learning: Policy Gradient REINFORCE Algorithm (Williams,1992) What we want to learn Action How old are you? i 'm 16 . . 16 EOS I’m Encoding Decoding how EOS old are you I’m 16 .
Q: Rewards?
Q: Rewards? A: Turing Test
Q: Rewards? A: Turing Test Adversarial Learning (Goodfellow et al., 2014)
Adversarial Learning for Neural Dialogue [Li, Monroe, Shi, Jean, Ritter, Jurafsky EMNLP 2016] sample Real-world human response conversations Discriminator Real or Fake? generate response Response Generator
Adversarial Learning for Neural Dialogue [Li, Monroe, Shi, Jean, Ritter, Jurafsky EMNLP 2016] (Alternate Between Training Generator and Discriminator) sample Real-world human response conversations Discriminator Real or Fake? generate response Response Generator
Adversarial Learning for Neural Dialogue [Li, Monroe, Shi, Jean, Ritter, Jurafsky EMNLP 2016] (Alternate Between Training Generator and Discriminator) sample Real-world human response conversations Discriminator Real or Fake? generate response Response Generator REINFORCE Algorithm (Williams,1992)
Adversarial Learning Improves Response Generation vs vanilla generation model Adversarial Adversarial Tie Win Lose Human Evaluator: 62% 18% 20% Adversarial Success (How often can you fool a machine) Adversarial Learning 8.0% Machine Evaluator: Standard Seq2Seq model 4.9% [Bowman et. al. 2016] Slide Credit: Jiwei Li
Q: Why are we so good at Speech, MT (but bad at NLU)? People naturally translate and transcribe. Q: Large, End-to-End Datasets for NLU? • Web-scale Conversations? • Web-scale Structured Data?
Q: Why are we so good at Speech, MT (but bad at NLU)? People naturally translate and transcribe. Q: Large, End-to-End Datasets for NLU? Generates fluent open domain • Web-scale Conversations? replies • Web-scale Structured Data?
Q: Why are we so good at Speech, MT (but bad at NLU)? People naturally translate and transcribe. Q: Large, End-to-End Datasets for NLU? Generates fluent open domain • Web-scale Conversations? replies • Web-scale Structured Data? Really Natural Language Understanding?
Q: Why are we so good at Speech, MT (but bad at NLU)? People naturally translate and transcribe. Q: Large, End-to-End Datasets for NLU? Generates fluent open domain • Web-scale Conversations? replies • Web-scale Structured Data? Really Natural Language Understanding?
Learning from Distant Supervision [Mintz et. al. 2009] 1) Named Entity Recognition Challenge: highly ambiguous labels [Ritter, et. al. EMNLP 2011] 2) Relation Extraction Challenge: missing data [Ritter, et. al. TACL 2013] 3) Time Normalization Challenge: diversity in noisy text [Tabassum, Ritter, Xu, EMNLP 2016] N X − λ U D (˜ p unlabeled O ( θ ) = log p θ ( y i | x i ) p || ˆ ) 4) Event Extraction − θ | {z } i Label regularization Challenge: lack of negative examples | {z } Log Likelihood [Ritter, et. al. WWW 2015] [Konovalov, et. al. WWW 2017]
Learning from Distant Supervision [Mintz et. al. 2009] 1) Named Entity Recognition Challenge: highly ambiguous labels [Ritter, et. al. EMNLP 2011] 2) Relation Extraction Challenge: missing data [Ritter, et. al. TACL 2013] 3) Time Normalization Challenge: diversity in noisy text [Tabassum, Ritter, Xu, EMNLP 2016] N X − λ U D (˜ p unlabeled O ( θ ) = log p θ ( y i | x i ) p || ˆ ) 4) Event Extraction − θ | {z } i Label regularization Challenge: lack of negative examples | {z } Log Likelihood [Ritter, et. al. WWW 2015] [Konovalov, et. al. WWW 2017]
Time Normalization [Tabassum, Ritter, Xu EMNLP 2016] State-of- the-art time resolvers { } TempEX HeidelTime SUTime UWTime 1 Jan 2016
Time Normalization [Tabassum, Ritter, Xu EMNLP 2016] Distant Supervision (no human labels or rules!) State-of- the-art time resolvers { } TempEX HeidelTime SUTime UWTime 1 Jan 2016
Distant Supervision Assumption Mercury Transit May 9,2016
Distant Supervision Assumption Mercury Transit May 9,2016
Distant Supervision Assumption Mercury Transit May 9,2016 8 May 9 May 10 May
Distant Supervision Assumption Mercury Transit May 9,2016 8 May 9 May 10 May
Distant Supervision Assumption Mercury Transit May 9,2016 8 May 9 May 10 May
Distant Supervision Assumption Mercury Transit May 9,2016 8 May 9 May 10 May
Distant Supervision Assumption Mercury Transit May 9,2016 8 May 9 May 10 May
Distant Supervision Assumption Mercury Transit May 9,2016 8 May 9 May 10 May
Distant Supervision Assumption Mercury Transit May 9,2016 8 May 9 May 10 May
Distant Supervision Assumption Mercury Transit May 9,2016 8 May 9 May 10 May
Multiple Instance Learning Tagger [ Mercury, 5/9/2016 ] … w 1 w 2 w n w 3 Words t 1 t 2 t 3 t 4 Sentence Level Tags 1 Mon 1 Past … … … Present 31 12 Sun Future [Event Database]
Multiple Instance Learning Tagger [ Mercury, 5/9/2016 ] … w 1 w 2 w n w 3 Words Local Classifier exp ( θ · f ( w i , z i )) … z 1 z 2 z 3 z n Word Level Tags t 1 t 2 t 3 t 4 Sentence Level Tags 1 Mon 1 Past … … … Present 31 12 Sun Future [Event Database]
Recommend
More recommend