Pretraining Sentiment Classifiers with Unlabeled Dialog Data Jul. - PowerPoint PPT Presentation

Pretraining Sentiment Classifiers with Unlabeled Dialog Data Jul. 18, 2018 Toru Shimizu *1 , Hayato Kobayashi *1,*2 , Nobuyuki Shimizu *1 *1 Yahoo Japan Corporation, *2 RIKEN AIP 1

�� 2 56th Annual Meeting of the Association for Computational Linguistics, 15-20 July 2018, Melbourne

�� • The amount of labeled training data – You will need at least 100k training records to surpass classical approaches (Hu+ 2014, Wu+ 2014) – Large-scale labeled datasets of document classification �� 0�2�6��4 05��8�5�� 1�55��2�9 -2�75��B85� �� 5B85��2�2�5� 05�,B2��12�9� �� 0��2�9�� 3 56th Annual Meeting of the Association for Computational Linguistics, 15-20 July 2018, Melbourne

�� • Semi-supervised approaches – Language model ��/�! �� LSTM-RNN transfer !��/ ��-�� positive LSTM-RNN ��/�� -�� 4 56th Annual Meeting of the Association for Computational Linguistics, 15-20 July 2018, Melbourne

�� • Semi-supervised approaches – Sequence autoencoder (Dai and Le 2015) �� LSTM-RNN LSTM-RNN transfer �� -�� positive LSTM-RNN !�� -�� 5 56th Annual Meeting of the Association for Computational Linguistics, 15-20 July 2018, Melbourne

�� • Pretraining strategy with unlabeled dialog data – Pretrain an encoder-decoder model for sentiment classifiers • Outperform other semi-supervised methods – Language model – Sequence autoencoder – Distant supervision with emoji and emoticons • Case study based on... – Costly labeled sentiment dataset of 99.5K items – Large-scale unlabeled dialog dataset of 22.3M utterance- response pairs 6 56th Annual Meeting of the Association for Computational Linguistics, 15-20 July 2018, Melbourne

�� • Emotional conversations in a dialog dataset �� (�� ,�� !��'�� ,�,�� ! !��,�� ,��,(��,� ��)��(�� • Implicitly learn sentiment-handling capabilities through learning a dialog model 7 56th Annual Meeting of the Association for Computational Linguistics, 15-20 July 2018, Melbourne

��-��-�� • Datasets – Large-scale dialog corpus: a set of a large number of unlabeled utterance-response tweet pairs – Labeled dataset: a set of a moderate number of tweets with a sentiment label ! �� • Pretraining LSTM-RNN LSTM-RNN transfer �� -��-�� • Fine-tuning positive LSTM-RNN ��'�� 8 56th Annual Meeting of the Association for Computational Linguistics, 15-20 July 2018, Melbourne

�� • Dialog data – Extract 22.3M pairs of an utterance tweet and its response tweet from Twitter Firehose data training validation test total Dialog data 22,300,000 10,000 50,000 22,360,000 • Sentiment data – Positive: 15.0%, Negative: 18.6%, Neutral 66.4% training validation test total Sentiment data 80,591 4,000 15,000 99,591 9 56th Annual Meeting of the Association for Computational Linguistics, 15-20 July 2018, Melbourne

�� • Dialog model – One-layer LSTM-RNN encoder-decoder – Embedding layer: 4000 tokens, 256 elements – LSTM: 1024 elements – Representation which encoder gives: 1024 elements – Decoder's readout layer: 256 elements – Decoder's output layer: 4000 tokens – LSTMs of the encoder and decoder share the parameter ! �� dist. LSTM-RNN LSTM-RNN repr. �'��'� 10 56th Annual Meeting of the Association for Computational Linguistics, 15-20 July 2018, Melbourne

�� token ID y t output layer o t φ enc readout layer ψ dec recurrent layer h t enc recurrent layer h t dec embedding layer embedding layer φ dec token ID u t token ID x t α dec encoder decoder RNN RNN 11 56th Annual Meeting of the Association for Computational Linguistics, 15-20 July 2018, Melbourne

�� • Classification model – The architecture of the encoder RNN part is identical to that of the dialog model – Produce a probability distribution over sentiment classes by a fully-connected layer and softmax function κ output layer encoder RNN 12 56th Annual Meeting of the Association for Computational Linguistics, 15-20 July 2018, Melbourne

�� • Model pretraining with the dialog data – MLE training objective – 1 GPU (7 TFLOPS) – 5 epochs = 15.9 days – Batch size: 64 – Optimizer: ADADELTA – Apply gradient clipping – Evaluate validation costs 10 times per epoch and pick up the best model – Theano-based implementation 13 56th Annual Meeting of the Association for Computational Linguistics, 15-20 July 2018, Melbourne

�� • Classifier model training with the sentiment data – Apply 5 different data sizes for each method • 5k � 10k � 20k � 40k � 80k (all) – 5 runs for each method/data size with varying random seeds – Evaluate the results by the average of f-measure scores – Adjust the duration so that the cost surely converges • Pretrained models converge very quickly but those trained from scratch converge slowly – The other aspects are the same with pretraining 14 56th Annual Meeting of the Association for Computational Linguistics, 15-20 July 2018, Melbourne

�-�� • The proposed method: Dial ! �� LSTM-RNN LSTM-RNN transfer �� -��-�� positive LSTM-RNN �� '�� 15 56th Annual Meeting of the Association for Computational Linguistics, 15-20 July 2018, Melbourne

��-�� • Default – No pretraining – Directly trained by the sentiment data positive LSTM-RNN �� !�� - 16 56th Annual Meeting of the Association for Computational Linguistics, 15-20 July 2018, Melbourne

��-�� • Lang – Pretrain an LSTM-RNNs as a language model � ��'��!/� �� LSTM-RNN �� transfer �/�� '�� positive LSTM-RNN �� /� �� 17 56th Annual Meeting of the Association for Computational Linguistics, 15-20 July 2018, Melbourne

��-�� • SeqAE – Pretrain an LSTM-RNNs as a sequence autoencoder (Dai and Le 2015) ! �� LSTM-RNN LSTM-RNN �� transfer ! �� positive LSTM-RNN �� '�� 18 56th Annual Meeting of the Association for Computational Linguistics, 15-20 July 2018, Melbourne

�-��-��- • Emoji and emoticon-based distant supervision – Prepare large-scale datasets utilizing emoticons or emoji as pseudo labels (Go+ 2009) – Positive emoticon examples • ! " # $ % & ❤ ( ) * + �� ◠‿◠ �) ∀ ) o(^-^)o – Negative emoticon examples • , - . / 0 1 2 3 4 (TДT) ��(� �� (*� orz 19 56th Annual Meeting of the Association for Computational Linguistics, 15-20 July 2018, Melbourne

Pretraining Sentiment Classifiers with Unlabeled Dialog Data Jul. - PowerPoint PPT Presentation

Pretraining Sentiment Classifiers with Unlabeled Dialog Data Jul. 18, 2018 Toru Shimizu 1 , Hayato Kobayashi 1,2 , Nobuyuki Shimizu 1 1 Yahoo Japan Corporation, 2 RIKEN AIP 1

Nonlinear Classifiers II 2 Nonlinear Classifiers: Introduction Classifiers Supervised

Advanced NLU & Dialog Models Ling575 Spoken Dialog Systems April 21, 2016 Roadmap

Speech Processing 15-492/18-492 Spoken Dialog Systems Advanced Concepts in Dialog Spoken Dialog

Twitter Sentiment Analysis Twitter Sentiment Analysis Presented by: Loitongbam Gyanendro Singh

ActBERT: Learning Global-Local Video-Text Representations Linchao Zhu Self-supervised pretraining

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

Mimicking Word Embeddings using Subword RNNs Yuval Pinter, Robert Guthrie, Jacob Eisenstein

10 Steps to Counting Unlabeled Planar Graphs: 20 Years Later Manuel Bodirsky October 2007

AI DIALOG SEARCH news services Josef Krupi ka Michal Svoboda Goals dialog system

Dialog Models 11-716 September 18, 2003 Thomas Harris What is a (dialog) model? A model is

Dialog Management EE596/LING580 -- Conversational Artificial Intelligence Hao Cheng University

Wrapping Up Ling575 Spoken Dialog Systems June 5, 2013 Roadmap Overview Distinctive

SDS: ASR, NLU, & VXML Ling575 Spoken Dialog April 14, 2016 Roadmap Dialog System

Genres: Discourse, Speech, and Tweets Sentiment, Subjectivity & Stance Ling 575 April 15,

Sentiment analysis Christopher Potts CS 244U: Natural language understanding May 19 1 / 83

Linguistic Expressions of Sentiment, Subjectivity & Stance Ling575 Sentiment April 1, 2014

C OMPUTERS . S OFTWARE . N ETWORKS . Andrew Cross, Ph.D. President and CTO IP VIDEO IS THE FUTURE

PV module degradation: the impact of light induced degradation (LID) and how to fix it! Dr.

Mic icrofa rofabrication brication an an oppo portu rtunity! ity! Dr. Hilde Krikor

Project Plan RailBuilder: The Great Race to Promontory The Capstone Experience Team Union

Cdiz Determina nati tion o n of Bi positi tions ns in GaA aAs (1-x) Bi Bi x heterost stru

1 C-DAC Four Days Technology Workshop ON Hybrid Computing Coprocessors & Accelerators

Mission: Strengthening Knowledge and Skills for Sustainable Economic Development Areas:

Leapfrogging skills development in e-commerce in South-East Asia in the Framework of the 2030

Pretraining Sentiment Classifiers with Unlabeled Dialog Data Jul. - PowerPoint PPT Presentation

Pretraining Sentiment Classifiers with Unlabeled Dialog Data Jul. 18, 2018 Toru Shimizu *1 , Hayato Kobayashi *1,*2 , Nobuyuki Shimizu *1 *1 Yahoo Japan Corporation, *2 RIKEN AIP 1

Nonlinear Classifiers II 2 Nonlinear Classifiers: Introduction Classifiers Supervised

Advanced NLU &amp; Dialog Models Ling575 Spoken Dialog Systems April 21, 2016 Roadmap

Speech Processing 15-492/18-492 Spoken Dialog Systems Advanced Concepts in Dialog Spoken Dialog

Twitter Sentiment Analysis Twitter Sentiment Analysis Presented by: Loitongbam Gyanendro Singh

ActBERT: Learning Global-Local Video-Text Representations Linchao Zhu Self-supervised pretraining

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

Mimicking Word Embeddings using Subword RNNs Yuval Pinter, Robert Guthrie, Jacob Eisenstein

10 Steps to Counting Unlabeled Planar Graphs: 20 Years Later Manuel Bodirsky October 2007

AI DIALOG SEARCH news services Josef Krupi ka Michal Svoboda Goals dialog system

Dialog Models 11-716 September 18, 2003 Thomas Harris What is a (dialog) model? A model is

Dialog Management EE596/LING580 -- Conversational Artificial Intelligence Hao Cheng University

Wrapping Up Ling575 Spoken Dialog Systems June 5, 2013 Roadmap Overview Distinctive

SDS: ASR, NLU, &amp; VXML Ling575 Spoken Dialog April 14, 2016 Roadmap Dialog System

Genres: Discourse, Speech, and Tweets Sentiment, Subjectivity &amp; Stance Ling 575 April 15,

Sentiment analysis Christopher Potts CS 244U: Natural language understanding May 19 1 / 83

Linguistic Expressions of Sentiment, Subjectivity &amp; Stance Ling575 Sentiment April 1, 2014

C OMPUTERS . S OFTWARE . N ETWORKS . Andrew Cross, Ph.D. President and CTO IP VIDEO IS THE FUTURE

PV module degradation: the impact of light induced degradation (LID) and how to fix it! Dr.

Mic icrofa rofabrication brication an an oppo portu rtunity! ity! Dr. Hilde Krikor

Project Plan RailBuilder: The Great Race to Promontory The Capstone Experience Team Union

Cdiz Determina nati tion o n of Bi positi tions ns in GaA aAs (1-x) Bi Bi x heterost stru

1 C-DAC Four Days Technology Workshop ON Hybrid Computing Coprocessors &amp; Accelerators

Mission: Strengthening Knowledge and Skills for Sustainable Economic Development Areas:

Leapfrogging skills development in e-commerce in South-East Asia in the Framework of the 2030

Pretraining Sentiment Classifiers with Unlabeled Dialog Data Jul. 18, 2018 Toru Shimizu 1 , Hayato Kobayashi 1,2 , Nobuyuki Shimizu 1 1 Yahoo Japan Corporation, 2 RIKEN AIP 1

Advanced NLU & Dialog Models Ling575 Spoken Dialog Systems April 21, 2016 Roadmap

SDS: ASR, NLU, & VXML Ling575 Spoken Dialog April 14, 2016 Roadmap Dialog System

Genres: Discourse, Speech, and Tweets Sentiment, Subjectivity & Stance Ling 575 April 15,

Linguistic Expressions of Sentiment, Subjectivity & Stance Ling575 Sentiment April 1, 2014

1 C-DAC Four Days Technology Workshop ON Hybrid Computing Coprocessors & Accelerators