Understanding Hidden Memories of Recurrent Neural Networks Yao Ming - PowerPoint PPT Presentation

Understanding Hidden Memories of Recurrent Neural Networks Yao Ming , Shaozu Cao, Ruixiang Zhang, Zhen Li, Yuanzhe Chen, Yangqiu Song, Huamin Qu. THE HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY

What is a Recurrent Neural Network? H K U S T 2

Introduction What is Recurrent Neural Networks (RNN)? A deep learning model used for: y ( t ) h ( t ) tanh x ( t ) Machine Translation, Speech Recognition, Language Modeling, … A vanilla RNN H K U S T 3

Introduction What is Recurrent Neural Networks (RNN)? y ( t ) A vanilla RNN takes an input 𝒚 (#) , and update its hidden state 𝒊 (#,-) using: y ( t ) 𝒊 (#) = tanh (𝑿𝒊 #,- + 𝑾𝒚 (#) ) # h ( t ) 𝒊 3 Visual Analytics Science & Technology h ( t ) tanh 𝑦 (5) 𝑦 (4) 𝑦 (3) 𝑦 (-) input tanh # h ( t ) 𝒊 - RNN tanh x ( t ) hidden state A vanilla RNN x ( t ) 𝑧 (4) 𝑧 (3) 𝑧 (-) 𝑧 (5) output A 2-layer RNN H K U S T 4

? input output What has the RNN learned from data? H K U S T 5

Motivation What has the RNN learned from data? A. map the value of a single hidden unit on data (Karpathy A. et al., 2015) A unit sensitive to position in a line. A lot more units have no clear meanings. H K U S T 6

Motivation What has the RNN learned from data? B. matrix plots (Li J. et. al., 2016) Each column represents the value of the hidden state vector when reads a input word Scalability! Machine Translation: 4-layer, 1000 units/layer (Sutskever I. et al., 2014) Language Modeling: 2-layer, 1500 units/layer (Zaremba et al., 2015) H K U S T 7

Our Solution - RNNVis H K U S T 8

Our Solution Explaining individual hidden units Bi-graph and co-clustering Sequence evaluation H K U S T 9

Solution Explaining an individual hidden unit using its most salient words How to define salient? Model’s response to a word 𝑥 at step 𝑢 : the update of hidden state Δ𝒊 # (#) , 𝑗 = 1, … , 𝑜 . Δ𝒊 # = Δℎ : # ) implies that the word 𝑥 is more salient to unit 𝑗 . Larger abs(Δℎ : (#) can vary given the same word 𝑥 , we use the expectation: Since Δℎ : E Δ𝒊 # ｜ 𝑥 # = 𝑥 Can be estimated by running the model on dataset and take the mean. H K U S T 10

Solution Explaining an individual hidden unit using its most salient words 25% - 75% 9% - 91% response Unit: #36 Top 4 positive/negative salient words of unit 36 in an RNN (GRU) trained on Yelp review data. H K U S T 11

Solution Explaining an individual hidden unit using its most salient words mean 25% - 75% 9% - 91% Highly responsive hidden units Unit # Distribution of model’s response given the word “he”. Units reordered according to the mean. (an LSTM with 600 units) H K U S T 12

Solution Explaining an individual hidden unit using its most salient words Investigating one unit/word at a time… P: Too much user burden! S: An overview for easier exploration H K U S T 13

Solution Explaining individual hidden units Bi-graph and co-clustering Sequence evaluation H K U S T 14

Solution Bi-graph Formulation Hidden Units he she by can may Words H K U S T 15

Solution Bi-graph Formulation Hidden Units he she by can may Words H K U S T 16

Solution Co-clustering Hidden Units he she by can may Words Algorithm* Spectral co-clustering (Dhillon I. S., 2001) H K U S T 17

Solution Co-clustering – Edge Aggregation Hidden Units Color: sign of the average edge weight Width: scale of the average edge weight he she by can may Words H K U S T 18

Solution may Co-clustering - Visualization Hidden Units can by she Words he H K U S T 19

Solution Co-clustering - Visualization Hidden Units Color: each unit’s salience to the selected word he she by can may Words Hidden Units Clusters Words Clusters (Memory Chips) (Word Clouds) H K U S T 20

Solution Explaining individual hidden units Bi-graph and co-clustering Sequence evaluation H K U S T 21

Solution Glyph design for evaluating sentences Each glyph summarizes the dynamics of hidden unit clusters when reading a word Each bar represents the average scale of the value in a hidden units cluster The ratio of preserved value Decreased value More positive value are preserved Current value More negative value are preserved Update towards positive Increased value Update towards negative H K U S T 22

Case Studies How do RNNs handle sentiments? The language of Shakespeare H K U S T 23

Case Study – Sentiment Analysis Each unit has two sides Single-layer GRU with 50 hidden units (cells), trained on Yelp review data H K U S T 24

Case Study – Sentiment Analysis RNNs can learn to handle the context Single-layer GRU with 50 hidden units (cells), trained on Yelp review data negative positive Update towards positive A B Update towards negative Sentence A: I love the food, though the staff is not helpful Sentence B: The staff is not helpful, though I love the food H K U S T 25

Case Study – Sentiment Analysis Clues for the problem Single-layer GRU with 50 hidden units (cells), trained on Yelp review data. Problem: the data is not evenly sampled. H K U S T 26

Case Study – Sentiment Analysis Visual indicator of the performance Single-layer GRUs with 50 hidden units (cells), trained on Yelp review data. Accuracy (test): 91.9% Accuracy (test): 88.6% Balanced Dataset Unbalanced Dataset H K U S T 27

Case Studies How do RNNs handle the sentiments? The language of Shakespeare H K U S T 28

Case Study – Language Modeling The language of Shakespeare – A mixture of the old and the new H K U S T 29

Case Study – Language Modeling The language of Shakespeare – A mixture of the old and the new H K U S T 30

Discussion & Future Work • Clustering. The quality of co-clustering? Interactive clustering? • Glyph-based sentence visualization. Scalability? • Text data. How about speech data? • RNN models. More advanced RNN-based models like attention models? H K U S T 31

Thank you! Contact: Yao Ming, ymingaa@connect.ust.hk Page: www.myaooo.com/rnnvis Code: www.github.com/myaooo/rnnvis H K U S T 32

� Technical Details Explaining individual hidden units - Decomposition The output of an RNN at step 𝑢 is typically a probability distribution: L 𝒊 # 𝑞 : = softmax 𝑽𝒊 (#) = exp 𝒗 : L 𝒊 # ) ∑ exp(𝒗 N N L , 𝑗 = 1,2, … , 𝑜 , is the output projection matrix. where 𝑽 = 𝒗 : The numerator of 𝑞 : can be decomposed to: # 𝒖 L 𝒊 R − 𝒊 R,- L Δ𝒊 # ) L 𝒊 # exp 𝒗 : = exp Q 𝒗 : = U exp(𝒗 : RT- 𝝊T𝟐 L Δ𝒊 # ) is the multiplicative contribution of input word 𝑥 # , the update of hidden state Here exp(𝒗 : Δ𝒊 # can be regard as the model’s response to 𝑥 # . H K U S T 33

Evaluation Expert Interview 1 2 3 4 5 Show Explore Answer Finish Compare a tutorial video the tool two models questions a survey H K U S T 34

Challenges What are the challenges? 1. The complexity of the model • Machine Translation: 4-layer LSTMs, 1000 units/layer (Sutskever I. et al., 2014) • Language Modeling: 2-layer LSTMs, 650 or 1500 units/layer (Zaremba et al., 2015) 2. The complexity of the hidden memory • Semantic information are distributed in hidden states of an RNN. 3. The complexity of the data • Patterns in sequential data like texts are difficult to be analyzed and interpreted H K U S T 35

Other Findings Comparing LSTMs and vanilla RNNs Left (A-C): co-cluster visualization of the last layer of an RNN. Right (D-F): visualization of the cell states of the last layer of an LSTM. Bottom (GH): two models’ responses to the same word “offer”. H K U S T 36

Contribution • A visual technique for understanding what RNNs learned. • A VA tool that ablates the hidden dynamics of a trained RNN. • Interesting findings with RNN models. H K U S T 37

Understanding Hidden Memories of Recurrent Neural Networks Yao Ming - PowerPoint PPT Presentation

Understanding Hidden Memories of Recurrent Neural Networks Yao Ming , Shaozu Cao, Ruixiang Zhang, Zhen Li, Yuanzhe Chen, Yangqiu Song, Huamin Qu. THE HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY What is a Recurrent Neural Network? H K U S T

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

The Power of Linear Recurrent Neural Networks Neural Networks Was knnen lineare rekurrente

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

Understanding LSTM Networks Recurrent Neural Networks An unrolled recurrent neural network The

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER

Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep

Introduction to Recurrent Neural Networks Jakob Verbeek Modeling sequential data with Recurrent

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

Computa(on through dynamics Using recurrent neural networks to unveil mechanism in neural

Real Time Embedded Systems " Memories Memories " rene.beuchat@epfl.ch LAP/ISIM/IC/EPFL

Recurrent Neural Networks CS60010: Deep Learning Abir Das IIT Kharagpur Mar 11, 2020

IN5550 Neural Methods in Natural Language Processing Recurrent Neural Networks Stephan Oepen

NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and

Introduction to the course RECURREN T N EURAL N ETW ORK S F OR LAN GUAGE MODELIN G IN P YTH ON

Recurrent neural network grammars Slide credits: Chris Dyer, Adhiguna Kuncoro Widespread

Recurrent Recommendation with Local Coherence Jianling Wang and James Caverlee Dynamics in

Translating Videos to Natural Language Using Deep Recurrent Neural Networks Subhashini Jeff

Normalizing tweets with edit scripts and recurrent neural embeddings Grzegorz Chrupaa |

Sparse Attentive Backtracking: Temporal credit assignment through reminding Nan Rosemary Ke 1,2 ,

Geometry-Aware Deep Visual Learning Katerina Fragkiadaki zebras How this talk fits the workshop

Neural Network Part 4: Recurrent Neural Networks Yingyu Liang Computer Sciences 760 Fall 2017

Understanding Hidden Memories of Recurrent Neural Networks Yao Ming - PowerPoint PPT Presentation

Understanding Hidden Memories of Recurrent Neural Networks Yao Ming , Shaozu Cao, Ruixiang Zhang, Zhen Li, Yuanzhe Chen, Yangqiu Song, Huamin Qu. THE HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY What is a Recurrent Neural Network? H K U S T

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

The Power of Linear Recurrent Neural Networks Neural Networks Was knnen lineare rekurrente

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

Understanding LSTM Networks Recurrent Neural Networks An unrolled recurrent neural network The

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER

Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep

Introduction to Recurrent Neural Networks Jakob Verbeek Modeling sequential data with Recurrent

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

Computa(on through dynamics Using recurrent neural networks to unveil mechanism in neural

Real Time Embedded Systems &quot; Memories Memories &quot; rene.beuchat@epfl.ch LAP/ISIM/IC/EPFL

Recurrent Neural Networks CS60010: Deep Learning Abir Das IIT Kharagpur Mar 11, 2020

IN5550 Neural Methods in Natural Language Processing Recurrent Neural Networks Stephan Oepen

NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and

Introduction to the course RECURREN T N EURAL N ETW ORK S F OR LAN GUAGE MODELIN G IN P YTH ON

Recurrent neural network grammars Slide credits: Chris Dyer, Adhiguna Kuncoro Widespread

Recurrent Recommendation with Local Coherence Jianling Wang and James Caverlee Dynamics in

Translating Videos to Natural Language Using Deep Recurrent Neural Networks Subhashini Jeff

Normalizing tweets with edit scripts and recurrent neural embeddings Grzegorz Chrupaa |

Sparse Attentive Backtracking: Temporal credit assignment through reminding Nan Rosemary Ke 1,2 ,

Geometry-Aware Deep Visual Learning Katerina Fragkiadaki zebras How this talk fits the workshop

Neural Network Part 4: Recurrent Neural Networks Yingyu Liang Computer Sciences 760 Fall 2017

Real Time Embedded Systems " Memories Memories " rene.beuchat@epfl.ch LAP/ISIM/IC/EPFL