understanding hidden memories of recurrent neural networks
play

Understanding Hidden Memories of Recurrent Neural Networks Yao Ming - PowerPoint PPT Presentation

Understanding Hidden Memories of Recurrent Neural Networks Yao Ming , Shaozu Cao, Ruixiang Zhang, Zhen Li, Yuanzhe Chen, Yangqiu Song, Huamin Qu. THE HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY What is a Recurrent Neural Network? H K U S T


  1. Understanding Hidden Memories of Recurrent Neural Networks Yao Ming , Shaozu Cao, Ruixiang Zhang, Zhen Li, Yuanzhe Chen, Yangqiu Song, Huamin Qu. THE HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY

  2. What is a Recurrent Neural Network? H K U S T 2

  3. Introduction What is Recurrent Neural Networks (RNN)? A deep learning model used for: y ( t ) h ( t ) tanh x ( t ) Machine Translation, Speech Recognition, Language Modeling, … A vanilla RNN H K U S T 3

  4. Introduction What is Recurrent Neural Networks (RNN)? y ( t ) A vanilla RNN takes an input π’š (#) , and update its hidden state π’Š (#,-) using: y ( t ) π’Š (#) = tanh (π‘Ώπ’Š #,- + π‘Ύπ’š (#) ) # h ( t ) π’Š 3 Visual Analytics Science & Technology h ( t ) tanh 𝑦 (5) 𝑦 (4) 𝑦 (3) 𝑦 (-) input tanh # h ( t ) π’Š - RNN tanh x ( t ) hidden state A vanilla RNN x ( t ) 𝑧 (4) 𝑧 (3) 𝑧 (-) 𝑧 (5) output A 2-layer RNN H K U S T 4

  5. ? input output What has the RNN learned from data? H K U S T 5

  6. Motivation What has the RNN learned from data? A. map the value of a single hidden unit on data (Karpathy A. et al., 2015) A unit sensitive to position in a line. A lot more units have no clear meanings. H K U S T 6

  7. Motivation What has the RNN learned from data? B. matrix plots (Li J. et. al., 2016) Each column represents the value of the hidden state vector when reads a input word Scalability! Machine Translation: 4-layer, 1000 units/layer (Sutskever I. et al., 2014) Language Modeling: 2-layer, 1500 units/layer (Zaremba et al., 2015) H K U S T 7

  8. Our Solution - RNNVis H K U S T 8

  9. Our Solution Explaining individual hidden units Bi-graph and co-clustering Sequence evaluation H K U S T 9

  10. Solution Explaining an individual hidden unit using its most salient words How to define salient? Model’s response to a word π‘₯ at step 𝑒 : the update of hidden state Ξ”π’Š # (#) , 𝑗 = 1, … , π‘œ . Ξ”π’Š # = Ξ”β„Ž : # ) implies that the word π‘₯ is more salient to unit 𝑗 . Larger abs(Ξ”β„Ž : (#) can vary given the same word π‘₯ , we use the expectation: Since Ξ”β„Ž : E Ξ”π’Š # | π‘₯ # = π‘₯ Can be estimated by running the model on dataset and take the mean. H K U S T 10

  11. Solution Explaining an individual hidden unit using its most salient words 25% - 75% 9% - 91% response Unit: #36 Top 4 positive/negative salient words of unit 36 in an RNN (GRU) trained on Yelp review data. H K U S T 11

  12. Solution Explaining an individual hidden unit using its most salient words mean 25% - 75% 9% - 91% Highly responsive hidden units Unit # Distribution of model’s response given the word β€œhe”. Units reordered according to the mean. (an LSTM with 600 units) H K U S T 12

  13. Solution Explaining an individual hidden unit using its most salient words Investigating one unit/word at a time… P: Too much user burden! S: An overview for easier exploration H K U S T 13

  14. Solution Explaining individual hidden units Bi-graph and co-clustering Sequence evaluation H K U S T 14

  15. Solution Bi-graph Formulation Hidden Units he she by can may Words H K U S T 15

  16. Solution Bi-graph Formulation Hidden Units he she by can may Words H K U S T 16

  17. Solution Co-clustering Hidden Units he she by can may Words Algorithm* Spectral co-clustering (Dhillon I. S., 2001) H K U S T 17

  18. Solution Co-clustering – Edge Aggregation Hidden Units Color: sign of the average edge weight Width: scale of the average edge weight he she by can may Words H K U S T 18

  19. Solution may Co-clustering - Visualization Hidden Units can by she Words he H K U S T 19

  20. Solution Co-clustering - Visualization Hidden Units Color: each unit’s salience to the selected word he she by can may Words Hidden Units Clusters Words Clusters (Memory Chips) (Word Clouds) H K U S T 20

  21. Solution Explaining individual hidden units Bi-graph and co-clustering Sequence evaluation H K U S T 21

  22. Solution Glyph design for evaluating sentences Each glyph summarizes the dynamics of hidden unit clusters when reading a word Each bar represents the average scale of the value in a hidden units cluster The ratio of preserved value Decreased value More positive value are preserved Current value More negative value are preserved Update towards positive Increased value Update towards negative H K U S T 22

  23. Case Studies How do RNNs handle sentiments? The language of Shakespeare H K U S T 23

  24. Case Study – Sentiment Analysis Each unit has two sides Single-layer GRU with 50 hidden units (cells), trained on Yelp review data H K U S T 24

  25. Case Study – Sentiment Analysis RNNs can learn to handle the context Single-layer GRU with 50 hidden units (cells), trained on Yelp review data negative positive Update towards positive A B Update towards negative Sentence A: I love the food, though the staff is not helpful Sentence B: The staff is not helpful, though I love the food H K U S T 25

  26. Case Study – Sentiment Analysis Clues for the problem Single-layer GRU with 50 hidden units (cells), trained on Yelp review data. Problem: the data is not evenly sampled. H K U S T 26

  27. Case Study – Sentiment Analysis Visual indicator of the performance Single-layer GRUs with 50 hidden units (cells), trained on Yelp review data. Accuracy (test): 91.9% Accuracy (test): 88.6% Balanced Dataset Unbalanced Dataset H K U S T 27

  28. Case Studies How do RNNs handle the sentiments? The language of Shakespeare H K U S T 28

  29. Case Study – Language Modeling The language of Shakespeare – A mixture of the old and the new H K U S T 29

  30. Case Study – Language Modeling The language of Shakespeare – A mixture of the old and the new H K U S T 30

  31. Discussion & Future Work β€’ Clustering. The quality of co-clustering? Interactive clustering? β€’ Glyph-based sentence visualization. Scalability? β€’ Text data. How about speech data? β€’ RNN models. More advanced RNN-based models like attention models? H K U S T 31

  32. Thank you! Contact: Yao Ming, ymingaa@connect.ust.hk Page: www.myaooo.com/rnnvis Code: www.github.com/myaooo/rnnvis H K U S T 32

  33. οΏ½ Technical Details Explaining individual hidden units - Decomposition The output of an RNN at step 𝑒 is typically a probability distribution: L π’Š # π‘ž : = softmax π‘½π’Š (#) = exp 𝒗 : L π’Š # ) βˆ‘ exp(𝒗 N N L , 𝑗 = 1,2, … , π‘œ , is the output projection matrix. where 𝑽 = 𝒗 : The numerator of π‘ž : can be decomposed to: # 𝒖 L π’Š R βˆ’ π’Š R,- L Ξ”π’Š # ) L π’Š # exp 𝒗 : = exp Q 𝒗 : = U exp(𝒗 : RT- 𝝊T𝟐 L Ξ”π’Š # ) is the multiplicative contribution of input word π‘₯ # , the update of hidden state Here exp(𝒗 : Ξ”π’Š # can be regard as the model’s response to π‘₯ # . H K U S T 33

  34. Evaluation Expert Interview 1 2 3 4 5 Show Explore Answer Finish Compare a tutorial video the tool two models questions a survey H K U S T 34

  35. Challenges What are the challenges? 1. The complexity of the model β€’ Machine Translation: 4-layer LSTMs, 1000 units/layer (Sutskever I. et al., 2014) β€’ Language Modeling: 2-layer LSTMs, 650 or 1500 units/layer (Zaremba et al., 2015) 2. The complexity of the hidden memory β€’ Semantic information are distributed in hidden states of an RNN. 3. The complexity of the data β€’ Patterns in sequential data like texts are difficult to be analyzed and interpreted H K U S T 35

  36. Other Findings Comparing LSTMs and vanilla RNNs Left (A-C): co-cluster visualization of the last layer of an RNN. Right (D-F): visualization of the cell states of the last layer of an LSTM. Bottom (GH): two models’ responses to the same word β€œoffer”. H K U S T 36

  37. Contribution β€’ A visual technique for understanding what RNNs learned. β€’ A VA tool that ablates the hidden dynamics of a trained RNN. β€’ Interesting findings with RNN models. H K U S T 37

Recommend


More recommend