investigating relational recurrent neural networks with
play

Investigating Relational Recurrent Neural Networks with Variable - PowerPoint PPT Presentation

Investigating Relational Recurrent Neural Networks with Variable Length Memory Pointer Mahtab Ahmed and Robert E. Mercer Department of Computer Science University of Western Ontario London, ON, Canada Introduction Memory based Neural


  1. Investigating Relational Recurrent Neural Networks with Variable Length Memory Pointer Mahtab Ahmed and Robert E. Mercer Department of Computer Science University of Western Ontario London, ON, Canada

  2. Introduction • Memory based Neural Networks can remember information longer while modelling temporal data. • Encode a Relational Memory Core (RMC) as the cell state inside an LSTM cell. • Uses standard Multi-head Self Attention. • Uses variable length memory pointer. • Evaluate on four different tasks. • State of the art on one of them; On par with the other three. 2

  3. Standard LSTM 3

  4. The model: Fixed Length Memory Pointer Random Input at t • Apply Multi-head Self Attention and create a weighted version, 𝑁 • Add a residual connection • Apply Layer-Normalization block on top of 𝑁 • Maintain separate version of mean and variance projection matrices. 4

  5. The model: Fixed Length Memory Pointer (contd.) • n non-linear projections of ℎ ! are applied followed by a residual connection f = RELU and ℎ ! = 𝑁 • Resultant tensor 𝑌 (having shape 2 × b × d) is split on the cardinal dimension to extract the memory • LSTM’s candidate cell state gets changed to • 𝑦 ! is replaced with the projected input (= 𝑋𝑦 ! ) in all LSTM equations. 5

  6. Variable Length Memory Pointer • Share W across all time steps. • Apply all the steps as before. • For Layer-Normalization, maintain just one version of mean and variance projection matrices. • Memory is still at the cardinal dimension. • Rather than looking at everything before • Track a fixed window of words (n-grams). • Mimic the behavior of convolution kernel. 6

  7. Model Architecture � �LSTM Eq�a�ion� � LSTM Eq�a�ion� � LSTM Eq�a�ion� La�e��N��mali�a�i�� La�e��N��mali�a�i�� La�e��N��mali�a�i�� N��-Li�ea� P��jec�i�� N��-Li�ea� P��jec�i�� N��-Li�ea� P��jec�i�� La�e��N��mali�a�i�� La�e��N��mali�a�i�� La�e��N��mali�a�i�� M�l�i-Head�A��e��i�� M�l�i-Head�A��e��i�� M�l�i-Head�A��e��i�� Linear Projec�ion Linear Projec�ion Linear Projec�ion 7

  8. Sentence Pair Modelling Classes Classifier ⊕ Sentence Sentence Representation Representation Encoder Encoder Word Word Representations Representations Left Sentence Right Sentence InferSent - https://arxiv.org/abs/1705.02364 8

  9. Hyperparameters We tried a range of values for each hyperparameter. The ones that worked for us are bold-faced. • 9

  10. Experimental Results Models marked with † are the ones that we implemented • 10

  11. Attention Visualization Me���� <�> Me���� <�> .�� .22 0.�� 0.22 Me���� <�> He Me���� <�> Bef��e .32 .25 .43 0.23 0.12 0.65 Me���� <�> He a��� Me���� <�> Bef��e �ha� .2� .15 .1� .3� 0.2� 0.10 0.43 0.1� Me���� <�> He a��� ����ed Me���� <�> Bef��e �ha� he .1� .10 .1� .1� .3� 0.2� 0.11 0.33 0.14 0.13 Me���� <�> He a��� ����ed i� Me���� <�> Bef��e �ha� he he�d .12 .10 .13 .14 .31 .20 0.26 0.12 0.2� 0.11 0.11 0.10 Me���� �he Vi�gi�ia a�����e� ge�e�a�'� �f�ce Me���� Vi�gi�ia i�c��di�g de���� a�����e� ge�e�a� 0.16 0.1� 0.1� 0.16 0.22 0.11 .10 .0� .30 .30 .10 .13 Me���� Vi�gi�ia a�����e� ge�e�a�'� �f�ce </�> Me���� i�c��di�g de���� a�����e� ge�e�a� </�> .0� .22 .34 .13 .12 .10 0.1� 0.1� 0.15 0.2� 0.0� 0.15 He a��� ���ked i� �he Vi�gi�ia a�����e� ge�e�a�'� �f�ce. Bef��e �ha� he he�d �a�i��� ����� i� Vi�gi�ia, i�c��di�g de���� a�����e� ge�e�a�. 11

  12. Conclusion • Extend the classical RMC with variable length memory pointer. • Uses a non-local context to compute an enhanced memory. • Design a sentence pair modelling architecture. • Evaluate on four different tasks. • On par performance on most of the tasks and best performance on one of them. • Interprets the attention shifting very well. • Memory pointer length does not follow a uniform pattern across all datasets. 12

  13. Thank you 13

Recommend


More recommend