how recurrent networks implement contextual processing in
play

How recurrent networks implement contextual processing in sentiment - PowerPoint PPT Presentation

How recurrent networks implement contextual processing in sentiment analysis Niru Maheswaranathan and David Sussillo Google Research ICML 2020 @niru_m Sentiment classification using RNNs Sentiment classification using RNNs That restaurant


  1. How recurrent networks implement contextual processing in sentiment analysis Niru Maheswaranathan and David Sussillo Google Research ICML 2020 @niru_m

  2. Sentiment classification using RNNs

  3. Sentiment classification using RNNs “That restaurant is amazing! I love it!” ➞ positive “I cannot stand that place. Terrible food.” ➞ negative

  4. Sentiment classification using RNNs “That restaurant is amazing! I love it!” ➞ positive “I cannot stand that place. Terrible food.” ➞ negative RNNs solve the task, but it’s hard to understand how they do it

  5. Sentiment classification using RNNs “That restaurant is amazing! I love it!” ➞ positive “I cannot stand that place. Terrible food.” ➞ negative RNNs solve the task, but it’s hard to understand how they do it

  6. Understanding RNN dynamics through linearization

  7. Understanding RNN dynamics through linearization Saddle Point Oscillations Line Attractor n 2 n 3 n 2 n 1 n 1 n 2 n 1

  8. Understanding RNN dynamics through linearization Saddle Point Oscillations Line Attractor n 2 n 3 n 2 n 1 n 1 n 2 n 1

  9. Line attractor dynamics in trained RNNs

  10. Line attractor dynamics in trained RNNs Maheswaranathan*, Williams* et al, NeurIPS 2019

  11. Line attractor dynamics in trained RNNs Maheswaranathan*, Williams* et al, NeurIPS 2019

  12. Line attractor dynamics in trained RNNs Maheswaranathan*, Williams* et al, NeurIPS 2019

  13. Line attractor dynamics in trained RNNs Maheswaranathan*, Williams* et al, NeurIPS 2019

  14. Line attractor dynamics in trained RNNs Approximate line attractor dynamics explain the most of the RNN’s performance Maheswaranathan*, Williams* et al, NeurIPS 2019

  15. Line attractor dynamics in trained RNNs Approximate line attractor dynamics explain the most of the RNN’s performance Maheswaranathan*, Williams* et al, NeurIPS 2019

  16. Line attractor dynamics in trained RNNs Approximate line attractor dynamics explain the most of the RNN’s performance -1 Line attractor +1

  17. A remaining puzzle…

  18. A remaining puzzle… 4 Probe sentences Mo�e� ��e�i�tio� (�o�it) 3 Base���e "This movie is a�esome� 2 � �i�e it�" 1 0 −1 −2 −3 −4 0 5 10 15 Time (t)

  19. A remaining puzzle… 4 Probe sentences Mo�e� ��e�i�tio� (�o�it) 3 Baseline "This movie is a�esome� 2 � �i�e it�" 1 Ne�a�i�n 0 "This movie is not a�esome� −1 � �on�t �i�e it�" −2 −3 −4 0 5 10 15 Time (t)

  20. A remaining puzzle… 4 Probe sentences Mo�e� ��e�i�tio� (�o�it) 3 Baseline "This movie is a�esome� 2 � �i�e it�" 1 Ne�a�i�n 0 "This movie is not a�esome� −1 � �on�t �i�e it�" −2 �n�ensi�e� −3 "This movie is e�treme�� a�esome� � �e�nite�� �i�e it�" −4 0 5 10 15 Time (t)

  21. Contextual processing in RNNs 4 Probe sentences Mo�e� ��e�i�tio� (�o�it) 3 Baseline "This movie is a�esome� 2 � �i�e it�" 1 Ne�a�i�n 0 "This movie is not a�esome� −1 � �on�t �i�e it�" −2 �n�ensi�e� −3 "This movie is e�treme�� a�esome� � �e�nite�� �i�e it�" −4 0 5 10 15 Time (t)

  22. Contextual processing in RNNs Contributions of our work

  23. Contextual processing in RNNs Contributions of our work Data-driven method to identify contextual inputs Analysis of the strength and timing of modifier e ff ects Experiments that demonstrate the identified mechanisms are necessary and su ff icient for RNN performance

  24. Contextual processing in RNNs Contributions of our work Data-driven method to identify contextual inputs Analysis of the strength and timing of modifier e ff ects Experiments that demonstrate the identified mechanisms are necessary and su ff icient for RNN performance

  25. Contextual processing in RNNs Contributions of our work Data-driven method to identify contextual inputs Analysis of the strength and timing of modifier e ff ects Experiments that demonstrate the identified mechanisms are necessary and su ff icient for RNN performance

  26. Identifying contextual processing Use the change in input sensitivity as a measure of contextual processing

  27. <latexit sha1_base64="pZ/myz1EV1gBoP37+hKey50Iz3k=">AI5XicfVb9s2Fa7S13vlm5PxV6IBQE6YMgs392nNpehNZosa5o0QOQaFHVsE6FEgaJip4qwX7C3Ya972uv2sj+zfzPSUqyE9EZAEHW+71y+Q1L0Y0YT2Wj8c+/+Bx9+9PGD2sP6J59+9vkXG4+PE14KgicEM64OPNxAoxGcCKpZHAWC8Chz+Ctf7Gr8beXIBLKozfyKoZRiKcRnVCpTKNx5fX3t7wCRG3pBG8fX12JOwkNkP+Xhjs7HdWA5kT9xysumU42j86MHfXsBJGkIkCcNJcu42YjnKsJCUMjrXpAjMkFnsK5mkY4hGSULTXkaEtZAjThQj2REvrbY8Mh0lyFfqKGWI5S0xMG9dh56mc9EeZ0pZKiEiRaJIyJDnSDUEBFUAku1ITARVtSIywITqdpWr9e30EkCSM5AFcYn9NoinS3l7VqM42UE2bIV9YAlc1GSeqHVMpSk4BLCvOnd4peFMrW7eNEKXKDcJcJz5SsXiEGX2vwrwMYy5kgrBKsvDUL0TtANTGqEnEl9AhCaCh4gLqkyqnENIxcujY3T8HMU4BvEtMlumG53kqBhbqn4F+RyLAOkuIt1tzqxGLxfhjGiBCaqYVUowkMFypsQyxa43ze/QyDJtup3YtgTgoRmeHebplP4kC/PcAOcVOLfA1xX42gJfVeArG8Q6q6ouKIrMPGXxA2wR34Pgd5kNizOsMg0tcFGBCwucVeDMAs8q8MwCTyvw1AKBMUOdtpgsUYUQCjRQHyQ2gixNZhSuw0gaXWU7P1qLABW6b6Nh4axeiyKQicMtHGxcg6u/1dg1T9Ux1qXP3xXMPSn4a7/eLcp+tOs/xIzLPNzd5R5Qp1tOc42XSvX3nHB0FGWp1plC/Js7/gpWrLvkA8PTLPUiXk8KBk170AJuoPU4GdPHvx5kDt53ZzMA3txm9G0bT7xK/sYbRXzF2O81+cw1jsGI8b7e6HbMTYTzDCU3WFd8rl0FtJK3AlMAFjqaVCAj6g1Y3X8dZySCNTrdpCi04KyGD3XbzPzgrKb2dVtvdN6REMF1ef6aUVQ3/J0YCZt2b8C130Bt0cptRrQfp4Z6/hlGtR7+z3zVlaEa1Hq32fqdpng0eqHschCmijG3vu5ny0Lt9q48M/o71/e8a97q9uS0ue2tzs/tTef7ZQ3fs352vnGeK4Ts95rxwjpwThzg/O384fzp/1a1X2q/1n4rqPfvlT5fOXdG7fd/AbRWOiY=</latexit> Identifying contextual processing Use the change in input sensitivity as a measure of contextual processing 1000 Count 500 Modi�er token� 0 10 −4 10 −3 10 −2 10 −1 10 0 || ∆ J inp || F Change in Input Jacobian (||ΔJ inp || F )

  28. <latexit sha1_base64="pZ/myz1EV1gBoP37+hKey50Iz3k=">AI5XicfVb9s2Fa7S13vlm5PxV6IBQE6YMgs392nNpehNZosa5o0QOQaFHVsE6FEgaJip4qwX7C3Ya972uv2sj+zfzPSUqyE9EZAEHW+71y+Q1L0Y0YT2Wj8c+/+Bx9+9PGD2sP6J59+9vkXG4+PE14KgicEM64OPNxAoxGcCKpZHAWC8Chz+Ctf7Gr8beXIBLKozfyKoZRiKcRnVCpTKNx5fX3t7wCRG3pBG8fX12JOwkNkP+Xhjs7HdWA5kT9xysumU42j86MHfXsBJGkIkCcNJcu42YjnKsJCUMjrXpAjMkFnsK5mkY4hGSULTXkaEtZAjThQj2REvrbY8Mh0lyFfqKGWI5S0xMG9dh56mc9EeZ0pZKiEiRaJIyJDnSDUEBFUAku1ITARVtSIywITqdpWr9e30EkCSM5AFcYn9NoinS3l7VqM42UE2bIV9YAlc1GSeqHVMpSk4BLCvOnd4peFMrW7eNEKXKDcJcJz5SsXiEGX2vwrwMYy5kgrBKsvDUL0TtANTGqEnEl9AhCaCh4gLqkyqnENIxcujY3T8HMU4BvEtMlumG53kqBhbqn4F+RyLAOkuIt1tzqxGLxfhjGiBCaqYVUowkMFypsQyxa43ze/QyDJtup3YtgTgoRmeHebplP4kC/PcAOcVOLfA1xX42gJfVeArG8Q6q6ouKIrMPGXxA2wR34Pgd5kNizOsMg0tcFGBCwucVeDMAs8q8MwCTyvw1AKBMUOdtpgsUYUQCjRQHyQ2gixNZhSuw0gaXWU7P1qLABW6b6Nh4axeiyKQicMtHGxcg6u/1dg1T9Ux1qXP3xXMPSn4a7/eLcp+tOs/xIzLPNzd5R5Qp1tOc42XSvX3nHB0FGWp1plC/Js7/gpWrLvkA8PTLPUiXk8KBk170AJuoPU4GdPHvx5kDt53ZzMA3txm9G0bT7xK/sYbRXzF2O81+cw1jsGI8b7e6HbMTYTzDCU3WFd8rl0FtJK3AlMAFjqaVCAj6g1Y3X8dZySCNTrdpCi04KyGD3XbzPzgrKb2dVtvdN6REMF1ef6aUVQ3/J0YCZt2b8C130Bt0cptRrQfp4Z6/hlGtR7+z3zVlaEa1Hq32fqdpng0eqHschCmijG3vu5ny0Lt9q48M/o71/e8a97q9uS0ue2tzs/tTef7ZQ3fs352vnGeK4Ts95rxwjpwThzg/O384fzp/1a1X2q/1n4rqPfvlT5fOXdG7fd/AbRWOiY=</latexit> Identifying contextual processing Allows us to identify modifier inputs 1000 Count 500 Modi�er token� 0 10 −4 10 −3 10 −2 10 −1 10 0 || ∆ J inp || F Change in Input Jacobian (||ΔJ inp || F )

  29. Modifier subspace 3 2 Modi�er componen� �1 1 0 −1 −2 −3 −2 −1 0 1 2 Modi�er componen� �2

  30. Modifier subspace Change in Input Jacobian (||ΔJ inp || F ) 3 0.30 2 Modi�er component �1 0.25 1 0.20 0 0.15 −1 0.10 −2 0.05 −3 0.00 −2 −1 0 1 2 Modi�er component �2

  31. Modifier subspace Change in Input Jacobian (||ΔJ inp || F ) 3 0.30 2 Modi�er component �1 0.25 e�treme�� 1 0.20 0 0.15 −1 0.10 −2 0.05 not −3 0.00 −2 −1 0 1 2 Modi�er component �2

  32. Modifier subspace Change in Input Jacobian (||ΔJ inp || F ) 3 0.30 �er� 2 Modi�er component �1 0.25 e�treme�� ��e 1 0.20 0 but 0.15 −1 0.10 �ero −2 ne�er 0.05 not −3 0.00 −2 −1 0 1 2 Modi�er component �2

  33. Modifier dynamics �o�i�er component �� � 2 � 0 −� −2 −� −4 −2 0 24 Principal component ��

  34. Modifier dynamics �o�i�er component �� � 2 � 0 −� −2 not −� −4 −2 0 24 Principal component ��

  35. Modifier dynamics �o�i�er component �� � 2 � 0 −� −2 not −� −4 −2 0 24 Principal component ��

  36. Modifier dynamics �o�i�er com�onent �� � 2 extremely � 0 −� −2 not −� −4 −2 0 24 Princi��l com�onent ��

  37. Modifier dynamics (a) (b) Modifjer component #1 3 �o�i�er com�onent �� 3 � 2 Distance from line attractor "not" extremely 1 2 2 � � 2�� to�ens 0 extremely � "extremely" −1 1 � � 1�� to�ens 0 −2 not −� 0 −3 (a) (b) 0 5 10 −4 −2 0 2 4 −2 Time (t) Principal component #1 not 3 −� Distance from line attractor "not" −4 −2 0 24 2 � � 2�� to�ens Princi��l com�onent �� "extremely" 1 � � 1�� to�ens 0 0 5 10 Time (t)

Recommend


More recommend