How recurrent networks implement contextual processing in sentiment analysis Niru Maheswaranathan and David Sussillo Google Research ICML 2020 @niru_m
Sentiment classification using RNNs
Sentiment classification using RNNs “That restaurant is amazing! I love it!” ➞ positive “I cannot stand that place. Terrible food.” ➞ negative
Sentiment classification using RNNs “That restaurant is amazing! I love it!” ➞ positive “I cannot stand that place. Terrible food.” ➞ negative RNNs solve the task, but it’s hard to understand how they do it
Sentiment classification using RNNs “That restaurant is amazing! I love it!” ➞ positive “I cannot stand that place. Terrible food.” ➞ negative RNNs solve the task, but it’s hard to understand how they do it
Understanding RNN dynamics through linearization
Understanding RNN dynamics through linearization Saddle Point Oscillations Line Attractor n 2 n 3 n 2 n 1 n 1 n 2 n 1
Understanding RNN dynamics through linearization Saddle Point Oscillations Line Attractor n 2 n 3 n 2 n 1 n 1 n 2 n 1
Line attractor dynamics in trained RNNs
Line attractor dynamics in trained RNNs Maheswaranathan*, Williams* et al, NeurIPS 2019
Line attractor dynamics in trained RNNs Maheswaranathan*, Williams* et al, NeurIPS 2019
Line attractor dynamics in trained RNNs Maheswaranathan*, Williams* et al, NeurIPS 2019
Line attractor dynamics in trained RNNs Maheswaranathan*, Williams* et al, NeurIPS 2019
Line attractor dynamics in trained RNNs Approximate line attractor dynamics explain the most of the RNN’s performance Maheswaranathan*, Williams* et al, NeurIPS 2019
Line attractor dynamics in trained RNNs Approximate line attractor dynamics explain the most of the RNN’s performance Maheswaranathan*, Williams* et al, NeurIPS 2019
Line attractor dynamics in trained RNNs Approximate line attractor dynamics explain the most of the RNN’s performance -1 Line attractor +1
A remaining puzzle…
A remaining puzzle… 4 Probe sentences Mo�e� ��e�i�tio� (�o�it) 3 Base���e "This movie is a�esome� 2 � �i�e it�" 1 0 −1 −2 −3 −4 0 5 10 15 Time (t)
A remaining puzzle… 4 Probe sentences Mo�e� ��e�i�tio� (�o�it) 3 Baseline "This movie is a�esome� 2 � �i�e it�" 1 Ne�a�i�n 0 "This movie is not a�esome� −1 � �on�t �i�e it�" −2 −3 −4 0 5 10 15 Time (t)
A remaining puzzle… 4 Probe sentences Mo�e� ��e�i�tio� (�o�it) 3 Baseline "This movie is a�esome� 2 � �i�e it�" 1 Ne�a�i�n 0 "This movie is not a�esome� −1 � �on�t �i�e it�" −2 �n�ensi�e� −3 "This movie is e�treme�� a�esome� � �e�nite�� �i�e it�" −4 0 5 10 15 Time (t)
Contextual processing in RNNs 4 Probe sentences Mo�e� ��e�i�tio� (�o�it) 3 Baseline "This movie is a�esome� 2 � �i�e it�" 1 Ne�a�i�n 0 "This movie is not a�esome� −1 � �on�t �i�e it�" −2 �n�ensi�e� −3 "This movie is e�treme�� a�esome� � �e�nite�� �i�e it�" −4 0 5 10 15 Time (t)
Contextual processing in RNNs Contributions of our work
Contextual processing in RNNs Contributions of our work Data-driven method to identify contextual inputs Analysis of the strength and timing of modifier e ff ects Experiments that demonstrate the identified mechanisms are necessary and su ff icient for RNN performance
Contextual processing in RNNs Contributions of our work Data-driven method to identify contextual inputs Analysis of the strength and timing of modifier e ff ects Experiments that demonstrate the identified mechanisms are necessary and su ff icient for RNN performance
Contextual processing in RNNs Contributions of our work Data-driven method to identify contextual inputs Analysis of the strength and timing of modifier e ff ects Experiments that demonstrate the identified mechanisms are necessary and su ff icient for RNN performance
Identifying contextual processing Use the change in input sensitivity as a measure of contextual processing
<latexit sha1_base64="pZ/myz1EV1gBoP37+hKey50Iz3k=">AI5XicfVb9s2Fa7S13vlm5PxV6IBQE6YMgs392nNpehNZosa5o0QOQaFHVsE6FEgaJip4qwX7C3Ya972uv2sj+zfzPSUqyE9EZAEHW+71y+Q1L0Y0YT2Wj8c+/+Bx9+9PGD2sP6J59+9vkXG4+PE14KgicEM64OPNxAoxGcCKpZHAWC8Chz+Ctf7Gr8beXIBLKozfyKoZRiKcRnVCpTKNx5fX3t7wCRG3pBG8fX12JOwkNkP+Xhjs7HdWA5kT9xysumU42j86MHfXsBJGkIkCcNJcu42YjnKsJCUMjrXpAjMkFnsK5mkY4hGSULTXkaEtZAjThQj2REvrbY8Mh0lyFfqKGWI5S0xMG9dh56mc9EeZ0pZKiEiRaJIyJDnSDUEBFUAku1ITARVtSIywITqdpWr9e30EkCSM5AFcYn9NoinS3l7VqM42UE2bIV9YAlc1GSeqHVMpSk4BLCvOnd4peFMrW7eNEKXKDcJcJz5SsXiEGX2vwrwMYy5kgrBKsvDUL0TtANTGqEnEl9AhCaCh4gLqkyqnENIxcujY3T8HMU4BvEtMlumG53kqBhbqn4F+RyLAOkuIt1tzqxGLxfhjGiBCaqYVUowkMFypsQyxa43ze/QyDJtup3YtgTgoRmeHebplP4kC/PcAOcVOLfA1xX42gJfVeArG8Q6q6ouKIrMPGXxA2wR34Pgd5kNizOsMg0tcFGBCwucVeDMAs8q8MwCTyvw1AKBMUOdtpgsUYUQCjRQHyQ2gixNZhSuw0gaXWU7P1qLABW6b6Nh4axeiyKQicMtHGxcg6u/1dg1T9Ux1qXP3xXMPSn4a7/eLcp+tOs/xIzLPNzd5R5Qp1tOc42XSvX3nHB0FGWp1plC/Js7/gpWrLvkA8PTLPUiXk8KBk170AJuoPU4GdPHvx5kDt53ZzMA3txm9G0bT7xK/sYbRXzF2O81+cw1jsGI8b7e6HbMTYTzDCU3WFd8rl0FtJK3AlMAFjqaVCAj6g1Y3X8dZySCNTrdpCi04KyGD3XbzPzgrKb2dVtvdN6REMF1ef6aUVQ3/J0YCZt2b8C130Bt0cptRrQfp4Z6/hlGtR7+z3zVlaEa1Hq32fqdpng0eqHschCmijG3vu5ny0Lt9q48M/o71/e8a97q9uS0ue2tzs/tTef7ZQ3fs352vnGeK4Ts95rxwjpwThzg/O384fzp/1a1X2q/1n4rqPfvlT5fOXdG7fd/AbRWOiY=</latexit> Identifying contextual processing Use the change in input sensitivity as a measure of contextual processing 1000 Count 500 Modi�er token� 0 10 −4 10 −3 10 −2 10 −1 10 0 || ∆ J inp || F Change in Input Jacobian (||ΔJ inp || F )
<latexit sha1_base64="pZ/myz1EV1gBoP37+hKey50Iz3k=">AI5XicfVb9s2Fa7S13vlm5PxV6IBQE6YMgs392nNpehNZosa5o0QOQaFHVsE6FEgaJip4qwX7C3Ya972uv2sj+zfzPSUqyE9EZAEHW+71y+Q1L0Y0YT2Wj8c+/+Bx9+9PGD2sP6J59+9vkXG4+PE14KgicEM64OPNxAoxGcCKpZHAWC8Chz+Ctf7Gr8beXIBLKozfyKoZRiKcRnVCpTKNx5fX3t7wCRG3pBG8fX12JOwkNkP+Xhjs7HdWA5kT9xysumU42j86MHfXsBJGkIkCcNJcu42YjnKsJCUMjrXpAjMkFnsK5mkY4hGSULTXkaEtZAjThQj2REvrbY8Mh0lyFfqKGWI5S0xMG9dh56mc9EeZ0pZKiEiRaJIyJDnSDUEBFUAku1ITARVtSIywITqdpWr9e30EkCSM5AFcYn9NoinS3l7VqM42UE2bIV9YAlc1GSeqHVMpSk4BLCvOnd4peFMrW7eNEKXKDcJcJz5SsXiEGX2vwrwMYy5kgrBKsvDUL0TtANTGqEnEl9AhCaCh4gLqkyqnENIxcujY3T8HMU4BvEtMlumG53kqBhbqn4F+RyLAOkuIt1tzqxGLxfhjGiBCaqYVUowkMFypsQyxa43ze/QyDJtup3YtgTgoRmeHebplP4kC/PcAOcVOLfA1xX42gJfVeArG8Q6q6ouKIrMPGXxA2wR34Pgd5kNizOsMg0tcFGBCwucVeDMAs8q8MwCTyvw1AKBMUOdtpgsUYUQCjRQHyQ2gixNZhSuw0gaXWU7P1qLABW6b6Nh4axeiyKQicMtHGxcg6u/1dg1T9Ux1qXP3xXMPSn4a7/eLcp+tOs/xIzLPNzd5R5Qp1tOc42XSvX3nHB0FGWp1plC/Js7/gpWrLvkA8PTLPUiXk8KBk170AJuoPU4GdPHvx5kDt53ZzMA3txm9G0bT7xK/sYbRXzF2O81+cw1jsGI8b7e6HbMTYTzDCU3WFd8rl0FtJK3AlMAFjqaVCAj6g1Y3X8dZySCNTrdpCi04KyGD3XbzPzgrKb2dVtvdN6REMF1ef6aUVQ3/J0YCZt2b8C130Bt0cptRrQfp4Z6/hlGtR7+z3zVlaEa1Hq32fqdpng0eqHschCmijG3vu5ny0Lt9q48M/o71/e8a97q9uS0ue2tzs/tTef7ZQ3fs352vnGeK4Ts95rxwjpwThzg/O384fzp/1a1X2q/1n4rqPfvlT5fOXdG7fd/AbRWOiY=</latexit> Identifying contextual processing Allows us to identify modifier inputs 1000 Count 500 Modi�er token� 0 10 −4 10 −3 10 −2 10 −1 10 0 || ∆ J inp || F Change in Input Jacobian (||ΔJ inp || F )
Modifier subspace 3 2 Modi�er componen� �1 1 0 −1 −2 −3 −2 −1 0 1 2 Modi�er componen� �2
Modifier subspace Change in Input Jacobian (||ΔJ inp || F ) 3 0.30 2 Modi�er component �1 0.25 1 0.20 0 0.15 −1 0.10 −2 0.05 −3 0.00 −2 −1 0 1 2 Modi�er component �2
Modifier subspace Change in Input Jacobian (||ΔJ inp || F ) 3 0.30 2 Modi�er component �1 0.25 e�treme�� 1 0.20 0 0.15 −1 0.10 −2 0.05 not −3 0.00 −2 −1 0 1 2 Modi�er component �2
Modifier subspace Change in Input Jacobian (||ΔJ inp || F ) 3 0.30 �er� 2 Modi�er component �1 0.25 e�treme�� ��e 1 0.20 0 but 0.15 −1 0.10 �ero −2 ne�er 0.05 not −3 0.00 −2 −1 0 1 2 Modi�er component �2
Modifier dynamics �o�i�er component �� � 2 � 0 −� −2 −� −4 −2 0 24 Principal component ��
Modifier dynamics �o�i�er component �� � 2 � 0 −� −2 not −� −4 −2 0 24 Principal component ��
Modifier dynamics �o�i�er component �� � 2 � 0 −� −2 not −� −4 −2 0 24 Principal component ��
Modifier dynamics �o�i�er com�onent �� � 2 extremely � 0 −� −2 not −� −4 −2 0 24 Princi��l com�onent ��
Modifier dynamics (a) (b) Modifjer component #1 3 �o�i�er com�onent �� 3 � 2 Distance from line attractor "not" extremely 1 2 2 � � 2�� to�ens 0 extremely � "extremely" −1 1 � � 1�� to�ens 0 −2 not −� 0 −3 (a) (b) 0 5 10 −4 −2 0 2 4 −2 Time (t) Principal component #1 not 3 −� Distance from line attractor "not" −4 −2 0 24 2 � � 2�� to�ens Princi��l com�onent �� "extremely" 1 � � 1�� to�ens 0 0 5 10 Time (t)
Recommend
More recommend