Low-Dimensional Dynamics of Encoding and Learning in Recurrent Neural Networks Stefan Horoi 1,2 , Victor Geadah 1,2 , Guy Wolf 1,2,* and Guillaume Lajoie 1,2,* * Equal senior authorship contributions 1 Department of Mathematics and Statistics 2 Mila – Quebec Artificial Intelligence Institute Université de Montréal 33rd Canadian Conference on Artificial Intelligence – 12-15 May 2020
Understanding RNNs as dynamical systems Intra-layer connections RNNs can be analysed as nonlinear dynamical systems Preserves information across timesteps Well-suited for the analysis of sequential (Sussillo, 2012) data (Poole, 2016) (Pascanu, 2012) We need to analyse the network's internal representations. 33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020
Geometry of internal representations (Marquez, 2018) In RNNs, the internal dynamics are linked to the internal (Sussillo, 2012) representations geometry In DNNs, the geometry of internal representations was linked to classification accuracy Further relations between geometry of internal states and performance have not been established in RNNs (Cohen, 2020) We analyse the geometry of internal states and the network dynamics in parallel. 33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020
Experimental setup Task: Sequential MNIST classification Model: 28 pixels / line 28 lines Weight matrices: Input Output Recurrent layer layer layer # neurons: 28 200 10 tanh linear 33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020
Training and implementation • Implementation: PyTorch • Optimizer: Adam • Loss function: Cross-entropy • Number of Epochs: 30 • Used the training dataset • Performance: 93% 33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020
Experimental validation datasets Original image Hidden image Cut image Extended image n lines 0 ≤ n ≤ 28 • Ten networks were trained on the original training dataset and were tested on the modified versions of the validation datasets. 33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020
Effect of sequence length on classification accuracy (1) Cut images Extended images 33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020
Effect of sequence length on classification accuracy (2) Regardless of the amount of information given, the sequence length is the main factor determining classification precision. Cut images Hidden images 33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020
Development of a task relevant structure Class (digit) coloured clustering of internal representations at three different timesteps using t-SNE: Some structure begins Digits are in separate Clusters begin to clusters to form degrade Task relevant structure is developed as soon as information is provided. 33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020
Classifying the RNNs as dynamical systems The autonomous dynamical system associated with our network equation: The spectrum of Lyapunov exponents can be used to (Crisanti, 2018) classify the geometry of the system's attractors: (Marquez, 2018) Largest (norm) eigenvalue MLE: Jacobian of h(t) λ 0 > 0 Exponential volume expansion – Initially close trajectories will diverge with time λ 0 = 0 Volume preservation – Trajectories start showing periodicity After training: λ 0 < 0 Exponential volume compression – All trajectories converge to fixed points Before training: 33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020
Formation of limit cycles Digits are in separate Clusters degrade into Trajectories settle into clusters two limit cycles two limit cycles 33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020
Conclusion • Different perspectives on how information is processed in standard RNNs; • Information is kept as multi-dimensional clusters corresponding to different classes; • The information is not interpretable by the final layer unless the sequence length of the input is the same as for the training images; • As the dynamics settle the internal representations are compressed into a non- trivial attractor; • The attractor is composed of two limit cycles of intrinsic dimension far smaller than the space; 33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020
Future work Short term • We hypothesize that information about digit classes might be encoded as phases on the limit cycles (preliminary results); Long term • All recurrent models (RNNs, LSTMs, GRUs, etc.) are defined by their dynamics; • We expect the same analytical framework to be effective in less artificial scenarios; • Extend analysis to tasks that have intrinsic sequential structure such as HAR or NLP. 33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020
Acknowledgments We would like to thank Aude Forcione-Lambert and Giancarlo Kerg for useful discussions. The work was partially funded by the following organisations: 33rd Canadian Conference on Artificial Intelligence – 12 - 15 May 2020
Recommend
More recommend