Exploiting Randomness in Neural Networks Daniele Di Sarli Mauriana - PowerPoint PPT Presentation

Exploiting Randomness in Neural Networks Daniele Di Sarli Mauriana Pesaresi seminars - 2020

Recurrent Neural Network

error = (output – expected) 2 ∂error ∂ W Backpropagation Through Time

PREDICTION PATTERNS, INTERACTIONS …, 3, 2, 1.5, 0.75, 1, -2.3, 4, …

Reservoir Readout

Reservoir Readout Echo State Network

(a) (b) (c) (d)

Cover’s theorem

Echo State Property

Echo State Network starter pack 1. Randomly initialize the weights (sparse) 2. Rescale the weights to guarantee contractivity of the state transition function (=> ESP) 3. Feed data, collect states 4. Compute optimal linear regression parameters

« RC […] provides explanations of why biological brains can carry out accurate computations with an “inaccurate” and noisy physical substrate » — Lukoševičius et al. In the primary visual cortex , «computations are performed by complex dynamical systems while information about results of these computations is read out by simple linear classifiers .» — Nikolić et al.

My work

Natural Language Processing LSTM GRU Transformer BERT CO 2 emissions (lbs) Car, avg incl. fuel, 1 lifetime Transformer w/ neural arch. search 0 100000 200000 300000 400000 500000 600000 700000 From Strubell, E., Ganesh, A., McCallum, A. : Energy and Policy Considerations for Deep Learning in NLP Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Text Classification pipeline -0.76, 0.35, … -0.02 RNN ‘My input sentence’ input sequence sentence embedding linear classifier (word embeddings) TRAINING

Text Classification pipeline -0.76, 0.35, … -0.02 ESN ‘My input sentence’ input sequence sentence embedding linear classifier (word embeddings) TRAINING

Question Classification What was the name of the first Russian astronaut to do a spacewalk? HUMAN What's the tallest building in New York City? LOCATION … also ABBREVIATION, ENTITY, DESCRIPTION, and NUMERIC VALUE

Improvements are needed • Bidirectional What's the tallest building in New York City? • Attention • Multi-ring

Improvements are needed • Bidirectional • Attention • Multi-ring

Results Accuracy 100 200M+ params, ours 98 heavy transfer learning < 1.6M params 96 94 92 90 88 N M r N U N ) t e o t S A N N R l T t E b G C S c C - m N e - L - - i a V + i B - S B e i d B E r s h A e - n p i m B e a ( r r g o N a f S s r E n a - a P i r B T

Results Training time Accuracy 600 100 7.5 min 200M+ params, ours 500 98 heavy transfer learning < 1.6M params 96 400 94 300 6 sec 92 200 90 100 88 0 N M r N U N ) t e o t S A N N R l T t E b G C S c C - Bi-GRU Bi-ESN Bi-ESN Bi-ESN-Att m N e - L - - i a V + i B - S B e i d B E r s h (ensemble) A e - n p i m B e a ( r r g o N a f S s r E n a - a P i r B T

How old was the youngest president of the United States ? When was Ulysses S. Grant born ? Who invented the instant Polaroid camera ? What is nepotism ? Where is the Mason/Dixon line ? What is the capital of Zimbabwe ? What are Canada 's two territories ?

Wrap up • A path towards efficient, effective ML models must be taken • Heavier understanding/exploitation of the architectural properties of RNN models can help towards that goal • Analysis is preliminary, but WIP results are encouraging

References 1. Di Sarli, D., Gallicchio, C., & Micheli, A. (2019, November). Question Classification with Untrained Recurrent Embeddings . In International Conference of the Italian Association for Artificial Intelligence . 2. Jaeger, H., & Haas, H. (2004). Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication . Science . 3. Lukoševičius, M., & Jaeger, H. (2009). Reservoir computing approaches to recurrent neural network training . Computer Science Review . 4. Nikolić, D., Haeusler, S., Singer, W., & Maass, W. (2007). Temporal dynamics of information content carried by neurons in the primary visual cortex . In Advances in neural information processing systems .

Exploiting Randomness in Neural Networks Daniele Di Sarli Mauriana - PowerPoint PPT Presentation

Exploiting Randomness in Neural Networks Daniele Di Sarli Mauriana Pesaresi seminars - 2020 Recurrent Neural Network error = (output expected) 2 error W Backpropagation Through Time PREDICTION PATTERNS, INTERACTIONS , 3, 2,

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Algorithmic randomness Cuny logic worshop Benoit Monin - LACL - Universit e Paris-Est Cr

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

9.4 Local Perception Filters 9.4 Local Perception Filters Exploiting Exploiting Perceptual

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Lecture 19 Randomness, Pseudo Randomness, and Confidentiality Stephen Checkoway University

Counting Words: Non- Randomness Pre-Processing and Non-Randomness The End Marco Baroni &

Randomness Some content taken from Silence on the Wire by Michal Zalewski Todays Agenda

CS 574: Randomized Algorithms Lecture 1. Introduction to Randomness August 25, 2015 Lecture 1.

Randomness in Computing L ECTURE 1 Randomness in Computing Course information Verifying

15-251 Great Theoretical Ideas in Computer Science Lecture 21: Introduction to Randomness and

Firmware Insider Bluetooth Randomness is Mostly Random RANDOMNESS IS MY PASSION Jrn

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Measuring medical engagement the new leadership challenge Paul W Long 24 June 2014 THE NEW

Platform Health Metrics Paul Resnick Michael D. Cohen Collegiate Professor Associate Dean for

Regional Engagement Youngstown State University The 2011-2020 YSU 2020

YOUR EXITS ARE HERE, HERE AND HERE shaunwilden.com OVERVIEW Intro to the church of

implementation of the Sustainable Development goals Promoting health and well-being for

Regional Emergency Transportation Coordination Workshop Regional Alliance for Resilient and

Galactic Winds driven by Clustered Supernovae Drummond Fielding Flatiron Institute, CCA

Acceleration and Escape of First Cosmic Rays Yutaka Ohira The University of Tokyo Contents

Exploiting Randomness in Neural Networks Daniele Di Sarli Mauriana - PowerPoint PPT Presentation

Exploiting Randomness in Neural Networks Daniele Di Sarli Mauriana Pesaresi seminars - 2020 Recurrent Neural Network error = (output expected) 2 error W Backpropagation Through Time PREDICTION PATTERNS, INTERACTIONS , 3, 2,

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Algorithmic randomness Cuny logic worshop Benoit Monin - LACL - Universit e Paris-Est Cr

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

9.4 Local Perception Filters 9.4 Local Perception Filters Exploiting Exploiting Perceptual

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Lecture 19 Randomness, Pseudo Randomness, and Confidentiality Stephen Checkoway University

Counting Words: Non- Randomness Pre-Processing and Non-Randomness The End Marco Baroni &amp;

Randomness Some content taken from Silence on the Wire by Michal Zalewski Todays Agenda

CS 574: Randomized Algorithms Lecture 1. Introduction to Randomness August 25, 2015 Lecture 1.

Randomness in Computing L ECTURE 1 Randomness in Computing Course information Verifying

15-251 Great Theoretical Ideas in Computer Science Lecture 21: Introduction to Randomness and

Firmware Insider Bluetooth Randomness is Mostly Random RANDOMNESS IS MY PASSION Jrn

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Measuring medical engagement the new leadership challenge Paul W Long 24 June 2014 THE NEW

Platform Health Metrics Paul Resnick Michael D. Cohen Collegiate Professor Associate Dean for

Regional Engagement Youngstown State University The 2011-2020 YSU 2020

YOUR EXITS ARE HERE, HERE AND HERE shaunwilden.com OVERVIEW Intro to the church of

implementation of the Sustainable Development goals Promoting health and well-being for

Regional Emergency Transportation Coordination Workshop Regional Alliance for Resilient and

Galactic Winds driven by Clustered Supernovae Drummond Fielding Flatiron Institute, CCA

Acceleration and Escape of First Cosmic Rays Yutaka Ohira The University of Tokyo Contents

Counting Words: Non- Randomness Pre-Processing and Non-Randomness The End Marco Baroni &