RNNs for Timeseries Analysis www.bgoncalves.com - PowerPoint PPT Presentation

��   RNNs for Timeseries Analysis www.bgoncalves.com github.com/bmtgoncalves/RNN

Disclaimer The views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of my employer. The examples provided with this tutorial were chosen for their didactic value and are not mean to be representative of my day to day work. @bgoncalves www.bgoncalves.com

References @bgoncalves www.bgoncalves.com

How the Brain “Works” (Cartoon version) @bgoncalves www.bgoncalves.com

How the Brain “Works” (Cartoon version) • Each neuron receives input from other neurons • 10 11 neurons, each with with 10 4 weights • Weights can be positive or negative • Weights adapt during the learning process • “neurons that fire together wire together” (Hebb) • Different areas perform different functions using same structure (Modularity) @bgoncalves www.bgoncalves.com

How the Brain “Works” (Cartoon version) Inputs f(Inputs) Output @bgoncalves www.bgoncalves.com

Optimization Problem • (Machine) Learning can be thought of as an optimization problem . • Optimization Problems have 3 distinct pieces : Neural Network • The constraints • The function to optimize Prediction Error • The optimization algorithm . Gradient Descent @bgoncalves www.bgoncalves.com

Artificial Neuron Bias 1 w 0 j x 1 w 1 j x 2 w 2 j z j a j φ ( z ) w T x w 3 j x 3 w Nj x N Activation Output Inputs Weights function @bgoncalves www.bgoncalves.com

Activation Function - Sigmoid http://github.com/bmtgoncalves/Neural-Networks • Non-Linear function • Differentiable 1 φ ( z ) = 1 + e − z • non-decreasing • Compute new sets of features • Each layer builds up a more abstract representation of the data • Perhaps the most common @bgoncalves www.bgoncalves.com

Activation Function - tanh http://github.com/bmtgoncalves/Neural-Networks • Non-Linear function • Differentiable φ ( z ) = e z − e − z e z + e − z • non-decreasing • Compute new sets of features • Each layer builds up a more abstract representation of the data @bgoncalves www.bgoncalves.com

Forward Propagation • The output of a perceptron is determined by a sequence of steps: • obtain the inputs • multiply the inputs by the respective weights • calculate output using the activation function • To create a multi-layer perceptron, you can simply use the output of one layer as the input to the next one.   1 1 w 0 k a 1 w 0 j x 1 w 1 k w 1 j a 2 w 2 k x 2 a k w 2 j w T a � � φ w k 3 w T x � � a j φ w 3 j x 3 w Nk w Nj a N x N • But how can we propagate back the errors and update the weights? @bgoncalves www.bgoncalves.com

Backward Propagation of Errors (BackProp) • BackProp operates in two phases: • Forward propagate the inputs and calculate the deltas • Update the weights • The error at the output is a weighted average difference between predicted output and the observed one. • For inner layers there is no “real output”! @bgoncalves www.bgoncalves.com

Loss Functions • For learning to occur, we must quantify how far off we are from the desired output. There are two common ways of doing this: • Quadratic error function: E = 1 | y n − a n | 2 X N • Cross Entropy n J = − 1 n log a n + (1 − y n ) T log (1 − a n ) h i X y T N n The Cross Entropy is complementary to sigmoid activation in the output layer and improves its stability. @bgoncalves www.bgoncalves.com

  Gradient Descent • Find the gradient for each training batch • Take a step downhill along the direction of the gradient   − ∂ H ∂θ mn θ mn ← θ mn − α ∂ H ∂θ mn H • where is the step size. α • Repeat until “convergence”. @bgoncalves www.bgoncalves.com

@bgoncalves www.bgoncalves.com

Feed Forward Networks h t Output x t Input h t = f ( x t ) @bgoncalves www.bgoncalves.com

Feed Forward Networks h t Output Information   Flow Input x t h t = f ( x t ) @bgoncalves www.bgoncalves.com

Information   Recurrent Neural Network (RNN) Flow h t Output h t Output Previous h t − 1 Output x t Input h t = f ( x t , h t − 1 ) @bgoncalves www.bgoncalves.com

Recurrent Neural Network (RNN) h t h t − 1 h t x t @bgoncalves www.bgoncalves.com

Recurrent Neural Network (RNN) • Each output depends (implicitly) on all previous outputs . • Input sequences generate output sequences ( seq2seq ) h t − 1 h t h t +1 h t − 2 h t − 1 h t h t +1 x t − 1 x t x t +1 @bgoncalves www.bgoncalves.com

Recurrent Neural Network (RNN) h t h t h t − 1 tanh x t h t = tanh ( Wh t − 1 + Ux t ) @bgoncalves www.bgoncalves.com

Recurrent Neural Network (RNN) h t h t h t − 1 tanh x t h t = tanh ( Wh t − 1 + Ux t ) Concatenate both inputs. @bgoncalves www.bgoncalves.com

Timeseries • Temporal sequence of data points • Consecutive points are strongly correlated • Common in statistics, signal processing, econometrics, mathematical finance, earthquake prediction, etc • Numeric (real or discrete) or symbolic data @bgoncalves www.bgoncalves.com

Long-Short Term Memory (LSTM) • What if we want to keep explicit information about previous states ( memory )? • How much information is kept, can be controlled through gates. • LSTMs were first introduced in 1997 by Hochreiter and Schmidhuber h t − 1 h t h t +1 c t − 2 c t − 1 c t c t +1 h t +1 h t − 2 h t − 1 h t x t − 1 x t x t +1 @bgoncalves www.bgoncalves.com

+ Element wise addition × Element wise multiplication Long-Short Term Memory (LSTM) 1 minus the input 1 − h t c t − 1 c t + × tanh f o i × × g tanh σ σ σ h t − 1 h t x t g = tanh ( W g h t − 1 + U g x t ) f = σ ( W f h t − 1 + U f x t ) c t = ( c t − 1 ⊗ f ) + ( g ⊗ i ) i = σ ( W i h t − 1 + U i x t ) o = σ ( W o h t − 1 + U o x t ) h t = tanh ( c t ) ⊗ o @bgoncalves www.bgoncalves.com

+ Element wise addition × Element wise multiplication Long-Short Term Memory (LSTM) 1 minus the input 1 − h t c t − 1 c t + × tanh Forget gate:   How much of f o i × × the previous state should g be kept? tanh σ σ σ h t − 1 h t x t g = tanh ( W g h t − 1 + U g x t ) f = σ ( W f h t − 1 + U f x t ) c t = ( c t − 1 ⊗ f ) + ( g ⊗ i ) i = σ ( W i h t − 1 + U i x t ) o = σ ( W o h t − 1 + U o x t ) h t = tanh ( c t ) ⊗ o @bgoncalves www.bgoncalves.com

+ Element wise addition × Element wise multiplication Long-Short Term Memory (LSTM) 1 minus the input 1 − h t c t − 1 c t + × tanh Input gate :   How much of f o i × × the previous output g should be tanh σ σ remembered? σ h t − 1 h t x t g = tanh ( W g h t − 1 + U g x t ) f = σ ( W f h t − 1 + U f x t ) c t = ( c t − 1 ⊗ f ) + ( g ⊗ i ) i = σ ( W i h t − 1 + U i x t ) o = σ ( W o h t − 1 + U o x t ) h t = tanh ( c t ) ⊗ o @bgoncalves www.bgoncalves.com

+ Element wise addition × Element wise multiplication Long-Short Term Memory (LSTM) 1 minus the input 1 − h t c t − 1 c t + × tanh Output gate :   All gates use the same How much of f o i × × inputs and the previous activation output g functions, should tanh but different σ σ contribute? σ weights h t − 1 h t x t g = tanh ( W g h t − 1 + U g x t ) f = σ ( W f h t − 1 + U f x t ) c t = ( c t − 1 ⊗ f ) + ( g ⊗ i ) i = σ ( W i h t − 1 + U i x t ) o = σ ( W o h t − 1 + U o x t ) h t = tanh ( c t ) ⊗ o @bgoncalves www.bgoncalves.com

+ Element wise addition × Element wise multiplication Long-Short Term Memory (LSTM) 1 minus the input 1 − h t c t − 1 c t + × tanh Output gate :   How much of f o i × × the previous output g should tanh σ σ contribute? σ h t − 1 h t x t g = tanh ( W g h t − 1 + U g x t ) f = σ ( W f h t − 1 + U f x t ) c t = ( c t − 1 ⊗ f ) + ( g ⊗ i ) i = σ ( W i h t − 1 + U i x t ) o = σ ( W o h t − 1 + U o x t ) h t = tanh ( c t ) ⊗ o @bgoncalves www.bgoncalves.com

+ Element wise addition × Element wise multiplication Long-Short Term Memory (LSTM) 1 minus the input 1 − h t c t − 1 c t + × tanh State :   Update the f o i × × current state g tanh σ σ σ h t − 1 h t x t g = tanh ( W g h t − 1 + U g x t ) f = σ ( W f h t − 1 + U f x t ) c t = ( c t − 1 ⊗ f ) + ( g ⊗ i ) i = σ ( W i h t − 1 + U i x t ) o = σ ( W o h t − 1 + U o x t ) h t = tanh ( c t ) ⊗ o @bgoncalves www.bgoncalves.com

+ Element wise addition × Element wise multiplication Long-Short Term Memory (LSTM) 1 minus the input 1 − h t c t − 1 c t + × tanh Output :   Combine all f o i × × available information. g tanh σ σ σ h t − 1 h t x t g = tanh ( W g h t − 1 + U g x t ) f = σ ( W f h t − 1 + U f x t ) c t = ( c t − 1 ⊗ f ) + ( g ⊗ i ) i = σ ( W i h t − 1 + U i x t ) o = σ ( W o h t − 1 + U o x t ) h t = tanh ( c t ) ⊗ o @bgoncalves www.bgoncalves.com

RNNs for Timeseries Analysis www.bgoncalves.com - PowerPoint PPT Presentation

RNNs for Timeseries Analysis www.bgoncalves.com github.com/bmtgoncalves/RNN Disclaimer The views and opinions expressed in this article are those of the authors and do not necessarily reflect

Network traffic: Scaling 1 Ways of representing a time series Timeseries Timeseries:

Recursive Neural Networks and Its Applications LU Yangyang luyy11@sei.pku.edu.cn KERE Seminar

Managing chronological objects with timeDate and timeSeries Yohan Chalabi and Diethelm Wuertz ITP

MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Charles Martin SO FAR; RNNS THAT MODEL

RNNs for Timeseries Analysis www.data4sci.com github.com/DataForScience/RNN Disclaimer The

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

Introduction to CNNs and RNNs with PyTorch Introduction to CNNs and RNNs with PyTorch Presented

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

Distributed Optimization of CNNs and RNNs GTC 2015 William Chan williamchan.ca

Persistent RNNs (stashing recurrent weights on-chip) Gregory Diamos Baidu SVAIL April 7, 2016

Introduction to RNNs Arun Mallya Best viewed with Computer Modern fonts installed

Recurrent Neural Networks (RNNs) for NLP MACHINE LEARNING MEETUP DR. ANA PELETEIRO RAMALLO

Course site: https://complexity-methods.github.io 1 Complexity Methods for Behavioural Science

Hero Timeseries Historian We are living in the world of Data 2 Without a systematic way to

Timeseries kinds and applications MAC H IN E L E AR N IN G FOR TIME SE R IE S DATA IN P YTH

SDMX in EViews Download timeseries Louis de Charsonville December 3, 2016 1/18 Table of

Conditional Sampling for Max-Stable Random Fields Yizao Wang Department of Statistics, the

Verification of Security Protocols with Lists: from Length One to Unbounded Length Miriam Paiola

hisham@medsites.me www.Medsites.me www.med-sites.com Healthcare Selling own Working

Simple Safety Monitoring with Signal Detection Magnus Mengelbier Director PhUSE 2009 1 Topics

3 PERFORMANCE LIMITATIONS IN SISO SYSTEMS [5] 3.1 Input-Output Controllability [5.1]

API

Presentation Copenhagen City on alternative drivelines and fuels and procurement strategy

Department of Revenue 1 p,l(JJ,.C Department of 1. Motor Vehicle Excise Tax Increased from 3%