Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Sequential Data with Neural Networks Recurrent Neural Networks • Sequential input / output Greg Mori - CMPT 419/726 • Many inputs, many outputs x 1 : T → y 1 : S • c.f. object tracking, speech recognition with HMMs; on-line/batch processing Goodfellow, Bengio, and Courville: Deep Learning textbook • One input, many outputs x → y 1 : S Ch. 10 • e.g. image captioning • Many inputs, one output x 1 : T → y • e.g. video classification Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Outline Hidden State Recurrent Neural Networks • Basic idea: maintain a state h t • State at time t depends on input x t and previous state h t − 1 Long Short-Term Memory • It’s a neural network, so relation is non-linear function of these inputs and some parameters W : Temporal Convolutional Networks h t = f ( h t − 1 , x t ; W ) • Parameters W and function f ( · ) reused at all time steps Examples
Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Outputs Gradients • Basic RNN not very effective • Need many time steps / complex model for challenging • Output y t also depends on the hidden state: tasks y t = f ( h t ; W y ) • Gradients in learning are a problem • Too large: can be handled with gradient clipping (truncate • Again, parameters/function reused across time gradient magnitude) • Too small: can be handled with network structures / gating functions (LSTM, GRU) Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Long Short-Term Memory Long Short-Term Memory RNN Unit LSTM Unit x t σ h t h t-1 RNN Unit Input Output LSTM Unit σ σ Gate Gate σ z t Output i t o t x t g t σ h t ϕ + ϕ h t = z t h t-1 Input Output σ σ Input Modulation Gate Gate Gate σ z t Output i t o t f t x t g t σ ϕ + ϕ h t = z t h t-1 Forget Gate c t-1 c t Input Modulation Gate f t x t σ h t-1 Forget Gate c t-1 c t i t = σ ( W xi x t + W hi h t − 1 + b i ) (1) f t = σ ( W xf x t + W hf h t − 1 + b f ) (2) • Hochreiter and Schmidhuber, Neural Computation 1997 o t = σ ( W xo x t + W ho h t − 1 + b o ) (3) • (Figure from Donohue et al. CVPR 2015) g t = tanh ( W xc x t + W hc h t − 1 + b c ) (4) • Gating functions g ( · ) , f ( · ) , o ( · ) , reduce vanishing gradients c t = f t ⊙ c t − 1 + i t ⊙ g t (5) h t = o t ⊙ tanh ( c t ) (6) see Graves, Liwicki, Fernandez, Bertolami, Bunke, and Schmidhuber, TPAMI 2009
Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Convolutions to Aggregate over Time Residual (skip) Connections . . . ˆ y 0 y 1 ˆ ˆ y 2 ˆ y T − 2 y T − 1 ˆ y T ˆ Output . . . y ˆ y ˆ y ˆ ˆ z ( i ) = (ˆ z ( i ) z ( i ) ˆ 1 , . . . , ˆ T ) d = 4 Residual block (k=3, d=1) Residual block (k, d) put z (1) z (1) ˆ T − 1 ˆ T + Dropout = 4 Convolutional Filter + + Hidden ReLU Identity Map (or 1x1 Conv) d = 2 en WeightNorm Dilated Causal Conv Hidden = 2 1x1 Conv Dropout (optional) d = 1 ReLU en WeightNorm Input = 1 x 0 x 1 x 2 . . . x T − 2 x T − 1 x T Dilated Causal Conv x 0 x 1 . . . x T − 1 x T (a) ut z ( i − 1) = (ˆ z ( i − 1) z ( i − 1) x x x . . . ˆ , . . . , ˆ ) 1 T Figure 1. Architectural elements in a TCN. (a) A dilated causal (b) (c) • Control history by d (dilation, holes in the filter) and k (width of the filter) • Include residual connections to allow long-range modeling • Causal convolution, only use elements from the past and gradient flow • Bai, Kolter, Koltun arXiv 2018 Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Example: Image Captioning Example: Video Description <pad> <pad> <pad> <pad> <pad> LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM <pad> <pad> <pad> <pad> <BOS> LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM A man is talking <EOS> Encoding stage Decoding stage time • S. Venugopalan, M. Rohrbach, J. Donahue, R. Mooney, T. Darrell, K. Saenko, ICCV 2015 • Karpathy and Fei-Fei, CVPR 2015
Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Example: Machine Translation Conclusion • Readings: http://www.deeplearningbook.org/ contents/rnn.html • Recurrent neural networks, can model sequential inputs/outputs • Input includes state (output) from previous time • Different structures: • RNN with multiple inputs/outputs • Gated recurrent unit (GRU) • Long short-term memory (LSTM) • Error gradients back-propagated across entire sequence • Wu et al., Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , arXiv 2016
Recommend
More recommend