Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep Learning textbook Ch. 10
Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Sequential Data with Neural Networks • Sequential input / output • Many inputs, many outputs x 1 : T → y 1 : S • c.f. object tracking, speech recognition with HMMs; on-line/batch processing • One input, many outputs x → y 1 : S • e.g. image captioning • Many inputs, one output x 1 : T → y • e.g. video classification
Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Outline Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples
Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Outline Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples
Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Hidden State • Basic idea: maintain a state h t • State at time t depends on input x t and previous state h t − 1 • It’s a neural network, so relation is non-linear function of these inputs and some parameters W : h t = f ( h t − 1 , x t ; W ) • Parameters W and function f ( · ) reused at all time steps
Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Outputs • Output y t also depends on the hidden state: y t = f ( h t ; W y ) • Again, parameters/function reused across time
Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Outline Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples
Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Gradients • Basic RNN not very effective • Need many time steps / complex model for challenging tasks • Gradients in learning are a problem • Too large: can be handled with gradient clipping (truncate gradient magnitude) • Too small: can be handled with network structures / gating functions (LSTM, GRU)
Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Long Short-Term Memory RNN Unit LSTM Unit x t σ h t h t-1 Input Output σ σ σ Gate Gate z t Output i t o t g t ϕ + ϕ h t = z t Input Modulation Gate f t x t σ h t-1 Forget Gate c t-1 c t • Hochreiter and Schmidhuber, Neural Computation 1997 • (Figure from Donohue et al. CVPR 2015) • Gating functions g ( · ) , f ( · ) , o ( · ) , reduce vanishing gradients
Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Long Short-Term Memory RNN Unit LSTM Unit x t σ h t h t-1 Input Output σ σ Gate Gate σ z t Output i t o t g t ϕ ϕ + h t = z t Input Modulation Gate f t x t σ h t-1 Forget Gate c t-1 c t i t = σ ( W xi x t + W hi h t − 1 + b i ) (1) f t = σ ( W xf x t + W hf h t − 1 + b f ) (2) o t = σ ( W xo x t + W ho h t − 1 + b o ) (3) g t = tanh ( W xc x t + W hc h t − 1 + b c ) (4) c t = f t ⊙ c t − 1 + i t ⊙ g t (5) h t = o t ⊙ tanh ( c t ) (6) see Graves, Liwicki, Fernandez, Bertolami, Bunke, and Schmidhuber, TPAMI 2009
Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Outline Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples
Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Convolutions to Aggregate over Time . . . ˆ ˆ ˆ y T − 2 ˆ y T − 1 ˆ ˆ y 0 y 1 y 2 y T Output d = 4 Hidden d = 2 Hidden d = 1 Input x 0 x 1 x 2 . . . x T − 2 x T − 1 x T (a) Figure 1. Architectural elements in a TCN. (a) A dilated causal • Control history by d (dilation, holes in the filter) and k (width of the filter) • Causal convolution, only use elements from the past • Bai, Kolter, Koltun arXiv 2018
Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Residual (skip) Connections . . . z ( i ) = (ˆ ˆ y y ˆ y ˆ ˆ z ( i ) z ( i ) ˆ 1 , . . . , ˆ T ) Residual block (k=3, d=1) Residual block (k, d) put z (1) z (1) ˆ T − 1 ˆ T + Dropout = 4 + + Convolutional Filter ReLU Identity Map (or 1x1 Conv) WeightNorm en Dilated Causal Conv = 2 1x1 Conv Dropout (optional) ReLU en WeightNorm = 1 Dilated Causal Conv x 0 . . . x 1 x T − 1 x T ut z ( i − 1) = (ˆ z ( i − 1) z ( i − 1) x x x . . . ˆ , . . . , ˆ ) 1 T (b) (c) • Include residual connections to allow long-range modeling and gradient flow
Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Outline Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples
Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Example: Image Captioning • Karpathy and Fei-Fei, CVPR 2015
Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Example: Video Description <pad> <pad> <pad> <pad> <pad> LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM <pad> <pad> <pad> <pad> <BOS> LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM A man is talking <EOS> Encoding stage Decoding stage time • S. Venugopalan, M. Rohrbach, J. Donahue, R. Mooney, T. Darrell, K. Saenko, ICCV 2015
Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Example: Machine Translation • Wu et al., Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , arXiv 2016
Recurrent Neural Networks Long Short-Term Memory Temporal Convolutional Networks Examples Conclusion • Readings: http://www.deeplearningbook.org/ contents/rnn.html • Recurrent neural networks, can model sequential inputs/outputs • Input includes state (output) from previous time • Different structures: • RNN with multiple inputs/outputs • Gated recurrent unit (GRU) • Long short-term memory (LSTM) • Error gradients back-propagated across entire sequence
Recommend
More recommend