Outline GF-RNN ReNet Outline Gated Feedback Recurrent Neural Networks. arXiv1502. Introduction: RNN & Gated RNN Gated Feedback Recurrent Neural Networks (GF-RNN) Experiments: Character-level Language Modeling & Python Program Evaluation ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks. arXiv1505. Introduction ReNet: 4 RNNs that sweep over lower-layer features in 4 directions Experiments: MNIST & CIFAR-10 & Street View House Numbers LU Yangyang luyy11@pku.edu.cn May 2015 @ KERE Seminar
Outline GF-RNN ReNet Authors • Gated Feedback Recurrent Neural Networks • arXiv.org. 9 Feb 2015 - 18 Feb 2015. • Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho , Yoshua Ben- gio (University of Montreal) • ReNet: A Recurrent Neural Network Based Alternative to Convolu- tional Networks • arXiv.org. 3 May 2015. • Francesco Visin, Kyle Kastner, Kyunghyun Cho , Matteo Matteucci, Aaron Courville, Yoshua Bengio (University of Montreal)
Outline GF-RNN ReNet Outline Gated Feedback Recurrent Neural Networks. arXiv1502. Introduction: RNN & Gated RNN Gated Feedback Recurrent Neural Networks (GF-RNN) Experiments: Character-level Language Modeling & Python Program Evaluation ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks. arXiv1505.
Outline GF-RNN ReNet Recurrent Neural Networks (RNN) FOR Sequence Modeling • Can process a sequence of arbitrary length • Recursively applying a transition function to its internal hidden state for each symbol of the input sequence • Theoretically capture any long-term dependency in an input sequence • Difficult to train an RNN to actually do so h t = f ( x t , h t − 1 ) = φ ( Wx t + Uh t − 1 ) p ( x 1 , x 2 , ..., x T ) = p ( x 1 ) p ( x 2 | x 1 ) ...p ( x T | x 1 , ..., x T − 1 ) p ( x t +1 | x 1 , ..., x t ) = g ( h t ) Figure: A single-layer RNN
Outline GF-RNN ReNet Gated Recurrent Neural Networks 1 LSTM & GRU A LSTM Unit: hj t = oj t tanh( cj A GRU Unit: t ) cj t = fj t cj t − 1 + ij cj hj t = (1 − zj t ) hj t − 1 + zj hj t ˜ t ˜ t t cj t = tanh( Wc x t + Uc h t − 1) j zj t = σ ( Wz x t + Uz h t − 1) j ˜ fj t = σ ( Wf x t + Uf h t − 1 + Vf c t ) j hj t = tanh( W x t + U ( r t ⊙ h t − 1)) j ˜ ij t = σ ( Wi x t + Ui h t − 1 + Vi c t ) j rj t = σ ( Wr x t + Ur h t − 1) j oj t = σ ( Wo x t + Uo h t − 1 + Vo c t ) j 1Chung, J.,et al. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv’14.
Outline GF-RNN ReNet Gated Recurrent Neural Networks Modifying the RNN architecture • Using a gated activation function: - the long short-term memory unit (LSTM): a memory cell, an input gate , a forget gate, and an output gate - the gated recurrent unit (GRU): a reset gate and an update gate • Can contain both fast changing and slow changing components - stacked multiple levels of recurrent layers - partitioned and grouped hidden units to allow feedback information at multiple timescales • Achieved promising results in both classification and generation tasks ⇒ Gated-feedback RNN (GF-RNN): learning multiple adaptive timescales
Outline GF-RNN ReNet GF-RNN: Overview Figure: A Clockwork RNN • A sequence often consists of both slow-moving and fast-moving components. - slow-moving: long-term dependencies - fast-moving: short-term dependencies • El Hihi & Bengio (1995): an RNN can capture these dependencies of different timescales more easily and efficiently when the hidden units of the RNN is explicitly partitioned into groups that correspond to different timescales. • The clockwork RNN (CW-RNN) (Koutnik et al., 2014): updating the i -th module only when t mod 2 i − 1 = 0 ⇒ to generalize the CW-RNN by allowing the model to adaptively adjust the connectivity pattern between the hidden layers in the consecutive time-steps
Outline GF-RNN ReNet GF-RNN: Overview (cont.) • Partition the hidden units into multiple modules: each module corresponds to a different layer in a stack of recurrent layers • Compared to CW-RNN: do not set an explicit rate for each module each module: hierarchically stacked → different timescales • Each module is fully connected to all the other modules across the stack and itself. • The global reset gate : gated the recurrent connection between two modules based on the current input and the previous states of the hidden layers
Outline GF-RNN ReNet GF-RNN: The global reset gate • h i t : the hidden unit on the i -th layer at time-step t • w i → j , u i → j : weights for inputs and hidden states of all layers at time-step t − 1 g g t − 1 to the j -th layer h j • g i → j : control the signal from the i -th layer h i t based on the input and the previous hidden units
Outline GF-RNN ReNet GF-RNN: The global reset gate • h i t : the hidden unit on the i -th layer at time-step t • w i → j , u i → j : weights for inputs and hidden states of all layers at time-step t − 1 g g t − 1 to the j -th layer h j • g i → j : control the signal from the i -th layer h i t based on the input and the previous hidden units Information flows: • stacked RNN & GF-RNN: lower layers → upper layers • GF-RNN: lower layers ← upper layers (finer timescale ← coarser timescale) A gated-feedback RNN: A fully-connected recurrent transition and global reset gates
Outline GF-RNN ReNet GF-RNN: Different Units of Practical Implementation • tanh Units • LSTM Units & GRU Units: only use the global reset gates when computing the new state LSTM: GRU: hj t = oj t tanh( cj t ) cj t = fj t cj t − 1 + ij cj hj t = (1 − zj t ) hj t − 1 + zj hj t ˜ t ˜ t t zj t = σ ( Wz x t + Uz h t − 1) j fj t = σ ( Wf x t + Uf h t − 1 + Vf c t ) j rj t = σ ( Wr x t + Ur h t − 1) j ij t = σ ( Wi x t + Ui h t − 1 + Vi c t ) j oj t = σ ( Wo x t + Uo h t − 1 + Vo c t ) j
Outline GF-RNN ReNet Experiment Tasks BOTH: representative examples of discrete sequence modeling Objective Function: to minimize the negative log-likelihood of training sequences • Character-level language modeling: • English Wikipedia: 100MB characters • Contents: Latin alphabets, non-Latin alphabets, XML markups and special characters • Vocabulary: 205 characters (one token for unknown character) • Train/CV/Test: 90MB/5MB/5MB • Preformance measure: the average number of bits-per-character (BPC, E [ − log 2 P ( x t +1 | h t )] ) • Pythong Program Evaluation: • Goal:to generate or predict a correct return value of a given Python script • Input: python scripts (include addition, multiplication, subtraction, for-loop, variable assignment, logical comparison and if-else statement) • Output: predicted value of the given Python script • Input/Output: 41/31 symbols
Outline GF-RNN ReNet Examples for Python Program Evaluation 2 2Zaremba, Wojciech and Sutskever, Ilya. Learning to execute. arXiv preprint arXiv:1410.4615, 2014.
Outline GF-RNN ReNet Experiments: Character-level Language Modeling • The sizes of models: • Tuning parameters: RMSProp and momentum • Test set BPC of models trained on the Hutter dataset for a 100 epochs:
Outline GF-RNN ReNet Experiments: Character-level Language Modeling (cont.) Text Generation based on character-level language modeling: • Given the seed at the left-most column (bold-faced font), the models predict next 200 - 300 characters. • Tabs, spaces and new-line characters are also generated by the models.
Outline GF-RNN ReNet Experiments: Python Program Evaluation Using an RNN encoder-decoder approach: • Python scripts → ENCODER (50 timesteps) → h t → DECODER → character-level results
Outline GF-RNN ReNet Outline Gated Feedback Recurrent Neural Networks. arXiv1502. ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks. arXiv1505. Introduction ReNet: 4 RNNs that sweep over lower-layer features in 4 directions Experiments: MNIST & CIFAR-10 & Street View House Numbers
Outline GF-RNN ReNet Introduction Object Recognition: • Convolutional Neural Networks (CNN): LeNet-5 - based on local context window • Recurrent Neural Networks : • (Graves and Schmidhuber, 2009): a multi-dimensional RNN • ReNet : purely uni-dimensional RNNs: replace each convolutional layer (conv. + pooling) in the CNN ⇒ 4 RNNs that sweep over lower-layer features in 4 directions: ↑ , ↓ , ← , → - each feature activation: at the specific location with respect to the whole image
Outline GF-RNN ReNet A one-layer ReNet • The Input Image: x ∈ R w × h × c (width, height, feature dimensionality) • Give a patch size - w p × h p : split the input image x into a set of I × J (non- overlapping) patches X = { x ij } , x ij ∈ R w p × h p × c 1. Sweep the image vertically with 2 RNNs( ↑ , ↓ ): Each RNN takes as an input one (flattened) patch at a time and updates its hidden state, working along each column j of the split input image X . 2. Concatenate the intermediate hidden states z F i,j , z R i,j at each location ( i, j ) to get a composite feature map v = { z i,j } j =1 ,...,J i =1 ,...,I , z ij ∈ R 1 × h p × 2 d ( d : the number of recurrent units) 3. Sweep V horizonally with 2 RNNs( ← , → ) in a similar man- ij ∈ R 1 × 1 × 2 d : ner. The resulting feature map H = z ′ i,j , z ′ the features of the original image patch x i,j in the context of the whole image The deep ReNet: stacked multiple φ ’s ( φ : the function from X to H )
Recommend
More recommend