ImageNet • 1.2 million high-resolution images from ImageNet LSVRC-2010 contest • 1000 different classes (sofmax layer) • NN configuration • NN contains 60 million parameters and 650,000 neurons, • 5 convolutional layers, some of which are followed by max-pooling layers • 3 fully-connected layers Krizhevsky, A., Sutskever, I. and Hinton, G. E . “ImageNet Classification with Deep Convolutional Neural Networks” NIPS 2012: Neural Information Processing Systems, Lake Tahoe, Nevada
ImageNet Figure 3: 96 convolutional kernels of size 11×11×3 learned by the first convolutional layer on the 224×224×3 input images. The top 48 kernels were learned on GPU 1 while the bottom 48 kernels were learned on GPU 2. See Section 6.1 for details. Krizhevsky, A., Sutskever, I. and Hinton, G. E . “ImageNet Classification with Deep Convolutional Neural Networks” NIPS 2012: Neural Information Processing Systems, Lake Tahoe, Nevada
ImageNet Five ILSVRC-2010 test images in the first Eight ILSVRC-2010 test images and the five column. The remaining columns show the six labels considered most probable by our model. training images that produce feature vectors in The correct label is written under each image, the last hidden layer with the smallest Euclidean and the probability assigned to the correct label distance from the feature vector for the test is also shown with a red bar (if it happens to be image. in the top 5). Krizhevsky, A., Sutskever, I. and Hinton, G. E . “ImageNet Classification with Deep Convolutional Neural Networks” NIPS 2012: Neural Information Processing Systems, Lake Tahoe, Nevada
CNN for Automatic Speech Recognition • Convolution over frequencies • Convolution over time
CNN-Recap Feature maps • Neural network with specialized connectivity structure Pooling • Feed-forward: - Convolve input - Non-linearity (rectified linear) Non-linearity - Pooling (local max) • Supervised training Convolution • Train convolutional filters by back-propagating error (Learned) • Convolution over time • Adding memory to classical MLP network Input image • Recurrent neural network
Recurrent Neural Networks (RNNs) Recurrent Neural Network Recurrent networks introduce (RNN) cycles and a notion of time. 𝑦 𝑢 𝑧 𝑢 ℎ 𝑢−1 ℎ 𝑢 One-step delay • They are designed to process sequences of data 𝑦 1 , … , 𝑦 𝑜 and can produce sequences of outputs 𝑧 1 , … , 𝑧 𝑛 .
Elman Nets (1990) – Simple Recurrent Neural Networks • Elman nets are feed forward networks with partial recurrence • Unlike feed forward nets, Elman nets have a memory or sense of time • Can also be viewed as a “Markovian” NN
(Vanilla) Recurrent Neural Network Simple Recurrent Neural Network The state consists of a single “hidden” vector h : 𝑦 𝑢 𝑧 𝑢 ℎ 𝑢−1 ℎ 𝑢 One-step delay
Unrolling RNNs Recurrent Neural Network RNNs can be unrolled across multiple time steps. 𝑦 𝑢 𝑧 𝑢 ℎ 𝑢−1 ℎ 𝑢 𝑧 0 𝑧 1 𝑧 2 ℎ 0 ℎ 1 ℎ 2 One-step delay This produces a DAG which supports backpropagation. 𝑦 0 𝑦 1 𝑦 2 But its size depends on the input sequence length.
Learning time sequences • Recurrent networks have one more or more feedback loops • There are many tasks that require learning a temporal sequence of events – Speech, video, Text, Market • These problems can be broken into 3 distinct types of tasks 1. Sequence Recognition: Produce a particular output pattern when a specific input sequence is seen. Applications: speech recognition 2. Sequence Reproduction: Generate the rest of a sequence when the network sees only part of the sequence. Applications: Time series prediction (stock market, sun spots, etc) 3. Temporal Association : Produce a particular output sequence in response to a specific input sequence. Applications: speech generation
RNN structure Recurrent Neural Network Often layers are stacked vertically (deep RNNs): 𝑧 10 𝑧 12 𝑧 11 ℎ 10 ℎ 12 ℎ 11 Same parameters at this level 𝑦 00 𝑦 02 𝑦 01 Abstraction 𝑧 00 𝑧 02 𝑧 01 ℎ 00 - Higher ℎ 02 ℎ 01 Same parameters level at this level features 𝑦 0 𝑦 1 𝑦 2 Time
RNN structure Recurrent Neural Network Backprop still works: (it called Backpropagation Through Time ) 𝑧 10 𝑧 12 𝑧 11 ℎ 10 ℎ 12 ℎ 11 𝑦 00 𝑦 02 𝑦 01 Activations Abstraction 𝑧 00 𝑧 02 𝑧 01 ℎ 00 - Higher ℎ 02 ℎ 01 level features 𝑦 0 𝑦 1 𝑦 2 Time
RNN structure Recurrent Neural Network Backprop still works: 𝑧 10 𝑧 12 𝑧 11 ℎ 10 ℎ 12 ℎ 11 𝑦 00 𝑦 02 𝑦 01 Activations Abstraction 𝑧 00 𝑧 02 𝑧 01 ℎ 00 - Higher ℎ 02 ℎ 01 level features 𝑦 0 𝑦 1 𝑦 2 Time
RNN structure Recurrent Neural Network Backprop still works: 𝑧 10 𝑧 12 𝑧 11 ℎ 10 ℎ 12 ℎ 11 𝑦 00 𝑦 02 𝑦 01 Activations Abstraction 𝑧 00 𝑧 02 𝑧 01 ℎ 00 - Higher ℎ 02 ℎ 01 level features 𝑦 0 𝑦 1 𝑦 2 Time
RNN structure Recurrent Neural Network Backprop still works: 𝑧 10 𝑧 12 𝑧 11 ℎ 10 ℎ 12 ℎ 11 𝑦 00 𝑦 02 𝑦 01 Activations Abstraction 𝑧 00 𝑧 02 𝑧 01 ℎ 00 - Higher ℎ 02 ℎ 01 level features 𝑦 0 𝑦 1 𝑦 2 Time
RNN structure Recurrent Neural Network Backprop still works: 𝑧 10 𝑧 12 𝑧 11 ℎ 10 ℎ 12 ℎ 11 𝑦 00 𝑦 02 𝑦 01 Activations Abstraction 𝑧 00 𝑧 02 𝑧 01 ℎ 00 - Higher ℎ 02 ℎ 01 level features 𝑦 0 𝑦 1 𝑦 2 Time
RNN structure Recurrent Neural Network Backprop still works: 𝑧 10 𝑧 12 𝑧 11 ℎ 10 ℎ 12 ℎ 11 𝑦 00 𝑦 02 𝑦 01 Activations Abstraction 𝑧 00 𝑧 02 𝑧 01 ℎ 00 - Higher ℎ 02 ℎ 01 level features 𝑦 0 𝑦 1 𝑦 2 Time
RNN structure Recurrent Neural Network Backprop still works: 𝑧 10 𝑧 12 𝑧 11 ℎ 10 ℎ 12 ℎ 11 𝑦 00 𝑦 02 𝑦 01 Activations Abstraction 𝑧 00 𝑧 02 𝑧 01 ℎ 00 - Higher ℎ 02 ℎ 01 level features 𝑦 0 𝑦 1 𝑦 2 Time
RNN structure Recurrent Neural Network Backprop still works: 𝑧 10 𝑧 12 𝑧 11 ℎ 10 ℎ 12 ℎ 11 𝑦 00 𝑦 02 𝑦 01 Gradients Abstraction 𝑧 00 𝑧 02 𝑧 01 ℎ 00 - Higher ℎ 02 ℎ 01 level features 𝑦 0 𝑦 1 𝑦 2 Time
RNN structure Recurrent Neural Network Backprop still works: 𝑧 10 𝑧 12 𝑧 11 ℎ 10 ℎ 12 ℎ 11 𝑦 00 𝑦 02 𝑦 01 Gradients Abstraction 𝑧 00 𝑧 02 𝑧 01 ℎ 00 - Higher ℎ 02 ℎ 01 level features 𝑦 0 𝑦 1 𝑦 2 Time
RNN structure Recurrent Neural Network Backprop still works: 𝑧 10 𝑧 12 𝑧 11 ℎ 10 ℎ 12 ℎ 11 𝑦 00 𝑦 02 𝑦 01 Gradients Abstraction 𝑧 00 𝑧 02 𝑧 01 ℎ 00 - Higher ℎ 02 ℎ 01 level features 𝑦 0 𝑦 1 𝑦 2 Time
RNN structure Recurrent Neural Network Backprop still works: 𝑧 10 𝑧 12 𝑧 11 ℎ 10 ℎ 12 ℎ 11 𝑦 00 𝑦 02 𝑦 01 Gradients Abstraction 𝑧 00 𝑧 02 𝑧 01 ℎ 00 - Higher ℎ 02 ℎ 01 level features 𝑦 0 𝑦 1 𝑦 2 Time
RNN structure Recurrent Neural Network Backprop still works: 𝑧 10 𝑧 12 𝑧 11 ℎ 10 ℎ 12 ℎ 11 𝑦 00 𝑦 02 𝑦 01 Gradients Abstraction 𝑧 00 𝑧 02 𝑧 01 ℎ 00 - Higher ℎ 02 ℎ 01 level features 𝑦 0 𝑦 1 𝑦 2 Time
RNN structure Recurrent Neural Network Backprop still works: 𝑧 10 𝑧 12 𝑧 11 ℎ 10 ℎ 12 ℎ 11 𝑦 00 𝑦 02 𝑦 01 Gradients Abstraction 𝑧 00 𝑧 02 𝑧 01 ℎ 00 - Higher ℎ 02 ℎ 01 level features 𝑦 0 𝑦 1 𝑦 2 Time
RNN structure Recurrent Neural Network Backprop still works: 𝑧 10 𝑧 12 𝑧 11 ℎ 10 ℎ 12 ℎ 11 𝑦 00 𝑦 02 𝑦 01 Gradients Abstraction 𝑧 00 𝑧 02 𝑧 01 ℎ 00 - Higher ℎ 02 ℎ 01 level features 𝑦 0 𝑦 1 𝑦 2 Time
The memory problem with RNN • RNN models signal context • If very long context is used -> RNNs become unable to learn the context information
Standard RNNs to LSTM Standard LSTM
LSTM illustrated: input and forming new memory Cell state LSTM cell takes the following input • the input 𝑦 𝑢 • Forget gate past memory output ℎ 𝑢−1 • past memory 𝐷 𝑢−1 Input gate (all vectors) New memory
LSTM illustrated: Output • Forming the output of the cell by using output gate Overall picture:
LSTM Equations 𝑗 = 𝜏 𝑦 𝑢 𝑉 𝑗 + 𝑡 𝑢−1 𝑋 𝑗 • 𝑔 = 𝜏 𝑦 𝑢 𝑉 𝑔 + 𝑡 𝑢−1 𝑋 𝑔 • 𝑝 = 𝜏 𝑦 𝑢 𝑉 𝑝 + 𝑡 𝑢−1 𝑋 𝑝 • • 𝒋: input gate, how much of the new = tanh 𝑦 𝑢 𝑉 + 𝑡 𝑢−1 𝑋 • information will be let through the memory • 𝑑 𝑢 = 𝑑 𝑢−1 ∘ 𝑔 + ∘ 𝑗 cell. • 𝑡 𝑢 = tanh 𝑑 𝑢 ∘ 𝑝 • 𝒈 : forget gate, responsible for information • 𝑧 = 𝑡𝑝𝑔𝑢𝑛𝑏𝑦 𝑊𝑡 𝑢 should be thrown away from memory cell. • 𝒑: output gate, how much of the information will be passed to expose to the next time step. • 𝒉: self-recurrent which is equal to standard RNN • 𝒅 𝒖 : internal memory of the memory cell LSTM Memory Cell • 𝒕 𝒖 : hidden state • 𝐳 : final output 92
LSTM output synchronization
(NLP) Applications of RNNs • Section overview – Language Model – Sentiment analysis / text classification – Machine translation and conversation modeling – Sentence skip-thought vectors
RNN for
Sentiment analysis / text classification • A quick example, to see the idea. • Given text collections and their labels. Predict labels for unseen texts.
Translating Videos to Natural Language Using Deep Recurrent Neural Networks Translating Videos to Natural Language Using Deep Recurrent Neural Networks Subhashini Venugopalan, Huijun Xu, Jeff Donahue, Marcus Rohrbach, Raymond Mooney, Kate Saenko North American Chapter of the Association for Computational Linguistics, Denver, Colorado, June 2015.
Composing music with RNN http://www.hexahedria.com/2015/08/03/composing-music-with-recurrent-neural-networks/
CNN-LSTM-DNN for speech recognition • Ensembles of RNN/LSTM, DNN, & Conv Nets (CNN) give huge gains (state of the art): • T. Sainath, O. Vinyals, A. Senior, H. Sak. “ Convolutional, Long Short-Term Memory, Fully Connected Deep Neural Networks,” ICASSP 2015.
Recommend
More recommend