Machine Learning for Computational Linguistics Recurrent neural networks (RNNs) Çağrı Çöltekin University of Tübingen Seminar für Sprachwissenschaft July 5, 2016
Neural networks: a quick summary Recurrent neural networks July 5, 2016 SfS / University of Tübingen Ç. Çöltekin, backpropagation sigmoid , tanh , or ReLU functions, such as logistic 1 / 22 Feed-forward networks f () g () h = f ( W ( 1 ) x ) w ( 1 ) y = g ( W ( 2 ) h ) ( 2 ) w x 1 11 h 1 1 1 y 1 = g ( W ( 2 ) f ( W ( 1 ) x )) w w ( 1 ) ( 2 1 12 ) ( 1 ) 2 w ( 2 ) 01 1 0 w ( 1 ) w w ( 2 ) • f () and g () are non-linear 21 1 2 w ( 1 ) ( 2 ) w x 2 22 h 2 2 2 y 2 • weights are updated using w ( 1 ) w ( 2 ) 2 02 0 1 1
Neural networks: a quick summary Recurrent neural networks July 5, 2016 SfS / University of Tübingen Ç. Çöltekin, – predicting the local environment (word2vec, GloVe) – global statistics over the complete data (e.g., SVD) representations: unlabeled data words, as well as relations between them representations represent similarities/difgerences between methods, particularly for neural networks Dense (word) representations 2 / 22 • Dense vector representations are useful for many ML • Unlike sparse (one-of-K / one-hot) representations, dense • General-purpose word vectors can be trained with • They can also be trained for the task at hand • Two methods to obtain (general purpose) dense
Neural networks: a quick summary Recurrent neural networks July 5, 2016 SfS / University of Tübingen Ç. Çöltekin, vanishing or exploding gradients back propagation may result in useful layers/hierarchies of features are problems where successful in many task layers) have recently been … Deep feed-forward networks 3 / 22 • Deep neural networks (>2 hidden • They are particularly useful in • Training deep networks with x 1 x m
Neural networks: a quick summary Recurrent neural networks July 5, 2016 SfS / University of Tübingen Ç. Çöltekin, of location invariance by a weighted some of its neighbors Pooling Convolution 4 / 22 Convolutional networks ′ ′ ′ h h h 1 2 3 h 1 h 2 h 3 h 4 h 5 x 1 x 2 x 3 x 4 x 5 • Convolution transforms input by replacing each input unit • Typically it is followed by pooling • CNNs are useful to detect local features with some amount • Sparse connectivity makes CNNs computationally effjcient
Neural networks: a quick summary Feature maps July 5, 2016 SfS / University of Tübingen Ç. Çöltekin, Classifjer Features Pooling Convolution Recurrent neural networks Word vectors Input seeing worth really not CNNs for NLP 5 / 22
Neural networks: a quick summary Recurrent neural networks Recurrent neural networks: motivation – can only learn associations – they do not have memory of earlier inputs: they cannot handle sequences learning Ç. Çöltekin, SfS / University of Tübingen July 5, 2016 6 / 22 • Feed forward networks • Recurrent neural networks are NN solution for sequence • This is achieved by recursive loops in the network
Neural networks: a quick summary Recurrent neural networks July 5, 2016 SfS / University of Tübingen Ç. Çöltekin, somewhat tricky Forward calculation is straightforward, learning becomes hidden layers) as well as the input But they include loops that use previous output (of the feed-forward networks 7 / 22 Recurrent neural networks y h 1 h 2 h 3 h 4 x 1 x 2 x 3 x 4 • Recurrent neural networks are similar to the standard
Neural networks: a quick summary Recurrent neural networks July 5, 2016 SfS / University of Tübingen Ç. Çöltekin, somewhat tricky Forward calculation is straightforward, learning becomes hidden layers) as well as the input feed-forward networks 7 / 22 Recurrent neural networks y h 1 h 2 h 3 h 4 x 1 x 2 x 3 x 4 • Recurrent neural networks are similar to the standard • But they include loops that use previous output (of the
Neural networks: a quick summary Recurrent neural networks July 5, 2016 SfS / University of Tübingen Ç. Çöltekin, somewhat tricky hidden layers) as well as the input feed-forward networks 7 / 22 Recurrent neural networks y h 1 h 2 h 3 h 4 x 1 x 2 x 3 x 4 • Recurrent neural networks are similar to the standard • But they include loops that use previous output (of the • Forward calculation is straightforward, learning becomes
Neural networks: a quick summary Recurrent neural networks July 5, 2016 SfS / University of Tübingen Ç. Çöltekin, dependencies long-distance but cannot learn network feed-forward states (context units) previous hidden Hidden units Output units Context units Input Elman (1990) A simple version: SRNs 8 / 22 • The network keeps • The rest is just like a y p o c • Training is simple,
Neural networks: a quick summary Recurrent neural networks Processing sequences with RNNs links Ç. Çöltekin, SfS / University of Tübingen July 5, 2016 9 / 22 • RNNs process sequences one unit at a time • The earlier input afgects the output through the recurrent y h 1 h 2 h 3 h 4
Neural networks: a quick summary Recurrent neural networks Processing sequences with RNNs links not Ç. Çöltekin, SfS / University of Tübingen July 5, 2016 9 / 22 • RNNs process sequences one unit at a time • The earlier input afgects the output through the recurrent y h 1 h 2 h 3 h 4
Neural networks: a quick summary Recurrent neural networks Processing sequences with RNNs links really Ç. Çöltekin, SfS / University of Tübingen July 5, 2016 9 / 22 • RNNs process sequences one unit at a time • The earlier input afgects the output through the recurrent y h 1 h 2 h 3 h 4
Neural networks: a quick summary Recurrent neural networks Processing sequences with RNNs links worth Ç. Çöltekin, SfS / University of Tübingen July 5, 2016 9 / 22 • RNNs process sequences one unit at a time • The earlier input afgects the output through the recurrent y h 1 h 2 h 3 h 4
Neural networks: a quick summary Recurrent neural networks Processing sequences with RNNs links seeing Ç. Çöltekin, SfS / University of Tübingen July 5, 2016 9 / 22 • RNNs process sequences one unit at a time • The earlier input afgects the output through the recurrent y h 1 h 2 h 3 h 4
Neural networks: a quick summary Recurrent neural networks July 5, 2016 SfS / University of Tübingen Ç. Çöltekin, backpropagated how errors should be fjrst not that obvious 10 / 22 Learning in recurrent networks y ( 1 ) • We need to learn three sets of weights: W 0 , W 1 and W 0 W 1 • Backpropagation in RNNs are at W 3 h ( 1 ) • It is not immediately obvious W 0 x
Neural networks: a quick summary … July 5, 2016 SfS / University of Tübingen Ç. Çöltekin, Note: the weights with the same color are shared. … Recurrent neural networks 11 / 22 Back propagation through time (BPTT) … Unrolling a recurrent network y ( 0 ) y ( 1 ) y ( t − 1 ) y ( t ) h ( 0 ) h ( 1 ) h ( t − 1 ) h ( t ) x ( 0 ) x ( 1 ) x ( t − 1 ) x ( t )
Neural networks: a quick summary Recurrent neural networks July 5, 2016 SfS / University of Tübingen Ç. Çöltekin, … … … Many-to-many (e.g., POS tagging) RNN architectures 12 / 22 y ( 0 ) y ( 1 ) y ( t − 1 ) y ( t ) h ( 0 ) h ( 1 ) h ( t − 1 ) h ( t ) x ( 0 ) x ( 1 ) x ( t − 1 ) x ( t )
Neural networks: a quick summary Recurrent neural networks July 5, 2016 SfS / University of Tübingen Ç. Çöltekin, … 12 / 22 … Many-to-one (e.g., document classifjcation) RNN architectures y ( t ) h ( 0 ) h ( 1 ) h ( t − 1 ) h ( t ) x ( 0 ) x ( 1 ) x ( t − 1 ) x ( t )
Neural networks: a quick summary Recurrent neural networks July 5, 2016 SfS / University of Tübingen Ç. Çöltekin, … 12 / 22 Many-to-one with a delay (e.g., machine translation) … RNN architectures y ( t − 1 ) y ( t ) h ( 0 ) h ( 1 ) h ( t − 1 ) h ( t ) x ( 0 ) x ( 1 ) x ( t − 1 ) x ( t )
Neural networks: a quick summary … July 5, 2016 SfS / University of Tübingen Ç. Çöltekin, … … Backward states … Forward states Recurrent neural networks Bidirectional RNNs 13 / 22 y ( t − 1 ) y ( t ) y ( t − 1 ) x ( t − 1 ) x ( t ) x ( t + 1 )
Neural networks: a quick summary Recurrent neural networks July 5, 2016 SfS / University of Tübingen Ç. Çöltekin, probabilities cannot be estimated reliably due to sparsity corpus 14 / 22 sequences of words words A short divergence: language models • Language models are useful in many NLP tasks • A language model defjnes a probability distribution over • An ngram model assigns probabilities of a sequence of m ∏ P ( w 1 , . . . , w m ) ≈ P ( w i | w i − 1 , . . . , w i −( n − 1 ) ) i = 1 • Conditional probabilities are estimated from a (unlabeled) • Larger ngrams require lots of memory, and their
Neural networks: a quick summary Recurrent neural networks RNNs as language models learn dependencies at a longer distance Ç. Çöltekin, SfS / University of Tübingen July 5, 2016 15 / 22 • RNNs can function as language models • We can train RNNs using unlabeled data for this purpose • During training the task of RNN is to predict the next word • Depending on the network confjguration, an RNN can • The resulting system can generate sequences
Recommend
More recommend