Traitement automatique des langues : Fondements et applications Cours 11 : Neural networks (2) Tim Van de Cruys & Philippe Muller 2016—2017
Introduction Machine learning for NLP • Standard approach: linear model trained over high-dimensional but very sparse feature vectors • Recently: non-linear neural networks over dense input vectors
Neural Network Architectures Feed-forward neural networks • Best known, standard neural network approach • Fully connected layers • Can be used as drop-in replacement for typical NLP classifiers
Convolutional neural network Introduction • Type of feedforward neural network • Certain layers are not fully connected but locally connected (convolutional layers, pooling layers) • same, local cues appear in different places in input (cfr. vision)
Convolutional neural network Intuition
Convolutional neural network Intuition
Convolutional neural network Intuition
Convolutional neural network Architecture
Convolutional neural network Encoding sentences How to represent variable number of features, e.g. words in a sentence, document? • Continuous Bag of Words (CBOW): sum embedding vectors of corresponding features • no ordering info (”not good quite bad” = ”not bad quite good”) • Convolutional layer • ’Sliding window’ approach that takes local structure into account • Combine individual windows to create vector of fixed size
Continuous bag of words Variable number of features • Feed-forward network assumes fixed dimensional input • How to represent variable number of features, e.g. words in a sentence, document? • Continuous Bag of Words (CBOW): sum embedding vectors of corresponding features
Convolutional neural network Convolutional layer for NLP • Goal: identify indicative local features (n-grams) in large structure, combine them into fixed size vector • Convolution: apply filter to each window (linear transformation + non-linear activation) • Pooling: combine by taking maximum
Convolutional neural networks Architecture for NLP
Neural Network Architectures Recurrent (+ recursive) neural networks • Handle structured data of arbitrary sizes • Recurrent networks for sequences • Recursive networks for trees
Recurrent neural network Introduction • CBOW: no ordering, no structure • CNN: improvement, but mostly local patterns • RNN: represent arbitrarily sized structured input as fixed-size vectors, paying attention to structured properties
Recurrent neural network Model • x 1 : input layer (current word) • a 1 : hidden layer of current timestep • a 0 : hidden layer of previous timestep • U , W and V : weights matrices • f ( · ) : element-wise activation function (sigmoid) • g ( · ) : softmax function to ensure probability distribution a 1 = f ( Ux 1 + Wa 0 ) (1) y 1 = g ( Va 1 ) (2)
Recurrent neural network Graphical representation
Recurrent neural network Training • Consider recurrent neural network as very deep neural network with shared parameters across computation • Backpropagation through time • What kind of supervision? • Acceptor: based on final state • Transducer: an output for each input (e.g. language modeling) • Encoder-decoder: one RNN to encode sequence into vector representation, another RNN to decode into sequence (e.g. machine translation)
Recurrent neural network Training: graphical representation
Recurrent neural network Multi-layer RNN • multiple layers of RNNs • input of next layer is output of RNN layer below it • Empirically shown to work better
Recurrent neural network Bi-directional RNN • Input sequence both forward and backward to different RNNs • Representation is concatenation of forward and backward state (A & A’) • Represent both history and future
Concrete RNN architectures Simple RNN
Concrete RNN architectures LSTM • Long short term memory networks • In practice, simple RNNs only able to remember narrow context (vanishing gradient) • LSTM: complex architecture able to capture long-term dependencies
Concrete RNN architectures LSTM
Concrete RNN architectures LSTM
Concrete RNN architectures LSTM
Concrete RNN architectures LSTM
Concrete RNN architectures LSTM
Concrete RNN architectures LSTM
Concrete RNN architectures GRU • LSTM: effective, but complex, computationally expensive • GRU: cheaper alternative that works well in practice
Concrete RNN architectures GRU • reset gate ( r ): how much information from previous hidden state needs to be included (reset with current information?) • upgate gate ( z ): controls updates to hidden state (how much does hidden state need to be updated with current information?)
Recursive neural networks Introduction • Generalization of RNNs from sequences to (binary) trees • Linear transformation + non-linear activation function applied recursively throughout a tree • Useful for parsing
Application Image to caption generation
Application Image to caption generation
Application Neural machine translation
Application Neural machine translation
Application Neural dialogue generation (chatbot)
Software • Tensorflow • Python, C++ • http://www.tensorflow.org • Theano • Python • http://deeplearning.net/software/theano/ • Keras • Theano/tensorflow-based modular deep learning library • Lasagne • Theano-based deep learning library
Recommend
More recommend