traitement automatique des langues fondements et
play

Traitement automatique des langues : Fondements et applications - PowerPoint PPT Presentation

Traitement automatique des langues : Fondements et applications Cours 11 : Neural networks (2) Tim Van de Cruys & Philippe Muller 20162017 Introduction Machine learning for NLP Standard approach: linear model trained over


  1. Traitement automatique des langues : Fondements et applications Cours 11 : Neural networks (2) Tim Van de Cruys & Philippe Muller 2016—2017

  2. Introduction Machine learning for NLP • Standard approach: linear model trained over high-dimensional but very sparse feature vectors • Recently: non-linear neural networks over dense input vectors

  3. Neural Network Architectures Feed-forward neural networks • Best known, standard neural network approach • Fully connected layers • Can be used as drop-in replacement for typical NLP classifiers

  4. Convolutional neural network Introduction • Type of feedforward neural network • Certain layers are not fully connected but locally connected (convolutional layers, pooling layers) • same, local cues appear in different places in input (cfr. vision)

  5. Convolutional neural network Intuition

  6. Convolutional neural network Intuition

  7. Convolutional neural network Intuition

  8. Convolutional neural network Architecture

  9. Convolutional neural network Encoding sentences How to represent variable number of features, e.g. words in a sentence, document? • Continuous Bag of Words (CBOW): sum embedding vectors of corresponding features • no ordering info (”not good quite bad” = ”not bad quite good”) • Convolutional layer • ’Sliding window’ approach that takes local structure into account • Combine individual windows to create vector of fixed size

  10. Continuous bag of words Variable number of features • Feed-forward network assumes fixed dimensional input • How to represent variable number of features, e.g. words in a sentence, document? • Continuous Bag of Words (CBOW): sum embedding vectors of corresponding features

  11. Convolutional neural network Convolutional layer for NLP • Goal: identify indicative local features (n-grams) in large structure, combine them into fixed size vector • Convolution: apply filter to each window (linear transformation + non-linear activation) • Pooling: combine by taking maximum

  12. Convolutional neural networks Architecture for NLP

  13. Neural Network Architectures Recurrent (+ recursive) neural networks • Handle structured data of arbitrary sizes • Recurrent networks for sequences • Recursive networks for trees

  14. Recurrent neural network Introduction • CBOW: no ordering, no structure • CNN: improvement, but mostly local patterns • RNN: represent arbitrarily sized structured input as fixed-size vectors, paying attention to structured properties

  15. Recurrent neural network Model • x 1 : input layer (current word) • a 1 : hidden layer of current timestep • a 0 : hidden layer of previous timestep • U , W and V : weights matrices • f ( · ) : element-wise activation function (sigmoid) • g ( · ) : softmax function to ensure probability distribution a 1 = f ( Ux 1 + Wa 0 ) (1) y 1 = g ( Va 1 ) (2)

  16. Recurrent neural network Graphical representation

  17. Recurrent neural network Training • Consider recurrent neural network as very deep neural network with shared parameters across computation • Backpropagation through time • What kind of supervision? • Acceptor: based on final state • Transducer: an output for each input (e.g. language modeling) • Encoder-decoder: one RNN to encode sequence into vector representation, another RNN to decode into sequence (e.g. machine translation)

  18. Recurrent neural network Training: graphical representation

  19. Recurrent neural network Multi-layer RNN • multiple layers of RNNs • input of next layer is output of RNN layer below it • Empirically shown to work better

  20. Recurrent neural network Bi-directional RNN • Input sequence both forward and backward to different RNNs • Representation is concatenation of forward and backward state (A & A’) • Represent both history and future

  21. Concrete RNN architectures Simple RNN

  22. Concrete RNN architectures LSTM • Long short term memory networks • In practice, simple RNNs only able to remember narrow context (vanishing gradient) • LSTM: complex architecture able to capture long-term dependencies

  23. Concrete RNN architectures LSTM

  24. Concrete RNN architectures LSTM

  25. Concrete RNN architectures LSTM

  26. Concrete RNN architectures LSTM

  27. Concrete RNN architectures LSTM

  28. Concrete RNN architectures LSTM

  29. Concrete RNN architectures GRU • LSTM: effective, but complex, computationally expensive • GRU: cheaper alternative that works well in practice

  30. Concrete RNN architectures GRU • reset gate ( r ): how much information from previous hidden state needs to be included (reset with current information?) • upgate gate ( z ): controls updates to hidden state (how much does hidden state need to be updated with current information?)

  31. Recursive neural networks Introduction • Generalization of RNNs from sequences to (binary) trees • Linear transformation + non-linear activation function applied recursively throughout a tree • Useful for parsing

  32. Application Image to caption generation

  33. Application Image to caption generation

  34. Application Neural machine translation

  35. Application Neural machine translation

  36. Application Neural dialogue generation (chatbot)

  37. Software • Tensorflow • Python, C++ • http://www.tensorflow.org • Theano • Python • http://deeplearning.net/software/theano/ • Keras • Theano/tensorflow-based modular deep learning library • Lasagne • Theano-based deep learning library

Recommend


More recommend