convolutional neural networks for language
play

Convolutional Neural Networks for Language CS 6956: Deep Learning - PowerPoint PPT Presentation

Convolutional Neural Networks for Language CS 6956: Deep Learning for NLP Features from text Example: Sentiment classification The goal: Is the sentiment of a sentence positive, negative or neutral? The film is fun and is host to some truly


  1. Convolutional Neural Networks for Language CS 6956: Deep Learning for NLP

  2. Features from text Example: Sentiment classification The goal: Is the sentiment of a sentence positive, negative or neutral? The film is fun and is host to some truly excellent sequences Approach: Train a multiclass classifier What features? 2

  3. Features from text Example: Sentiment classification The goal: Is the sentiment of a sentence positive, negative or neutral? The film is fun and is host to some truly excellent sequences Approach: Train a multiclass classifier What features? Some words and ngrams are informative, while some are not 3

  4. Features from text Example: Sentiment classification The goal: Is the sentiment of a sentence positive, negative or neutral? The film is fun and is host to some truly excellent sequences Approach: Train a multiclass classifier What features? Some words and ngrams are informative, while some are not We need to: 1. Identify informative local information 2. Aggregate it into a fixed size vector representation 4

  5. Convolutional Neural Networks Designed to 1. Identify local predictors in a larger input 2. Pool them together to create a feature representation 3. And possibly repeat this in a hierarchical fashion In the NLP context, it helps identify predictive ngrams for a task 5

  6. Overview • Convolutional Neural Networks: A brief history • The two operations in a CNN – Convolution – Pooling • Convolution + Pooling as a building block • CNNs in NLP • Recurrent networks vs Convolutional networks 6

  7. Overview • Convolutional Neural Networks: A brief history • The two operations in a CNN – Convolution – Pooling • Convolution + Pooling as a building block • CNNs in NLP • Recurrent networks vs Convolutional networks 7

  8. Convolutional Neural Networks: Brief history First arose in the context of vision • Hubel and Wiesel, 1950s/60s: Mammalian visual cortex contain neurons that respond to small regions and specific patterns in the visual field Fukushima 1980, Neocognitron: Directly inspired by Hubel, Wiesel • – Key idea: locality of features in the visual cortex is important, integrate them locally and propagate them to further layers – Two operations: convolutional layer that reacts to specific patterns and a down-sampling layer that aggregates information • Le Cun 1989-today, Convolutional Neural Network: A supervised version – Related to convolution kernels in computer vision – Very successful on handwriting recognition and other computer vision tasks Has become better over recent years with more data, computation • – Krizhevsky et al 2012: Object detection with ImageNet – The de facto feature extractor for computer vision 8

  9. Convolutional Neural Networks: Brief history First arose in the context of vision • Hubel and Wiesel, 1950s/60s: Mammalian visual cortex contain neurons that respond to small regions and specific patterns in the visual field Nobel Prize in Physiology or Medicine, 1981 Torsten Wiesel David H. Hubel 9

  10. Convolutional Neural Networks: Brief history First arose in the context of vision • Hubel and Wiesel, 1950s/60s: Mammalian visual cortex contain neurons that respond to small regions and specific patterns in the visual field Fukushima 1980, Neocognitron: Directly inspired by Hubel, Wiesel • – Key idea: locality of features in the visual cortex is important, integrate them locally and propagate them to further layers – Two operations 1. convolutional layer that reacts to specific patterns and, 2. a down-sampling layer that aggregates information 10

  11. Convolutional Neural Networks: Brief history First arose in the context of vision • Hubel and Wiesel, 1950s/60s: Mammalian visual cortex contain neurons that respond to small regions and specific patterns in the visual field Fukushima 1980, Neocognitron: Directly inspired by Hubel, Wiesel • – Key idea: locality of features in the visual cortex is important, integrate them locally and propagate them to further layers – Two operations: convolutional layer that reacts to specific patterns and a down-sampling layer that aggregates information • Le Cun 1989-today, Convolutional Neural Network: A supervised version – Related to convolution kernels in computer vision – Success with handwriting recognition and other computer vision tasks 11

  12. Convolutional Neural Networks: Brief history First arose in the context of vision • Hubel and Wiesel, 1950s/60s: Mammalian visual cortex contain neurons that respond to small regions and specific patterns in the visual field Fukushima 1980, Neocognitron: Directly inspired by Hubel, Wiesel • – Key idea: locality of features in the visual cortex is important, integrate them locally and propagate them to further layers – Two operations: convolutional layer that reacts to specific patterns and a down-sampling layer that aggregates information • Le Cun 1989-today, Convolutional Neural Network: A supervised version – Related to convolution kernels in computer vision – Success with handwriting recognition and other computer vision tasks Has become better over recent years with more data, computation • – Krizhevsky et al 2012: Object detection with ImageNet – The de facto feature extractor for computer vision 12

  13. Convolutional Neural Networks: Brief history • Introduced to NLP by Collobert et al, 2011 – Used as a feature extraction system for semantic role labeling • Since then several other applications such as sentiment analysis, question classification, etc – Kalchbrener et al 2014, Kim 2014 13

  14. CNN terminology Shows its computer visions and signal processing origins Filter • – A function that transforms in input matrix/vector into a scalar feature – A filter is a learned feature detector C hannel • – In computer vision, color images have red, blue and green channels – In general, a channel represents a medium that captures information about an input independent of other channels For example, different kinds of word embeddings could be different channels • • Channels could themselves be produced by previous convolutional layers Receptive field • – The region of the input that a filter currently focuses on 14

  15. CNN terminology Shows its computer visions and signal processing origins Filter • – A function that transforms in input matrix/vector into a scalar feature – A filter is a learned feature detector (also called a feature map) C hannel • – In computer vision, color images have red, blue and green channels – In general, a channel represents a medium that captures information about an input independent of other channels For example, different kinds of word embeddings could be different channels • • Channels could themselves be produced by previous convolutional layers Receptive field • – The region of the input that a filter currently focuses on 15

  16. CNN terminology Shows its computer visions and signal processing origins Filter • – A function that transforms in input matrix/vector into a scalar feature – A filter is a learned feature detector (also called a feature map) C hannel • – In computer vision, color images have red, blue and green channels – In general, a channel represents a medium that captures information about an input independent of other channels For example, different kinds of word embeddings could be different channels • • Channels could themselves be produced by previous convolutional layers Receptive field • – The region of the input that a filter currently focuses on 16

  17. CNN terminology Shows its computer visions and signal processing origins Filter • – A function that transforms in input matrix/vector into a scalar feature – A filter is a learned feature detector (also called a feature map) C hannel • – In computer vision, color images have red, blue and green channels – In general, a channel represents a “view of the input” that captures information about an input independent of other channels For example, different kinds of word embeddings could be different channels • • Channels could themselves be produced by previous convolutional layers Receptive field • – The region of the input that a filter currently focuses on 17

  18. Overview • Convolutional Neural Networks: A brief history • The two operations in a CNN – Convolution – Pooling • Convolution + Pooling as a building block • CNNs in NLP • Recurrent networks vs Convolutional networks 18

  19. What is a convolution? Let’s see this using an example for vectors. We will generalize this to matrices and beyond, but the general idea remains the same. 19

  20. What is a convolution? An example using vectors A vector 𝐲 2 3 1 3 2 1 20

  21. What is a convolution? An example using vectors A vector 𝐲 2 3 1 3 2 1 Filter 𝐠 of size 𝑜 1 2 1 Here, the filter size is 3 21

  22. � What is a convolution? An example using vectors A vector 𝐲 2 3 1 3 2 1 Filter 𝐠 of size 𝑜 1 2 1 The output is also a vector output ( = * 𝑔 , ⋅ 𝑦 (/ 0 1 2, , 22

  23. � What is a convolution? An example using vectors A vector 𝐲 2 3 1 3 2 1 Filter 𝐠 of size 𝑜 1 2 1 The output is also a vector output ( = * 𝑔 , ⋅ 𝑦 (/ 0 1 2, , The filter moves across the vector. At each position, the output is the dot product of the filter with a slice of the vector of that size. 23

Recommend


More recommend