Convolutional Neural Networks for Language CS 6956: Deep Learning for NLP
Features from text Example: Sentiment classification The goal: Is the sentiment of a sentence positive, negative or neutral? The film is fun and is host to some truly excellent sequences Approach: Train a multiclass classifier What features? 2
Features from text Example: Sentiment classification The goal: Is the sentiment of a sentence positive, negative or neutral? The film is fun and is host to some truly excellent sequences Approach: Train a multiclass classifier What features? Some words and ngrams are informative, while some are not 3
Features from text Example: Sentiment classification The goal: Is the sentiment of a sentence positive, negative or neutral? The film is fun and is host to some truly excellent sequences Approach: Train a multiclass classifier What features? Some words and ngrams are informative, while some are not We need to: 1. Identify informative local information 2. Aggregate it into a fixed size vector representation 4
Convolutional Neural Networks Designed to 1. Identify local predictors in a larger input 2. Pool them together to create a feature representation 3. And possibly repeat this in a hierarchical fashion In the NLP context, it helps identify predictive ngrams for a task 5
Overview • Convolutional Neural Networks: A brief history • The two operations in a CNN – Convolution – Pooling • Convolution + Pooling as a building block • CNNs in NLP • Recurrent networks vs Convolutional networks 6
Overview • Convolutional Neural Networks: A brief history • The two operations in a CNN – Convolution – Pooling • Convolution + Pooling as a building block • CNNs in NLP • Recurrent networks vs Convolutional networks 7
Convolutional Neural Networks: Brief history First arose in the context of vision • Hubel and Wiesel, 1950s/60s: Mammalian visual cortex contain neurons that respond to small regions and specific patterns in the visual field Fukushima 1980, Neocognitron: Directly inspired by Hubel, Wiesel • – Key idea: locality of features in the visual cortex is important, integrate them locally and propagate them to further layers – Two operations: convolutional layer that reacts to specific patterns and a down-sampling layer that aggregates information • Le Cun 1989-today, Convolutional Neural Network: A supervised version – Related to convolution kernels in computer vision – Very successful on handwriting recognition and other computer vision tasks Has become better over recent years with more data, computation • – Krizhevsky et al 2012: Object detection with ImageNet – The de facto feature extractor for computer vision 8
Convolutional Neural Networks: Brief history First arose in the context of vision • Hubel and Wiesel, 1950s/60s: Mammalian visual cortex contain neurons that respond to small regions and specific patterns in the visual field Nobel Prize in Physiology or Medicine, 1981 Torsten Wiesel David H. Hubel 9
Convolutional Neural Networks: Brief history First arose in the context of vision • Hubel and Wiesel, 1950s/60s: Mammalian visual cortex contain neurons that respond to small regions and specific patterns in the visual field Fukushima 1980, Neocognitron: Directly inspired by Hubel, Wiesel • – Key idea: locality of features in the visual cortex is important, integrate them locally and propagate them to further layers – Two operations 1. convolutional layer that reacts to specific patterns and, 2. a down-sampling layer that aggregates information 10
Convolutional Neural Networks: Brief history First arose in the context of vision • Hubel and Wiesel, 1950s/60s: Mammalian visual cortex contain neurons that respond to small regions and specific patterns in the visual field Fukushima 1980, Neocognitron: Directly inspired by Hubel, Wiesel • – Key idea: locality of features in the visual cortex is important, integrate them locally and propagate them to further layers – Two operations: convolutional layer that reacts to specific patterns and a down-sampling layer that aggregates information • Le Cun 1989-today, Convolutional Neural Network: A supervised version – Related to convolution kernels in computer vision – Success with handwriting recognition and other computer vision tasks 11
Convolutional Neural Networks: Brief history First arose in the context of vision • Hubel and Wiesel, 1950s/60s: Mammalian visual cortex contain neurons that respond to small regions and specific patterns in the visual field Fukushima 1980, Neocognitron: Directly inspired by Hubel, Wiesel • – Key idea: locality of features in the visual cortex is important, integrate them locally and propagate them to further layers – Two operations: convolutional layer that reacts to specific patterns and a down-sampling layer that aggregates information • Le Cun 1989-today, Convolutional Neural Network: A supervised version – Related to convolution kernels in computer vision – Success with handwriting recognition and other computer vision tasks Has become better over recent years with more data, computation • – Krizhevsky et al 2012: Object detection with ImageNet – The de facto feature extractor for computer vision 12
Convolutional Neural Networks: Brief history • Introduced to NLP by Collobert et al, 2011 – Used as a feature extraction system for semantic role labeling • Since then several other applications such as sentiment analysis, question classification, etc – Kalchbrener et al 2014, Kim 2014 13
CNN terminology Shows its computer visions and signal processing origins Filter • – A function that transforms in input matrix/vector into a scalar feature – A filter is a learned feature detector C hannel • – In computer vision, color images have red, blue and green channels – In general, a channel represents a medium that captures information about an input independent of other channels For example, different kinds of word embeddings could be different channels • • Channels could themselves be produced by previous convolutional layers Receptive field • – The region of the input that a filter currently focuses on 14
CNN terminology Shows its computer visions and signal processing origins Filter • – A function that transforms in input matrix/vector into a scalar feature – A filter is a learned feature detector (also called a feature map) C hannel • – In computer vision, color images have red, blue and green channels – In general, a channel represents a medium that captures information about an input independent of other channels For example, different kinds of word embeddings could be different channels • • Channels could themselves be produced by previous convolutional layers Receptive field • – The region of the input that a filter currently focuses on 15
CNN terminology Shows its computer visions and signal processing origins Filter • – A function that transforms in input matrix/vector into a scalar feature – A filter is a learned feature detector (also called a feature map) C hannel • – In computer vision, color images have red, blue and green channels – In general, a channel represents a medium that captures information about an input independent of other channels For example, different kinds of word embeddings could be different channels • • Channels could themselves be produced by previous convolutional layers Receptive field • – The region of the input that a filter currently focuses on 16
CNN terminology Shows its computer visions and signal processing origins Filter • – A function that transforms in input matrix/vector into a scalar feature – A filter is a learned feature detector (also called a feature map) C hannel • – In computer vision, color images have red, blue and green channels – In general, a channel represents a “view of the input” that captures information about an input independent of other channels For example, different kinds of word embeddings could be different channels • • Channels could themselves be produced by previous convolutional layers Receptive field • – The region of the input that a filter currently focuses on 17
Overview • Convolutional Neural Networks: A brief history • The two operations in a CNN – Convolution – Pooling • Convolution + Pooling as a building block • CNNs in NLP • Recurrent networks vs Convolutional networks 18
What is a convolution? Let’s see this using an example for vectors. We will generalize this to matrices and beyond, but the general idea remains the same. 19
What is a convolution? An example using vectors A vector 𝐲 2 3 1 3 2 1 20
What is a convolution? An example using vectors A vector 𝐲 2 3 1 3 2 1 Filter 𝐠 of size 𝑜 1 2 1 Here, the filter size is 3 21
� What is a convolution? An example using vectors A vector 𝐲 2 3 1 3 2 1 Filter 𝐠 of size 𝑜 1 2 1 The output is also a vector output ( = * 𝑔 , ⋅ 𝑦 (/ 0 1 2, , 22
� What is a convolution? An example using vectors A vector 𝐲 2 3 1 3 2 1 Filter 𝐠 of size 𝑜 1 2 1 The output is also a vector output ( = * 𝑔 , ⋅ 𝑦 (/ 0 1 2, , The filter moves across the vector. At each position, the output is the dot product of the filter with a slice of the vector of that size. 23
Recommend
More recommend