Recurrent Neural Networks CS 6956: Deep Learning for NLP Overview - - PowerPoint PPT Presentation

recurrent neural networks
SMART_READER_LITE
LIVE PREVIEW

Recurrent Neural Networks CS 6956: Deep Learning for NLP Overview - - PowerPoint PPT Presentation

Recurrent Neural Networks CS 6956: Deep Learning for NLP Overview 1. Modeling sequences 2. Recurrent neural networks: An abstraction 3. Usage patterns for RNNs 4. BiDirectional RNNs 5. A concrete example: The Elman RNN 6. The vanishing


slide-1
SLIDE 1

CS 6956: Deep Learning for NLP

Recurrent Neural Networks

slide-2
SLIDE 2

Overview

1. Modeling sequences 2. Recurrent neural networks: An abstraction 3. Usage patterns for RNNs 4. BiDirectional RNNs 5. A concrete example: The Elman RNN 6. The vanishing gradient problem 7. Long short-term memory units

1

slide-3
SLIDE 3

Overview

1. Modeling sequences 2. Recurrent neural networks: An abstraction 3. Usage patterns for RNNs 4. BiDirectional RNNs 5. A concrete example: The Elman RNN 6. The vanishing gradient problem 7. Long short-term memory units

2

slide-4
SLIDE 4

What can we do with such an abstraction?

  • 1. The encoder: Convert a sequence into a feature vector for subsequent classification
  • 2. A generator: Produce a sequence using an initial state
  • 3. A transducer: Convert a sequence into another sequence
  • 4. A conditioned generator (or an encoder-decoder): Combine 1 and 2
slide-5
SLIDE 5
  • 1. An Encoder

Convert a sequence into a feature vector for subsequent classification

4

I Initial state like cake

slide-6
SLIDE 6
  • 1. An Encoder

Convert a sequence into a feature vector for subsequent classification

5

I Initial state like cake A neural network

slide-7
SLIDE 7
  • 1. An Encoder

Convert a sequence into a feature vector for subsequent classification

6

I Initial state like cake A neural network loss

slide-8
SLIDE 8
  • 1. An Encoder

Convert a sequence into a feature vector for subsequent classification

7

I Initial state like cake A neural network loss Example: Encode a sentence or a phrase into a feature vector for a classification task such as sentiment classification

slide-9
SLIDE 9
  • 2. A Generator

Produce a sequence using an initial state

8

∅ Initial state ∅ ∅ I like cake

slide-10
SLIDE 10
  • 2. A Generator

Produce a sequence using an initial state

9

∅ Initial state ∅ ∅ I like cake loss

slide-11
SLIDE 11
  • 2. A Generator

Produce a sequence using an initial state

10

∅ Initial state I like I like cake loss Maybe the previous output becomes the current input

slide-12
SLIDE 12
  • 2. A Generator

Produce a sequence using an initial state

11

∅ Initial state I like I like cake loss Examples: Text generation tasks

slide-13
SLIDE 13
  • 3. A Transducer

Convert a sequence into another sequence

12

I Initial state like cake Pronoun Verb Noun

slide-14
SLIDE 14
  • 3. A Transducer

Convert a sequence into another sequence

13

I Initial state like cake Pronoun Verb Noun loss

slide-15
SLIDE 15
  • 4. Conditioned generator

Or an encoder-decoder: First encode a sequence, then generate another one

14

I Initial state like cake First encode a sequence

slide-16
SLIDE 16
  • 4. Conditioned generator

Or an encoder-decoder: First encode a sequence, then generate another one

15

I Initial state like cake ∅ ∅ ∅ मला केक आवडतो Then decode it to produce a different sequence

slide-17
SLIDE 17
  • 4. Conditioned generator

Or an encoder-decoder: First encode a sequence, then generate another one

16

I Initial state like cake ∅ ∅ ∅ मला केक आवडतो Example: A building block for neural machine translation

slide-18
SLIDE 18

Stacking RNNs

  • A commonly seen usage pattern
  • An RNN takes an input sequence and produces an output

sequence

  • The input to an RNN can itself be the output of an RNN –

stacked RNNs, also called deep RNNs

  • Two or more layers often seems to improve prediction

performance

17