CS6501: Deep Learning for Visual Recognition Recurrent Neural - PowerPoint PPT Presentation

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs)

Today’s Class • Recurrent Neural Network Cell • Recurrent Neural Networks (RNNs) • Bi-Directional Recurrent Neural Networks (Bi-RNNs) • Multiple-layer / Stacked / Deep Bi-Direction Recurrent Neural Networks • LSTMs and GRUs. • Applications in Vision: Caption Generation.

Recurrent Neural Network Cell ℎ & ℎ " #$$ ! "

Recurrent Neural Network Cell ℎ " = tanh(- .. ℎ & + - .0 ! " ) ℎ & ℎ " #$$ ! "

Recurrent Neural Network Cell 2 " ℎ " ℎ & ℎ " #$$ ℎ " = tanh(- .. ℎ & + - .0 ! " ) ! " 2 " = softmax(- .8 ℎ " )

Recurrent Neural Network Cell , $ = [0.1, 0.05, 0.05, 0.1, 0.7] ℎ $ = [0.1 0.2 0 − 0.3 − 0.1 ] ℎ + = [0 0 0 0 0 0 0 ] !"" ℎ $ = [0.1 0.2 0 − 0.3 − 0.1 ] ℎ $ = tanh(9 :: ℎ + + 9 :< # $ ) # $ = [0 0 1 0 0] , $ = softmax(9 :C ℎ $ )

Recurrent Neural Network Cell e (0.7) , $ = [0.1, 0.05, 0.05, 0.1, 0.7] ℎ $ = [0.1 0.2 0 − 0.3 − 0.1 ] ℎ + = [0 0 0 0 0 0 0 ] !"" ℎ $ = [0.1 0.2 0 − 0.3 − 0.1 ] # $ = [0 0 1 0 0] a b c d e c

Recurrent Neural Network Cell ' " ℎ " ℎ & ℎ " #$$ ℎ " = tanh(. // ℎ & + . /1 ! " ) ! " ' " = softmax(. /8 ℎ " )

Recurrent Neural Network Cell ℎ " ℎ & ℎ " #$$ ℎ " = tanh(- .. ℎ & + - .0 ! " ) ! "

Recurrent Neural Network Cell ℎ & ℎ " #$$ ℎ " = tanh(- .. ℎ & + - .0 ! " ) ! "

(Unrolled) Recurrent Neural Network ℎ & ℎ " ℎ ' ℎ ( #$$ #$$ #$$ ! " ! ' ! (

How can it be used? – e.g. Tagging a Text Sequence One-to-one Sequence Mapping Problems <<possessive>> <<noun>> <<verb>> ) " ) ' ) ( ℎ " ℎ ' ℎ ( ℎ & ℎ " ℎ ' ℎ ( #$$ #$$ #$$ ! " ! ' ! ( my car works

How can it be used? – e.g. Tagging a Text Sequence One-to-one Sequence Mapping Problems Training examples don’t need to be the same length! input output my car works <<possessive>> <<noun>> <<verb>> my dog ate the assignment <<possessive>> <<noun>> <<verb>> <<pronoun>> <<noun>> my mother saved the day <<possessive>> <<noun>> <<verb>> <<pronoun>> <<noun>> the smart kid solved the problem <<pronoun>> <<qualifier>> <<noun>> <<verb>> <<pronoun>> <<noun>>

How can it be used? – e.g. Tagging a Text Sequence One-to-one Sequence Mapping Problems Training examples don’t need to be the same length! input output L(my car works) = 3 L (<<possessive>> <<noun>> <<verb>>) = 3 L( my dog ate the assignment ) = 5 L (<<possessive>> <<noun>> <<verb>> <<pronoun>> <<noun>>) = 5 L( my mother saved the day ) = 5 L (<<possessive>> <<noun>> <<verb>> <<pronoun>> <<noun>>) = 5 L( the smart kid solved the problem ) = 6 L (<<pronoun>> <<qualifier>> <<noun>> <<verb>> <<pronoun>> <<noun>>) = 6

How can it be used? – e.g. Tagging a Text Sequence One-to-one Sequence Mapping Problems Training examples don’t need to be the same length! If we assume a vocabulary of a 1000 possible words and 20 possible output tags input output T: 1000 x 3 T: 20 x 3 T: 1000 x 5 T: 20 x 5 T: 1000 x 5 T: 20 x 5 T: 1000 x 6 T: 20 x 6

How can it be used? – e.g. Tagging a Text Sequence One-to-one Sequence Mapping Problems Training examples don’t need to be the same length! If we assume a vocabulary of a 1000 possible words and 20 possible output tags input output T: 1000 x 3 T: 20 x 3 T: 1000 x 5 T: 20 x 5 T: 1000 x 5 T: 20 x 5 T: 1000 x 6 T: 20 x 6 How do we create batches if inputs and outputs have different shapes?

How can it be used? – e.g. Tagging a Text Sequence One-to-one Sequence Mapping Problems Training examples don’t need to be the same length! If we assume a vocabulary of a 1000 possible words and 20 possible output tags input output T: 1000 x 3 T: 20 x 3 T: 1000 x 5 T: 20 x 5 T: 1000 x 5 T: 20 x 5 T: 1000 x 6 T: 20 x 6 How do we create batches if inputs and outputs have different shapes? Solution 1: Forget about batches, just process things one by one.

How can it be used? – e.g. Tagging a Text Sequence One-to-one Sequence Mapping Problems Training examples don’t need to be the same length! If we assume a vocabulary of a 1000 possible words and 20 possible output tags input output T: 1000 x 3 T: 20 x 3 T: 1000 x 5 T: 20 x 5 T: 1000 x 5 T: 20 x 5 T: 1000 x 6 T: 20 x 6 How do we create batches if inputs and outputs have different shapes? Solution 2: Zero padding. We can put the above vectors in T: 4 x 1000 x 6

How can it be used? – e.g. Tagging a Text Sequence One-to-one Sequence Mapping Problems Training examples don’t need to be the same length! If we assume a vocabulary of a 1000 possible words and 20 possible output tags input output T: 1000 x 3 T: 20 x 3 T: 1000 x 5 T: 20 x 5 T: 1000 x 5 T: 20 x 5 T: 1000 x 6 T: 20 x 6 How do we create batches if inputs and outputs have different shapes? Solution 3: Advanced. Dynamic Batching or Auto-batching https://dynet.readthedocs.io/en/latest/tutorials_notebooks/Autobatching.html

How can it be used? – e.g. Tagging a Text Sequence One-to-one Sequence Mapping Problems Solution 4: Pytorch stacking, padding, and sorting combination

Pytorch RNN

How can it be used? – e.g. Scoring the Sentiment of a Text Sequence Many-to-one Sequence to score problems positive / negative sentiment rating * ℎ ) ℎ & ℎ " ℎ ' ℎ ) … #$$ #$$ #$$ #$$ ! " ! ' ! ( ! ) <<EOS>> the cat likes

How can it be used? – e.g. Sentiment Scoring Many to one Mapping Problems Input training examples don’t need to be the same length! In this case outputs can be. input output this restaurant has good food Positive this restaurant is bad Negative this restaurant is the worst Negative this restaurant is well recommended Positive

How can it be used? – e.g. Text Generation Auto-regressive model – Sequence to Sequence during Training, Auto-regressive during test DURING TRAINING is The world not enough <END> % $ % & % ' % ( % * % ) ℎ $ ℎ & ℎ ' ℎ ( ℎ ) ℎ * ℎ " RNN ℎ $ RNN ℎ & RNN ℎ ' RNN ℎ ( RNN ℎ ) RNN # $ # & # ' # ( # ) # * <START> The world is not enough

How can it be used? – e.g. Text Generation Auto-regressive Models Input training examples don’t need to be the same length! In this case outputs can be. input output this restaurant has good food <END> <START> this restaurant has good food <START> this restaurant is bad this restaurant is bad <END> <START> this restaurant is the worst this restaurant is the worst <END> <START> this restaurant is well recommended this restaurant is well recommended <END>

How can it be used? – e.g. Text Generation Auto-regressive model – Sequence to Sequence during Training, Auto-regressive during test DURING TESTING ℎ " RNN # $ <START>

How can it be used? – e.g. Text Generation Auto-regressive model – Sequence to Sequence during Training, Auto-regressive during test DURING TESTING The % $ ℎ $ ℎ " RNN ℎ $ # $ <START>

How can it be used? – e.g. Text Generation Auto-regressive model – Sequence to Sequence during Training, Auto-regressive during test DURING TESTING The % $ ℎ $ ℎ " RNN ℎ $ RNN # $ # & <START>

How can it be used? – e.g. Text Generation Auto-regressive model – Sequence to Sequence during Training, Auto-regressive during test DURING TESTING The world % $ % & ℎ $ ℎ & ℎ " RNN ℎ $ RNN ℎ & # $ # & <START>

How can it be used? – e.g. Text Generation Auto-regressive model – Sequence to Sequence during Training, Auto-regressive during test DURING TESTING is The world not enough <END> % $ % & % ' % ( % * % ) ℎ $ ℎ & ℎ ' ℎ ( ℎ ) ℎ * ℎ " RNN ℎ $ RNN ℎ & RNN ℎ ' RNN ℎ ( RNN ℎ ) RNN # $ # & # ' # ( # ) # * <START>

Character-level Models a t <<space>> ) " ) ' ) ( ℎ " ℎ ' ℎ ( ℎ & ℎ " ℎ ' ℎ ( #$$ #$$ #$$ ! " ! ' ! ( c a t

CS6501: Deep Learning for Visual Recognition Recurrent Neural - PowerPoint PPT Presentation

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class Recurrent Neural Network Cell Recurrent Neural Networks (RNNs) Bi-Directional Recurrent Neural Networks (Bi-RNNs) Multiple-layer /

CS6501: Deep Learning for Visual Recognition Recognizing People in Images Todays Class

CS6501: Deep Learning for Visual Recognition Seq2Seq Model & Text-to-Image Synthesis

CS6501: Deep Learning for Visual Recognition Object Detection: RCNN, Fast-RCNN, Faster-RCNN

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Introduction to Visual Recognition General visual recognition importance for intelligence?

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Rich representations for Rich representations for learning visual recognition learning visual

Softmax Classifier + SGD Todays Class Intro to Machine Learning What is Machine Learning?

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Audio- -Visual Automatic Speech Recognition: Visual Automatic Speech Recognition: Audio Theory,

Image Retrieval with CNN Giorgos Tolias Visual Recognition Group, CTU in Prague CVPR 2017

Machine visual perception Cordelia Schmid INRIA Grenoble Machine visual perception

Congestion Management for Non-Blocking Clos Networks Nikos Chrysos Inst. of Computer Science

An Open Interface for Hooking Solvers to Modeling Systems Part 2: The Modeling System Interface

Scene Classification with Inception-7 Christian Szegedy with Julian Ibarz and Vincent Vanhoucke

Le Lecture 10 recap ap Prof. Leal-Taix and Prof. Niessner 1 Le LeNet 60k parameters

Convolutional Neural Networks Kaitlin Palmer San Diego State University 1 Outline What are

Announcements Class is 170. Matlab Grader homework, 1 and 2 (of less than 9) homeworks Due 22

Deep Learning Tutorial Part II Greg Shakhnarovich TTI-Chicago December 2016 Deep Learning

Switching and Forwarding Outline Store-and-Forward Switches Bridges and Extended LANs Cell

CS6501: Deep Learning for Visual Recognition Recurrent Neural - PowerPoint PPT Presentation

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class Recurrent Neural Network Cell Recurrent Neural Networks (RNNs) Bi-Directional Recurrent Neural Networks (Bi-RNNs) Multiple-layer /

CS6501: Deep Learning for Visual Recognition Recognizing People in Images Todays Class

CS6501: Deep Learning for Visual Recognition Seq2Seq Model &amp; Text-to-Image Synthesis

CS6501: Deep Learning for Visual Recognition Object Detection: RCNN, Fast-RCNN, Faster-RCNN

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Introduction to Visual Recognition General visual recognition importance for intelligence?

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Rich representations for Rich representations for learning visual recognition learning visual

Softmax Classifier + SGD Todays Class Intro to Machine Learning What is Machine Learning?

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Audio- -Visual Automatic Speech Recognition: Visual Automatic Speech Recognition: Audio Theory,

Image Retrieval with CNN Giorgos Tolias Visual Recognition Group, CTU in Prague CVPR 2017

Machine visual perception Cordelia Schmid INRIA Grenoble Machine visual perception

Congestion Management for Non-Blocking Clos Networks Nikos Chrysos Inst. of Computer Science

An Open Interface for Hooking Solvers to Modeling Systems Part 2: The Modeling System Interface

Scene Classification with Inception-7 Christian Szegedy with Julian Ibarz and Vincent Vanhoucke

Le Lecture 10 recap ap Prof. Leal-Taix and Prof. Niessner 1 Le LeNet 60k parameters

Convolutional Neural Networks Kaitlin Palmer San Diego State University 1 Outline What are

Announcements Class is 170. Matlab Grader homework, 1 and 2 (of less than 9) homeworks Due 22

Deep Learning Tutorial Part II Greg Shakhnarovich TTI-Chicago December 2016 Deep Learning

Switching and Forwarding Outline Store-and-Forward Switches Bridges and Extended LANs Cell

CS6501: Deep Learning for Visual Recognition Seq2Seq Model & Text-to-Image Synthesis