Attention Models Attention Models: Motivation bird Image: H x W x - PowerPoint PPT Presentation

Day 4 Lecture 6 Attention Models

Attention Models: Motivation bird Image: H x W x 3 The whole input volume is used to predict the output... 2

Attention Models: Motivation bird Image: H x W x 3 The whole input volume is used to predict the output... ...despite the fact that not all pixels are equally important 3

Attention Models: Motivation Attention models can relieve computational burden Helpful when processing big images ! 4

Attention Models: Motivation Attention models can relieve computational burden Helpful when processing big images ! bird 5

Encoder & Decoder From previous lecture... The whole input sentence is used to produce the translation Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015) 6

Attention Models Bahdanau et al. Neural Machine Translation by Jointly Learning to Align and Translate. ICLR 2015 Kyunghyun Cho, “Introduction to Neural Machine Translation with GPUs” (2015) 7

Attention Models Idea: Focus in different parts of the input as you make/refine predictions in time E.g.: Image Captioning A bird flying over a body of water 8

LSTM Decoder A bird flying ... <EOS> ... LSTM LSTM LSTM LSTM CNN Features: D The LSTM decoder “sees” the input only at the beginning ! 9

Attention for Image Captioning CNN Features: Image: L x D H x W x 3 Slide Credit: CS231n 10

Attention for Image Captioning Attention weights (LxD) a1 CNN h0 Features: Image: L x D H x W x 3 Slide Credit: CS231n 11

Attention for Image Captioning Attention predicted weights (LxD) word a1 a2 y2 CNN h0 h1 Features: Image: L x D Weighted H x W x 3 z1 y1 features: D Weighted combination First word of features Slide Credit: CS231n 12

Attention for Image Captioning Attention predicted weights (LxD) word a1 a2 y2 a3 y3 CNN h0 h1 h2 Features: Image: L x D Weighted H x W x 3 z1 y1 z2 y2 features: D Weighted combination First word of features Slide Credit: CS231n 13

Attention for Image Captioning Xu et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. ICML 2015 14

Soft Attention Soft attention: Summarize ALL locations z = p a a+ p b b + p c c + p d d a b CNN Derivative dz/dp is nice! c d Train with gradient descent Grid of features Image: (Each D- H x W x 3 dimensional) Context vector z (D-dimensional) p a p b From RNN: p c p d Distribution over grid locations p a + p b + p c + p c = 1 Xu et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. ICML 2015 Slide Credit: CS231n 17

Soft Attention Soft attention: Summarize ALL locations z = p a a+ p b b + p c c + p d d a b CNN Differentiable function c d Train with gradient descent Grid of features Image: (Each D- H x W x 3 dimensional) Context vector z (D-dimensional) p a p b From RNN: p c p d ● Still uses the whole input ! Distribution over ● Constrained to fix grid grid locations p a + p b + p c + p c = 1 Xu et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. ICML 2015 Slide Credit: CS231n 18

Hard attention Hard attention : Sample a subset of the input Not a differentiable function ! Input image: Cropped and H x W x 3 rescaled image: Can’t train with backprop :( X x Y x 3 Box Coordinates: (xc, yc, w, h) need reinforcement learning Gradient is 0 almost everywhere Gradient is undefined at x = 0 19

Hard attention Classify images by attending to Generate images by attending to arbitrary regions of the input arbitrary regions of the output Gregor et al. DRAW: A Recurrent Neural Network For Image Generation. ICML 2015 20

Hard attention Gregor et al. DRAW: A Recurrent Neural Network For Image Generation. ICML 2015 21

Hard attention Read text, generate handwriting using an RNN that attends at different arbitrary regions over time REAL GENERATED Graves. Generating Sequences with Recurrent Neural Networks. arXiv 2013 22

Hard attention CNN bird Input image: Cropped and H x W x 3 rescaled image: X x Y x 3 Box Coordinates: (xc, yc, w, h) Not a differentiable function ! Can’t train with backprop :( 23

Spatial Transformer Networks CNN bird Input image: Cropped and H x W x 3 rescaled image: X x Y x 3 Box Coordinates: (xc, yc, w, h) Not a differentiable function ! Make it differentiable Can’t train with backprop :( Train with backprop :) 24 Jaderberg et al. Spatial Transformer Networks. NIPS 2015

Spatial Transformer Networks Network Idea : Function mapping attends to pixel coordinates (xt, yt) of input by output to pixel coordinates Can we make this predicting � (xs, ys) of input function differentiable? Repeat for all pixels Input image: in output to get a Cropped and H x W x 3 sampling grid rescaled image: X x Y x 3 Then use bilinear Box Coordinates: interpolation to (xc, yc, w, h) compute output Jaderberg et al. Spatial Transformer Networks. NIPS 2015 Slide Credit: CS231n 25

Spatial Transformer Networks Insert spatial transformers into a classification network and it learns to attend and transform the input Differentiable module Easy to incorporate in any network, anywhere ! Jaderberg et al. Spatial Transformer Networks. NIPS 2015 26

Spatial Transformer Networks Fine-grained classification Jaderberg et al. Spatial Transformer Networks. NIPS 2015 27

Visual Attention Visual Question Answering Zhu et al. Visual7w: Grounded Question Answering in Images. arXiv 2016 28

Visual Attention Action Recognition in Videos Salient Object Detection Sharma et al. Action Recognition Using Visual Attention. arXiv 2016 Kuen et al. Recurrent Attentional Networks for Saliency Detection. CVPR 2016 29

Other examples Attention to scale for Semantic attention semantic segmentation For image captioning Chen et al. Attention to Scale: Scale-aware Semantic Image Segmentation. CVPR 2016 You et al. Image Captioning with Semantic Attention. CVPR 2016 30

Resources ● CS231n Lecture @ Stanford [slides][video] ● More on Reinforcement Learning ● Soft vs Hard attention ● Handwriting generation demo ● Spatial Transformer Networks - Slides & Video by Victor Campos ● Attention implementations: ○ Seq2seq in Keras ○ DRAW & Spatial Transformers in Keras ○ DRAW in Lasagne ○ DRAW in Tensorflow 31

Attention Models Attention Models: Motivation bird Image: H x W x - PowerPoint PPT Presentation

Day 4 Lecture 6 Attention Models Attention Models: Motivation bird Image: H x W x 3 The whole input volume is used to predict the output... 2 Attention Models: Motivation bird Image: H x W x 3 The whole input volume is used to predict

Attention in NLP CS 6956: Deep Learning for NLP Overview What is attention Attention in

Attention Eye tracking seminar 2/19/15 Presented by Tatiana Emmanouil Outline What is

Attention, Transformer and BERT Prof. Kuan-Ting Lai 2020/6/16 Attention is All You Need! A.

Attention Models Focus on parts of input Olof Mogren Improves NN performance on different

Attention! 1. Definitions and behavioral effects 2. Effects on neural firing rates: Spatial

The Attention Economy What is the attention economy? A business model where you (as the

Advanced Neural Machine Translation Gongbo Tang 23 September 2019 Outline NMT with Attention

Attention is All You Need (Vaswani et. al. 2017) Slides and figures when not cited are from:

Advanced Neural Machine Translation Gongbo Tang 21 September 2020 Outline NMT with Attention

Sketch Model Review MotoThresher Empowering Tanzanian Farmers Motivation Motivation

Consciousness First? Attention First? David Chalmers Some Issues Q1: Is there consciousness

Attention and its (mis)interpretation Danish Pruthi 1 Acknowledgements Mansi Gupta Bhuwan

Visual Attention FEF V4 spatial attention: simultaneous neural recordings in V4

A Convolutional Attention Network for Extreme Summarization of Source Code ATTENTION

CS480/680 Lecture 19: July 10, 2019 Attention and Transformer Networks [Vaswani et al.,

with Polynomial Filters Josiah Manson and Scott Schaefer Texas A&M University Motivation

A Neural Attention Model for Sentence Summarization Alexander M. Rush, Sumit Chopra, Jason

Sequence-to-sequence Models and Attention Graham Neubig Preliminaries: Language Models

Structured Attention Networks Yoon Kim Carl Denton Luong Hoang Alexander M. Rush

Attention-based Networks M. Malinowski Why attention? Long term memories - attending to

Show, Attend, and Tell Neural Image Caption Generation with Visual Attention Kelvin Xu, Jimmy

The Attention Mechanism & Encoder-Decoder Variants CMSC 470 Marine Carpuat Introduction to

Attention-based Learning for Missing Data Imputation in HoloClean Richard Wu 1 , A oqian Zhang 1 ,

PAY ATTENTION! Kate Naito, CPDT-KA Doggie Academy PURPOSE Teach your dog to pay attention:

Attention Models Attention Models: Motivation bird Image: H x W x - PowerPoint PPT Presentation

Day 4 Lecture 6 Attention Models Attention Models: Motivation bird Image: H x W x 3 The whole input volume is used to predict the output... 2 Attention Models: Motivation bird Image: H x W x 3 The whole input volume is used to predict

Attention in NLP CS 6956: Deep Learning for NLP Overview What is attention Attention in

Attention Eye tracking seminar 2/19/15 Presented by Tatiana Emmanouil Outline What is

Attention, Transformer and BERT Prof. Kuan-Ting Lai 2020/6/16 Attention is All You Need! A.

Attention Models Focus on parts of input Olof Mogren Improves NN performance on different

Attention! 1. Definitions and behavioral effects 2. Effects on neural firing rates: Spatial

The Attention Economy What is the attention economy? A business model where you (as the

Advanced Neural Machine Translation Gongbo Tang 23 September 2019 Outline NMT with Attention

Attention is All You Need (Vaswani et. al. 2017) Slides and figures when not cited are from:

Advanced Neural Machine Translation Gongbo Tang 21 September 2020 Outline NMT with Attention

Sketch Model Review MotoThresher Empowering Tanzanian Farmers Motivation Motivation

Consciousness First? Attention First? David Chalmers Some Issues Q1: Is there consciousness

Attention and its (mis)interpretation Danish Pruthi 1 Acknowledgements Mansi Gupta Bhuwan

Visual Attention FEF V4 spatial attention: simultaneous neural recordings in V4

A Convolutional Attention Network for Extreme Summarization of Source Code ATTENTION

CS480/680 Lecture 19: July 10, 2019 Attention and Transformer Networks [Vaswani et al.,

with Polynomial Filters Josiah Manson and Scott Schaefer Texas A&amp;M University Motivation

A Neural Attention Model for Sentence Summarization Alexander M. Rush, Sumit Chopra, Jason

Sequence-to-sequence Models and Attention Graham Neubig Preliminaries: Language Models

Structured Attention Networks Yoon Kim Carl Denton Luong Hoang Alexander M. Rush

Attention-based Networks M. Malinowski Why attention? Long term memories - attending to

Show, Attend, and Tell Neural Image Caption Generation with Visual Attention Kelvin Xu, Jimmy

The Attention Mechanism &amp; Encoder-Decoder Variants CMSC 470 Marine Carpuat Introduction to

Attention-based Learning for Missing Data Imputation in HoloClean Richard Wu 1 , A oqian Zhang 1 ,

PAY ATTENTION! Kate Naito, CPDT-KA Doggie Academy PURPOSE Teach your dog to pay attention:

with Polynomial Filters Josiah Manson and Scott Schaefer Texas A&M University Motivation

The Attention Mechanism & Encoder-Decoder Variants CMSC 470 Marine Carpuat Introduction to