AttentionNAS: Spatiotemporal Attention Cell Search for Video - PowerPoint PPT Presentation

AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification Xiaofang Wang, Xuehan Xiong, Maxim Neumann, AJ Piergiovanni, Michael S. Ryoo, Anelia Angelova, Kris M. Kitani, Wei Hua

Convolutional networks are dominant C3D [ICCV 2015] I3D [CVPR 2017] S3D [ECCV 2018] SlowFast [ICCV 2019]

What’s missing from convolution? • Where to focus in images/videos • Long-range dependencies The same convolutional kernel Long-range dependencies are is applied at every position. modeled by large receptive fields. Photo credit: [Convolution arithmetic] [Receptive field arithmetic]

Attention is complementary to convolution • Map-based Attention • Dot-product Attention CBAM [ECCV 2018] Attention is All You Need [NeurIPS 2017] Where to focus : learn a pointwise Long-range dependencies : compute pairwise weighting factor for each position similarity between all the positions

Many design choices need to be determined Challenge : to apply attention to videos • How to compose multiple • What is the right dimension attention operations? to apply attention to videos? Output Output ! Temporal Attention Spatial Temporal # Attention Attention Spatial Attention " Input Input Three dimensions in video data: spatial, temporal or spatiotemporal? Sequential, parallel, or others?

Automatically search for attention cells in a Proposal : driven manner data-dr dat Sink Node Combine Spatial Spatial Temporal Temporal Op3 Op2 Spatial Spatial Temporal Temporal Op1 Input Novel Attention Cell Search Space Efficient Differentiable Search Method

Attention Cell Search Space Attention Cell • Composed of multiple attention operations Combine • Input shape == output shape; can be inserted anywhere in existing backbones Search Space Op3 Op2 • Cell Level Search Space: Connectivity between the operations within the cell • Operation Level Search Space: Choices to instantiate an individual attention Op1 operation

Cell Level Search Space Output of the cell Select input to each operation • Input to the 1 "# operation is fixed to Combine • Input to the $ #% operation is a weighted sum of selected feature maps from Op3 Op2 Combine Op1 • Concatenate channels + CONV Input to the cell

Operation Level Search Space ! # Map-based Dot-product Attention Attention " 1. Spatial 2. Temporal 3. Spatiotemporal Attention Dimension Attention Operation Type

Map-based Attention and Dot-product Attention Map-based Attention Where to focus : learn a pointwise weighting factor for each position Dot-product Attention Long-range dependencies : compute pairwise similarity between all the positions Assume attention dimension = temporal

Search Space Summary Spatial • Map-based attention • Combine Temporal • Dot-product attention • Spatiotemporal • Attention Dimension Attention Operation Type Op3 Op2 None • ReLU • Input to each operation • Softmax • Op1 Sigmoid • Activation Function Connectivity between Operations

Insert Attention Cells into Backbone Networks Combine Convolutional Convolutional Op3 Op2 Layers Layers Attention Cell Op1

Differentiable Formulation of Search Space • Search algorithm : differentiable architecture search • Search cost : equals to the cost of training one network Sink Node Solid connection (no weights) Level connection weights Spatial Spatial Temporal Temporal Sink connection weights Map-based Attention Spatial Spatial Temporal Temporal Dot-product Attention Input

Supergraph and Connection Weights Sink Node Solid connection (no weights) Level connection weights Spatial Spatial Temporal Temporal Sink connection weights Map-based Attention Spatial Spatial Temporal Temporal Dot-product Attention Supergraph: ! levels; each level " nodes Input Node : an attention operation of a pre- defined attention dimension and type

Differentiable Search • Jointly train the network weights and connection weights with gradient descent Sink Node Convolutional Convolutional Spatial Spatial Temporal Temporal Layers Layers Spatial Spatial Temporal Temporal Input Supergraph

Attention Cell Design Derivation Sink Node Solid connection (no weights) Level connection weights Spatial Spatial Temporal Temporal Sink connection weights Map-based Attention Spatial Spatial Temporal Temporal Dot-product Attention Input How to derive the attention cell design from the learned weights?

Attention Cell Design Derivation Sink Node Solid connection (no weights) Level connection weights Spatial Spatial Temporal Temporal Sink connection weights Map-based Attention Spatial Spatial Temporal Temporal Dot-product Attention Input Choose top ! (e.g., 3) nodes based on

Attention Cell Design Derivation Sink Node Solid connection (no weights) Level connection weights Spatial Spatial Temporal Temporal Sink connection weights Map-based Attention Spatial Spatial Temporal Temporal Dot-product Attention Input Choose top ! (e.g., 2) predecessors of each selected code recursively based on until we reach the first level

Attention Cell Design Derivation Combine Solid connection (no weights) Level connection weights Spatial Temporal Sink connection weights Map-based Attention Spatial Temporal Temporal Dot-product Attention Input

Experimental Setup • Backbones • Inception-based • Insert 5 cells I3D [CVPR 2017] S3D [ECCV 2018] • Datasets: Kinetics-600 and Moments in Time (MiT)

Comparison with Non-local Blocks

Generalization across Modalities RGB to optical flow

Generalization across Backbones

Generalization across Datasets

Comparison with State-of-the-art

Contributions • Extend NAS beyond discovering convolutional cells to attention cells • Search space for spatiotemporal attention cells • A differentiable formulation of the search space • State-of-the-art performance; outperforms non-local blocks • Strong generalization across modalities, backbones, or datasets • More analysis and visualizations of attention cells available in the paper

AttentionNAS: Spatiotemporal Attention Cell Search for Video - PowerPoint PPT Presentation

AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification Xiaofang Wang, Xuehan Xiong, Maxim Neumann, AJ Piergiovanni, Michael S. Ryoo, Anelia Angelova, Kris M. Kitani, Wei Hua Convolutional networks are dominant C3D [ICCV

Spatiotemporal Regulation of ERK by Spatiotemporal Regulation of ERK by Dual- -specificity

Attention in NLP CS 6956: Deep Learning for NLP Overview What is attention Attention in

Spatiotemporal Cell Population Tracking & Cell Population Tracking and Lineage Construction

Bacteria Without a Cell Wall L-forms Pros & Cons of Cell Wall Cell membrane Cell wall DNA

Cell Communication and Cell Signaling Why is cell signaling important? Why is cell signaling

Attention Eye tracking seminar 2/19/15 Presented by Tatiana Emmanouil Outline What is

Attention, Transformer and BERT Prof. Kuan-Ting Lai 2020/6/16 Attention is All You Need! A.

Does God play dice with the cell? Does God play dice with the cell? Does God play dice with the

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Attention! 1. Definitions and behavioral effects 2. Effects on neural firing rates: Spatial

The Attention Economy What is the attention economy? A business model where you (as the

Cell Hydration as Cell Hydration as an Essential Cell Parameter for an Essential Cell Parameter

Eukaryotic Cell Structures and Functions General Animal Cell Structure General Plant Cell

VHL and clear cell Renal Cell Carcinoma Gene expression profiles in renal cell VHL syndrome

A spatiotemporal model with visual attention for video classification Mo Shan and Nikolay

An Overview of Models and Methods for Spatiotemporal Data Analysis Jim Zidek- U British

Hamsa Balakrishnan Massachuse1s Ins3tute of Technology Resilient Ops, Inc. (With Bala Chandran,

OpenSDS An Indus try W ide Colla bora tion For SDS Ma na gement Cameron Bahar and Steven Tan

Running Debian on Inexpensive Network Attached Storage Devices Martin Michlmayr tbm@cyrius.com

Dalhousie University Records Management Introduction to Records Management Dalhousie University

1 20.2 20.2 Structure and Reactivity Structure and Reactivity Fig. 20.2 20.2 20.2

A Solution for Densely Annotated Large Scale Object Detection Task Yuan Gao, Hui Shen, Donghong

Semantic Representation and Scale-up of Integrated Air Traffic Management Data Rich Keller, Ph.D.

Assessing DNS Vulnerability to Record Injection Kyle Schomp , Tom Callahan, Michael

AttentionNAS: Spatiotemporal Attention Cell Search for Video - PowerPoint PPT Presentation

AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification Xiaofang Wang, Xuehan Xiong, Maxim Neumann, AJ Piergiovanni, Michael S. Ryoo, Anelia Angelova, Kris M. Kitani, Wei Hua Convolutional networks are dominant C3D [ICCV

Spatiotemporal Regulation of ERK by Spatiotemporal Regulation of ERK by Dual- -specificity

Attention in NLP CS 6956: Deep Learning for NLP Overview What is attention Attention in

Spatiotemporal Cell Population Tracking &amp; Cell Population Tracking and Lineage Construction

Bacteria Without a Cell Wall L-forms Pros &amp; Cons of Cell Wall Cell membrane Cell wall DNA

Cell Communication and Cell Signaling Why is cell signaling important? Why is cell signaling

Attention Eye tracking seminar 2/19/15 Presented by Tatiana Emmanouil Outline What is

Attention, Transformer and BERT Prof. Kuan-Ting Lai 2020/6/16 Attention is All You Need! A.

Does God play dice with the cell? Does God play dice with the cell? Does God play dice with the

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Attention! 1. Definitions and behavioral effects 2. Effects on neural firing rates: Spatial

The Attention Economy What is the attention economy? A business model where you (as the

Cell Hydration as Cell Hydration as an Essential Cell Parameter for an Essential Cell Parameter

Eukaryotic Cell Structures and Functions General Animal Cell Structure General Plant Cell

VHL and clear cell Renal Cell Carcinoma Gene expression profiles in renal cell VHL syndrome

A spatiotemporal model with visual attention for video classification Mo Shan and Nikolay

An Overview of Models and Methods for Spatiotemporal Data Analysis Jim Zidek- U British

Hamsa Balakrishnan Massachuse1s Ins3tute of Technology Resilient Ops, Inc. (With Bala Chandran,

OpenSDS An Indus try W ide Colla bora tion For SDS Ma na gement Cameron Bahar and Steven Tan

Running Debian on Inexpensive Network Attached Storage Devices Martin Michlmayr tbm@cyrius.com

Dalhousie University Records Management Introduction to Records Management Dalhousie University

1 20.2 20.2 Structure and Reactivity Structure and Reactivity Fig. 20.2 20.2 20.2

A Solution for Densely Annotated Large Scale Object Detection Task Yuan Gao, Hui Shen, Donghong

Semantic Representation and Scale-up of Integrated Air Traffic Management Data Rich Keller, Ph.D.

Assessing DNS Vulnerability to Record Injection Kyle Schomp , Tom Callahan, Michael

Spatiotemporal Cell Population Tracking & Cell Population Tracking and Lineage Construction

Bacteria Without a Cell Wall L-forms Pros & Cons of Cell Wall Cell membrane Cell wall DNA