Beyond Short Snippets: Deep Networks for Video Classification Joe - PowerPoint PPT Presentation

Beyond Short Snippets: Deep Networks for Video Classification Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, George Toderici Özge Yalçınkaya

Introduction ✤ Many attempts to apply CNNs to action recognition ✤ Treating video frames as images, using CNN for description ✤ Average predictions at the video level ✤ However, complete action information is missing

Introduction ✤ For accurate video classification, learning a global description of the video’s temporal information is important ✤ Using increasing number of frames improves classification ✤ Moreover, optical flow images may provide additional information

Introduction ✤ Two approaches are introduced: ➡ Feature Pooling ➡ LSTM ✤ State-of-the-art performances on Sports-1M and UCF101 ✤ AlexNet and GoogLeNet are used

Approach: Feature Pooling Architectures ✤ Conv Pooling: ➡ Performs max-pooling over final CNN layer across the frames (blue) ➡ Feeds them to FC layer (yellow)

Approach: Feature Pooling Architectures ✤ Late Pooling: ➡ Performs max-pooling(blue) after two FC layers(yellow) ➡ Compared to Conv Pooling, it directly combines high-level information

Approach: Feature Pooling Architectures ✤ Slow Pooling: ➡ First, max-pooling(blue) is applied over 10 frame after CNN(like size-10 filter) ➡ Each one is followed by a FC layer(yellow) ➡ A single max-pooling combines outputs ➡ Groups local features before combining high level information

Approach: Feature Pooling Architectures ✤ Local Pooling: ➡ Combines frame level features locally as Slow Pooling(blue) ➡ Softmax(orange) layer is connected to all FC(yellow) layers for final prediction

Approach: Feature Pooling Architectures ✤ Time-Domain Convolution: ➡ Extra time-domain conv layer(green) ➡ Max-pooling across frames on temporal domain(blue) ➡ Captures local relationships between frames

Approach: Feature Pooling Architectures ✤ GoogLeNet Conv Pooling: ➡ Max-pooling is applied in network ➡ Then, this layer is connected to softmax layer ➡ Enhancement is done by adding FC layers

Approach: LSTM Architecture

Approach: LSTM Architecture LSTM takes input from CNN layer at each video frame. A softmax layer predicts the class for each time step

Implementation Details ✤ Experiments done with both AlexNet and GoogLeNet ✤ Parameters are initialized from pre-trained Imagenet model, fine-tuned on Sports-1M ✤ Single-frame networks are expanded to 30 and 120-frames ✤ Optical flow images are used

Results: Sports-1M ✤ 1 Million YouTube sports videos annotated with 487 classes ✤ 1000-3000 videos in per class ✤ Optical flow quality varies wildly between videos ✤ First 5 minutes of each video is sampled to obtain 300 frames

Results: Sports-1M Feature-pooling architecture comparisons CNN network comparisons

Results: Sports-1M Effect of the number of frames in model used in GoogLeNet Optical flow effect

Results: Sports-1M Comparison with the work of Karpathy et al.

Results: UCF-101 ✤ 13,320 videos with 101 activity classes ✤ More constrained camera movements, hand-crafted dataset UCF-101 accuracy results for different frame numbers

Results: UCF-101 State-of -the-art UCF-101 results

Conclusion and Future Work ✤ They presented two video-classification methods that aggregate frame-level CNN outputs to video-level ✤ Feature pooling and LSTM for video classification is introduced ✤ Using optical flow is beneficial ✤ State-of-the-art results are obtained on two benchmark dataset ✤ Learning should take place over the entire video rather than short clips

Beyond Short Snippets: Deep Networks for Video Classification Joe - PowerPoint PPT Presentation

Beyond Short Snippets: Deep Networks for Video Classification Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, George Toderici zge Yalnkaya Introduction Many attempts to apply CNNs to action

Spice up your website with Machine Learning! Evelina Gabasova @evelgab F# Snippets F# Snippets

Beamer Snippets Brandon Amos July 2014 Brandon Amos Beamer Snippets Section 1 Subsection A

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

2016 ANNUAL GENERAL MEETING Short Sea Shipping is OUR BUSINESS 2 Short Sea Shipping is OUR

GSM Short Message Service GSM Short Message Service GSM Short Message Service GSM Short Message

Deep Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Deep Networks Need for

Deep Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Deep Networks Need for

E9 205 Machine Learning for Signal Processing Understanding Deep Networks 08-11-2019 Instructor

Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob

MEDIA DISRUPTION SEEING BEYOND SEEING BEYOND SEEING BEYOND SEEING BEYOND LED BY THE BLIND

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

Deep-Learning: Unsupervised Generative models Deep Belief Networks Deep Stacked AutoEncoders

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Neural Architecture Search with Bayesian Optimisation and Optimal Transport #0 ip, 64, (28891)

Convolutional Neural Networks Hwann-Tzong Chen Naitonal Tsing Hua University 3 Januray 2017 1 /

Revisiting Network Support for RDMA Radhika Mittal 1 , Alex Shpiner 3 , Aurojit Panda 1,4 , Eitan

OMG, NPIV! Virtualizing Fibre Channel with Linux and KVM Paolo Bonzini, Red Hat Hannes Reinecke,

Boolean functions in quantum computation Ashley Montanaro School of Mathematics, University of

Why Are Convlotuional Nets More Sample-Efficient than Fully-Connected Nets? Zhiyuan Li Joint

Impact of Magnet Performance on the Physics Program of MICE Chris Rogers, AST eC, Rutherford

CENG5030 Part 2-1: Introduction to Convolutional Nueral Network Bei Yu (Latest update: March 4,

Beyond Short Snippets: Deep Networks for Video Classification Joe - PowerPoint PPT Presentation

Beyond Short Snippets: Deep Networks for Video Classification Joe Yue-Hei Ng, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, George Toderici zge Yalnkaya Introduction Many attempts to apply CNNs to action

Spice up your website with Machine Learning! Evelina Gabasova @evelgab F# Snippets F# Snippets

Beamer Snippets Brandon Amos July 2014 Brandon Amos Beamer Snippets Section 1 Subsection A

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

2016 ANNUAL GENERAL MEETING Short Sea Shipping is OUR BUSINESS 2 Short Sea Shipping is OUR

GSM Short Message Service GSM Short Message Service GSM Short Message Service GSM Short Message

Deep Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Deep Networks Need for

Deep Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Deep Networks Need for

E9 205 Machine Learning for Signal Processing Understanding Deep Networks 08-11-2019 Instructor

Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob Looking Beyond the Knob

MEDIA DISRUPTION SEEING BEYOND SEEING BEYOND SEEING BEYOND SEEING BEYOND LED BY THE BLIND

Deep learning for natural language processing A short primer on deep learning Benoit Favre &lt;

Deep-Learning: Unsupervised Generative models Deep Belief Networks Deep Stacked AutoEncoders

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Neural Architecture Search with Bayesian Optimisation and Optimal Transport #0 ip, 64, (28891)

Convolutional Neural Networks Hwann-Tzong Chen Naitonal Tsing Hua University 3 Januray 2017 1 /

Revisiting Network Support for RDMA Radhika Mittal 1 , Alex Shpiner 3 , Aurojit Panda 1,4 , Eitan

OMG, NPIV! Virtualizing Fibre Channel with Linux and KVM Paolo Bonzini, Red Hat Hannes Reinecke,

Boolean functions in quantum computation Ashley Montanaro School of Mathematics, University of

Why Are Convlotuional Nets More Sample-Efficient than Fully-Connected Nets? Zhiyuan Li Joint

Impact of Magnet Performance on the Physics Program of MICE Chris Rogers, AST eC, Rutherford

CENG5030 Part 2-1: Introduction to Convolutional Nueral Network Bei Yu (Latest update: March 4,

Deep learning for natural language processing A short primer on deep learning Benoit Favre <