Video Analytics Xavier Gir-i-Nieto Motivation 2 Motivation 3 - PowerPoint PPT Presentation

Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto

Motivation 2

Motivation 3

Motivation 4

Outline 1. Scene Classification 2. Object Detection & Tracking 5

Scene Classification (Slides by Victor Campos) Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. 6 (2014, June). Large-scale video classification with convolutional neural networks. CVPR 2014

Scene Classification Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision , pp. 4489-4497. 2015 7

Scene Classification Previous lectures Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision , pp. 4489-4497. 2015 8

Scene Classification Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision , pp. 4489-4497. 2015 9

Scene Classification: DeepVideo: Architectures (Slides by Victor Campos) Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. 10 (2014, June). Large-scale video classification with convolutional neural networks. CVPR 2014

Scene Classification: DeepVideo: Features Unsupervised learning [Le at al’11] Supervised learning [Karpathy et al’14] (Slides by Victor Campos) Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. 11 (2014, June). Large-scale video classification with convolutional neural networks. CVPR 2014

Scene Classification: DeepVideo: Multires (Slides by Victor Campos) Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. 12 (2014, June). Large-scale video classification with convolutional neural networks. CVPR 2014

Scene Classification: DeepVideo: Results (Slides by Victor Campos) Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. 13 (2014, June). Large-scale video classification with convolutional neural networks. CVPR 2014

Scene Classification Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." CVPR 2015 14

Scene Classification: C3D Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." 15 CVPR 2015

Scene Classification: C3D: Spatial Dimensions K. Simonyan, A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition” ICLR 2015. 16

Scene Classification: C3D: Temporal dimension 3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets 2D ConvNets Temporal depth Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal 17 features with 3D convolutional networks." CVPR 2015

Scene Classification: C3D: Temporal dimension A homogeneous architecture with small 3 × 3 × 3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal 18 features with 3D convolutional networks." CVPR 2015

Scene Classification: C3D: Temporal dimension No gain when varying the temporal depth across layers. Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal 19 features with 3D convolutional networks." CVPR 2015

Scene Classification: C3D: Network Architecture Feature vector Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal 20 features with 3D convolutional networks." CVPR 2015

Scene Classification: C3D: Feature Vector 16 frames-long clips Video sequence 8 frames-long overlap Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal 21 features with 3D convolutional networks." CVPR 2015

Scene Classification: C3D: Feature Vector 16-frame clip 4096-dim video descriptor 4096-dim video descriptor 16-frame clip Average L2 norm 16-frame clip ... 16-frame clip Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal 22 features with 3D convolutional networks." CVPR 2015

Scene Classification: C3D: Visualization Based on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details. Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal 23 features with 3D convolutional networks." CVPR 2015

Scene Classification: C3D: Visualization C3D + simple linear classifier outperformed state-of-the-art methods on 4 different benchmarks, and were comparable with state of the art methods on other 2 benchmarks Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal 24 features with 3D convolutional networks." CVPR 2015

Scene Classification: C3D: Software Implementation by Michael Gygli (GitHub) Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal 25 features with 3D convolutional networks." CVPR 2015

Classification: Image & Optical Flow CNN + LSTM Yue-Hei Ng, Joe, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and 26 George Toderici. "Beyond short snippets: Deep networks for video classification." CVPR 2015

(Scene Classification: Image &) Optical Flow Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. 27 and Brox, T., FlowNet: Learning Optical Flow With Convolutional Networks. CVPR 2015

(Scene Classification: Image &) Optical Flow Since existing ground truth datasets are not sufficiently large to train a Convnet, a synthetic dataset is generated… and augmented (translation, rotation, scaling transformations; additive Gaussian noise; changes in brightness, contrast, gamma and color). Data augmentation Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. 28 and Brox, T., FlowNet: Learning Optical Flow With Convolutional Networks. CVPR 2015

Scene Classification & Detection + CNN RNN “Biking” 29 Slide credit: Albero Montes

Classification & Detection: Proposals + C3D (Slidecast and Slides by Alberto Montes) Shou, Zheng, Dongang Wang, and Shih-Fu Chang. "Temporal 30 Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR 2016 [code]

Classification & Detection: Proposals + C3D (1) Binary classification: Action or No Action (Slidecast and Slides by Alberto Montes) Shou, Zheng, Dongang Wang, and Shih-Fu Chang. "Temporal 31 Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR 2016 [code]

Classification & Detection: Proposals + C3D (2) One-vs-all Action classification (Slidecast and Slides by Alberto Montes) Shou, Zheng, Dongang Wang, and Shih-Fu Chang. "Temporal 32 Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR 2016 [code]

Classification & Detection: Proposals + C3D (3) Refinement with temporal-aware loss function (Slidecast and Slides by Alberto Montes) Shou, Zheng, Dongang Wang, and Shih-Fu Chang. "Temporal 33 Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR 2016 [code]

Classification & Detection: Proposals + C3D Post-processing (Slidecast and Slides by Alberto Montes) Shou, Zheng, Dongang Wang, and Shih-Fu Chang. "Temporal 34 Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR 2016 [code]

Classification & Detection: Proposals + C3D (Slidecast and Slides by Alberto Montes) Shou, Zheng, Dongang Wang, and Shih-Fu Chang. "Temporal 35 Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR 2016 [code]

Classification & Detection: Image + RNN + Reinforce Yeung, Serena, Olga Russakovsky, Greg Mori, and Li Fei-Fei. "End-to-end Learning of Action Detection 36 from Frame Glimpses in Videos." CVPR 2016

Scene Classification & Detection: C3D + LSTM Montes A. “Temporal Activity Detection in Untrimmed Videos with Recurrent Neural 37 Networks”. BSc thesis submitted to ETSETB (2016) [code available in Keras]

Outline 1. Scene Classification 2. Object Detection & Tracking 38

Objects: ImageNet Video [ILSVRC 2015 Slides and videos] 39

Objects: ImageNet Video [ILSVRC 2015 Slides and videos] 40

Objects: ImageNet Video: T-CNN Object Detection Object Tracking (Slides by Andrea Ferri): Kai Kang, Hongsheng Li, Junjie Yan, Xingyu Zeng, Bin Yang, Tong Xiao, Cong Zhang, Zhe Wang, Ruohui Wang, Xiaogang Wang, and Wanli Ouyang, “Object Detection From Video Tubelets With Convolutional Neural 41 Networks”, CVPR 2016 [code]

Video Analytics Xavier Gir-i-Nieto Motivation 2 Motivation 3 - PowerPoint PPT Presentation

Day 4 Lecture 4 Video Analytics Xavier Gir-i-Nieto Motivation 2 Motivation 3 Motivation 4 Outline 1. Scene Classification 2. Object Detection & Tracking 5 Scene Classification (Slides by Victor Campos) Karpathy, A., Toderici, G.,

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

. Live Your Vision Edge Analytics Appliance Sonys First AI-Based Video Analytics Solution

Undergraduate Business Analytics Minor Spreadsheet Analytics BANA-2081 Business Analytics

Video Games Written and Researched by: Patrick Kania First Video Game The first Video Game made

Video Analytics Framework with Multilevel Security Dr. Patrick McDaniel Zachary Lassman Fall

Architecture 3.0 Landscape Analytics Jrgen Dllner Hasso-Plattner-Institut Jrgen

Google Analytics Overview Whats Google Analytics? The Google Analytics

Google Analytics A beginners guide What is Google Analytics? Google Analytics is not magic.

Introduction to Talent Analytics and Interim View 01 Overview Erich OSaben Talent Analytics

Document Name Solar Analytics - Rooftop PV energy analytics PREPARED BY: Your Name, Your Title

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 3/20/2019 NVIDIA Video Technologies Overview Turing

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 3/26/2018 NVIDIA Video Technologies Overview Video

Video Sur Video Sur rveillance, rveillance, , Video Analyti Video Analyti ics, and You.

S9545 - USING THE DEEPSTREAM SDK FOR AI-BASED VIDEO ANALYTICS Anudeep Nallamothu - NVIDIA

Live Video Analytics at Scale with Approximation and Delay-Tolerance Haoyu Zhang, Ganesh

Digital Video Analytics and Intelligent Event Based Surveillance YingLi Tian, PhD Department of

Network-based and Client-based DMM solutions using Mobile IP mechanisms

Mo Movin ving F From G m Goo ood In Intentio tions ns to o Co Concrete A Actio tion:

gOlogy: impact of -O* on -g Alexandre Oliva aoliva@redhat.com http://people.redhat.com/~aoliva/

Learning Deep Features for Scene Recognition using Places Database Bolei Zhou, Agata Lapedriza,

Image Processing II Computer Vision Fall 2018 Columbia University Convolution Review Cross

Bandit Optimisation with Approximations Kirthevasan Kandasamy Carnegie Mellon University

Multi-fidelity Bayesian Optimisation g ( z, x ) f ( x ) Z x z X Kirthevasan Kandasamy

Beyond Memorability: Visualization Recognition and Recall Borkin, M., Bylinskii, Z., Kim, N.W.,