S 3 D: S ingle S hot multi- S pan D etector via Fully 3D - PowerPoint PPT Presentation

S 3 D: S ingle S hot multi- S pan D etector via Fully 3D Convolutional Network Da Zhang 1 , Xiyang Dai 2 , Xin Wang 1 , and Yuan-Fang Wang 1 dazhang@cs.ucsb.edu 1 UC Santa Barbara & 2 University of Maryland

Task: Temporal Activity Detection Input: untrimmed videos 1. Localization : when do activities start/end? 2. Classification : what are the activities? Detection Results Pole Vault Pole Vault [242.0 - 247.7s] [228.1 - 236.6s]

Related Works Conventional two-stage approach: Proposal + Classification Temporal Sliding window, DAP, etc. Proposal Activity Two-stream, Classifier C3D, etc. Pole Vault [228.1 - 236.6s] Pole Vault [242.0 - 247.7s] S-CNN (CVPR 2016), CDC (CVPR 2017), TSN (ICCV 2017), R-C3D (ICCV 2017), SSN (ICCV 2017)

Related Works Current limitations: Temporal Ineffective Inefficient Proposal Activity Classifier Pole Vault [228.1 - 236.6s] Pole Vault [242.0 - 247.7s] S-CNN (CVPR 2016), CDC (CVPR 2017), TSN (ICCV 2017), R-C3D (ICCV 2017), SSN (ICCV 2017)

Motivation Can we do better? Single-shot End-to-end Pole Vault [228.1 - 236.6s] Pole Vault [242.0 - 247.7s] Introducing a novel S ingle S hot multi- S pan D etector (S 3 D)

Motivation Quick Summary Single-shot End-to-end Pole Vault [228.1 - 236.6s] Pole Vault [242.0 - 247.7s] q Directly encode entire input video with Conv3D kernels q Multi-scale default spans associated to temporal feature maps q End-to-end trainable and single forward-pass inference

S 3 D: Input Video L 112 Our model takes the whole video stream as input (L frames)

S 3 D: Base Feature Layers C3D up to Conv5b L/8 L 112 7 We apply the standard C3D network to extract spatial-temporal features. D. Tran, L. Bourdev, R. Fergus, L. Torresani and M. Paluri. Learning spatiotemporal features with 3D convolutional networks. In CVPR, 2015.

S 3 D: Auxiliary Feature Layers Auxiliary Feature Layers C3D up to Conv5b L/256 L/128 L/64 L/32 L/16 L/8 We produce a sequence of feature maps that progressively decrease in temporal dimension.

S 3 D: Multi-scale Default Spans Temporal Feature Layers 0 T/4 T/2 3T/4 T 0 T/8 T/4 3T/8 T/2 5T/8 3T/4 7T/8 T Multi-scale default spans are associated to each temporal feature map

S 3 D: Multi-scale Default Spans Temporal Feature Layers Temporal Feature Layers 0 T/4 T/2 3T/4 T 0 T/8 T/4 3T/8 T/2 5T/8 3T/4 7T/8 T Loc: ! ( "#, %# ) ( " & , " ' , … , " ( , " )"# ) Conf: Localization and classification results are predicted at each default span.

S 3 D: Convolutional Predictors Temporal Feature Layers 3D Max pool, Conv3D: 3x1x1x(4x(K+1+2)) We apply on top of each feature map a Conv3D filter to produce the results.

S 3 D: Convolutional Predictors Temporal Feature Layers 3D Max pool, Conv3D: 3x1x1x(4x(K+1+2)) Classes + BG Localization offsets Kernel size # of scales ( " & , " ' , … , " ( , " )"# ) ! ( "#, %# )

Single Shot multi-Span Detector C3D up to Conv5b layer 252 Temporal Spans per Video 1 2 4 Video Conv10 Temporal NMS 8 Conv9 activity B 16 Conv8 Conv7 256 3D Max pool, Conv3D: 3x1x1x(4x(K+1+2)) 32 112 7 Conv6 activity A 3D Max pool, Conv3D: 3x1x1x(4x(K+1+2)) 7 112 112 Conv5 Time Input Video Base Feature Layers Auxiliary Temporal Feature Layers Temporal Activity Detections Training of S 3 D: Smooth L1 Softmax Cross Sigmoid Cross Entropy Entropy

Quantitative Results Evaluation: mean Average Precision over 20 activities on THUMOS’14 1271 FPS on a single GTX 1080 Ti GPU

Qualitative Results THUMOS’14 segment: Pole Vault

Qualitative Results THUMOS’14 segment: Javelin Throw

Qualitative Results THUMOS’14 segment: Shotput

Qualitative Results THUMOS’14 segment: Clean and Jerk

Conclusions Introduced S 3 D : q A novel single-shot end-to-end model for Temporal Activity Detection. q Simple : completely based on Conv3D kernels. q Strong : state-of-the-art performance on THUMOS’14 benchmark. q Speed : operates at 1271 FPS on a single GeForce GTX 1080 Ti GPU. TensorFlow code coming soon at https://github.com/dazhang-cv/S3D

Thank you! C3D up to Conv5b layer 252 Temporal Spans per Video 1 2 4 Video Conv10 Temporal NMS 8 Conv9 activity B 16 Conv8 Conv7 256 3D Max pool, Conv3D: 3x1x1x(4x(K+1+2)) 32 112 7 Conv6 activity A 3D Max pool, Conv3D: 3x1x1x(4x(K+1+2)) 7 112 112 Conv5 Time Input Video Base Feature Layers Auxiliary Temporal Feature Layers Temporal Activity Detections

S 3 D: S ingle S hot multi- S pan D etector via Fully 3D - PowerPoint PPT Presentation

S 3 D: S ingle S hot multi- S pan D etector via Fully 3D Convolutional Network Da Zhang 1 , Xiyang Dai 2 , Xin Wang 1 , and Yuan-Fang Wang 1 dazhang@cs.ucsb.edu 1 UC Santa Barbara & 2 University of Maryland Task: Temporal Activity Detection

COM PAN Y PROF ILE COM PAN Y PROF ILE COM PAN Y PROF ILE COM PAN Y PROF ILE COM PAN Y PROF

HOT CEREALS March, 2016 THE BIG NEWS ABOUT BREAKFAST Hot Cereal Has Never Been Hotter Hot

Presenters Richard Heinzl, M.D. Medical Director, Ingle International & Intrepid 24/7

ND DAQ Asher Kaboth 16 Dec 2019 The Near Detector etector Complex ArgonCube Pixelated

Uncovering Plagiarism, Authorship, and Social Software Misuse PAN 2011 [pan.webis.de] The PAN

INVESTOR PRESENTATION December 4, 2019 TSX: HOT.UN (CAD$) | TSX: HOT.U (US$) | TSX: HOT.DB.U

Investor Presentation TSX: HOT.UN (CAD$) TSX: HOT.U (US$) TSX: HOT.DB.U (Debentures)

Investor Presentation TSX: HOT.UN (CAD$) TSX: HOT.U (US$) TSX: HOT.DB.U (Debentures) May

Hot Topics in Visualization 12-1 Ronald Peikert SciVis 2007 - Hot Topics Hot Topic 1:

ESA-ESTEC GSTP4 - Analog Silicon Compiler for Mixed Signal ASICs PDFE: A P article D etector F

P ART 1: O FF -D ETECTOR E LECTRONICS ECAL T ASKS & ORGANIZATION A REA &

Evaluating the fully automatic multi Evaluating the fully automatic multi- g g y y -language

REVIEW PROJECT Presentation by Katrina Jensen, PAN Board Co-Chair PAN Fall Conference October

Contentious UNESCO World Heritage Nominee Gianluigi Salvador Pesticide Action Network (PAN)

PRESENTATION ON FIRESCAPE PRODUCTS BY MARK NOBLE TH DECEMBER 2018 4 TH PAN-SAFE (PAN3000) Pan

Open Forum on Pan-European Instant Payments Milan 7 th June 2016 Open Forum on Pan-European

End-to-end Neural Coreference Resolution Kenton Lee Luheng He Mike Lewis Luke

Lecture 1.2: Linear independence and spanning sets Matthew Macauley Department of Mathematical

Combating Snowshoe Spam with Fire Olivier van der Toorn <o.i.vandertoorn@utwente.nl>

CS 403X Mobile and Ubiquitous Computing Lecture 15: Making Apps Intelligent/Machine Learning

Span-based Localizing Network for Natural Language Video Localization Hao Zhang 1,2 , Aixin Sun 1

Computational Complexity (Continued) 15-150 1 Story so far We need to model the efficiency

Activity Suppose you have available a procedure is independent(L) , which takes a list L of Vec s

Quiz Define linear combination and give two examples using the 3-vectors v 1 = [1 , 1 , 0] , v

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

S 3 D: S ingle S hot multi- S pan D etector via Fully 3D - PowerPoint PPT Presentation

S 3 D: S ingle S hot multi- S pan D etector via Fully 3D Convolutional Network Da Zhang 1 , Xiyang Dai 2 , Xin Wang 1 , and Yuan-Fang Wang 1 dazhang@cs.ucsb.edu 1 UC Santa Barbara & 2 University of Maryland Task: Temporal Activity Detection

COM PAN Y PROF ILE COM PAN Y PROF ILE COM PAN Y PROF ILE COM PAN Y PROF ILE COM PAN Y PROF

HOT CEREALS March, 2016 THE BIG NEWS ABOUT BREAKFAST Hot Cereal Has Never Been Hotter Hot

Presenters Richard Heinzl, M.D. Medical Director, Ingle International &amp; Intrepid 24/7

ND DAQ Asher Kaboth 16 Dec 2019 The Near Detector etector Complex ArgonCube Pixelated

Uncovering Plagiarism, Authorship, and Social Software Misuse PAN 2011 [pan.webis.de] The PAN

INVESTOR PRESENTATION December 4, 2019 TSX: HOT.UN (CAD$) | TSX: HOT.U (US$) | TSX: HOT.DB.U

Investor Presentation TSX: HOT.UN (CAD$) TSX: HOT.U (US$) TSX: HOT.DB.U (Debentures)

Investor Presentation TSX: HOT.UN (CAD$) TSX: HOT.U (US$) TSX: HOT.DB.U (Debentures) May

Hot Topics in Visualization 12-1 Ronald Peikert SciVis 2007 - Hot Topics Hot Topic 1:

ESA-ESTEC GSTP4 - Analog Silicon Compiler for Mixed Signal ASICs PDFE: A P article D etector F

P ART 1: O FF -D ETECTOR E LECTRONICS ECAL T ASKS &amp; ORGANIZATION A REA &amp;

Evaluating the fully automatic multi Evaluating the fully automatic multi- g g y y -language

REVIEW PROJECT Presentation by Katrina Jensen, PAN Board Co-Chair PAN Fall Conference October

Contentious UNESCO World Heritage Nominee Gianluigi Salvador Pesticide Action Network (PAN)

PRESENTATION ON FIRESCAPE PRODUCTS BY MARK NOBLE TH DECEMBER 2018 4 TH PAN-SAFE (PAN3000) Pan

Open Forum on Pan-European Instant Payments Milan 7 th June 2016 Open Forum on Pan-European

End-to-end Neural Coreference Resolution Kenton Lee Luheng He Mike Lewis Luke

Lecture 1.2: Linear independence and spanning sets Matthew Macauley Department of Mathematical

Combating Snowshoe Spam with Fire Olivier van der Toorn &lt;o.i.vandertoorn@utwente.nl&gt;

CS 403X Mobile and Ubiquitous Computing Lecture 15: Making Apps Intelligent/Machine Learning

Span-based Localizing Network for Natural Language Video Localization Hao Zhang 1,2 , Aixin Sun 1

Computational Complexity (Continued) 15-150 1 Story so far We need to model the efficiency

Activity Suppose you have available a procedure is independent(L) , which takes a list L of Vec s

Quiz Define linear combination and give two examples using the 3-vectors v 1 = [1 , 1 , 0] , v

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Presenters Richard Heinzl, M.D. Medical Director, Ingle International & Intrepid 24/7

P ART 1: O FF -D ETECTOR E LECTRONICS ECAL T ASKS & ORGANIZATION A REA &

Combating Snowshoe Spam with Fire Olivier van der Toorn <o.i.vandertoorn@utwente.nl>