University of Amsterdams Deep Net for Video Event Detection Pascal - PowerPoint PPT Presentation

University of Amsterdam’s Deep Net for Video Event Detection Pascal Mettes, Spencer Cappallo, Dennis Koelma, Cees G. M. Snoek University of Amsterdam

Summary Top performance for example-based event detection tasks.

This talk Train videos Training Organizing Sampling Deep ImageNet frames Network Hierarchy Extracting features Learning the frame representation. Pooling frames to video representation. Pooling Training to video SVM representation 1

This talk Train videos Training Organizing Sampling Deep ImageNet frames Network Hierarchy Extracting features Learning the frame representation. Pooling Training to video SVM representation 1

Starting point Google’s Inception Network [Szegedy et al. CVPR 2015] . - Very deep network with inception modules. - Trained with standard ImageNet setup. - 1.2 million images from 1,000 classes. 2

Observation Not all 1,000 classes are equally relevant for event detection. Only 8% of complete ImageNet hierarchy is used. - Full ImageNet hierarchy contains 14 million images from 21,841 classes. We leverage the complete ImageNet hierarchy for training. 3

Problems with the complete hierarchy Imbalance in image distribution. - ‘ Yorkshire terrier ’ has 3047 examples. - 296 classes have 1 example. Yorkshire terrier Over-specific classes for event detection. - ‘siderocyte’ and ‘ gametophyte’ not likely to be relevant for event detection. Siderocyte Gametophyte 4

Four proposals for reorganizing ImageNet 5

Four proposals for reorganizing ImageNet Mamba Black mamba Roll Green mamba Proposal 1: Roll up all classes with only 1 child. 5

Four proposals for reorganizing ImageNet Balloon Hot air Zeppelin Trial Bind Proposal 2: Bind all subtrees with less than 3000 examples. 5

Four proposals for reorganizing ImageNet Dining table Triclinium Promote Proposal 3: Promote all classes with less than 200 examples. 5

Four proposals for reorganizing ImageNet Sample Sauce Proposal 4: Sample for classes with more than 2000 examples. 5

Advantages of our proposal 1. All images in the ImageNet hierarchy are used. 2. Over-specific and small classes are merged with their parents. 3. Compact semantic frame representations (12,988 classes). 7

This talk Train videos Training Organizing Sampling Deep ImageNet frames Network Hierarchy Extracting features Pooling frames to video representation. Pooling Training to video SVM representation 1

Pooling: Main idea An event video is an interplay of sub-events. Birthday Party We aim to pool over individual sub-events, not average over all. 9

Algorithm overview [Mettes et al. ICMR 2015] Find the most discriminative fragments from training videos. Encode a video using a score for each discriminative fragment. Step 1: Propose Step 2: Select Step 3: Encode Training video 10

Algorithm overview [Mettes et al. ICMR 2015] Find the most discriminative fragments from training videos. Encode a video using a score for each discriminative fragment. Step 1: Propose Step 2: Select Step 3: Encode Video Training video Encoding 10 10

Experiments Train videos Training Organizing Sampling Deep ImageNet frames Network Hierarchy Extracting features Pooling Training to video SVM representation 1

Experiment 1: AlexNet vs. GoogleNet GoogleNet outperforms AlexNet. 12

Experiment 2: 1,000 vs. all ImageNet classes GoogleNet outperforms AlexNet. Using all ImageNet classes helps. 12

Experiment 3: Our ImageNet reorganization GoogleNet outperforms AlexNet. Using all ImageNet classes helps. We do better than directly using all classes. Our feature vector is twice as small. 12

Experiment 4: 100 Example results GoogleNet outperforms AlexNet. Using all ImageNet classes helps. We do better than directly using all classes. Our feature vector is twice as small. Idem for 100 Examples. 12

Experiment 5: Average pooling vs. Bag-of-Fragments MED 2014 100 Examples: Method AlexNet [ICMR results] GoogleNet [new results] Averaging 0.232 0.351 Bag-of-Fragments 0.276 0.317 Combination 0.373 0.381 Bag-of-Fragments is both competitive and complementary to average pooling. 13

TRECVID 2015: 10 Examples Fusion: - Deep Net with averaging. - Motion (MBH with Fisher Vectors) . - Audio (MFCC with Fisher Vectors) . Results: - Our fusion yields top result. - ‘Deep Net only’ already near top. 14

TRECVID 2015: 100 Examples Fusion: - Deep Net with averaging. - Deep Net with Bag-of-Fragments. - Motion (MBH with Fisher Vectors) . - Audio (MFCC with Fisher Vectors) . Results: - Our fusion yields top result. - ‘Deep Net only’ second place. 14

Conclusions Training on organized ImageNet hierarchy helps event detection. Bag-of-Fragments yields complementary video representations. 15

Contact information Pascal Mettes - mail: P.S.M.Mettes@uva.nl - address: Science Park 904, Amsterdam

University of Amsterdams Deep Net for Video Event Detection Pascal - PowerPoint PPT Presentation

University of Amsterdams Deep Net for Video Event Detection Pascal Mettes, Spencer Cappallo, Dennis Koelma, Cees G. M. Snoek University of Amsterdam Summary Top performance for example-based event detection tasks. This talk Train videos

5-6 December, 2019 | RAI, Amsterdam 5-6 December, 2019 | RAI, Amsterdam 5-6 December, 2019 | RAI,

Events Event-driven programming Event loop Event dispatch Event handling Event Driven

Events Event-driven programming Event loop Event dispatch Event handling Event Driven

Video Games Written and Researched by: Patrick Kania First Video Game The first Video Game made

DESTINATION PRESENTATION AMSTERDAM AMSTERDAM Amsterdam, also called Venice of the North,

More Event Combinators CML provides two more event combinators: guard and withNack : val guard :

Estdio de Vdeo HD HD Video Studio Rui Ribeiro Rui Ribeiro FCCN 31 de Maro 2011 I FCCN Video

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 3/20/2019 NVIDIA Video Technologies Overview Turing

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 3/26/2018 NVIDIA Video Technologies Overview Video

Video Sur Video Sur rveillance, rveillance, , Video Analyti Video Analyti ics, and You.

Ch 11. Event Cognition Seminar on Event Cognition Summary of Event Cognition Event

Banco De Vdeo Broadcast Video Archive Rui Ribeiro Rui Ribeiro FCCN 31 de Maro 2011 I FCCN

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

CEE 370 Environmental Engineering Principles Lecture #38 Air Pollution III: Air Pollution

THE PHARMACISTS ROLE IN EMERGENCY MEDICAL RELIEF Samuel E. Molind, DMD, FABOMS, FICD,

Covid-19 Update Practical tips for a DMEPOS day to day operations Sandra Canally RN

Car Mike Holenderski, m.holenderski@tue.nl 3 Car Mike

BioS oSpecimen Exchange for Neurological Disorders (BioS oSEN END) MBPS Training Webinar

Basic Concrete Tests Plastic Concrete Basic Tests Cylinder Compression Splitting Tension Beam

Getting Clean Air in Cities Clean Air for Brent 6 July 2017 By Simon Birkett Founder and

Review of the O 3 NAAQS: First Draft Health Risk and Exposure Assessment (REA) Clean Air