Segments, Residuals and Embeddings for Few-Example Video Event - PowerPoint PPT Presentation

Segments, Residuals and Embeddings for Few-Example Video Event Detection Dennis Koelma and Cees Snoek University of Amsterdam The Netherlands

Pipeline 10Ex 2016 CNN Inception avg ImageNet sample pool 2 / sec Shuffle SVM Videos Frames pool5 10Ex M1 Video Story SVM embedding 10Ex M5 avg pool SVM prob 10Ex M2 Fisher vector dense SVM trajectories 10Ex M3 mfcc0 Fisher vector SVM mfcc1 10Ex M4 mfcc2

Pipeline 10Ex 2017 ResNet + ResNeXt difference ImageNet sample coding 2 / sec Shuffle SVM Videos Frames pool5 10Ex M1 Video Story SVM embedding 10Ex M5 avg pool sliding SVM window 10Ex M2 Fisher vector dense SVM trajectories 10Ex M3 mfcc0 Fisher vector SVM mfcc1 10Ex M4 mfcc2

CNN Features from 22k ImageNet classes - Use as many classes as possible Irrelevant classes - Find a balance between level of abstraction of classes and number of images in a class Example imbalance Siderocyte 296 classes with 1 image Gametophyte 4

CNN training on selection out of 22k ImageNet classes • Idea • Increase level of abstraction of classes • Incorporate classes with less than 200 samples • Heuristics • Roll, Bind, Promote, Subsample N > 2000 : Subsample • Result • 12,988 classes • 13.6M images N < 200 : Promote Roll N < 3000 : Bind The ImageNet Shuffle: Reorganized Pre-training for Video Event Detection, Pascal Mettes and Dennis Koelma and Cees Snoek, International Conference on Multimedia Retrieval, 2016

Feature Difference Coding • K-means clustering (k = 5) on last fully connected layer before probability layers (called flatten) • Fisher like encoding but sigma is based on distance of points assigned to a cluster to its center MAP 2014 Test Set 0.350 0.340 0.330 0.320 0.310 0.300 0.290 flatten-avg flatten-dc ResNet ResNeXt Fusion

Video Story: Embed the story of a video Stunt Bike Motorcycle x i y i s i W A Embedding Joint optimization of W and A to preserve Descriptiveness: preserve video descriptions : L(A,S) Predictability: recognize terms from video content : L(S,W) Videostory: A new multimedia embedding for few-example recognition and translation of events, Amirhossein Habibian and Thomas Mensink and Cees Snoek, Proceedings of the ACM International Conference on Multimedia, 2014

VideoStory Embedding as a Feature MAP 2014 Test Set 0.335 0.330 0.325 0.320 0.315 0.310 0.305 0.300 flatten-avg video story ResNet ResNeXt Fusion

Video Story for 0Ex x i s i W Embedding Attempting a bike trick A 1.0 attempt 0.45 bike Cosine 1.0 bike 0.30 man similarity 1.0 trick

Finding Segments to Expand Training Material Example1 Window Cosine similarity Example1_1 Example1_2 Example1_3

Window based Features MAP 2014 Test Set 0.345 0.340 0.335 0.330 0.325 0.320 0.315 0.310 0.305 0.300 0.295 flatten-avg flatten-window ResNet ResNeXt Fusion

Result Individual Modalities on 2014 Test Set DC is best overfit ? VS > flatten window > avg R < Rx < F 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 flatten-avg softmax trajectories mfcc video story flatten-dc flatten-window ResNet ResNeXt Fusion

Fusion Visual Modalities on 2014 Test Set ResNet + ResNeXt 0.360 0.355 0.350 0.345 0.340 0.335 0.330 0.325 0.320 0.315 VS DC Win DC-VS DC-Win VS-DC-Win

Fusion on 2014 Test Set last year single visual MM + new features mod fusion fusion avg 0.360 0.355 0.350 0.345 0.340 0.335 0.330 0.325 0.320 0.315 AVG2-DT-MFCC-VS DC VS-DC-Win VS-DC-Win-DT-MFCC VS-DC-Win-DT-MFCC-AVG2 ResNeXt ResNet+ResNeXt

Computational Efficiency Feature Extraction Classification MAP 250 0.12 35.8 35.6 0.1 200 35.4 35.2 0.08 150 35 34.8 0.06 34.6 100 0.04 34.4 34.2 50 0.02 34 33.8 0 0 p-visualFusionTwoCNN c-mmFusionTwoCNN c-visualFusionOneCNN c-mmFusionOneCNN c-visualSingle

Our MED Submission Test 2014 PS AH p-visualFusionTwoCNN c-mmFusionTwoCNN c-visualFusionOneCNN c-mmFusionOneCNN c-visualSingle

All MED Submissions PS 50 45 40 35 30 25 20 15 10 5 0 AH 80 70 60 50 40 30 20 10 0 MediaMill MediaMill TokyoTech TokyoTech ITICERTH ITICERTH INF

Conclusions • Visual features are still improving • Fusion still works but other modalities need work • 0ex helps to get more out of your examples

Thank You

Segments, Residuals and Embeddings for Few-Example Video Event - PowerPoint PPT Presentation

Segments, Residuals and Embeddings for Few-Example Video Event Detection Dennis Koelma and Cees Snoek University of Amsterdam The Netherlands Pipeline 10Ex 2016 CNN Inception avg ImageNet sample pool 2 / sec Shuffle SVM Videos Frames

Model Adequacy Usual residual plots: Residuals versus predicted (fitted) values; Probability

Diagnostics Internally studentized residuals, PRESS residuals or externally studentized

gam.check summary(resid_fit) Randomised quantile residuals Example Fitting to residuals

Line Segments and Triangles A line drawing = set of line segments + set of faces. We need to

Group Insurance Group Insurance Overview Overview Group Insurance Segments Group Insurance

ENHANCING STRENGTH AND DURABILITY OF CONCRETE USING RESIDUALS AND REJECT FIBERS FROM PULP AND

Residuals and Goodness-of-fit tests for marked Gibbs point processes Fr ed eric Lavancier

The Source, Issue, Remedy And Opportunity September 2019 Presentation by: Kenneth Leung

RTCR and Chlorine Residuals - Overall Look From A Utility Perspective Sharon L. Fillmann

Titel der Prsentation Treatment of residuals from coffee production in Costa Rica Naxos 13

Opportunities for Underutilized Wood Regional Symposium May 1, 2018 Chips Bark

AIRS Minor Constituents Focus Group: Turning small residuals into science Turning small

Dexter Area B2B segments C and D1 Dexter Area B2B segments C and D1 River Terrace Trail

Bantu tone Class 9 Autosegmental representation Parallel tiers of segments, where segments

Review: classification of segments Phonemes and allophones ( m v v tha m v ) Sound

of the Vertebral Column Spine 33 vertebral segments divided into 5 segments Cervical: 7

Multilingual acoustic word embedding models for processing zero-resource languages ICASSP 2020

Securing Real-Time Microcontroller Systems through Customized Memory View Switching + * Chung

TAMA Data Analysis 8th GWDAW, Milwaukee WI, USA, 16th Dec. 2003 Nobuyuki Kanda Department of

Systems Engineering Department Matthew G. Feemster Associate Professor USNA Overview We are

Partial-Hessian Strategies for Fast Learning of Nonlinear Embeddings Max Vladymyrov and Miguel

Introduction CSCE CSCE 496/896 496/896 Lecture 9: Lecture 9: word2vec and word2vec and To

EMC BWE PANDA Services and Mounting 27-Apr-15 HIM - EMC BW Endcap 1 Boundaries 27-Apr-15

Proposed Reclassification of Allen Creek and Maiden Creek in Catawba and Lincoln Counties