for Zero-Example Video Search Dennis Koelma and Cees Snoek - PowerPoint PPT Presentation

Query Understanding is Key for Zero-Example Video Search Dennis Koelma and Cees Snoek University of Amsterdam The Netherlands

Pipeline Selected query terms Video Frames 2 / sec window average Closest terms Video Story cosine similarity flatten (word2vec) ResNet term vector 0Ex M1 VS vocabulary ResNeXt ImageNet Shuffle Top N closest percentile dot similarity concept (word2vec) filter scores 0Ex M2 concepts softmax

22k ImageNet classes - Use as many classes as possible Irrelevant classes - Find a balance between level of abstraction of classes and number of images in a class Example imbalance Siderocyte 296 classes with 1 image Gametophyte 3

CNN training on selection out of 22k ImageNet classes • Idea • Increase level of abstraction of classes • Incorporate classes with less than 200 samples • Heuristics • Roll, Bind, Promote, Subsample N > 2000 : Subsample • Result • 12,988 classes • 13.6M images N < 200 : Promote Roll N < 3000 : Bind The ImageNet Shuffle: Reorganized Pre-training for Video Event Detection, Pascal Mettes and Dennis Koelma and Cees Snoek, International Conference on Multimedia Retrieval, 2016

Concept Bank • Two networks • ResNet • ResNeXt • Three datasets (subsets of ImageNet) • Roll Bind (3000) Promote (200) Subsample, 13k classes, training: 1000 images/class • Roll Bind (7000) Promote (1250) Subsample, 4k classes, training: 1706 images/class • Top 4000 classes, Breadth-first search >1200 images, training: 1324 images/class N > 2000 : Subsample N < 200 : Promote Roll N < 3000 : Bind

Video Story: Embed the story of a video Stunt Bike Motorcycle x i y i s i W A Embedding Joint optimization of W and A to preserve Descriptiveness: preserve video descriptions : L(A,S) Predictability: recognize terms from video content : L(S,W) Videostory: A new multimedia embedding for few-example recognition and translation of events, Amirhossein Habibian and Thomas Mensink and Cees Snoek, Proceedings of the ACM International Conference on Multimedia, 2014

Video Story Training Sets • VideoStory46k - www.mediamill.nl • 45826 videos from YouTube based on 2013 MED research set terms • FCVID: Fudan Columbia Video Dataset • 87609 videos • EventNet • 88542 videos • Merged (VideoStory46k, FCVID, EventNet) • Video Story dictionary: Terms that occur more than 10 times in the dataset • Merged : 6440 terms • Using vocabulary of stemmed terms that occur more than 100 times in Wikipedia dump • With stemming: Respect the Video Story dictionary • 267.836 terms • Use word2vec to expand them per video

Query Terms • Experiments show it is important to select the right terms • Instead of just taking the average of the terms in word2vec space • Part-of-Speech tagging • <noun1> , <verb> , <noun2> • <subject> , <predicate> , <remainder> • Query Plan A. Use nouns, verbs, and adjectives in <subject> • unless it concerns a person (noun1 = “person”, ”man”, “woman”, “child”, …) B. Use nouns in <remainder> • u nless it concerns a person or noun is a setting (“indoors”, “outdoors”, …) C. Use <predicate> D. Use all nouns in sentence • Unless noun is a person or a setting

The Effect of Parsing on 2016 Topics • MIAP using only ResNet feature 0.090 0.080 0.070 0.060 0.050 0.040 0.030 0.020 0.010 0.000 EventNet Merged top4000 rbps13k avg parse

(Greedy) Oracle on 2016 Topics • Fuse top (max 5) words/concepts with highest MIAP • MIAP using only ResNet feature 0.250 0.200 0.150 0.100 0.050 0.000 EventNet Merged top4000 rbps13k avg parse oracle

Query Examples : The Good • A person playing drums indoors • VideoStory terms avg : person 0.450 plai 0.400 drum 0.350 indoor 0.300 • VideoStory terms parse : 0.250 drum 0.200 • VideoStory terms oracle : 0.150 beat 0.100 drum 0.050 snare 0.000 vibe Merged rbps13k bng avg parse oracle

Query Examples : The Ambiguous • A person playing drums indoors 0.500 • Concepts top5 avg : 0.400 guitarist, guitar player 0.300 outdoor game 0.200 drum, drumfish 0.100 sitar player 0.000 brake drum, drum Merged rbps13k • Concepts top5 parse : avg parse oracle drum, drumfish brake drum, drum Oracle : barrel, drum percussionist snare drum, snare, side drum cymbal drum, membranophone, tympan drummer drum, membranophone, tympan snare drum, snare, side drum

Query Examples : The Bad • A person sitting down with a laptop visible • VideoStory terms avg : person 0.200 sit laptop 0.150 • VideoStory terms parse : 0.100 laptop • VideoStory terms oracle : 0.050 monitor 0.000 aspir Merged rbps13k acer avg parse oracle alienwar vaio asus laptop (rank 7)

Query Examples : The Difficult • A person wearing a helmet • Concept top5 parse : helmet (a protective headgear made of hard material to resist blows) helmet (armor plate that protects the head) pith hat, pith helmet, sun helmet, topee, topi batting helmet crash helmet 0.500 • Concept top5 oracle : 0.400 hockey skate 0.300 hockey stick 0.200 ice hockey, hockey, hockey game 0.100 field hockey, hockey 0.000 rink, skating rink Merged rbps13k avg parse oracle

Query Examples : The Impossible • A crowd demonstrating in a city street at night • Parsing “fails” 0.350 • Average wouldn’t have helped 0.300 • VS oracle : 0.250 vega 0.200 squar 0.150 gang 0.100 times 0.050 0.000 occupi Merged rbps13k • Concept oracle : avg parse oracle vigil light, vigil candle motorcycle cop, motorcycle policeman, speed cop rider minibike, motorbike freewheel

Results 5 Modalities x 2 Features • VideoStory : ResNeXt is better than ResNet • Concepts : ResNet is better than ResNeXt (overfit?) • VideoStory is better than Concepts 0.090 0.080 0.070 0.060 0.050 0.040 0.030 0.020 0.010 0.000 EventNet Merged top4000 rbps4k rbps13k ResNet ResNeXt ResNet+ResNeXt

Final Fusion • Concept fusion is slightly better than VideoStory • Often complementary, also big difference for many topics • Top 2/4 for concepts is slightly better than top 3/5 0.120 0.100 0.080 0.060 0.040 0.020 0.000 ResNet ResNeXt ResNet+ResNeXt

Our AVS Submission 0.250 0.200 0.150 0.100 0.050 0.000 2016 2017 Fusion top24 Fusion top35 VideoStory Concepts

All Fully Automatic AVS Submissions 0.250 0.200 0.150 0.100 0.050 0.000

0.000 0.050 0.100 0.150 0.200 0.250 All Automatic and Interactive AVS Submissions M_D_Waseda_Meisei.17_1 M_D_Waseda_Meisei.17_3 F_D_MediaMill.17_1 F_D_MediaMill.17_2 M_D_Waseda_Meisei.17_2 M_D_Waseda_Meisei.17_4 F_D_MediaMill.17_4 M_D_VIREO.17_2 M_D_VIREO.17_4 F_D_Waseda_Meisei.17_1 F_D_MediaMill.17_3 M_D_FIU_UM.17_2 M_D_FIU_UM.17_4 F_D_Waseda_Meisei.17_4 F_D_Waseda_Meisei.17_3 F_D_Waseda_Meisei.17_2 M_D_VIREO.17_1 F_D_VIREO.17_2 M_D_VIREO.17_3 F_D_VIREO.17_4 F_D_VIREO.17_3 M_D_FIU_UM.17_3 M_E_ITEC_UNIKLU.17_2 M_E_ITEC_UNIKLU.17_4 F_D_ITI_CERTH.17_3 F_D_EURECOM.17_3 F_D_VIREO.17_1 F_D_ITI_CERTH.17_4 F_D_ITI_CERTH.17_1 F_D_EURECOM.17_1 M_E_ITEC_UNIKLU.17_1 F_D_EURECOM.17_2 M_D_kobe_nict_siegen.17_1 M_D_FIU_UM.17_1 M_E_ITEC_UNIKLU.17_3 F_D_ITI_CERTH.17_2 F_D_NII_Hitachi_UIT.17_1 F_D_NII_Hitachi_UIT.17_2 F_E_ITEC_UNIKLU.17_4 F_E_ITEC_UNIKLU.17_3 F_E_INF.17_2 F_E_ITEC_UNIKLU.17_2 F_D_NII_Hitachi_UIT.17_5 F_D_NII_Hitachi_UIT.17_3 F_E_INF.17_1 M_D_kobe_nict_siegen.17_2 F_E_ITEC_UNIKLU.17_1 M_D_kobe_nict_siegen.17_3 F_E_INF.17_3 F_D_EURECOM.17_4 F_E_INF.17_4 F_D_NII_Hitachi_UIT.17_4

Conclusions • Query parsing is important • VideoStory and Concepts are good but will not “solve” AVS

Thank You

for Zero-Example Video Search Dennis Koelma and Cees Snoek - PowerPoint PPT Presentation

Query Understanding is Key for Zero-Example Video Search Dennis Koelma and Cees Snoek University of Amsterdam The Netherlands Pipeline Selected query terms Video Frames 2 / sec window average Closest terms Video Story cosine

Zero Waste at The Nat Zero Waste Zero Waste Zero Waste is a philosophy that encourages the

Getting to Zero San Francisco Consortium Zero new HIV infections Zero HIV deaths Zero stigma

Getting to Zero San Francisco Consortium Zero new HIV infections Zero HIV deaths Zero stigma

DALLAS ZERO WASTE Recycling 101 ZERO WASTE PLAN What is Zero Waste? The planet has limited

VISION ZERO SF: ELIMINATING TRAFFIC DEATHS BY 2024 FEBRUARY 6, 2017 VISION ZERO VISION ZERO SF

Presentation of Platform Zero Incidents Platform Zero Incidents Platform Zero Incidents MENTAL

Vision Zero Insight A new approach to Roads Policing VISION ZERO 2 The Vision Zero Action Plan

Consortium Zero new HIV infections Zero HIV deaths Zero stigma and discrimination Agenda 1.

Zero-knowledge Arguments Proving circuit satisfaibility in zero-knowledge Zero-knowledge In

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Video Games Written and Researched by: Patrick Kania First Video Game The first Video Game made

Tabu Search Search Tabu Page 1 Part I Part I Tabu Search Principles Search Principles Tabu

Uninformed Search 2 Informed Search Rest of blind search An informed search strategyone

Informed search algorithms Outline Best-first search Greedy best-first search A *

Foundations of Artificial Intelligence 9. State-Space Search: Tree Search and Graph Search Malte

The Firefighter Problem on Trees David Ellison RMIT School of Science Co-authors: Pierre

Three programs stories LEAH: MCH H.O.P.E.S Targeted audience: Organizational Partners

(s|qu)eries Visual Regular Expressions for Querying and Exploring Event Sequences Emanuel

Clarifying Murky Waters: 4 feet onto sidewalk during removal from Head Injuries in Children car

Aim Aim I know how to be safe on and near the road. Success Criteria Success Criteria

MAKING CHILDREN, FAMILIES AND COMMUNITIES SAFER April 22, 2015 DANGEROUS MISCONCEPTIONS RISKS

MANAGING S PEED on Hillsboroughs Presented by: Paula C. Flores, FITE High Inj ury Network

E V A L U AT I N G P O L I C I E S I MPA 612: Economy, Society, and Public Policy April 8, 2019

Presents Kindergarten Readiness Workshops October 2018 Hosted by Southington YMCA Early Learning

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us