VIREO-TNO @ TRECVID 2014 Zero-Shot Event Detection and Recounting Speaker: Maaike de Boer (TNO) Yi-Jie Lu 1 , Hao Zhang 1 ,Chong-Wah Ngo 1 Maaike de Boer 2 , John Schavemaker 2 , Klamer Schutte 2 , Wessel Kraaij 2 1 VIREO Group, City University of Hong Kong, Hong Kong 2 Netherlands Organization for Applied Scientific Research (TNO), Netherlands
Outline 0-Shot System – System Overview – Findings MER System – System Workflow – Results
Semantic Query Generation (SQG) – Given an event query , SQG translates the query description into a representation of semantic concepts Semantic Query < Objects > • Bike 0.60 SQG • Motorcycle 0.60 • Mountain bike 0.60 < Actions > • Bike trick 1.00 Event Query • Ridding bike 0.62 $ (Attempting a Bike Trick) ₤ UCF101 • $ Flipping bike 0.61 Research Collection < Scenes > • ¥ Parking lot 0.01 ImageNet ƒ € HMDB51 TRECVID SIN Relevant Concepts Relevance Score Concept Bank Concept Bank
Concept Bank – Research collection (497 concepts) – ImageNet ILSVRC’12 (1000 concepts) – SIN’14 (346 concepts) $ ₤ UCF101 $ Research Collection ¥ ImageNet ƒ € HMDB51 TRECVID SIN Concept Bank
Event Search – Ranking according to the SQ and concept responses q Semantic Query < Objects > • Bike 0.60 s Event Search qc • Motorcycle 0.60 i i • Mountain bike 0.60 < Actions > • Bike trick 1.00 • Ridding bike 0.62 • Video Ranking Flipping bike 0.61 < Scenes > • Parking lot 0.01 c Concept Response i
Outline 0-Shot System – System Overview – Findings MER System – System Workflow – Results
SQG Experiments – Exact matching vs. WordNet/ConceptNet matching – How many concepts are used to represent an event? – To further improve the weighting: TF-IDF Term specificity
Exact matching vs. WordNet matching Exact matching but 0.5 only retains the top 0.45 few concepts 0.4 0.35 Average Precision 0.3 Exact Matching 0.25 0.2 7% 0.15 0.1 0.05 0 WordNet Event ID WordNet ExactMatching EM-TOP
Amount of concepts used to represent event Hit the best MAP by only retaining the Top 8 concepts 0.08 0.07 Mean Average Precision 0.06 0.05 0.04 0.03 0.02 0.01 0 1 6 11 16 21 26 Top k Concepts MAP(all)
Insights 0.5 0.45 0.4 0.35 Average Precision 0.3 Paddle wheel Trick Wheel 0.25 Person riding 21 0.2 Jumping 0.15 Car wheel 0.1 Potter wheel 0.05 0 1 6 11 16 21 26 Top k Concepts Event 21: Attempting a bike trick
Insights Bee house (ImageNet) 0.5 Cutting (research collection) 0.45 Cutting down tree (research collection) 0.4 0.35 Average Precision 0.3 0.25 0.2 Bee (ImageNet) 0.15 31 0.1 Honeycomb (ImageNet) 0.05 0 1 6 11 16 21 26 Top k Concepts Event 31: Beekeeping
Insights 0.5 Dog show (research collection) 0.45 0.4 0.35 Average Precision 0.3 0.25 0.2 23 Brush dog (research collection) 0.15 0.1 0.05 0 1 6 11 16 21 26 Top k Concepts Event 23: Dog show
Improvements by TF-IDF and word specificity Method MAP (on MED14-Test) Exact Matching Only 0.0306 Exact Matching + TF 0.0420 Exact Matching + TFIDF 0.0495 Exact Matching + TFIDF + Word Specificity 0.0502 0.06 0.05 0.04 0.03 0.02 0.01 0 EM Only EM + TF EM + TFIDF EM + TFIDF + Spec.
Findings 1. Exact matching performs better than matching with WordNet and/or ConceptNet 2. Performance is even better by only retaining the top few exactly matched concepts 3. Adding both TF-IDF and Word Specificity increases performance
Why ontology-based mapping would not work? A sample query in TRECVID 2009
Why ontology-based mapping would not work? red wolf ImageNet kit fox cat Concept horse “dog” mammal SIN Dog Show carnivore animal
Why ConceptNet mapping would not work? desires driver tailgating car engine food bus helmet parking lot Tailgating team uniform portable shelter
Findings It is difficult to – harness the ontology-based mapping while constraining the mapping by event context
In the Ad-Hoc event “Extinguishing a Fire” – Key concepts are missing: Fire extinguisher Firefighter
Findings It is reasonable to – Scale up the number of concepts, thus increasing the chance of exact matching
MED14-Eval-Full Results PS 000Ex – Automatic semantic query generation and search – Fusion of 0-Shot and OCR system – Achieves the MAP of 5.2 AH 000Ex – System is the same as in PS 000Ex – Achieves the MAP of 2.6 – Performance drops due to the lack of key concepts
Outline 0-Shot System – System Overview – Findings MER System – System Workflow – Results
MER System In algorithm design, we aim to optimize – Concept-to-event relevancy – Evidence diversity – Viewing time of evidential shots
MER System In algorithm design, we aim to optimize – Concept-to-event relevancy First, we require that candidate shots are relevant to the event; Second, we do concept-to-shot alignment. – Evidence diversity – Viewing time of evidential shots
MER System In algorithm design, we aim to optimize – Concept-to-event relevancy First, we require that candidate shots are relevant to the event; Second, we do concept-to-shot alignment. – Evidence diversity In concept-to-shot alignment, we recount each shot with a unique concept different from other shots. – Viewing time of evidential shots
MER System In algorithm design, we aim to optimize – Concept-to-event relevancy First, we require that candidate shots are relevant to the event; Second, we do concept-to-shot alignment. – Evidence diversity In concept-to-shot alignment, we recount each shot with a unique concept different from other shots. – Viewing time of evidential shots Select only the three most confident shots as key evidence Basically, each shot is in about 5 seconds
Outline 0-Shot System – System Overview – Findings MER System – System Workflow – Results
Key Evidence Localization Extract keyframes uniformly
Key Evidence Localization Concept Reponses $ ₤ UCF101 $ Research Collection Apply concept detectors ¥ ImageNet ƒ € HMDB51 TRECVID SIN Concept Bank
Key Evidence Localization Choose keyframes that are most relevant to this event • All concepts in semantic query are taken into account by calculating s the weighted sum wr i i
Key Evidence Localization Expand keyframes to shots
Key Evidence Localization The top 3 shots are selected as key evidences
Key Evidence Localization The rests are non-key evidences
Concept-to-Shot Alignment Semantic Query < Objects > • Bike Key • Motorcycle • Mountain bike < Actions > • Bike trick Key • Ridding bike Key • Flipping bike Non-Key < Scenes > • Parking lot Ridding bike Bike trick Bike trick Bike Bike Ridding bike The top concept in the key evidence is selected as the representative concept * We choose unique concept for each shot
MER14 Results The percentage of strongly agree 30% 30% 25% 25% 20% 20% 15% 15% 10% 10% 5% 5% 0% 0% Team2 VIREO Team4 Team3 Team6 Team1 Team5 VIREO Team1 Team2 Team3 Team4 Team6 Team5 (a) Evidence quality (b) Event query quality
MER14 Results The percentage of both agree and strongly agree 70% 90% 80% 60% 70% 50% 60% 40% 50% 40% 30% 30% 20% 20% 10% 10% 0% 0% Team2 VIREO Team4 Team1 Team6 Team5 Team3 Team1 Team2 Team3 VIREO Team4 Team5 Team6 (a) Evidence quality (b) Event query quality
Summary 0-Shot System – The simple exact matching performs the best – The quality of concepts selected to represent an event is more important than quantity – It’s an open problem of how to harness the ontology- based mapping
Summary MER System – In key evidence localization, we emphasize the event relevancy first, then the hot concepts – We recommend three shots as key evidences and each in about 5 seconds
Thanks!
Recommend
More recommend