vireo tno trecvid 2014
play

VIREO-TNO @ TRECVID 2014 Zero-Shot Event Detection and Recounting - PowerPoint PPT Presentation

VIREO-TNO @ TRECVID 2014 Zero-Shot Event Detection and Recounting Speaker: Maaike de Boer (TNO) Yi-Jie Lu 1 , Hao Zhang 1 ,Chong-Wah Ngo 1 Maaike de Boer 2 , John Schavemaker 2 , Klamer Schutte 2 , Wessel Kraaij 2 1 VIREO Group, City University of


  1. VIREO-TNO @ TRECVID 2014 Zero-Shot Event Detection and Recounting Speaker: Maaike de Boer (TNO) Yi-Jie Lu 1 , Hao Zhang 1 ,Chong-Wah Ngo 1 Maaike de Boer 2 , John Schavemaker 2 , Klamer Schutte 2 , Wessel Kraaij 2 1 VIREO Group, City University of Hong Kong, Hong Kong 2 Netherlands Organization for Applied Scientific Research (TNO), Netherlands

  2. Outline  0-Shot System – System Overview – Findings  MER System – System Workflow – Results

  3.  Semantic Query Generation (SQG) – Given an event query , SQG translates the query description into a representation of semantic concepts Semantic Query < Objects > • Bike 0.60 SQG • Motorcycle 0.60 • Mountain bike 0.60 < Actions > • Bike trick 1.00 Event Query • Ridding bike 0.62 $ (Attempting a Bike Trick) ₤ UCF101 • $ Flipping bike 0.61 Research Collection < Scenes > • ¥ Parking lot 0.01 ImageNet ƒ € HMDB51 TRECVID SIN Relevant Concepts Relevance Score Concept Bank Concept Bank

  4.  Concept Bank – Research collection (497 concepts) – ImageNet ILSVRC’12 (1000 concepts) – SIN’14 (346 concepts) $ ₤ UCF101 $ Research Collection ¥ ImageNet ƒ € HMDB51 TRECVID SIN Concept Bank

  5.  Event Search – Ranking according to the SQ and concept responses q Semantic Query < Objects > • Bike 0.60 s  Event Search qc • Motorcycle 0.60 i i • Mountain bike 0.60 < Actions > • Bike trick 1.00 • Ridding bike 0.62 • Video Ranking Flipping bike 0.61 < Scenes > • Parking lot 0.01 c Concept Response i

  6. Outline  0-Shot System – System Overview – Findings  MER System – System Workflow – Results

  7.  SQG Experiments – Exact matching vs. WordNet/ConceptNet matching – How many concepts are used to represent an event? – To further improve the weighting:  TF-IDF  Term specificity

  8.  Exact matching vs. WordNet matching Exact matching but 0.5 only retains the top 0.45 few concepts 0.4 0.35 Average Precision 0.3 Exact Matching 0.25 0.2 7% 0.15 0.1 0.05 0 WordNet Event ID WordNet ExactMatching EM-TOP

  9.  Amount of concepts used to represent event Hit the best MAP by only retaining the Top 8 concepts 0.08 0.07 Mean Average Precision 0.06 0.05 0.04 0.03 0.02 0.01 0 1 6 11 16 21 26 Top k Concepts MAP(all)

  10. Insights 0.5 0.45 0.4 0.35 Average Precision 0.3 Paddle wheel Trick Wheel 0.25 Person riding 21 0.2 Jumping 0.15 Car wheel 0.1 Potter wheel 0.05 0 1 6 11 16 21 26 Top k Concepts Event 21: Attempting a bike trick

  11. Insights Bee house (ImageNet) 0.5 Cutting (research collection) 0.45 Cutting down tree (research collection) 0.4 0.35 Average Precision 0.3 0.25 0.2 Bee (ImageNet) 0.15 31 0.1 Honeycomb (ImageNet) 0.05 0 1 6 11 16 21 26 Top k Concepts Event 31: Beekeeping

  12. Insights 0.5 Dog show (research collection) 0.45 0.4 0.35 Average Precision 0.3 0.25 0.2 23 Brush dog (research collection) 0.15 0.1 0.05 0 1 6 11 16 21 26 Top k Concepts Event 23: Dog show

  13.  Improvements by TF-IDF and word specificity Method MAP (on MED14-Test) Exact Matching Only 0.0306 Exact Matching + TF 0.0420 Exact Matching + TFIDF 0.0495 Exact Matching + TFIDF + Word Specificity 0.0502 0.06 0.05 0.04 0.03 0.02 0.01 0 EM Only EM + TF EM + TFIDF EM + TFIDF + Spec.

  14. Findings 1. Exact matching performs better than matching with WordNet and/or ConceptNet 2. Performance is even better by only retaining the top few exactly matched concepts 3. Adding both TF-IDF and Word Specificity increases performance

  15.  Why ontology-based mapping would not work? A sample query in TRECVID 2009

  16.  Why ontology-based mapping would not work? red wolf ImageNet kit fox cat Concept horse “dog” mammal SIN Dog Show carnivore animal

  17.  Why ConceptNet mapping would not work? desires driver tailgating car engine food bus helmet parking lot Tailgating team uniform portable shelter

  18. Findings  It is difficult to – harness the ontology-based mapping while constraining the mapping by event context

  19.  In the Ad-Hoc event “Extinguishing a Fire” – Key concepts are missing:  Fire extinguisher  Firefighter

  20. Findings  It is reasonable to – Scale up the number of concepts, thus increasing the chance of exact matching

  21. MED14-Eval-Full Results  PS 000Ex – Automatic semantic query generation and search – Fusion of 0-Shot and OCR system – Achieves the MAP of 5.2  AH 000Ex – System is the same as in PS 000Ex – Achieves the MAP of 2.6 – Performance drops due to the lack of key concepts

  22. Outline  0-Shot System – System Overview – Findings  MER System – System Workflow – Results

  23. MER System  In algorithm design, we aim to optimize – Concept-to-event relevancy – Evidence diversity – Viewing time of evidential shots

  24. MER System  In algorithm design, we aim to optimize – Concept-to-event relevancy  First, we require that candidate shots are relevant to the event;  Second, we do concept-to-shot alignment. – Evidence diversity – Viewing time of evidential shots

  25. MER System  In algorithm design, we aim to optimize – Concept-to-event relevancy  First, we require that candidate shots are relevant to the event;  Second, we do concept-to-shot alignment. – Evidence diversity  In concept-to-shot alignment, we recount each shot with a unique concept different from other shots. – Viewing time of evidential shots

  26. MER System  In algorithm design, we aim to optimize – Concept-to-event relevancy  First, we require that candidate shots are relevant to the event;  Second, we do concept-to-shot alignment. – Evidence diversity  In concept-to-shot alignment, we recount each shot with a unique concept different from other shots. – Viewing time of evidential shots  Select only the three most confident shots as key evidence  Basically, each shot is in about 5 seconds

  27. Outline  0-Shot System – System Overview – Findings  MER System – System Workflow – Results

  28.  Key Evidence Localization Extract keyframes uniformly

  29.  Key Evidence Localization Concept Reponses $ ₤ UCF101 $ Research Collection Apply concept detectors ¥ ImageNet ƒ € HMDB51 TRECVID SIN Concept Bank

  30.  Key Evidence Localization Choose keyframes that are most relevant to this event • All concepts in semantic query are taken into account by calculating s  the weighted sum wr i i

  31.  Key Evidence Localization Expand keyframes to shots

  32.  Key Evidence Localization The top 3 shots are selected as key evidences

  33.  Key Evidence Localization The rests are non-key evidences

  34.  Concept-to-Shot Alignment Semantic Query < Objects > • Bike Key • Motorcycle • Mountain bike < Actions > • Bike trick Key • Ridding bike Key • Flipping bike Non-Key < Scenes > • Parking lot Ridding bike Bike trick Bike trick Bike Bike Ridding bike The top concept in the key evidence is selected as the representative concept * We choose unique concept for each shot

  35. MER14 Results The percentage of strongly agree 30% 30% 25% 25% 20% 20% 15% 15% 10% 10% 5% 5% 0% 0% Team2 VIREO Team4 Team3 Team6 Team1 Team5 VIREO Team1 Team2 Team3 Team4 Team6 Team5 (a) Evidence quality (b) Event query quality

  36. MER14 Results The percentage of both agree and strongly agree 70% 90% 80% 60% 70% 50% 60% 40% 50% 40% 30% 30% 20% 20% 10% 10% 0% 0% Team2 VIREO Team4 Team1 Team6 Team5 Team3 Team1 Team2 Team3 VIREO Team4 Team5 Team6 (a) Evidence quality (b) Event query quality

  37. Summary  0-Shot System – The simple exact matching performs the best – The quality of concepts selected to represent an event is more important than quantity – It’s an open problem of how to harness the ontology- based mapping

  38. Summary  MER System – In key evidence localization, we emphasize the event relevancy first, then the hot concepts – We recommend three shots as key evidences and each in about 5 seconds

  39. Thanks!

Recommend


More recommend