learning trecvid 08 high level features from youtube
play

Learning TRECVID08 High-level Features from YouTube Adrian Ulges*, - PowerPoint PPT Presentation

Learning TRECVID08 High-level Features from YouTube Adrian Ulges*, Markus Koch, Christian Schulze, Thomas M. Breuel Image Understanding and Pattern Recognition DFKI & TU Kaiserslautern / Germany 2008/07/07 Ulges: CIVR08 1


  1. Learning TRECVID’08 High-level Features from YouTube Adrian Ulges*, Markus Koch, Christian Schulze, Thomas M. Breuel Image Understanding and Pattern Recognition DFKI & TU Kaiserslautern / Germany 2008/07/07 Ulges: CIVR’08 1 2008/07/07

  2. Outline Motivation Online Video Concept Detection TRECVID’08 Experiments More Experiments Discussion Ulges: CIVR’08 2 2008/07/07

  3. Concept Detection Detection of generic semantic concepts in video ◮ objects (“US flag”), locations (“desert”), events (“interview”) ◮ main application: video search Ulges: CIVR’08 3 2008/07/07

  4. Concept Detection Key issue - training data acquisition ◮ training sets must be large-scale and annotated Ulges: CIVR’08 4 2008/07/07

  5. Training Data: State-of-the-art ◮ high-quality manual annotations ◮ TRECVID [Smeaton06], Mediamill [Snoek06], LSCOM [naphade06], ... ◮ detectors exist for 100s of concepts Ulges: CIVR’08 5 2008/07/07

  6. Training Data: State-of-the-art ◮ high-quality manual annotations ◮ TRECVID [Smeaton06], Mediamill [Snoek06], LSCOM [naphade06], ... ◮ detectors exist for 100s of concepts Limitations ◮ need to scale up further (1 , 000s of concepts [Hauptmann07]) ◮ annotations are bound to a dataset ◮ annotations are static Ulges: CIVR’08 5 2008/07/07

  7. Outline Motivation Online Video Concept Detection TRECVID’08 Experiments More Experiments Discussion Ulges: CIVR’08 6 2008/07/07

  8. Online Video Concept Detection Idea: use online video as training data ◮ tags provided by users are used as annotations ◮ video taggers can learn autonomously Ulges: CIVR’08 7 2008/07/07

  9. Online Video Concept Detection Benefits ◮ scalability: can scale up to 1 , 000s of concepts ◮ flexibility: web community keeps content up-to-date Ulges: CIVR’08 8 2008/07/07

  10. Online Video Concept Detection Benefits ◮ scalability: can scale up to 1 , 000s of concepts ◮ flexibility: web community keeps content up-to-date Problems ◮ web video is a mixture of domains with varying production style (TV news, home video, music clips, ...) ◮ annotations are coarse and weak ◮ ( for benchmarking ) potential mismatch between TRECVID and YouTube concepts. YouTube YouTube (filtered) TRECVID Ulges: CIVR’08 8 2008/07/07

  11. The Ultimate Question How Well Do Concept Detectors Trained on YouTube Work? Ulges: CIVR’08 9 2008/07/07

  12. Key Idea ◮ use a standard concept detection approach (visual words + SVM) ◮ train it on YouTube and on a standard dataset (TRECVID-devel) ◮ benchmark both detectors Experiments 1. participation in TRECVID’08 2. further experiments: TV05, TV07, YouTube Ulges: CIVR’08 10 2008/07/07

  13. Outline Motivation Online Video Concept Detection TRECVID’08 Experiments More Experiments Discussion Ulges: CIVR’08 11 2008/07/07

  14. Approach ◮ Keyframe Extraction ◮ adaptive clustering [Borth08] ◮ Features: Bag-of-visual-words ◮ dense sampling over several scales (ca. 3 , 600 features / frame) ◮ SIFT descriptors ◮ 2 , 000-means clustering to codebook ◮ Classifier: SVMs ◮ χ 2 kernel ◮ cross-validation for γ and C maximizing avg. prec. ◮ roughly balanced training sets (downsample negative class) ◮ Fusion over keyframes ◮ simple averaging Ulges: CIVR’08 12 2008/07/07

  15. Datasets ◮ Test ◮ standard TV’08 test data ◮ Training 1: TV’08 ◮ standard TV’08 training data ◮ Training 2: YouTube ◮ downloaded using the YouTube API ◮ 100 videos per concept of up to 3 min. length ◮ two refinements: 1. by category : mountain → mountain[travel&places] 2. manually : mountain[travel&places] → mountain+panorama[travel&places] Ulges: CIVR’08 13 2008/07/07

  16. YouTube Dataset: Quality TRECVID YouTube mountain cityscape Ulges: CIVR’08 14 2008/07/07

  17. YouTube Dataset: Quality cont’d TRECVID YouTube singing telephone Ulges: CIVR’08 15 2008/07/07

  18. Results 1 Top detections of YouTube-based detector mountain cityscape singing telephone Ulges: CIVR’08 16 2008/07/07

  19. Results 2 0,16 A_IUPR-TV-M A_IUPR-TV-MF A_IUPR-TV-S 0,14 A_IUPR-TV-SF c_IUPR-YOUTUBE-M c_IUPR-YOUTUBE-S 0,12 Inferred average precision 0,10 0,08 0,06 0,04 0,02 0,00 Classroom Bridge Em._Vehicle Dog Kitchen Airpl._flying Two people Bus Driver Cityscape Harbor Telephone Street Demonstr._Or_Pr. Hand Mountain Nighttime Boat_Ship Flower Singing ◮ infMAP for TRECVID runs: 5.3-6.3 % ◮ infMAP for YouTube runs: 2.1-2.2 % ◮ performance strongly depends on the concept Ulges: CIVR’08 17 2008/07/07

  20. Results 3 Concept “Dog”: TRECVID training “dogs” detected TRECVID test “dogs” ◮ specialized detectors make use of duplicates in the dataset ◮ the YouTube-based tagger cannot do this if annotations on the target domain are given, specialized detectors outperform YouTube-based ones in terms of MAP. Influence of Duplicates? Ulges: CIVR’08 18 2008/07/07

  21. Outline Motivation Online Video Concept Detection TRECVID’08 Experiments More Experiments Discussion Ulges: CIVR’08 19 2008/07/07

  22. Idea Goal: Compare YouTube-based detectors with standard ones on a third target domain where no annotations are given! ◮ Approach / Concepts: see last experiments ◮ Datasets: 1. TV05 : TRECVID’05 video data with LSCOM annotations 2. TV07 : TRECVID’07 video data with TRECVID’08 annotations 3. YouTube : see last experiment Setup ◮ split each dataset for training and testing ◮ train on all datasets → 3 detectors ◮ test each detector on all 3 datasets Ulges: CIVR’08 20 2008/07/07

  23. Results 1 MAP[%] training / testing TV05 TV07 YOUTUBE TV05 18.40 3.82 14.68 TV07 3.32 9.65 16.49 YOUTUBE 2.83 3.51 31.33 ◮ specialized detectors always perform best! (also for YouTube) ◮ all detectors generalize poorly! ◮ in-depth analysis: duplicates in all datasets Ulges: CIVR’08 21 2008/07/07

  24. Results 2 MAP[%] training / testing TV05 TV07 YOUTUBE TV05 18.40 3.82 14.68 TV07 3.32 9.65 16.49 YOUTUBE 2.83 3.51 31.33 ◮ the relative performance loss for the YouTube-based detector is moderate (11.4%) Ulges: CIVR’08 22 2008/07/07

  25. Results 3 Enhancing standard training sets with YouTube material ◮ join two datasets, test on third one tagging performance on TV07 tagging performance on TV05 7 6 training on YOUTUBE training on YOUTUBE 6 training on TV05 training on TV07 5 training on YOUTUBE+TV05 training on YOUTUBE+TV07 5 4 MAP [%] MAP [%] 4 3 3 2 2 1 1 0 0 ◮ Combining training sets with YouTube material slightly increases generalization performance (11.7%) Ulges: CIVR’08 23 2008/07/07

  26. Outline Motivation Online Video Concept Detection TRECVID’08 Experiments More Experiments Discussion Ulges: CIVR’08 24 2008/07/07

  27. Conclusions YouTube helps on domains with no training annotations when... ◮ ... replacing standard datasets (11 . 4% performance loss, but autonomous training) ◮ ... complementing standard datasets (11 . 7% increase in generalization capabilities) ◮ more: [TRECVID Notebook Paper], [adrian.ulges@dfki.de] Ulges: CIVR’08 25 2008/07/07

  28. Conclusions YouTube helps on domains with no training annotations when... ◮ ... replacing standard datasets (11 . 4% performance loss, but autonomous training) ◮ ... complementing standard datasets (11 . 7% increase in generalization capabilities) ◮ more: [TRECVID Notebook Paper], [adrian.ulges@dfki.de] Issues ◮ Scaling to 1000 tags? ◮ Adapting YouTube-based detectors to other target domains? Ulges: CIVR’08 25 2008/07/07

  29. Fine Thanks for Your Attention! (thanks also to Marcel Worring and Alexander Hauptmann for helpful discussions!) Ulges: CIVR’08 26 2008/07/07

  30. References ◮ [Smeaton06]: A. Smeaton, P. Over, W. Kraaij. Evaluation Campaigns and TRECVID . MIR 2006. ◮ [Snoek06]: C. Snoek, M. Worring, J. van Gemert, J. Geusebroek, A. Smeulders. The Challenge Problem for Automated Detection of 101 Semantic Concepts in Multimedia . Multimedia 2006. ◮ [Naphade06]: M. Naphade, J. Smith, J. Tesic, S. Chang, W. Hsu, L. Kennedy, A. Hauptmann, J. Curtis. Large-Scale Concept Ontology for Multimedia . IEEE Multimedia, 2006. ◮ [Hauptmann07]: A. Hauptmann, R. Yan, W. Lin. How many High-Level Concepts will Fill the Semantic Gap in News Video Retrieval? . CIVR, 2007. ◮ [Ulges08]: A. Ulges, C. Schulze, D. Keysers, T. Breuel. A System that Learns to Tag Videos by Watching Youtube . ICVS, Santorini, 2008. ◮ images taken from: [youtube,TRECVID datasets] Ulges: CIVR’08 27 2008/07/07

Recommend


More recommend