Learning TRECVID08 High-level Features from YouTube Adrian Ulges*, - PowerPoint PPT Presentation

Learning TRECVID’08 High-level Features from YouTube Adrian Ulges*, Markus Koch, Christian Schulze, Thomas M. Breuel Image Understanding and Pattern Recognition DFKI & TU Kaiserslautern / Germany 2008/07/07 Ulges: CIVR’08 1 2008/07/07

Outline Motivation Online Video Concept Detection TRECVID’08 Experiments More Experiments Discussion Ulges: CIVR’08 2 2008/07/07

Concept Detection Detection of generic semantic concepts in video ◮ objects (“US flag”), locations (“desert”), events (“interview”) ◮ main application: video search Ulges: CIVR’08 3 2008/07/07

Concept Detection Key issue - training data acquisition ◮ training sets must be large-scale and annotated Ulges: CIVR’08 4 2008/07/07

Training Data: State-of-the-art ◮ high-quality manual annotations ◮ TRECVID [Smeaton06], Mediamill [Snoek06], LSCOM [naphade06], ... ◮ detectors exist for 100s of concepts Ulges: CIVR’08 5 2008/07/07

Training Data: State-of-the-art ◮ high-quality manual annotations ◮ TRECVID [Smeaton06], Mediamill [Snoek06], LSCOM [naphade06], ... ◮ detectors exist for 100s of concepts Limitations ◮ need to scale up further (1 , 000s of concepts [Hauptmann07]) ◮ annotations are bound to a dataset ◮ annotations are static Ulges: CIVR’08 5 2008/07/07

Online Video Concept Detection Idea: use online video as training data ◮ tags provided by users are used as annotations ◮ video taggers can learn autonomously Ulges: CIVR’08 7 2008/07/07

Online Video Concept Detection Benefits ◮ scalability: can scale up to 1 , 000s of concepts ◮ flexibility: web community keeps content up-to-date Ulges: CIVR’08 8 2008/07/07

Online Video Concept Detection Benefits ◮ scalability: can scale up to 1 , 000s of concepts ◮ flexibility: web community keeps content up-to-date Problems ◮ web video is a mixture of domains with varying production style (TV news, home video, music clips, ...) ◮ annotations are coarse and weak ◮ ( for benchmarking ) potential mismatch between TRECVID and YouTube concepts. YouTube YouTube (filtered) TRECVID Ulges: CIVR’08 8 2008/07/07

The Ultimate Question How Well Do Concept Detectors Trained on YouTube Work? Ulges: CIVR’08 9 2008/07/07

Key Idea ◮ use a standard concept detection approach (visual words + SVM) ◮ train it on YouTube and on a standard dataset (TRECVID-devel) ◮ benchmark both detectors Experiments 1. participation in TRECVID’08 2. further experiments: TV05, TV07, YouTube Ulges: CIVR’08 10 2008/07/07

Approach ◮ Keyframe Extraction ◮ adaptive clustering [Borth08] ◮ Features: Bag-of-visual-words ◮ dense sampling over several scales (ca. 3 , 600 features / frame) ◮ SIFT descriptors ◮ 2 , 000-means clustering to codebook ◮ Classifier: SVMs ◮ χ 2 kernel ◮ cross-validation for γ and C maximizing avg. prec. ◮ roughly balanced training sets (downsample negative class) ◮ Fusion over keyframes ◮ simple averaging Ulges: CIVR’08 12 2008/07/07

Datasets ◮ Test ◮ standard TV’08 test data ◮ Training 1: TV’08 ◮ standard TV’08 training data ◮ Training 2: YouTube ◮ downloaded using the YouTube API ◮ 100 videos per concept of up to 3 min. length ◮ two refinements: 1. by category : mountain → mountain[travel&places] 2. manually : mountain[travel&places] → mountain+panorama[travel&places] Ulges: CIVR’08 13 2008/07/07

YouTube Dataset: Quality TRECVID YouTube mountain cityscape Ulges: CIVR’08 14 2008/07/07

YouTube Dataset: Quality cont’d TRECVID YouTube singing telephone Ulges: CIVR’08 15 2008/07/07

Results 1 Top detections of YouTube-based detector mountain cityscape singing telephone Ulges: CIVR’08 16 2008/07/07

Results 2 0,16 A_IUPR-TV-M A_IUPR-TV-MF A_IUPR-TV-S 0,14 A_IUPR-TV-SF c_IUPR-YOUTUBE-M c_IUPR-YOUTUBE-S 0,12 Inferred average precision 0,10 0,08 0,06 0,04 0,02 0,00 Classroom Bridge Em._Vehicle Dog Kitchen Airpl._flying Two people Bus Driver Cityscape Harbor Telephone Street Demonstr._Or_Pr. Hand Mountain Nighttime Boat_Ship Flower Singing ◮ infMAP for TRECVID runs: 5.3-6.3 % ◮ infMAP for YouTube runs: 2.1-2.2 % ◮ performance strongly depends on the concept Ulges: CIVR’08 17 2008/07/07

Results 3 Concept “Dog”: TRECVID training “dogs” detected TRECVID test “dogs” ◮ specialized detectors make use of duplicates in the dataset ◮ the YouTube-based tagger cannot do this if annotations on the target domain are given, specialized detectors outperform YouTube-based ones in terms of MAP. Influence of Duplicates? Ulges: CIVR’08 18 2008/07/07

Idea Goal: Compare YouTube-based detectors with standard ones on a third target domain where no annotations are given! ◮ Approach / Concepts: see last experiments ◮ Datasets: 1. TV05 : TRECVID’05 video data with LSCOM annotations 2. TV07 : TRECVID’07 video data with TRECVID’08 annotations 3. YouTube : see last experiment Setup ◮ split each dataset for training and testing ◮ train on all datasets → 3 detectors ◮ test each detector on all 3 datasets Ulges: CIVR’08 20 2008/07/07

Results 1 MAP[%] training / testing TV05 TV07 YOUTUBE TV05 18.40 3.82 14.68 TV07 3.32 9.65 16.49 YOUTUBE 2.83 3.51 31.33 ◮ specialized detectors always perform best! (also for YouTube) ◮ all detectors generalize poorly! ◮ in-depth analysis: duplicates in all datasets Ulges: CIVR’08 21 2008/07/07

Results 2 MAP[%] training / testing TV05 TV07 YOUTUBE TV05 18.40 3.82 14.68 TV07 3.32 9.65 16.49 YOUTUBE 2.83 3.51 31.33 ◮ the relative performance loss for the YouTube-based detector is moderate (11.4%) Ulges: CIVR’08 22 2008/07/07

Results 3 Enhancing standard training sets with YouTube material ◮ join two datasets, test on third one tagging performance on TV07 tagging performance on TV05 7 6 training on YOUTUBE training on YOUTUBE 6 training on TV05 training on TV07 5 training on YOUTUBE+TV05 training on YOUTUBE+TV07 5 4 MAP [%] MAP [%] 4 3 3 2 2 1 1 0 0 ◮ Combining training sets with YouTube material slightly increases generalization performance (11.7%) Ulges: CIVR’08 23 2008/07/07

Conclusions YouTube helps on domains with no training annotations when... ◮ ... replacing standard datasets (11 . 4% performance loss, but autonomous training) ◮ ... complementing standard datasets (11 . 7% increase in generalization capabilities) ◮ more: [TRECVID Notebook Paper], [adrian.ulges@dfki.de] Ulges: CIVR’08 25 2008/07/07

Conclusions YouTube helps on domains with no training annotations when... ◮ ... replacing standard datasets (11 . 4% performance loss, but autonomous training) ◮ ... complementing standard datasets (11 . 7% increase in generalization capabilities) ◮ more: [TRECVID Notebook Paper], [adrian.ulges@dfki.de] Issues ◮ Scaling to 1000 tags? ◮ Adapting YouTube-based detectors to other target domains? Ulges: CIVR’08 25 2008/07/07

Fine Thanks for Your Attention! (thanks also to Marcel Worring and Alexander Hauptmann for helpful discussions!) Ulges: CIVR’08 26 2008/07/07

References ◮ [Smeaton06]: A. Smeaton, P. Over, W. Kraaij. Evaluation Campaigns and TRECVID . MIR 2006. ◮ [Snoek06]: C. Snoek, M. Worring, J. van Gemert, J. Geusebroek, A. Smeulders. The Challenge Problem for Automated Detection of 101 Semantic Concepts in Multimedia . Multimedia 2006. ◮ [Naphade06]: M. Naphade, J. Smith, J. Tesic, S. Chang, W. Hsu, L. Kennedy, A. Hauptmann, J. Curtis. Large-Scale Concept Ontology for Multimedia . IEEE Multimedia, 2006. ◮ [Hauptmann07]: A. Hauptmann, R. Yan, W. Lin. How many High-Level Concepts will Fill the Semantic Gap in News Video Retrieval? . CIVR, 2007. ◮ [Ulges08]: A. Ulges, C. Schulze, D. Keysers, T. Breuel. A System that Learns to Tag Videos by Watching Youtube . ICVS, Santorini, 2008. ◮ images taken from: [youtube,TRECVID datasets] Ulges: CIVR’08 27 2008/07/07

Learning TRECVID08 High-level Features from YouTube Adrian Ulges*, - PowerPoint PPT Presentation

Learning TRECVID08 High-level Features from YouTube Adrian Ulges*, Markus Koch, Christian Schulze, Thomas M. Breuel Image Understanding and Pattern Recognition DFKI & TU Kaiserslautern / Germany 2008/07/07 Ulges: CIVR08 1

Learning From Video Browse Behavior Learning From Video Browse Behavior TRECVID 2009 TRECVID

Columbia HLF: TRECVID2006 TRECVID TRECVID TRECVID 2005 2005 2005 (development)

COMPANY PROFILE WATER FEATURES 1 WATER FEATURES 2 WATER FEATURES 3 WATER FEATURES 4 WATER

Accelerating YouTube & Google Search Andreas Terzis YouTube Statistics YouTube is a large

George Awad National Institute of Standards and Technology Dakota Consulting, Inc 2 TRECVID

CMU @ TRECVID Event Detection @ Ming-yu Chen & Alex Hauptmann School of Computer Science

Event Detection in Airport Surveillance The TRECVid 2008 Evaluation The TRECVid 2008 Evaluation

TRECVID 2008 CBCD TRECVID 2008. CBCD MCG-ICT-CAS MCG-ICT-CAS Sheng Tang Yongdong Zhang Ke Gao

TRECVID 2010 K TRECVID 2010 Known item Search it S h by NUS by NUS Xiangyu Chen, Jin Yuan

Adaptive Feature Discovery for TRECVID Broadcast News Video Story Segmentation @TRECVID Workshop

TRECVID-2005 Low-level (camera motion) feature task Wessel Kraaij TNO & Tzveta Ianeva

BBN VISER TRECVID MED 11 System 1/12/2012 1 Outline Overview Feature Extraction

TRECVID Story Segmentation based on Content-Independent Audio-Video Features Keiichiro Hoashi,

UN High UN High UN High UN High- - - -Level Meeting on TB Level Meeting on TB Level Meeting

LEAR @ TrecVid MED 2012 a 1 , Matthijs Douze 1 , J ome Revaud 1 , Dan Oneat er Jochen

TRECVID-2005 High-Level Feature task: Overview Wessel Kraaij TNO & Paul Over NIST

Wireless Communication Systems @CS.NCTU Lecture 9: MPEG-2 Instructor: Kate Ching-Ju Lin (

FOSS MADE US DO IT! How switching to open source tools enabled video innovation Gustav Grusell

SKYSTITCH A Cooperative Multi-UAV-based Real-time Video Surveillance System with Stitching

0 c Inte fere n eXP 6 et2 Co n n r Todd Needham Sr. Program Manager Microsoft

Information Transmission Chapter 3, image and video OVE EDFORS ELECTRICAL AND INFORMATION

CMPT 365 Multimedia Systems Media Compression - Video Coding Standards Spring 2017 Edited from

Student Teacher Security Staff On the way to the school A comprehensive protection on students

H.264/AVC Standard H.264/AVC Standard 1 History History Objectives: 50% bit rate savings

Learning TRECVID08 High-level Features from YouTube Adrian Ulges*, - PowerPoint PPT Presentation

Learning TRECVID08 High-level Features from YouTube Adrian Ulges*, Markus Koch, Christian Schulze, Thomas M. Breuel Image Understanding and Pattern Recognition DFKI & TU Kaiserslautern / Germany 2008/07/07 Ulges: CIVR08 1

Learning From Video Browse Behavior Learning From Video Browse Behavior TRECVID 2009 TRECVID

Columbia HLF: TRECVID2006 TRECVID TRECVID TRECVID 2005 2005 2005 (development)

COMPANY PROFILE WATER FEATURES 1 WATER FEATURES 2 WATER FEATURES 3 WATER FEATURES 4 WATER

Accelerating YouTube &amp; Google Search Andreas Terzis YouTube Statistics YouTube is a large

George Awad National Institute of Standards and Technology Dakota Consulting, Inc 2 TRECVID

CMU @ TRECVID Event Detection @ Ming-yu Chen &amp; Alex Hauptmann School of Computer Science

Event Detection in Airport Surveillance The TRECVid 2008 Evaluation The TRECVid 2008 Evaluation

TRECVID 2008 CBCD TRECVID 2008. CBCD MCG-ICT-CAS MCG-ICT-CAS Sheng Tang Yongdong Zhang Ke Gao

TRECVID 2010 K TRECVID 2010 Known item Search it S h by NUS by NUS Xiangyu Chen, Jin Yuan

Adaptive Feature Discovery for TRECVID Broadcast News Video Story Segmentation @TRECVID Workshop

TRECVID-2005 Low-level (camera motion) feature task Wessel Kraaij TNO &amp; Tzveta Ianeva

BBN VISER TRECVID MED 11 System 1/12/2012 1 Outline Overview Feature Extraction

TRECVID Story Segmentation based on Content-Independent Audio-Video Features Keiichiro Hoashi,

UN High UN High UN High UN High- - - -Level Meeting on TB Level Meeting on TB Level Meeting

LEAR @ TrecVid MED 2012 a 1 , Matthijs Douze 1 , J ome Revaud 1 , Dan Oneat er Jochen

TRECVID-2005 High-Level Feature task: Overview Wessel Kraaij TNO &amp; Paul Over NIST

Wireless Communication Systems @CS.NCTU Lecture 9: MPEG-2 Instructor: Kate Ching-Ju Lin (

FOSS MADE US DO IT! How switching to open source tools enabled video innovation Gustav Grusell

SKYSTITCH A Cooperative Multi-UAV-based Real-time Video Surveillance System with Stitching

0 c Inte fere n eXP 6 et2 Co n n r Todd Needham Sr. Program Manager Microsoft

Information Transmission Chapter 3, image and video OVE EDFORS ELECTRICAL AND INFORMATION

CMPT 365 Multimedia Systems Media Compression - Video Coding Standards Spring 2017 Edited from

Student Teacher Security Staff On the way to the school A comprehensive protection on students

H.264/AVC Standard H.264/AVC Standard 1 History History Objectives: 50% bit rate savings

Accelerating YouTube & Google Search Andreas Terzis YouTube Statistics YouTube is a large

CMU @ TRECVID Event Detection @ Ming-yu Chen & Alex Hauptmann School of Computer Science

TRECVID-2005 Low-level (camera motion) feature task Wessel Kraaij TNO & Tzveta Ianeva

TRECVID-2005 High-Level Feature task: Overview Wessel Kraaij TNO & Paul Over NIST