columbia university trecvid 2005 search task
play

Columbia University TRECVID 2005 Search Task Shih-Fu Chang, Winston - PowerPoint PPT Presentation

TRECVID 2005 Workshop Columbia University TRECVID 2005 Search Task Shih-Fu Chang, Winston Hsu, Lyndon Kennedy, Akira Yanagawa, Eric Zavesky, Dong-Qing Zhang Digital Video and Multimedia Lab Columbia University Nov. 14 2005


  1. TRECVID 2005 Workshop Columbia University TRECVID 2005 Search Task Shih-Fu Chang, Winston Hsu, Lyndon Kennedy, Akira Yanagawa, Eric Zavesky, Dong-Qing Zhang Digital Video and Multimedia Lab Columbia University Nov. 14 2005 http://www.ee.columbia.edu/dvmm

  2. Columbia Video Search System Overview http://www.ee.columbia.edu/cuvidsearch User Level Search Objects automatic/manual Interactive search search • Query topic class mining • Cue-X reranking user search • Interactive activity log mining query cue-X pattern topic classes re-ranking mining Multi-modal Search Tools • combined text-concept search text concept Image story Near-duplicate search browsing search search matching • story-based browsing • near-duplicate browsing concept feature automatic detection extraction story (text, video, segmentation Content Exploitation prosody) near-duplicate detection • multi-modal feature extraction • story segmentation video speech • semantic concept detection text

  3. Cue-X Information-theoretic Framework semantic clustering Y= search relevance Y=“demonstration” Y= story boundary semantic label = topic “Arafat” Information Bottleneck principle cluster cond. prob. (relevance to semantic label) … … ↑ cue-X clusters automatically discovered via Information Bottleneck principle & Kernel Density Estimation (KDE) low-level features

  4. News Story Segmentation in TRECVID 2005 • Cue-X framework effectively applied to discover salient features and achieve accurate story segmentation – Focus on visual and audio (prosody) features only – Without a priori manual selection of features – High accuracy across multi-lingual data sources • TRECVID 2005 – Dataset • 277 videos, 3 languages (ARB, CHN, and ENG), • 7 channels, 10+ different programs • Poor or missing ASR/MT transcripts – Accuracy on the validation set • Cue-X features + prosody features (no text features!) • ARB-0.87, CHN-0.84, and ENG-0.52 (F1 measure) – Results donated to whole TRECIVD 2005 community • Story boundary results available for download at http://www.ee.columbia.edu/dvmm/downloads/cuex_story.htm

  5. Enhancing Interactive Search Using Story Boundaries Story Query Shot Shot Shot Shot Shot Find shots of Pope John Paul second in other new s pope john paul the second w ill get his first look at the shroud of turin today that's the piece of linen many believe w as the burial cloth of jesus the round is on public display for the first time in tw enty years it has already draw n up million visitors the pope's visit to northw est italy has also included beatification services for three people the vatican says john paul is now the longest serving pope this century he has surpassed pope pious the tw elfth w ho served for nineteen years seven months and seven days • Stories define an intuitive unit Relative contributions from different search tools with coherent semantics • Story boundaries are effectively detected by Cue-X using audio- visual features • Improves text search by more than 100% in TRECVID 2005 automatic search • Major contributor to good performance of interactive video search

  6. Enhancing Semantic Concept Detection Performance Using Local Features and Spatial Context traditional Global or block-based features: Color Moment  Difficult to achieve robustness against background clutter …  Difficult to model object appearance variations Color Moment enhanced Part-based model : Part  Eliminate background clutter  Model part appearance more accurately Part relation  Model part relation more accurately

  7. Extracting Graphical Representations of Visual Content and Learning Statistical Models of Content Classes size; color; texture Graph spatial Representation relation of Visual Content Individual images Attributed Relational  Salient points, high entropy Graph (ARG) regions Statistics of attributes and relations machine learning Statistical Graph Representation of Model Random Attributed Relational Graph Collection of training images (R-ARG)

  8. Parts-based detector performance in TRECVID 2005 Avg. performance over all concepts • Parts-based detector Adding fixed feature consistently improves Parts-based Baseline by more than 10% for all concepts • It performs best for spatio-dominant concepts such as “US flag”. Spatio-dominant concepts: “US Flag” • It complements nicely Adding with the discriminant Parts-based SVM fixed feature classifiers using fixed Baseline features.

  9. Search Components: Detecting Image Near Duplicates (IND) Parts-based Stochastic Attribute Scene Change Relational Graph Learning Learning Learning Camera Pool Change Digitization Digitization Stochastic graph models the physics of Measure IND scene transformation likelihood ratio • Near duplicates occur frequently in multi-channel broadcast Duplicate detection is the single most • But difficult to detect due to diverse effective tool in our Interactive Search variations • Problem Complexity Similarity matching < IND detection < object recognition

  10. Concept Search Map text • Query queries to high- level feature Query Text Part-of-Speech Map to concepts Concept Space detection “Find shots of a Tags - keywords WordNet Resnik 39 dimensions road with one “road car” semantic Use human- • (1.0) road or more cars” similarity (0.1) fire defined (0.2) sports keywords from (1.0) car Concept Metadata …. concept Names and Definitions (0.6) boat definitions (0.0) person Measure Euclidean Distance • Documents semantic distance Subshots Concept Space between query Confidence for each concept 39 dimensions and concept (0.9) road (0.9) road Use detection (0.9) road • (0.9) road (0.1) fire (0.9) road (0.1) fire (0.1) fire (0.1) fire and reliability for (0.3) sports (0.1) fire (0.3) sports (0.3) sports Concept (0.3) sports (0.9) car Concept (0.3) sports (0.9) car subshot (0.9) car Models (0.9) car …. Reliability (0.9) car …. …. Simple SVM, …. documents (0.2) boat …. Expected AP (0.2) boat (0.2) boat Grid Color (0.2) boat (0.1) person for each (0.2) boat (0.1) person (0.1) person Moments, (0.1) person (0.1) person concept. Gabor Texture

  11. Concept Search Automatic - Can help for queries with related concepts “Find shots of boats.” “Find shots of a road with one or more cars.” Method AP Method AP Story Text .169 Story Text .053 CBIR .002 CBIR .009 Concept .115 Concept .090 Fused .195 Fused .095 Manual / Interactive Manual keyword selection allows more relationships to be found. Query Text Concepts Query Text Concepts “ Find shots of an office setting, i.e., one Office “ Find shots of a graphic map of Iraq, Map or more desks/tables and one or more location of Bagdhad marked - not a computers and one or more people ” weather map ” Query Text Concepts Query Text Concepts “ Find shots of one or more people Person, Find shots of people with banners or March or entering or leaving a building ” Building, signs protest Urban

  12. Cue-X Reranking by Pseudo-Labeling • Learn the recurrent relevant and irrelevant low-level patterns from the estimated pseudo-labels • Reorder shots by the smoothed cluster relevance Query: (4 rank clusters by “ AL clinic bombing ” ) cue-X clustering (2 Text + ) Search (1) + (5 - OKAPI text query + ) - Yahoo rank within-cluster + - Google features by + density prob. (3 … … ) pseudo-label, - estimated from random variable: Y rough search - results (e.g., text use only - search scores), - user feedbacks, low-level feature: X etc.

  13. Effect of Cue-X Reranking in Video Search • Improvement over story-based text search (in automatic search TRECVID 2005) – 17% in MAP, 46% in soccer (171), 36% in helicopter (158), 32% in Blair (153), 28% in Abbas (154), etc. – No external search examples provided but discovered automatically topic: soccer (171) reranked results text search (“goal soccer match” ) 46% ↑ topic: Blair (153) reranked results text search (“tony blair” ) 32% ↑

  14. Automatic Discovery of Key: Video Multimodal Query Classes Text Audio Search Performance Query Semantics • Distinct query classes use customized fusion strategies Find Person A Manually • How to automatically discover defined Find Person B query query classes? classes • When and how does each Find Person C modality help for each query? Find Event D • Existing methods: define query classes using human Find Event E Automatic knowledge. Joint semantics- • New method: discover queries Find Object F performanc e grouping according to performance and semantics of searches. Find Object G

  15. Auto. Discovered Named persons Query Clusters • Learned over a large query topic pool Named • Text search and objects person-X – named persons sports • Image search – named objects , – sports , and – generic scene classes Google • Automated term expansion expansion – Google class for cats, birds and Generic airport terminals. scenes

Recommend


More recommend