parts based concept detectors
play

Parts-based Concept Detectors Dong-Qing Zhang, Shih-Fu Chang, - PowerPoint PPT Presentation

TRECVI D 2005 Workshop Columbia University High-Level Feature Detection: Parts-based Concept Detectors Dong-Qing Zhang, Shih-Fu Chang, Winston Hsu, Lexin Xie, Eric Zavesky Digital Video and Multimedia Lab Columbia University (In collaboration


  1. TRECVI D 2005 Workshop Columbia University High-Level Feature Detection: Parts-based Concept Detectors Dong-Qing Zhang, Shih-Fu Chang, Winston Hsu, Lexin Xie, Eric Zavesky Digital Video and Multimedia Lab Columbia University (In collaboration with IBM Research in ARDA VACE II Project) S.F. Chang, Columbia U. 1

  2. data source and design principle � Multi-lingual multi-channel video data 277 videos, 3 languages (ARB, CHN, and ENG) � 7 channels, 10+ different programs � Poor or missing ASR/MT transcripts � � A very broad concept space over diverse content � object, site, people, program, etc � TV05 (10), LSCOM-Lite (39), LSCOM (449) � Concept detection in such a huge space is challenging � Need a principled approach � Take advantage of the extremely valuable annotation set � Data-driven learning based approach offers potential for scalability S.F. Chang, Columbia U. 2

  3. Insights from Samples: Object - flag Unique object appearance and structure � Some even fool the annotator � Variations in scale, view, appearance, number � Noisy labels � Sometimes contextual, spatial cues are helpful for detection � Speaker, stage, sky, crowd � S.F. Chang, Columbia U. 3

  4. Site/location � Again visual appearance and spatial structures very useful S.F. Chang, Columbia U. 4

  5. Activity/Event � Visual appearances capture the after effects of some events – smoke, fire � Sufficient cues for detecting occurrences of events � Other events (e.g., people running) need object tracking and recognition S.F. Chang, Columbia U. 5

  6. Motivation for Spatio-Appearance Models � Many visual concepts characterized by � Unique spatial structures and visual appearances of the objects and sites � joint occurrences of accompanying entities with spatial constraints � Motivate the deeper analysis of spatio- appearance models S.F. Chang, Columbia U. 6

  7. Spatio-Features: How to sample local features? traditional Color Block-based features: Moment � visual appearances of fixed blocks + block locations … � suitable for concepts with fixed spatial patterns Color Moment Support Vector Machine (SVM) Adaptive Sampling: Object Parts Part-based model : Part � Model appearance at salient points � Model part relations Part � Robust against occlusion, background, relation location change

  8. � Parts-based object detection paradigm also related to Human Vision System (HVS) Image Eye movement and Pre-attentive fixation to get retinal stage images in local regions Attentive Group retinal images into object stage [Rybak et al. 98’] object

  9. Our TRECVID 2005 Objectives � Explore the potential strengths of parts-based models in � detecting spatio-dominant concepts � fusing with traditional fixed features models � detecting other interesting patterns such as Near-Duplicates in broadcast news S.F. Chang, Columbia U. 9

  10. How do we extract and represent parts? Part-based representation Bag Gabor filter, PCA projection, Interest points Color histogram, Moments … Structural Graph Feature Extraction within local parts Segmented Regions Maximum Entropy Regions Attributed Relational Graph Part detection S.F. Chang, Columbia U. 10

  11. Representation and Learning size; color; texture Graph spatial Representation relation of Visual Content I ndividual images Attributed Relational � Salient points, high entropy Graph (ARG) regions Statistics of attributes and relations machine learning Statistical Graph Representation of Model Random Attributed Relational Graph Collection of training images (R-ARG)

  12. Learning Object Model Re-estimate Matching Probability Patch image cluster … � Challenge : Finding the correspondence of parts and computing matching probability are NP-complete � Solution : � Apply and develop advanced machine learning techniques – Loopy Belief Propagation (LBP), and Gibbs Sampling plus Belief Optimization (GS+ BO) (demo)

  13. Role of RARG Model: Explain object generation process � Generative Process : From object model to image Part-based Representation Object I nstance Object Model of I mage Background parts 6 1 5 4 3 2 Random ARG ARG ARG Sampling node occurrence Sampling background pdf and And node/edge features add background parts S.F. Chang, Columbia U. 13

  14. Object Detection by Random AG Binary detection problem : contain or not contain an object ? H= 1 H= 0 Likelihood ratio test : , O: input ARG Object likelihood : Correspondence 6 x 1 iu n = = = = n u ( | 1) ( | 1) ( | , 1) P O H P X H P O X H 4 i 1 n v X modeled by Association Graph 5 2 Random ARG for ARG object model n j 2 for image 3 3 4 Association Graph Probabilities computed by MRF � Likelihood ratio can be computed by � variational methods (LPB, MC) S.F. Chang, Columbia U. 14

  15. Extension to Multi-view Object Detection Challenge of multi-view object/scene detection � Objects under different views have different structures � Part appearances are more diverse Structure variation could be handled by Random ARG model (each view covered by a sub-graph) Shared parts are visible from different views S.F. Chang, Columbia U. 15

  16. Adding Discriminative Model for Multi-view Concept Detection Previous : Part appearance modeling by Gaussian distribution Now : Part appearance modeling by Support Vector Machine � Use SVM plus non-linear kernels to model diverse part appearance in multiple views � principle similar to boosting S.F. Chang, Columbia U. 16

  17. Evaluation in TRECVID 2005 17 S.F. Chang, Columbia U.

  18. Parts-based detector performance in TRECVID 2005 Avg. performance over all concepts Parts-based detector � Adding fixed feature consistently improves by Parts-based Baseline SVM more than 10% for all concepts It performs best for � spatio-dominant concepts such as “US flag”. Spatio-dominant concepts: “US Flag” It complements nicely � Adding with the discriminant fixed feature Parts-based classifiers using fixed Baseline SVM features.

  19. � Add text or change fusion models does not help Baseline SVM Add parts-based Relative contributions 19 S.F. Chang, Columbia U.

  20. Other Applications of Parts-Based Model: Detecting Image Near Duplicates (IND) Parts-based Stochastic Attribute Scene Change Relational Graph Learning Learning Learning Camera Pool Change Digitization Digitization Stochastic graph models the physics of Measure I ND scene transformation likelihood ratio Many Near-Duplicates in TRECVD 05 • Near duplicates occur frequently in multi-channel broadcast Duplicate detection is the single most • But difficult to detect due to diverse effective tool in our Interactive Search variations • Problem Complexity Similarity matching < IND detection < object recognition TRECVI D 05 I nteractive Search

  21. Near Duplicate Benchmark Set (available for download at Columbia Web Site) S.F. Chang, Columbia U. 21

  22. Examples of Near Duplicate Search 22 in TRECVID 05 S.F. Chang, Columbia U.

  23. Application: Concept Search Map text queries • Concept Search Query to concept detection Query Text Part-of-Speech Map to concepts Concept Space Use human- “Find shots of a Tags - keywords WordNet Resnik 39 dimensions • road with one or “road car” semantic similarity defined (1.0) road more cars” keywords from (0.1) fire concept (0.2) sports (1.0) car definitions Concept Metadata …. Names and Definitions (0.6) boat Measure • (0.0) person semantic Euclidean Distance distance Documents between query and concept Concept Space Subshots Confidence for each concept Use detection • 39 dimensions and reliability (0.9) road for subshot (0.9) road (0.9) road (0.9) road (0.1) fire (0.9) road (0.1) fire (0.1) fire documents (0.1) fire (0.3) sports (0.1) fire (0.3) sports (0.3) sports Concept (0.3) sports (0.9) car Model (0.3) sports (0.9) car (0.9) car Models (0.9) car …. (0.9) car Reliability …. …. Simple SVM, …. (0.2) boat …. Expected AP (0.2) boat (0.2) boat Grid Color (0.2) boat (0.1) person for each (0.2) boat (0.1) person (0.1) person Moments, (0.1) person (0.1) person concept. Gabor Texture

  24. Concept Search Automatic - help queries with related concepts “Find shots of boats.” “Find shots of a road with one or more cars.” Method AP Method AP Story Text .169 Story Text .053 CBIR .002 CBIR .009 Concept .115 Concept .090 Fused .195 Fused .095 Manual / Interactive Manual keyword selection allows more relationships to be found. Query Text Concepts Query Text Concepts “ Find shots of an office setting, i.e., one or Office “ Find shots of a graphic map of Iraq, Map more desks/tables and one or more location of Bagdhad marked - not a computers and one or more people ” weather map ” Query Text Concepts Query Text Concepts “ Find shots of one or more people entering Person, Find shots of people with banners or March or or leaving a building ” Building, signs protest Urban

Recommend


More recommend