Coloring Visual Codebooks Coloring Visual Codebooks for Concept Detection in Video for Concept Detection in Video Koen van de Sande Koen van de Sande Cees Snoek Cees Snoek Jan van Gemert Gemert Jan van Jasper Uijlings Uijlings Jasper Jan Jan- -Mark Mark Geusebroek Geusebroek Theo Gevers Gevers Theo Arnold Smeulders Smeulders Arnold University of Amsterdam University of Amsterdam MediaMill 18- 18 -11 11- -2008 2008 TRECVID workshop TRECVID workshop
Introduction Introduction Concept detection: Concept detection: � Machine learning based on image descriptors only Machine learning based on image descriptors only � In a real- -world video: world video: In a real � Large variations in viewing and lighting conditions Large variations in viewing and lighting conditions � � image description complicated image description complicated � How do changes in viewpoint and illumination How do changes in viewpoint and illumination conditions affect concept detection? conditions affect concept detection? 2 2
Lowe IJCV 2004 Mikolajczyk IJCV 2005 Viewpoint Changes Zhang IJCV 2007 Viewpoint Changes Marszalek VOC 2007 � Orientation and scale of object changes Orientation and scale of object changes � � Salient point methods robustly detect regions Salient point methods robustly detect regions � Dense sampling Harris-Laplace � INRIA INRIA- -LEAR (VOC 2007 winner): preferred for LEAR (VOC 2007 winner): preferred for � concept detection accuracy are concept detection accuracy are � Harris Harris- -Laplace salient points Laplace salient points � � Dense sampling Dense sampling � 3 3
Concept Detection Stages Concept Detection Stages Spatio ‐ Visual Kernel ‐ Codebook temporal feature based transform sampling learning extraction 4 4
Lazebnik CVPR 2006 Marszalek VOC 2007 Spatio- -Temporal Sampling Temporal Sampling Spatio � Spatial pyramid Spatial pyramid � � 1x1 1x1 whole image whole image � � 2x2 2x2 image quarters image quarters � � 1x3 1x3 horizontal bars horizontal bars � � Temporal analysis of up to 5 frames per shot Temporal analysis of up to 5 frames per shot � Harris ‐ Laplace multi ‐ Spatial frame pyramid Dense sampling 5 5
van de Weijer PAMI 2006 Bosch CIVR 2007 Illumination Changes Burghouts CVIU 2008 Illumination Changes van de Sande CVPR 2008 Concept detection suffers from unstable region description Concept detection suffers from unstable region description SIFT descriptor: SIFT descriptor: Most well- Most well -known known � � State- State -of of- -the the- -art performance art performance � � Intensity- -based descriptor: based descriptor: no color Intensity no color � � Proposed color descriptors: Proposed color descriptors: HueSIFT, HSV , HSV- -SIFT, SIFT, OpponentSIFT OpponentSIFT, C , C- -SIFT, SIFT, rg SIFT HueSIFT rg SIFT � � Increase discriminative power Increase discriminative power � � Increase illumination invariance Increase illumination invariance � � Research questions Research questions What are the properties of these color descriptors? What are the properties of these color descriptors? � � How do they perform? How do they perform? � � See the evaluation in our CVPR 2008 paper See the evaluation in our CVPR 2008 paper � � 6 6
Example: light color change Example: light color change Transformed color SIFT descriptor is invariant Transformed color SIFT descriptor is invariant 7 7
Von Kries 1970 Finlayson ICIP 2005 Invariance properties: Diagonal model Invariance properties: Diagonal model Lambertian Lambertian reflectance model reflectance model Corresponds to diagonal- Corresponds to diagonal -offset model of illumination change offset model of illumination change Illuminant parameters Canonical illuminant Unknown illuminant Unified framework for modeling: Unified framework for modeling: Shadows Shadows � � Shading Shading � � Light color changes Light color changes � � Highlights Highlights � � Scattering Scattering � � 8 8
van de Sande CVPR 2008 Color Descriptor Taxonomy Color Descriptor Taxonomy Invariance properties of the descriptors used Light Light Light intensity Light color intensity intensity change and Light color change and change shift shift change shift SIFT + + + + + Descriptor OpponentSIFT +/- + +/- +/- +/- C-SIFT + + + +/- +/- rg SIFT + + + +/- +/- Transformed + + + + + color SIFT 9 9
Invariant Visual Descriptors Invariant Visual Descriptors Color SIFT: Color SIFT: � Intensity Intensity- -based SIFT based SIFT � � OpponentSIFT OpponentSIFT � � C C- -SIFT SIFT � rg SIFT SIFT � rg � � Transformed color SIFT Transformed color SIFT � Add color, but also keep intensity information Add color, but also keep intensity information relative +8% Visual Descriptors MAP on TV2007test Intensity SIFT 0,144 Intensity SIFT 0,144 5x Color SIFT 5x Color SIFT 0,155 0,155 TV2007test results: TV2007test results: � Trained on TRECVID2007 development set Trained on TRECVID2007 development set � � Evaluated on TRECVID2007 test set Evaluated on TRECVID2007 test set � � TRECVID2007 development + test = 2008 development TRECVID2007 development + test = 2008 development � 10 10
Concept Detection Stages Concept Detection Stages Spatio ‐ Visual Kernel ‐ Codebook temporal feature based model sampling learning extraction 11 11
Visual Codebook Model Visual Codebook Model Cluster Dense+OpponentSIFT Feature vector (length 4000) Assign Codebook consists of Codebook consists of codewords codewords � � Constructed with k- Constructed with k -means clustering on descriptors means clustering on descriptors � � We use 4,000 We use 4,000 codewords codewords per codebook per codebook � � 12 12
van Gemert ECCV 2008 ● Codeword Codebook Assignment Codebook Assignment Soft assignment using Gaussian kernel Soft assignment using Gaussian kernel Soft assignment Hard assignment relative +7% Assignment MAP on TV2007test Hard 0,155 Hard 0,155 Soft 0,166 Soft 0,166 13 13
Codebook Library Codebook Library Codebook Sampling method Descriptor Construction Assignment Codebook Sampling method Descriptor Construction Assignment #1 Dense OpponentSIFT K- -means means Soft #1 Dense OpponentSIFT K Soft #2 Harris- -Laplace Laplace SIFT Radius- -based based Soft #2 Harris SIFT Radius Soft #3 Dense SIFT K- -means means Hard #3 Dense rg SIFT K Hard rg … … Dense Dense C- C -SIFT SIFT K K- -means means Hard Hard Single codebook depends on Single codebook depends on � Sampling method Sampling method � � Descriptor Descriptor � � Codebook construction method Codebook construction method � � Codebook assignment Codebook assignment � Codebook library is… … Codebook library is � a configuration of several codebooks a configuration of several codebooks � 14 14
Codebook Library Codebook Library (cont (cont’ ’d) d) For a frame: For a frame: Each codebook in the library has feature vector of length 4,000 Each codebook in the library has feature vector of length 4,000 � � Final feature vector is concatenation (4 books ~ length 16,000) Final feature vector is concatenation (4 books ~ length 16,000) � � Spatial pyramid adds more dimensions: Spatial pyramid adds more dimensions: � � •1x1 1x1 4,000 • 4,000 •2x2 2x2 16,000 • 16,000 •1x3 1x3 12,000 • 12,000 Feature vector length easily >100,000… … Feature vector length easily >100,000 � � 15 15
SVM kernel trick: precompute precompute kernel SVM kernel trick: kernel SVM learning does not need feature vectors SVM learning does not need feature vectors SVM learning needs distance between vectors only: SVM learning needs distance between vectors only: - γ dist( , ) K( , ) = e Very large decrease in computation time Very large decrease in computation time � Precompute Precompute the SVM kernel matrix the SVM kernel matrix � � Long vectors possible: only need 2 in memory at once Long vectors possible: only need 2 in memory at once � � Parameter optimization re Parameter optimization re- -uses uses precomputed precomputed matrix matrix � 16 16
Impact of annotations Impact of annotations Ours = common annotation effort + ICT- -CAS + verifying positives CAS + verifying positives Ours = common annotation effort + ICT Codebook library Ours* Common ann. effort* (type B) (type A) 3x Color SIFT 0,152 0,152 3x Color SIFT 0,152 0,152 5x Color SIFT 0,155 0,155 5x Color SIFT 0,155 0,155 *MiAP on TV2008test Add a digit… … Add a digit Codebook library Ours* Common ann. effort* 3x Color SIFT 0,1516 0,1521 3x Color SIFT 0,1516 0,1521 5x Color SIFT 5x Color SIFT 0,1548 0,1548 0,1549 0,1549 On average, didn’ ’t help t help On average, didn 17 17
Concept Detection Stages Concept Detection Stages Spatio ‐ Visual Kernel ‐ Codebook temporal feature based model sampling learning extraction 18 18
Recommend
More recommend