Concept Detection: Concept Detection: Convergence to Local Features Convergence to Local Features and Opportunities Beyond and Opportunities Beyond Shih ‐ Fu Chang 1 , Junfeng He 1 , Yu ‐ Gang Jiang 1,2 , Elie El Khoury 3 , Chong ‐ Wah Ngo 2 , Akira Yanagawa 1 , Eric Zavesky 1 g g g y 1 DVMM Lab, Columbia University , y 2 City University of Hong Kong 3 IRIT, Toulouse, France IRIT, Toulouse, France TRECVID 2008 workshop, NIST
Overview: 5 components & 6 runs Overview: 5 components & 6 runs Classifiers 5 components 6 6 Local Feature 5 Global Feature Global Feature 4 SVM 374-d fea. CU ‐ VIREO374 CU VIREO374 3 Web Images Web Images 1 4 2 Face & Audio Filtering 2
Overview: overall performance Overview: overall performance TRECVID 2008 Type ‐ A Submissions (161) TRECVID 2008 Type A Submissions ( ) 0.18 0.16 on ge Precisio 0.14 0 14 0.12 0.1 ean Averag 0.08 0.06 Me 0.04 0.02 0 0 – Local feature alone already achieves near top performance – Every other component contributes incrementally to the final detection 3
Overview: per ‐ concept performance Overview: per ‐ concept performance 0.4 CU_2_run4+face&audio CU_4_run5+cu ‐ vireo374 0.35 CU_5_local_global CU_6_local_only 0.3 MEAN MAX 0.25 0.2 0.15 0.1 0.05 0 4
Outline Outline Classifiers Classifiers 5 components 5 components 6 6 Local Feature 5 Global Feature Global Feature 4 SVM 374 ‐ d fea. 374 ‐ d fea. CU ‐ VIREO374 CU VIREO374 3 Web Images Web Images 1 4 2 Face & Audio Filtering 5
Bag ‐ of ‐ Visual ‐ Words (BoW) Bag ‐ of ‐ Visual ‐ Words (BoW) 6
Representation Choices of BoW Representation Choices of BoW • Word weighting scheme d h h – How to weight the importance of a word to an How to weight the importance of a word to an image? • Spatial information Spatial information – Are the spatial locations of keypoints useful? 7
Weighting Scheme Weighting Scheme • Traditional… d l – Binary, Term frequency (TF), inverse document frequency y, q y ( ), q y (IDF)… • Our method • Our method – soft weighting soft weighting ‐‐ Assign a keypoint to multiple visual Assign a keypoint to multiple visual words ‐‐ weights are determined by keypoint ‐ to ‐ word similarity Details in: Jiang et al. CIVR 2007. 8 Image from http://www.cs.joensuu.fi/pages/franti/vq/lkm15.gif
Vocabulary Size & Weighting Scheme Vocabulary Size & Weighting Scheme TRECVID 2006 Test Data 0.12 Binary TF TF ‐ IDF Soft 0.1 ion age Precisi 0.08 0.06 Mean Avera 0.04 M 0.02 0 500 1,000 5,000 10,000 Vocabulary Size – Soft weighting • Improve TF by 10% ‐ 20% – More accurate to assess the importance of a keypoint 9
Spatial Information Spatial Information • Partition image into equal ‐ sized regions l d • Concatenate BoW features from the regions g – Poor generalizability F = ( f 11 , f 12 , f 13 , f 21 , f 22 , f 23 , f 31 , f 32 , f 33 ) 10
Spatial Information Spatial Information TRECVID 2006 Test Data (soft ‐ weighting) C 006 est ata(so t e g t g) 0.14 1×1 region 2×2 regions 3×3 regions 4×4 regions sion 0.12 age precis 0.10 0.08 an avera 0.06 0.04 Mea 0.02 0.00 500 1000 5000 10000 Vocabulary size – Spatial Information does not help much for p p concept detection • 2x2 is a good choice • 3x3 and 4x4 may cause mismatch problem 11
Local Feature Representation Framework Local Feature Representation Framework • a K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, L. Van Gool, “A comparison of affine region detectors”, IJCV, vol. 65, pp. 43 ‐ 72, 2005. 12
Internal Results – Local Features Internal Results – Local Features • Over TRECVID 2008 Test Data O C 2008 MAP: 13% 13% Similar! 0.16 0.157 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 1x1 (3 detectors) 1x1 (2 detectors) 2x2 1x3 Run6: Fusion 13
Failure Cases ‐ I Failure Cases ‐ I misses misses • Flower – Small visual area Small visual area – Coloration/texture too similar to background scene t b k d • Possible Solutions – Color ‐ descriptor – Color ‐ descriptor – Class ‐ specific visual words 14
Failure Cases ‐ II Failure Cases ‐ II misses misses i • Boat_Ship, Airplane_flying – Learning biased by background scene – Difficulty from occlusion • Possible Solution – Feature selection F t l ti 15
Summary – Local Features Summary – Local Features • BoW with good representation choices achieved very impressive performance y p p • Soft ‐ weighting is very effective • Multiple spatial layouts are useful l l l l f l • Multi ‐ detectors do not help much p • Rooms for future improvement • Class ‐ specific visual words, feature selection, color ‐ descriptor etc. 16
Outline Outline Classifiers Classifiers 5 components 5 components 6 6 Local Feature 5 Global Feature Global Feature 4 SVM 374 ‐ d fea. 374 ‐ d fea. CU ‐ VIREO374 CU VIREO374 3 Web Images Web Images 1 4 2 Face & Audio Filtering 17
Global Features Global Features • Grid based Color Moments (225 d) • Grid ‐ based Color Moments (225 ‐ d) • Wavelet texture (81 ‐ d) 0.4 A_ CU_6_local_only_6 A_ CU_5_local_global_5 0.35 0.3 0.25 0.2 0 2 0.15 0.1 0.05 0 18
Outline Outline Classifiers Classifiers 5 components 5 components 6 6 Local Feature 5 Global Feature Global Feature 4 SVM 374 ‐ d fea. 374 ‐ d fea. CU ‐ VIREO374 CU VIREO374 3 Web Images Web Images 1 4 2 Face & Audio Filtering 19
CU ‐ VIREO374 CU ‐ VIREO374 • Fusion of Columbia374 and VIREO374 Fusion of Columbia374 and VIREO374 Feature Dimension Grid ‐ based color moment (LUV) 225 Columbia374 Gabor Texture 48 Edge Direction Histogram 73 Bag of visual words (soft weighting) Bag ‐ of ‐ visual ‐ words (soft weighting) 500 500 VIREO374 Grid ‐ based Color Moment ( Lab ) 225 Grid ‐ based Wavelet Texture 81 Performance of CU ‐ VIREO374 over TRECVID 2006 Test Data CU ‐ VIREO374 VIREO374 Columbia374 Scores on the TRECVID2008 corpora: http://www.ee.columbia.edu/ln/dvmm/CU ‐ VIREO374/ Yu ‐ Gang Jiang, Akira Yanagawa, Shih ‐ Fu Chang, Chong ‐ Wah Ngo, "Fusing Columbia374 and VIREO ‐ 374 for Large 20 Scale Semantic Concept Detection", Columbia University ADVENT Technical Report #223 ‐ 2008 ‐ 1, Aug. 2008.
Concept Fusion Using CU ‐ VIREO374 Concept Fusion Using CU ‐ VIREO374 • Train a SVM for each concept f h – Using CU ‐ VIREO374 scores as features Using CU VIREO374 scores as features TRECVID 2008 Test Data 2.2% 2.2% 0.18 on 0.16 0 16 age Preciso 0.14 0.12 0.1 0.08 0.08 Mean Avera 0.06 0.04 0.02 0 M CU ‐ VIREO374 Run5 Run4: run5+CU ‐ VIREO374 Run5: Local+global – Performance improvement is merely 2% Performance improvement is merely 2% • Need a better concept fusion model! 21
Outline Outline Classifiers Classifiers 5 components 5 components 6 6 Local Feature 5 Global Feature Global Feature 4 SVM 374 ‐ d fea. 374 ‐ d fea. CU ‐ VIREO374 CU VIREO374 3 Web Images Web Images 1 4 2 Face & Audio Filtering 22
Exploring External Images from Web Exploring External Images from Web • Problem bl – Sparsity of positive data Concept Name # Positive shots Concept Name # Positive shots Classroom 224 Harbor 195 Bridge 158 Telephone 184 Emergency_Vehicle 88 Street 1551 D Dog 122 122 Demonstration/Protest D t ti /P t t 134 134 Kitchen 250 Hand 1515 Airplane_flying 72 Mountain 239 Two_people 3630 Nighttime 424 Bus 87 Boat_Ship 437 Driver D i 258 258 Fl Flower 582 582 Cityscape 288 Singing 366 Total # of shots in TV’08 Dev: 36 262 Total # of shots in TV 08 Dev: 36,262 23
Challenging Issues Challenging Issues • How to make use of the large amount of “noisily k f h l f “ i il labeled” web images for concept detection? – Issue 1: filter the false positive samples Flickr Images Flickr Images Good Good Bad Bad 24
Challenging Issues Challenging Issues • How to make use of the large amount of “noisily k f h l f “ i il labeled” web images for concept detection? – Issue 1: filter the false positive samples – Issue 2: overcome the cross ‐ domain problem Issue 2: overcome the cross domain problem Flickr Flickr TRECVID TRECVID 25
Preliminary Results Preliminary Results • Web image set: 18,000 from Flickr b i 8 000 f li k – Issue 1: filter the false positive samples • Graph based semi ‐ supervised learning – Issue 2: overcome the cross ‐ domain problem p • Weighted SVM • Results Results 0.3 A_CU_Run ‐ 4 0.25 – MAP: no difference C_CU_Run ‐ 3 0.2 (Bug Free) – “Bus”: improve 50% “ ” i 0% 0.15 0.1 • Open Problem! 0.05 0 26
Outline Outline Classifiers Classifiers 5 components 5 components 6 6 Local Feature 5 Global Feature Global Feature 4 SVM 374 ‐ d fea. 374 ‐ d fea. CU ‐ VIREO374 CU VIREO374 3 Web Images Web Images 1 4 2 Face & Audio Filtering 27
Face Detection and Tracking Face Detection and Tracking • Face Detection (OpenCV Toolbox) i (O C lb ) • Tracking based on face location and skin color g Backward Backward Forward Forward Character 1 tracking tracking Pt1 x x Pt2 Start Frame End Frame ... Pt1 Pt1 Pt1 Pt1 Face Detection Pt2 Pt2 Tracking 28
Recommend
More recommend