Time Frame Baby Cake People P e o p l e
A very sparse semantical vector for frame: Emphasize primary object Vector sparser Overlook small size regional object Vocabulary larger
max pooling Only around 40% of Regional Discriminatory power of deep features Information left consistently improves
-Framework-1 -Framework-2
Frame Candidate Objects Selective search On average, each frame has 20 candidate object regions.
Observations - Possible reasons: -Alternative method :
VLAD
Spatial & temporal features clustering Candidate Objects Regional objects Selective search Deep features K-means & VLAD Zhongwen Xu, Yi Yang, Alexander G. Hauptmann (CVPR’15)
Spatial Pyramid Pooling Max pool filter: 7 X 7 6 X 6 5 X 5 2 X 2 Deep Feature Map Extraction Feature 50 descriptors Feature Map 7 X 7 Spatial Pyramid Pooling filter: 50 descriptors VLAD Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun (ECCV’14)
VLAD
PS-10Ex PS-100Ex 45 45 40 40 35 35 30 30 25 25 20 20 MED14-Test (mAP%) MED16-EvalSub MED16-EvalFull MED14-Test (mAP%) MED16-EvalSub MED16-EvalFull (MinfAP200%) (MinfAP200%) (MinfAP200%) (MinfAP200%) CNN-VLAD Object-VLAD CNN-VLAD Object-VLAD
PS-10Ex Concept-Bank_N2 40 Object-VLAD 38 Visual-System 36 (Concept-Bank_N2 + Object-VLAD) 34 32 30 28 26 24 22 20 MED14-Test (mAP%) MED16-EvalSub (MinfAP200%) MED16-EvalFull (MinfAP200%)
MED16-EvalFull-As- MED16-EvalFull (MinfAP200%) MED16-EvalSub (MinfAP200%) MED16-EvalFull (MinfAP200%) ProgressSubset (MinfAP200%) 50 45 41 45 40 39 43 45 35 41 37 30 39 40 35 25 37 20 35 33 35 15 33 31 31 10 29 30 29 5 27 27 0 Team2 VIREO Team3 Team4 Team5 Team6 Team7 Team8 Team9 Team10 Team11 25 25 25 VIREO Team2 Team3 Team4 Team2 VIREO Team3 Team4 Team2 VIREO Team3 Team4
MED16-EvalFull (MinfAP200%) MED16-EvalSub (MinfAP200%) 45 50 40 45 35 40 30 35 30 25 25 20 20 15 15 10 10 5 5 0 0 Team2 VIREO Team4 Team5 Team2 Team3 Team4 Team5 VIREO Team6 Team7 Team8 Team9
Recommend
More recommend