1 TRECVID 2016 TRECVID-2016 Concept Localization : Overview George Awad National Institute of Standards and Technology Dakota Consulting, Inc
2 TRECVID 2016 • Goal • Make concept detection more precise in time and space than current shot-level evaluation. • Encourage context independent concepts design to increase their reusability. • Task set up • For each of the 10 new test concepts, NIST provided set of ≈ 1000 shots. • Any shot may or may not contain the target concept. • Task • For each I-Frame within the shot that contains the target, return the x,y coordinates of the (UL,LR) vertices of a bounding rectangle containing all of the target concept and as little more as possible. • Systems were allowed to submit more than 1 bounding box per I- frame but only the ones with maximum f-score were scored.
3 TRECVID 2016 10 New evaluated concepts Non action concepts New action concepts Animal Bicycling Boy Dancing Baby Instrumental_musician Running Sitting_down Skier Explosion_fire
4 TRECVID 2016 NIST Evaluation framework • Testing data • IACC.2.A-C (600 h, used between 2013 to 2015 in semantic indexing task). • About 1000 shots per concept were sampled from the ground truth (with true positive (TP) clips of max = 300, avg = 178, min = 12). • Total of 9 587 shots and 2 205 140 i-frames were distributed to systems. • Human assessors were given all the i-frames (total of 55 789 images) of all TP shots to create the ground truth (drawing bounding box around the concept if it exists). • Human assessors had to watch the video clips of the images to verify the concepts.
5 TRECVID 2016 Evaluation metrics • Temporal localization: precision, recall and f-score based on the judged I-frames. • Spatial localization: precision, recall and f-score based on the located pixels representing the concept. • An average of precision, recall and f-score for temporal and spatial localization across all I-frames for each concept and for each run.
6 TRECVID 2016 Participants (Finishers: 3 out of 21) • 3 teams submitted 11 runs • TokyoTech (4 runs) • Tokyo Institute of Technology • NII_Hitachi_UIT (3 runs) • National Institute of Informatics; Hitachi, Ltd; University of Information Technology • UTS_CMU_D2DCRC (4 runs) • University of Technology, Sydney; Carnegie Mellon University; D2DCRC
7 TRECVID 2016 Temporal localization results by run (sorted by F-score) Mean per run across all concepts 1 I-frame F-score 0.9 0.8 I-frame Precision 0.7 I-frame Recall 0.6 0.5 0.4 0.3 0.2 0.1 0
Mean per run across all concepts Mean per run across all concepts 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 1 0.2 0.4 0.6 0.8 TRECVID 2016 CCNY_sub1.result.txt 0 1 CCNY_sub2.result.txt CCNY_sub3.result.txt CCNY_sub4.result.txt insightdcu.DCU_Loc MediaMill_Qualcomm MediaMill_Qualcomm MediaMill_Qualcomm 2013 MediaMill_Qualcomm PicSOM.PicSOM_LO 2015 PicSOM.PicSOM_LO PicSOM.PicSOM_LO PicSOM.PicSOM_LO TokyoTech.run_tokyo TokyoTech.run_tokyo TokyoTech.run_tokyo TokyoTech.run_tokyo Trimps_1.txt Trimps_2_NEG_04.tx Mean per run across all concepts Trimps_3_NEG_NOC 0.2 0.4 0.6 0.8 Trimps_3_NOC_015. 0 1 (mainly objects) 2016 (mainly action) >> 2013 & 2014 Temporal Localization results to systems to localize. ONLY TP shots were given 2014 8
9 TRECVID 2016 Spatial Localization results by run (sorted by F-score) Mean per run across all concepts 1 0.9 Harder than 0.8 temporal 0.7 localization 0.6 0.5 0.4 0.3 0.2 0.1 0 Mean Pixel F-score Mean Pixel Precision Mean Pixel Recall
Mean per run across all concepts 0.2 0.4 0.6 0.8 Mean per run across all concepts 0 1 TRECVID 2016 0.2 0.4 0.6 0.8 0 1 CCNY_sub1.result.txt CCNY_sub2.result.txt CCNY_sub3.result.txt CCNY_sub4.result.txt insightdcu.DCU_Loca MediaMill_Qualcomm 2013 MediaMill_Qualcomm MediaMill_Qualcomm MediaMill_Qualcomm 2015 PicSOM.PicSOM_LO PicSOM.PicSOM_LO PicSOM.PicSOM_LO PicSOM.PicSOM_LO TokyoTech.run_tokyot TokyoTech.run_tokyot TokyoTech.run_tokyot TokyoTech.run_tokyot Trimps_1.txt Mean per run across all concepts Trimps_2_NEG_04.tx 0.2 0.4 0.6 0.8 Trimps_3_NEG_NOC 0 1 Trimps_3_NOC_015.t 2016 (actions) ~ 2014 (objects) 2016 (actions) > 2013 (objects) Spatial Localization results to systems to localize. ONLY TP shots were given 2014 10
11 TRECVID 2016 Results per concept top 10 runs Spatial localization Temporal localization 1 1 Median Median 0.9 0.9 10 10 0.8 0.8 9 9 0.7 0.7 Mean F-score 8 F-score 0.6 8 0.6 0.5 0.5 7 7 0.4 0.4 6 6 0.3 0.3 5 5 0.2 0.2 4 4 0.1 0.1 3 3 0 0 2 2 1 1 Most concepts perform better in temporal compared to spatial localization A lot of resemblance between same concepts
12 TRECVID 2016 Results per concept across all runs Temporal localization Spatial localization 1 1 Mean Recall 0.9 0.9 baby 0.8 0.8 0.7 0.7 Recall Inst_musi 0.6 0.6 0.5 0.5 0.4 0.4 bicycling 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Precision Mean precision Many systems submi-ed a lot of non-target I-frames, while submitted bounding boxes few found a good balance. approximate the size of ground truth boxes and overlap with them. Many systems are good in finding the real box sizes.
13 TRECVID 2016 General Observations • Consistent observations in the last 4 years ü Temporal localization is easier than spatial localization. ü Systems report approximate G.T box sizes. • Performance of action/dynamic concepts are higher than object concepts tested in 2013 to 2014. • Assessment of action/dynamic concepts proved to be challenging in many cases to the human assessors. • Lower finishing% of teams compared to signups.
14 TRECVID 2016 Next team talks • TokyoTech • UTS_CMU_D2DCRC
Recommend
More recommend