PKU@TRECVID2009: Single-Actor and Pair-Activity Event Detection in Surveillance Video General Coach: Wen Gao a , Xihong Wu b , Tiejun Huang a Executive Coach: Yonghong Tian a , Yaowei Wang a , Lei Qing a Member: Zhipeng Hu a* , Guangnan Ye b* , Guochen Jia a , Xibin Chen b , Qiong Hu c , Kaihua Jiang b a National Engineering Laboratory for Video Technology, Peking University b Speech and Hearing Research Center, Peking University c Key Lab of Intel. Inf. Proc., Institute of Computing Technology, Chinese Academy of Sciences
Outline Overview Introduction of TRECVID-ED Tasks Summary of TRECVID-ED 2008 Our Results in TRECVID-ED 2009 Our Solution in the eSur System Background Modeling Detection and Tracking Event Classification Post-processing Illustrative Results Summary 2
Overview of TRECVID-ED Tasks Task To develop an automatic system to detect observable events in surveillance video Ten Events Challenges PeopleMeet Clutter scenes PeopleSplitUp Illumination Embrace variations ElevatorNoEntry Occlusion PersonRun Different camera views CellToEar No clear event ObjectPut definition TakePicture Pointing OpposingFlow 3
The Best Results of 2008 SITEID Event #Ref #Sys #CorDet #FA #Miss Act.DCR IFP-UIUC-NEC CellToEar 349 15 1 14 348 0.999 Intuvision ElevatorNoEntry 0 8 0 8 0 NA DCU Embrace 401 36193 91 5091 310 1.271 IFP-UIUC-NEC ObjectPut 1944 83 6 77 1938 1.004 Intuvision OpposingFlow 12 31 9 12 3 0.251 SJTU PeopleMeet 1182 25033 270 5779 912 1.337 CMU PeopleSplitUp 671 42415 185 42230 486 4.856 MCG-ICT-CAS PersonRuns 314 662 23 639 291 0.989 SJTU Pointing 2316 1005 35 970 2281 1.080 Intuvision TakePicture 23 10 0 10 23 1.000 Note: There are much rooms for improvement. OpposingFlow event has good detection performance. ElevatorNoEntry and TakePicture events are zero CorDets. 4
Approaches in 2008 PeopleMeet (SJTU): Camshift guided particle filter + HMM Combine Head top detector and human detector Camshift guided particle filter to obtain trajectory HMM models to detect hidden states defined by trajectory features. PeopleSplitUp (CMU): Key points + SVM Cluster interest points into visual keywords SVM classifiers to detect activities Event segmentation was done in a multi-resolution framework, where all activity durations found in training were tried. Embrace (DCU): Pedestrian tracking in 3D space Detect and track pedestrians to infer the 3D location Calculate the probability of person taking part in Embrace evens. PersonRuns (ICT): Data correlation + trajectory features Train full-body and head-shoulder detectors using standard haar-like features P. Yarlagadda, et. al, INTUVISION EVENT DETECTION SYSTEM FOR TRECVID 2008 Adopt the data correlation method with the visual features to track objects P. Wilkins, et al. Dublin City Event detection by trajectory length, location of trajectory points and speed. University at TRECVID 2008 ElevatorNoEntry (INTUVISION): Pedestrian detection + histogram matching X. Yang, et al., Shanghai Jiao Tong University participation in high-level feature J.B. Guo et. al, TRECVID 2008 Event Detection By MCG-ICT-CAS extraction,automatic search and surveillance event detection at TRECVID 2008 Haar object pedestrian detection Histogram matching to find person not entering an elevator …… 5 A. Hauptmann et al. Informedia @ TRECVID2008: Exploring New Frontiers
Our Results in TRECVID-ED2009 (1) Event #Ref #Sys #CorDet #FA #Miss Act. DCR p-eSur_1 1.023 PeopleMeet 449 125 7 118 442 PeopleSplitUp 187 198 7 191 180 1.025 1.020 Embrace 175 80 1 79 174 0.334 ElevatorNoEntry 3 4 2 2 1 p-eSur_2 Event #Ref #Sys #CorDet #FA #Miss Act. DCR PeopleMeet 449 210 15 195 434 1.030 PeopleSplitUp 187 881 14 867 173 1.209 Embrace 175 164 3 161 172 1.036 PersonRuns 107 356 5 351 102 1.068 p-eSur_3 Event #Ref #Sys #CorDet #FA #Miss Act. DCR PeopleMeet 449 210 15 195 434 1.030 PeopleSplitUp 187 881 14 867 173 1.209 Embrace 175 164 3 161 172 1.036 ElevatorNoEntry 3 0 0 0 3 1.000 6
Our Results in TRECVID-ED2009 (2) Compared with the best results in TRECVID-ED 2008 Directly on the reported results in terms of Act. DCR Event Our Best Best 2008 Imp. PeopleMeet 1.023 1.337 -0.314 PeopleSplitUp 1.025 4.856 -3.831 Note: Our results are evaluated on the ED Embrace 1.020 1.271 -0.251 2009 data by 2009 DCR metric, while the ElevatorNoEntry 0.334 N/A - 2008 best results are evaluated on the ED PersonRuns 1.068 0.989 +0.079 2008 data by 2008 DCR metric. On the TRECVID-ED 2008 data in terms of 2008 Act. DCR Event Our Best Best 2008 Imp. PeopleMeet 1.245 1.337 -0.092 PeopleSplitUp 1.976 4.856 -2.880 Embrace 1.208 1.271 -0.063 ElevatorNoEntry 0.130 N/A - 1.249 PersonRuns 0.989 +0.260 7
What are Improved? What? 1. Effectively reduce the false alarms of detection 2. Obtain comparable detection accuracy, and much better results for ElevatorNoEntry Why? 1. Adaptive background modeling 2. Effective human detection and tracking 3. Ensemble of one-vs.-all SVM and automata-based classifiers 4. Effective event merging and post-processing 8
Our Solution : Treatments for Different Event Categories Pair-activity Event: One people interact with another people Single-actor Event: No interaction with other people Retrospective event detection Pair-activity Single-actor events events People Elevator PeopleMeet Embrace PersonRuns SplitUp NoEntry 9
Our eSur Framework for TRECVID-ED Camera Classification Feature Extraction Body Detection Background Subtraction Head-Shoulder Detection Post- Processing Object Tracking Events Merging Feature Extraction One VS All SVM Automata
Our Solution (1) : Background Modeling Mixture of Gaussian (MoG): To accurately extract the foreground while effectively decreasing detection false alarms. Block-wise PCA Model: To identify which camera the video belongs to Also used in the ElevatorNoEntry event detection. “block” : segment each frame into blocks “wise” : adaptively select the principle component for background reconstruction 11
MoG Key Idea Randomly select 1000 frames from each camera Manually label the foreground objects Use EM algorithm to estimate the model Results of Background Reconstruction Cam1 Background Cam2 Background Cam3 Background Cam5 Background Disadvantage: Computation time-consuming 12
Block-wise PCA General PCA Model a whole frame Problems high spatio-temporal computation complexity high miss ratio (especially for static objects). Block-wise PCA Segment a frame into blocks, and model each block respectively. Lower spatio-temporal computation complexity Adaptively select principle component by the MMSE to the mean background Lower miss ratio and less block effect. 2 = − = φφ T B argmin I B B I B i i i i i φ where I is the trained mean background, is the i th principle i component and is the ith reconstructed background B 13 i
Comparative Results Blocking vs. No Blocking Method No-blocking Blocking Training time 361.332s 150.406s (for 300 frames) * Experiment platform : Intel Xeon E5410 2.33GHz , 8G Result with no blocking Result with blocking Block PCA vs. Block-wise PCA original image Block-wise PCA Block PCA 14
Our Solution (2) : Detection and Tracking Detection: Histogram of oriented gradients (HOG) for both whole body and head-shoulder Tracking: Online boosting Forward and backward tracking Combining color similarity to reduce drift 15
HOG Detector Fusion of Head-shoulder and Body detection Adjust the detector searching scales
Detection Results 17
Tracking Process Frame 1 . 2. 3. 4. … Forward Tracking Backward Tracking … Combined Result: Expected Target : Detection Result : Canceled : Expected Path : Final Path : 18
State Machine of Tracking D : Detection existence ND: No detection results P : Online boosting prediction result NH: Not human, drifting happens H : No drifting S : Online boosting and detection results are similar U : Online boosting and detection results aren’t similar S H Head –shoulder Start Prediction and Body Detection D P Start U NH ND End 19
Detection and Tracking Results Detection Results Tracking Results 20
Drift Reduction by Color Similarity Problem: Drifting Solution: Combine color similarity to refine tracking results Tracking Result without Color Similarity Comparison Tracking Result with Color Similarity Comparison 21 [CLICK FOR PLAY]
Our Solution (3) : Events Detection - Pair-activity Event Analysis using key frames Key Frames: Frames characterize an event happening “PeopleMeet” and “Embrace” At the end of the event “PeopleSplitUp” At the beginning of the event PeopleMeet Embrace PeopleSplitUp 22
Recommend
More recommend