pku trecvid2009 single actor and pair activity event
play

PKU@TRECVID2009: Single-Actor and Pair-Activity Event Detection in - PowerPoint PPT Presentation

PKU@TRECVID2009: Single-Actor and Pair-Activity Event Detection in Surveillance Video General Coach: Wen Gao a , Xihong Wu b , Tiejun Huang a Executive Coach: Yonghong Tian a , Yaowei Wang a , Lei Qing a Member: Zhipeng Hu a* , Guangnan Ye b* ,


  1. PKU@TRECVID2009: Single-Actor and Pair-Activity Event Detection in Surveillance Video General Coach: Wen Gao a , Xihong Wu b , Tiejun Huang a Executive Coach: Yonghong Tian a , Yaowei Wang a , Lei Qing a Member: Zhipeng Hu a* , Guangnan Ye b* , Guochen Jia a , Xibin Chen b , Qiong Hu c , Kaihua Jiang b a National Engineering Laboratory for Video Technology, Peking University b Speech and Hearing Research Center, Peking University c Key Lab of Intel. Inf. Proc., Institute of Computing Technology, Chinese Academy of Sciences

  2. Outline  Overview  Introduction of TRECVID-ED Tasks  Summary of TRECVID-ED 2008  Our Results in TRECVID-ED 2009  Our Solution in the eSur System  Background Modeling  Detection and Tracking  Event Classification  Post-processing  Illustrative Results  Summary 2

  3. Overview of TRECVID-ED Tasks  Task  To develop an automatic system to detect observable events in surveillance video  Ten Events  Challenges  PeopleMeet  Clutter scenes  PeopleSplitUp  Illumination  Embrace variations   ElevatorNoEntry Occlusion   PersonRun Different camera  views CellToEar   No clear event ObjectPut definition  TakePicture  Pointing  OpposingFlow 3

  4. The Best Results of 2008 SITEID Event #Ref #Sys #CorDet #FA #Miss Act.DCR IFP-UIUC-NEC CellToEar 349 15 1 14 348 0.999 Intuvision ElevatorNoEntry 0 8 0 8 0 NA DCU Embrace 401 36193 91 5091 310 1.271 IFP-UIUC-NEC ObjectPut 1944 83 6 77 1938 1.004 Intuvision OpposingFlow 12 31 9 12 3 0.251 SJTU PeopleMeet 1182 25033 270 5779 912 1.337 CMU PeopleSplitUp 671 42415 185 42230 486 4.856 MCG-ICT-CAS PersonRuns 314 662 23 639 291 0.989 SJTU Pointing 2316 1005 35 970 2281 1.080 Intuvision TakePicture 23 10 0 10 23 1.000  Note:  There are much rooms for improvement.  OpposingFlow event has good detection performance.  ElevatorNoEntry and TakePicture events are zero CorDets. 4

  5. Approaches in 2008  PeopleMeet (SJTU): Camshift guided particle filter + HMM  Combine Head top detector and human detector  Camshift guided particle filter to obtain trajectory  HMM models to detect hidden states defined by trajectory features.  PeopleSplitUp (CMU): Key points + SVM  Cluster interest points into visual keywords  SVM classifiers to detect activities  Event segmentation was done in a multi-resolution framework, where all activity durations found in training were tried.  Embrace (DCU): Pedestrian tracking in 3D space  Detect and track pedestrians to infer the 3D location  Calculate the probability of person taking part in Embrace evens.  PersonRuns (ICT): Data correlation + trajectory features  Train full-body and head-shoulder detectors using standard haar-like features  P. Yarlagadda, et. al, INTUVISION EVENT DETECTION SYSTEM FOR TRECVID 2008 Adopt the data correlation method with the visual features to track objects P. Wilkins, et al. Dublin City  Event detection by trajectory length, location of trajectory points and speed. University at TRECVID 2008  ElevatorNoEntry (INTUVISION): Pedestrian detection + histogram matching X. Yang, et al., Shanghai Jiao Tong University participation in high-level feature J.B. Guo et. al, TRECVID 2008 Event Detection By MCG-ICT-CAS extraction,automatic search and surveillance event detection at TRECVID 2008  Haar object pedestrian detection  Histogram matching to find person not entering an elevator  …… 5 A. Hauptmann et al. Informedia @ TRECVID2008: Exploring New Frontiers

  6. Our Results in TRECVID-ED2009 (1) Event #Ref #Sys #CorDet #FA #Miss Act. DCR p-eSur_1 1.023 PeopleMeet 449 125 7 118 442 PeopleSplitUp 187 198 7 191 180 1.025 1.020 Embrace 175 80 1 79 174 0.334 ElevatorNoEntry 3 4 2 2 1 p-eSur_2 Event #Ref #Sys #CorDet #FA #Miss Act. DCR PeopleMeet 449 210 15 195 434 1.030 PeopleSplitUp 187 881 14 867 173 1.209 Embrace 175 164 3 161 172 1.036 PersonRuns 107 356 5 351 102 1.068 p-eSur_3 Event #Ref #Sys #CorDet #FA #Miss Act. DCR PeopleMeet 449 210 15 195 434 1.030 PeopleSplitUp 187 881 14 867 173 1.209 Embrace 175 164 3 161 172 1.036 ElevatorNoEntry 3 0 0 0 3 1.000 6

  7. Our Results in TRECVID-ED2009 (2)  Compared with the best results in TRECVID-ED 2008  Directly on the reported results in terms of Act. DCR Event Our Best Best 2008 Imp. PeopleMeet 1.023 1.337 -0.314 PeopleSplitUp 1.025 4.856 -3.831 Note: Our results are evaluated on the ED Embrace 1.020 1.271 -0.251 2009 data by 2009 DCR metric, while the ElevatorNoEntry 0.334 N/A - 2008 best results are evaluated on the ED PersonRuns 1.068 0.989 +0.079 2008 data by 2008 DCR metric.  On the TRECVID-ED 2008 data in terms of 2008 Act. DCR Event Our Best Best 2008 Imp. PeopleMeet 1.245 1.337 -0.092 PeopleSplitUp 1.976 4.856 -2.880 Embrace 1.208 1.271 -0.063 ElevatorNoEntry 0.130 N/A - 1.249 PersonRuns 0.989 +0.260 7

  8. What are Improved?  What? 1. Effectively reduce the false alarms of detection 2. Obtain comparable detection accuracy, and much better results for ElevatorNoEntry  Why? 1. Adaptive background modeling 2. Effective human detection and tracking 3. Ensemble of one-vs.-all SVM and automata-based classifiers 4. Effective event merging and post-processing 8

  9. Our Solution : Treatments for Different Event Categories  Pair-activity Event:  One people interact with another people  Single-actor Event:  No interaction with other people Retrospective event detection Pair-activity Single-actor events events People Elevator PeopleMeet Embrace PersonRuns SplitUp NoEntry 9

  10. Our eSur Framework for TRECVID-ED Camera Classification Feature Extraction Body Detection Background Subtraction Head-Shoulder Detection Post- Processing Object Tracking Events Merging Feature Extraction One VS All SVM Automata

  11. Our Solution (1) : Background Modeling  Mixture of Gaussian (MoG):  To accurately extract the foreground while effectively decreasing detection false alarms.  Block-wise PCA Model:  To identify which camera the video belongs to  Also used in the ElevatorNoEntry event detection.  “block” : segment each frame into blocks  “wise” : adaptively select the principle component for background reconstruction 11

  12. MoG  Key Idea  Randomly select 1000 frames from each camera  Manually label the foreground objects  Use EM algorithm to estimate the model  Results of Background Reconstruction Cam1 Background Cam2 Background Cam3 Background Cam5 Background  Disadvantage: Computation time-consuming 12

  13. Block-wise PCA  General PCA  Model a whole frame  Problems  high spatio-temporal computation complexity  high miss ratio (especially for static objects).  Block-wise PCA  Segment a frame into blocks, and model each block respectively.  Lower spatio-temporal computation complexity  Adaptively select principle component by the MMSE to the mean background  Lower miss ratio and less block effect. 2 = − = φφ T B argmin I B B I B i i i i i φ where I is the trained mean background, is the i th principle i component and is the ith reconstructed background B 13 i

  14. Comparative Results  Blocking vs. No Blocking Method No-blocking Blocking Training time 361.332s 150.406s (for 300 frames) * Experiment platform : Intel Xeon E5410 2.33GHz , 8G Result with no blocking Result with blocking  Block PCA vs. Block-wise PCA original image Block-wise PCA Block PCA 14

  15. Our Solution (2) : Detection and Tracking  Detection: Histogram of oriented gradients (HOG) for both whole body and head-shoulder  Tracking: Online boosting  Forward and backward tracking  Combining color similarity to reduce drift 15

  16. HOG Detector  Fusion of Head-shoulder and Body detection  Adjust the detector searching scales

  17. Detection Results 17

  18. Tracking Process Frame 1 . 2. 3. 4. … Forward Tracking Backward Tracking … Combined Result: Expected Target : Detection Result : Canceled : Expected Path : Final Path : 18

  19. State Machine of Tracking D : Detection existence ND: No detection results P : Online boosting prediction result NH: Not human, drifting happens H : No drifting S : Online boosting and detection results are similar U : Online boosting and detection results aren’t similar S H Head –shoulder Start Prediction and Body Detection D P Start U NH ND End 19

  20. Detection and Tracking Results Detection Results Tracking Results 20

  21. Drift Reduction by Color Similarity  Problem: Drifting  Solution: Combine color similarity to refine tracking results Tracking Result without Color Similarity Comparison Tracking Result with Color Similarity Comparison 21 [CLICK FOR PLAY]

  22. Our Solution (3) : Events Detection - Pair-activity  Event Analysis using key frames  Key Frames: Frames characterize an event happening  “PeopleMeet” and “Embrace”  At the end of the event  “PeopleSplitUp”  At the beginning of the event PeopleMeet Embrace PeopleSplitUp 22

Recommend


More recommend