PKU@TRECVID2009: Single-Actor and Pair-Activity Event Detection in - PowerPoint PPT Presentation

PKU@TRECVID2009: Single-Actor and Pair-Activity Event Detection in Surveillance Video General Coach: Wen Gao a , Xihong Wu b , Tiejun Huang a Executive Coach: Yonghong Tian a , Yaowei Wang a , Lei Qing a Member: Zhipeng Hu a* , Guangnan Ye b* , Guochen Jia a , Xibin Chen b , Qiong Hu c , Kaihua Jiang b a National Engineering Laboratory for Video Technology, Peking University b Speech and Hearing Research Center, Peking University c Key Lab of Intel. Inf. Proc., Institute of Computing Technology, Chinese Academy of Sciences

Outline  Overview  Introduction of TRECVID-ED Tasks  Summary of TRECVID-ED 2008  Our Results in TRECVID-ED 2009  Our Solution in the eSur System  Background Modeling  Detection and Tracking  Event Classification  Post-processing  Illustrative Results  Summary 2

Overview of TRECVID-ED Tasks  Task  To develop an automatic system to detect observable events in surveillance video  Ten Events  Challenges  PeopleMeet  Clutter scenes  PeopleSplitUp  Illumination  Embrace variations   ElevatorNoEntry Occlusion   PersonRun Different camera  views CellToEar   No clear event ObjectPut definition  TakePicture  Pointing  OpposingFlow 3

The Best Results of 2008 SITEID Event #Ref #Sys #CorDet #FA #Miss Act.DCR IFP-UIUC-NEC CellToEar 349 15 1 14 348 0.999 Intuvision ElevatorNoEntry 0 8 0 8 0 NA DCU Embrace 401 36193 91 5091 310 1.271 IFP-UIUC-NEC ObjectPut 1944 83 6 77 1938 1.004 Intuvision OpposingFlow 12 31 9 12 3 0.251 SJTU PeopleMeet 1182 25033 270 5779 912 1.337 CMU PeopleSplitUp 671 42415 185 42230 486 4.856 MCG-ICT-CAS PersonRuns 314 662 23 639 291 0.989 SJTU Pointing 2316 1005 35 970 2281 1.080 Intuvision TakePicture 23 10 0 10 23 1.000  Note:  There are much rooms for improvement.  OpposingFlow event has good detection performance.  ElevatorNoEntry and TakePicture events are zero CorDets. 4

Approaches in 2008  PeopleMeet (SJTU): Camshift guided particle filter + HMM  Combine Head top detector and human detector  Camshift guided particle filter to obtain trajectory  HMM models to detect hidden states defined by trajectory features.  PeopleSplitUp (CMU): Key points + SVM  Cluster interest points into visual keywords  SVM classifiers to detect activities  Event segmentation was done in a multi-resolution framework, where all activity durations found in training were tried.  Embrace (DCU): Pedestrian tracking in 3D space  Detect and track pedestrians to infer the 3D location  Calculate the probability of person taking part in Embrace evens.  PersonRuns (ICT): Data correlation + trajectory features  Train full-body and head-shoulder detectors using standard haar-like features  P. Yarlagadda, et. al, INTUVISION EVENT DETECTION SYSTEM FOR TRECVID 2008 Adopt the data correlation method with the visual features to track objects P. Wilkins, et al. Dublin City  Event detection by trajectory length, location of trajectory points and speed. University at TRECVID 2008  ElevatorNoEntry (INTUVISION): Pedestrian detection + histogram matching X. Yang, et al., Shanghai Jiao Tong University participation in high-level feature J.B. Guo et. al, TRECVID 2008 Event Detection By MCG-ICT-CAS extraction,automatic search and surveillance event detection at TRECVID 2008  Haar object pedestrian detection  Histogram matching to find person not entering an elevator  …… 5 A. Hauptmann et al. Informedia @ TRECVID2008: Exploring New Frontiers

Our Results in TRECVID-ED2009 (1) Event #Ref #Sys #CorDet #FA #Miss Act. DCR p-eSur_1 1.023 PeopleMeet 449 125 7 118 442 PeopleSplitUp 187 198 7 191 180 1.025 1.020 Embrace 175 80 1 79 174 0.334 ElevatorNoEntry 3 4 2 2 1 p-eSur_2 Event #Ref #Sys #CorDet #FA #Miss Act. DCR PeopleMeet 449 210 15 195 434 1.030 PeopleSplitUp 187 881 14 867 173 1.209 Embrace 175 164 3 161 172 1.036 PersonRuns 107 356 5 351 102 1.068 p-eSur_3 Event #Ref #Sys #CorDet #FA #Miss Act. DCR PeopleMeet 449 210 15 195 434 1.030 PeopleSplitUp 187 881 14 867 173 1.209 Embrace 175 164 3 161 172 1.036 ElevatorNoEntry 3 0 0 0 3 1.000 6

Our Results in TRECVID-ED2009 (2)  Compared with the best results in TRECVID-ED 2008  Directly on the reported results in terms of Act. DCR Event Our Best Best 2008 Imp. PeopleMeet 1.023 1.337 -0.314 PeopleSplitUp 1.025 4.856 -3.831 Note: Our results are evaluated on the ED Embrace 1.020 1.271 -0.251 2009 data by 2009 DCR metric, while the ElevatorNoEntry 0.334 N/A - 2008 best results are evaluated on the ED PersonRuns 1.068 0.989 +0.079 2008 data by 2008 DCR metric.  On the TRECVID-ED 2008 data in terms of 2008 Act. DCR Event Our Best Best 2008 Imp. PeopleMeet 1.245 1.337 -0.092 PeopleSplitUp 1.976 4.856 -2.880 Embrace 1.208 1.271 -0.063 ElevatorNoEntry 0.130 N/A - 1.249 PersonRuns 0.989 +0.260 7

What are Improved?  What? 1. Effectively reduce the false alarms of detection 2. Obtain comparable detection accuracy, and much better results for ElevatorNoEntry  Why? 1. Adaptive background modeling 2. Effective human detection and tracking 3. Ensemble of one-vs.-all SVM and automata-based classifiers 4. Effective event merging and post-processing 8

Our Solution : Treatments for Different Event Categories  Pair-activity Event:  One people interact with another people  Single-actor Event:  No interaction with other people Retrospective event detection Pair-activity Single-actor events events People Elevator PeopleMeet Embrace PersonRuns SplitUp NoEntry 9

Our eSur Framework for TRECVID-ED Camera Classification Feature Extraction Body Detection Background Subtraction Head-Shoulder Detection Post- Processing Object Tracking Events Merging Feature Extraction One VS All SVM Automata

Our Solution (1) : Background Modeling  Mixture of Gaussian (MoG):  To accurately extract the foreground while effectively decreasing detection false alarms.  Block-wise PCA Model:  To identify which camera the video belongs to  Also used in the ElevatorNoEntry event detection.  “block” : segment each frame into blocks  “wise” : adaptively select the principle component for background reconstruction 11

MoG  Key Idea  Randomly select 1000 frames from each camera  Manually label the foreground objects  Use EM algorithm to estimate the model  Results of Background Reconstruction Cam1 Background Cam2 Background Cam3 Background Cam5 Background  Disadvantage: Computation time-consuming 12

Block-wise PCA  General PCA  Model a whole frame  Problems  high spatio-temporal computation complexity  high miss ratio (especially for static objects).  Block-wise PCA  Segment a frame into blocks, and model each block respectively.  Lower spatio-temporal computation complexity  Adaptively select principle component by the MMSE to the mean background  Lower miss ratio and less block effect. 2 = − = φφ T B argmin I B B I B i i i i i φ where I is the trained mean background, is the i th principle i component and is the ith reconstructed background B 13 i

Comparative Results  Blocking vs. No Blocking Method No-blocking Blocking Training time 361.332s 150.406s (for 300 frames) * Experiment platform : Intel Xeon E5410 2.33GHz , 8G Result with no blocking Result with blocking  Block PCA vs. Block-wise PCA original image Block-wise PCA Block PCA 14

Our Solution (2) : Detection and Tracking  Detection: Histogram of oriented gradients (HOG) for both whole body and head-shoulder  Tracking: Online boosting  Forward and backward tracking  Combining color similarity to reduce drift 15

HOG Detector  Fusion of Head-shoulder and Body detection  Adjust the detector searching scales

Detection Results 17

Tracking Process Frame 1 . 2. 3. 4. … Forward Tracking Backward Tracking … Combined Result: Expected Target : Detection Result : Canceled : Expected Path : Final Path : 18

State Machine of Tracking D : Detection existence ND: No detection results P : Online boosting prediction result NH: Not human, drifting happens H : No drifting S : Online boosting and detection results are similar U : Online boosting and detection results aren’t similar S H Head –shoulder Start Prediction and Body Detection D P Start U NH ND End 19

Detection and Tracking Results Detection Results Tracking Results 20

Drift Reduction by Color Similarity  Problem: Drifting  Solution: Combine color similarity to refine tracking results Tracking Result without Color Similarity Comparison Tracking Result with Color Similarity Comparison 21 [CLICK FOR PLAY]

Our Solution (3) : Events Detection - Pair-activity  Event Analysis using key frames  Key Frames: Frames characterize an event happening  “PeopleMeet” and “Embrace”  At the end of the event  “PeopleSplitUp”  At the beginning of the event PeopleMeet Embrace PeopleSplitUp 22

PKU@TRECVID2009: Single-Actor and Pair-Activity Event Detection in - PowerPoint PPT Presentation

PKU@TRECVID2009: Single-Actor and Pair-Activity Event Detection in Surveillance Video General Coach: Wen Gao a , Xihong Wu b , Tiejun Huang a Executive Coach: Yonghong Tian a , Yaowei Wang a , Lei Qing a Member: Zhipeng Hu a* , Guangnan Ye b* ,

AN AN AN ACTOR AN ACTOR ACTOR ACTOR- - - -CENTERED POLICY PROCESS CENTERED POLICY PROCESS

Living Actor Living Actor Living Actor - Use Cases Living Actor - Use Cases Use Cases

Why actor analysis? Actor and network analysis Bert Enserink Network map of linked Network map

Movie & Actor QI, Xiaoxu CHEN, Guanhao JIN, Yue OVERVIEW Goal: build a movie and actor

My PKU U journe rney Intr troductio duction Karen Willetts from Dublin, Ireland

ROUNDERS (1998) CASINO ROYALE (2006) HAND RANKINGS HIGH CARD HAND RANKINGS PAIR HIGH CARD

Closest Pair of Points Cormen et.al 33.4 Closest Pair of Points Closest pair. Given n points in

Parallel Programming and Heterogeneous Computing D3 - Shared-Nothing: Actors Max Plauth, Sven

ECE 3574: Applied Software Design Actor Pattern Today we are going to look at an abstraction of

CAF C++ Actor Framework Matthias Vallentin UC Berkeley Berkeley C++ Summit October 17, 2016

Soft Actor-Critic Zikun Chen, Minghan Li Jan. 28, 2020 Soft Actor-Critic: Ofg-Policy Maximum

Amino acid disorders (PKU, MSUD, HT,HCU) Amino acid disorders (PKU, MSUD, HT,HCU) Biochemical

THEATR an Actor-Model Language so easy, even an Actor/Model could use it! Our Team: All the

Actor training for all ACT-SF.ORG CHANGE YOUR LIFE. CHANGE THE WORLD. Professional actor

Graph Representation Learning with Graph Convolutional Networks Jure Leskovec Networks: Common

Neural Fitted Actor-Critic Matthieu Zimmer Alain Dutech Yann Boniface University of Lorraine,

Graph Algorithms and Graph Measures for the Life Sciences Falk Schreiber 23/10/2014 1 Networks

and Research RNA in the sequence/structure network Jerome Waldispuhl School of Computer Science,

The National COVID Cohort Collaborative: Opportunities and Partnership April 14, 2020 CTSA

Full statistical analyses with secure multi-party computation Dan Bogdanov, Liina Kamm, Ville

PKU-IDM@TRECVID-CCD 2010: Copy Detection with Visual-Audio Feature Fusion and Sequential Pyramid

Lecture: Fast Proximal Gradient Methods http://bicmr.pku.edu.cn/~wenzw/opt-2018-fall.html

Computing with Semi-Algebraic Sets Represented by Triangular Decomposition Rong Xiao 1 joint work

Asymmetry Helps: Eigenvalue and Eigenvector Analyses of Asymmetrically Perturbed Low-Rank Matrices

PKU@TRECVID2009: Single-Actor and Pair-Activity Event Detection in - PowerPoint PPT Presentation

PKU@TRECVID2009: Single-Actor and Pair-Activity Event Detection in Surveillance Video General Coach: Wen Gao a , Xihong Wu b , Tiejun Huang a Executive Coach: Yonghong Tian a , Yaowei Wang a , Lei Qing a Member: Zhipeng Hu a* , Guangnan Ye b* ,

AN AN AN ACTOR AN ACTOR ACTOR ACTOR- - - -CENTERED POLICY PROCESS CENTERED POLICY PROCESS

Living Actor Living Actor Living Actor - Use Cases Living Actor - Use Cases Use Cases

Why actor analysis? Actor and network analysis Bert Enserink Network map of linked Network map

Movie &amp; Actor QI, Xiaoxu CHEN, Guanhao JIN, Yue OVERVIEW Goal: build a movie and actor

My PKU U journe rney Intr troductio duction Karen Willetts from Dublin, Ireland

ROUNDERS (1998) CASINO ROYALE (2006) HAND RANKINGS HIGH CARD HAND RANKINGS PAIR HIGH CARD

Closest Pair of Points Cormen et.al 33.4 Closest Pair of Points Closest pair. Given n points in

Parallel Programming and Heterogeneous Computing D3 - Shared-Nothing: Actors Max Plauth, Sven

ECE 3574: Applied Software Design Actor Pattern Today we are going to look at an abstraction of

CAF C++ Actor Framework Matthias Vallentin UC Berkeley Berkeley C++ Summit October 17, 2016

Soft Actor-Critic Zikun Chen, Minghan Li Jan. 28, 2020 Soft Actor-Critic: Ofg-Policy Maximum

Amino acid disorders (PKU, MSUD, HT,HCU) Amino acid disorders (PKU, MSUD, HT,HCU) Biochemical

THEATR an Actor-Model Language so easy, even an Actor/Model could use it! Our Team: All the

Actor training for all ACT-SF.ORG CHANGE YOUR LIFE. CHANGE THE WORLD. Professional actor

Graph Representation Learning with Graph Convolutional Networks Jure Leskovec Networks: Common

Neural Fitted Actor-Critic Matthieu Zimmer Alain Dutech Yann Boniface University of Lorraine,

Graph Algorithms and Graph Measures for the Life Sciences Falk Schreiber 23/10/2014 1 Networks

and Research RNA in the sequence/structure network Jerome Waldispuhl School of Computer Science,

The National COVID Cohort Collaborative: Opportunities and Partnership April 14, 2020 CTSA

Full statistical analyses with secure multi-party computation Dan Bogdanov, Liina Kamm, Ville

PKU-IDM@TRECVID-CCD 2010: Copy Detection with Visual-Audio Feature Fusion and Sequential Pyramid

Lecture: Fast Proximal Gradient Methods http://bicmr.pku.edu.cn/~wenzw/opt-2018-fall.html

Computing with Semi-Algebraic Sets Represented by Triangular Decomposition Rong Xiao 1 joint work

Asymmetry Helps: Eigenvalue and Eigenvector Analyses of Asymmetrically Perturbed Low-Rank Matrices

Movie & Actor QI, Xiaoxu CHEN, Guanhao JIN, Yue OVERVIEW Goal: build a movie and actor