at t research at trecvid 2013 surveillance event detection
play

AT&T Research at TRECVID 2013: Surveillance Event Detection - PowerPoint PPT Presentation

AT&T Research at TRECVID 2013: Surveillance Event Detection Xiaodong Yang * , Zhu Liu , Eric Zavesky , David Gibbon , Behzad Shahraray City College of New York, CUNY AT&T Labs - Research *This work is carried out


  1. AT&T Research at TRECVID 2013: Surveillance Event Detection Xiaodong Yang †* , Zhu Liu ‡ , Eric Zavesky ‡ , David Gibbon ‡ , Behzad Shahraray ‡ † City College of New York, CUNY ‡ AT&T Labs - Research *This work is carried out when the author worked as a research intern at AT&T Labs – Research.

  2. Team Members Xiaodong Zhu Eric David Behzad Yang Liu Zavesky Gibbon Shahraray

  3. Outline  System Overview  Low-Level Features  Video Representation  CascadeSVMs  Human Interactions  Performance Evaluation  Conclusion

  4. Outline  System Overview  Low-Level Features  Video Representation  CascadeSVMs  Human Interactions  Performance Evaluation  Conclusion

  5. System Overview

  6. Outline  System Overview  Low-Level Features  Video Representation  CascadeSVMs  Human Interactions  Performance Evaluation  Conclusion

  7. System Overview

  8. Low-Level Feature Extraction  STIP-HOG/HOF  MoSIFT  ActionHOG  Dense Trajectories (DT)  Trajectory  HOG  HOF  Motion Boundary Histogram (MBH)

  9. Low-Level Feature Extraction  STIP  3D Harris corner detector  HOG-HOF descriptor I. Laptev. On Space-Time Interest Points. IJCV , 2005.

  10. Low-Level Feature Extraction  MoSIFT  SIFT detector + motion  SIFT descriptor  image gradient  optical flow M. Chen and A. Hauptmann. MoSIFT: Recognizing Human Actions in Surveillance Videos. CMU-CS-09-161 , 2009.

  11. Low-Level Feature Extraction  ActionHOG  SURF detector + motion  HOG  image gradient  motion history image  optical flow X. Yang, C. Yi, L. Cao, and Y. Tian. MediaCCNY at TRECVID 2012: Surveillance Event Detection. NIST TRECVID Workshop , 2012.

  12. Low-Level Feature Extraction  Dense Trajectories  dense sampling + tracking  Trajectory  HOG  HOF  MBH H. Wang, A. Klaser, C. Schmid, and C. Liu. Action Recognition by Dense Trajectories. CVPR , 2011.

  13. Outline  System Overview  Low-Level Features  Video Representation  CascadeSVMs  Human Interactions  Performance Evaluation  Conclusion

  14. System Overview

  15. Video Representation  Fisher Vector  low-level features  GMM  gradient wrt. mean  gradient wrt. variance F. Perronnin, J. Sanchez, and T. Mensink. Improving The Fisher Kernel for Large-Scale Image Classification. ECCV , 2010.

  16. Video Representation  Fisher Vector  concatenation of and  dimension of  GMM-128 Feature STIP MoSIFT ActionHOG DT-HOG DT-HOF DT-MBH DT-Traj Feat-Dim 162 256 216 96 108 192 30 FV-Dim 330K 520K 440K 200K 220K 400K 60K

  17. Video Representation  Spatial Pyramids S. Lazebnik, C. Schmid, and J. Ponce. Beyond Bag of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. CVPR , 2006.

  18. Outline  System Overview  Low-Level Features  Video Representation  CascadeSVMs  Human Interactions  Performance Evaluation  Conclusion

  19. System Overview

  20. CascadeSVMs  Imbalanced Data

  21. CascadeSVMs  Imbalanced Data % 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0

  22. CascadeSVMs Sample Model-1 Model-2 Model-3 Model-C positive prediction negative prediction X. Yang, C. Yi, L. Cao, and Y. Tian. MediaCCNY at TRECVID 2012: Surveillance Event Detection. NIST TRECVID Workshop , 2012.

  23. CascadeSVMs  Feature Fusion

  24. Outline  System Overview  Low-Level Features  Video Representation  CascadeSVMs  Human Interactions  Performance Evaluation  Conclusion

  25. System Overview

  26. Human Interactions  High Throughput UI

  27. Human Interactions  Triage UI

  28. Outline  System Overview  Low-Level Features  Video Representation  CascadeSVMs  Human Interactions  Performance Evaluation  Conclusion

  29. Performance Evaluation  Experimental Setup  PersonRuns  Fisher Vector  CascadeSVMs  40-hour videos for training  10-hour videos for testing

  30. Performance Evaluation  Number of Gaussian Components  STIP

  31. Performance Evaluation  Comparisons of Low-Level Features  STIP  MoSIFT  ActionHOG  DT-Trajectory  DT-HOG  DT-HOF  DT-MBH

  32. Performance Evaluation  How A Larger Training Set Helps  40 vs. 90 hours training videos

  33. Performance Evaluation  Feature Fusion  90 hours training videos  STIP, DT-Trajectory, DT-MBH  Early Fusion  Late Fusion  Early + Late Fusion

  34. Performance Evaluation  Formal Evaluation  Comparative Results

  35. Outline  System Overview  Low-Level Features  Video Representation  CascadeSVMs  Human Interactions  Performance Evaluation  Conclusion

  36. Conclusion  Best ADCR

  37. Conclusion  Best ADCR Single Multiple Multiple Multiple Person Single Person Person People People People Object Person Object

  38. Conclusion  Multiple Features  fusion scheme  ranking and selection  event-specific investigation  Fisher Vector  accuracy and computation  Human Interaction  collaborative mode  cross-event mode  static gesture detection

Recommend


More recommend