c crowd counting and behavior d c ti d b h i modeling
play

C Crowd Counting and Behavior d C ti d B h i Modeling with - PowerPoint PPT Presentation

C Crowd Counting and Behavior d C ti d B h i Modeling with Modeling with Convolutional Neural Networks Hongsheng Li Hongsheng Li 1 Dept. of Electronic Enigineering, 2 Multimedia Laboratory The Chinese University of Hong


  1. C Crowd Counting and Behavior d C ti d B h i Modeling with Modeling with Convolutional Neural Networks Hongsheng Li 李鴻升 Hongsheng Li 李鴻升 1 Dept. of Electronic Enigineering, 2 Multimedia Laboratory The Chinese University of Hong Kong The Chinese University of Hong Kong

  2. Typical Surveillance Scenario yp 2

  3. Background Subtraction g [Stauffer and Grimson 1999] [Stauffer and Grimson 1999] [Elgammal et al. 2000] [Elgammal et al 2000] [Zivkovic 2004] [Kim et al. 2005] [Sheikh and Shah 2005] 3

  4. Crowd Tracking [Lucas and Kanade 1981] [Shi and Tomasi 1994] [Wang et al. 2011] 4

  5. Crowd Motion Analysis y [Ali and Shah 2007] [Ali and Shah 2007] [Amer and Todorovic 2011] [Chang et al. 2011] [Amer and Todorovic 2011] [Chang et al 2011] [Loy et al. 2012] [Pellegrini et al. 2009] [Zhou et al. 2013] 5

  6. Contents Contents • Crowd Counting – Cross ‐ scene crowd count and density estimation y with deep CNN [Zhang et al. CVPR’15] – Crossing ‐ line crowd counting with two ‐ phase deep Crossing line crowd counting with two phase deep CNN [Zhao et al. ECCV’16] • Crowd Behavior Modeling C d B h i M d li – Multi ‐ person walking path prediction [Yi et al. ECCV’16]

  7. Contents Contents • Crowd Counting – Cross ‐ scene crowd count and density estimation y with deep CNN [Zhang et al. CVPR’15] – Crossing ‐ line crowd counting with two ‐ phase deep Crossing line crowd counting with two phase deep CNN [Zhao et al. ECCV’16] • Crowd Behavior Modeling C d B h i M d li – Multi ‐ person walking path prediction [Yi et al. ECCV’16]

  8. Cross scene Crowd Counting Cross ‐ scene Crowd Counting • Problem definition • Problem definition – Counting the people in the Region ‐ Of ‐ Interest (ROI d (ROI, denoted as the blue region) d h bl i )

  9. CNN for Crowd Counting CNN for Crowd Counting • Training strategy • Training strategy – Patch ‐ based training – Alternatively training with crowd counts and crowd density objectives

  10. Create Ground Truth Patches (Cont’d) Create Ground Truth Patches (Cont d) • Estimating perspective map of a scene • Estimating perspective map of a scene – Each scene needs 2 ‐ 4 annotations of person h i h height – Each pixel stores the value that how many meters current pixel represent

  11. Create Ground Truth Patches (Cont’d) Create Ground Truth Patches (Cont d) • Convolution on the head annotation map with p person ‐ shape kernel – Person ‐ shape kernel should be sum to 1 – Person ‐ shape kernel should be sum to 1 – Crop 3x3 meter patches – Normalize patches to the same size (72x72)

  12. Alternative Training Strategy Alternative Training Strategy • Train each step until convergence • Train each step until convergence – Train with pixel ‐ level density maps and L2 loss – Train with crowd counts of patches

  13. Finetuning on Unseen Scenes Finetuning on Unseen Scenes • Training on all training scenes • Training on all training scenes • For an unseen scene, the trained model might not be suitable for direct deployment • Finetuning the pre ‐ trained model on training Finetuning the pre trained model on training patches similar to those test patches

  14. Training Patch Retrieval Training Patch Retrieval • Candidate training scene retrieval • Candidate training scene retrieval – Given a target scene, retrieve training scenes with similar perspective map (i.e., scenes with similar i il i (i i h i il viewing angles) – Top 20 perspective ‐ map ‐ similar training scenes are kept Top 20 training scenes Top ‐ 20 training scenes Test Scene 1 Test Scene 2

  15. Training Patch Retrieval (cont’d) Training Patch Retrieval (cont d) • Candidate training patch retrieval • Candidate training patch retrieval – Estimate target scene density using pretrained model – Retrieve training patches to match the distribution of target scene according to its density histogram Target scene Training patches d density it d density distribution distribution

  16. Datasets Datasets • UCSD [Chan et al CVPR’08] • UCSD [Chan et al. CVPR 08] • UCF_CC_50 [Idrees et al. CVPR’13] • WordExpo’10 dataset (with SJTU) – Train & validation: 1,127 one ‐ minute video clips of Train & validation: 1 127 one minute video clips of 103 scenes – Test: 5 one ‐ hour video clips from 5 scenes T t 5 h id li f 5 Dataset # frames # scenes Resolution FPS # people per # total frame annotations UCSD 2,000 1 158 X 238 10 11 ‐ 46 49885 UCF_FF_50 UCF FF 50 50 50 50 50 Various Various image image 94 ‐ 4543 94 4543 63974 63974 WorldExpo 4.44 million 108 576 X 720 25 1 ‐ 253 199923

  17. Results Results

  18. Results: WorldExpo’10 Results: WorldExpo 10 • Metric: mean absolute error Metric: mean absolute error Method Scene 1 Scene 2 Scene 3 Scene 4 Scene 5 Average LBP RR LBP+RR 13.6 13.6 58.9 58.9 37.1 37.1 21.8 21.8 23.4 23.4 31.0 31.0 Fiaschi et al. 2.2 87.3 22.2 16.4 5.4 26.7 ICPR’12 Chen et al. 2.1 55.9 9.6 11.3 3.4 16.5 BMVC’12 Crowd CNN 2.0 29.5 9.7 9.3 3.1 10.7

  19. Results: UCSD & UCF CC 50 Results: UCSD & UCF_CC_50 UCSD UCSD UCF_CC_50

  20. Contents Contents • Crowd Counting – Cross ‐ scene crowd count and density estimation y with deep CNN [Zhang et al. CVPR’15] – Crossing ‐ line crowd counting with two ‐ phase deep Crossing line crowd counting with two phase deep CNN [Zhao et al. ECCV’16] • Crowd Behavior Modeling C d B h i M d li – Multi ‐ person walking path prediction [Yi et al. ECCV’16]

  21. Cross scene Crowd Counting Cross ‐ scene Crowd Counting • Problem definition • Problem definition – Count people crossing a Line ‐ of ‐ Inerest in both di directions i – Has practical needs in intelligent surveillance

  22. Temporal slicing Temporal slicing • Existing LOI counting methods mostly use • Existing LOI counting methods mostly use temporal slices

  23. CNN with Pixel level Supervision CNN with Pixel ‐ level Supervision • CNN trained with pixel level supervision maps • CNN trained with pixel ‐ level supervision maps – Instantaneous crowd counting map, which can be d decomposed to d – Crowd density map – Crowd velocity map

  24. Definition of Crowd Counting Map Definition of Crowd Counting Map • At a single time step how many persons have • At a single time step, how many persons have passed this location along x and y directions at each location. h l i • Crossing ‐ line counts can be calculated by g y projecting the values to the normal direction of the LOI of the LOI

  25. Definition of Crowd Counting Map (cont’d) • Corwd counting map can be decomposed as • Corwd counting map can be decomposed as the multiplication of crowd density map and crowd velocity map d l i

  26. Two phase Strategy Two ‐ phase Strategy • Two phase strategy • Two phase strategy – Phase I: train with density and velocity supervision – Phase II: train with counting supervision

  27. Supervision Maps Supervision Maps GT counting map GT velocity map GT density map Estimated counting map d Estimated velocity map Estimated density map

  28. From Instantaneous Counts to LOI Counts • Project the x and y directional counting values on the LOI to Project the x and y directional counting values on the LOI to its normal direction. • Integrating over all the projected values leads to the g g p j instantaneous LOI counts and in the two directions at time t • For certain period of time T, integrate the instantaneous counting numbers to obtain the final crossing line counts within T, LOI

  29. LOI Counting Dataset LOI Counting Dataset • A new LOI counting dataset • A new LOI counting dataset • Evaluation metric ‐ Mean Windowed Relative Absolute Errors Mean Windowed Relative Absolute Errors

  30. LOI Counting Dataset LOI Counting Dataset • A new LOI counting dataset • A new LOI counting dataset

  31. Results Results • Baselines: • Baselines: – Phase I: no phase II training, estimated velocity map and density map are directly multiplied density map are directly multiplied – Direct ‐ A: CNN without elementwise multiplication, direct train with Phase II supervision p – Direct ‐ B: CNN with elementwise multiplication, direct train with Phase II supervision – Two ‐ separate: two separate CNNs for velocity and density

  32. Results Results 2X Speed Downward Downward Upward Upward

  33. Contents Contents • Crowd Counting – Cross ‐ scene crowd count and density estimation y with deep CNN [Zhang et al. CVPR’15] – Crossing ‐ line crowd counting with two ‐ phase deep Crossing line crowd counting with two phase deep CNN [Zhao et al. ECCV’16] • Crowd Behavior Modeling C d B h i M d li – Multi ‐ person walking path prediction [Yi et al. ECCV’16]

  34. Problem Definition Problem Definition • Previous five frames as input BLUE : input locations. GREEN : GT future locations. RED : current locations.

  35. Problem Definition Problem Definition • Need to predict future five frames BLUE : input locations. GREEN : GT future locations. RED : current locations.

  36. Main Difficulties Main Difficulties • How to solve the problem with deep neural network? • How to encode pedestrian walking paths as the input of a deep networks? the input of a deep networks? • How to jointly model the behaviors of all pedestrians in the scene?

Recommend


More recommend