analyzing and predicting human activities in video
play

Analyzing and Predicting Human Activities in Video Greg Mori - PowerPoint PPT Presentation

Analyzing and Predicting Human Activities in Video Greg Mori Professor School of Computing Science Simon Fraser University Research Director Borealis AI Vancouver What does activity recognition involve? Detection: are there people? indoor


  1. Analyzing and Predicting Human Activities in Video Greg Mori Professor School of Computing Science Simon Fraser University Research Director Borealis AI Vancouver

  2. What does activity recognition involve?

  3. Detection: are there people?

  4. indoor scene long term care walker facility chair Objects and scenes: where are they? floor

  5. run stand Action recognition: what are they doing? fall squat

  6. get help watch Intention/social role: why are they doing this? comfort

  7. help the fallen person Group activity recognition: what is the overall situation?

  8. indoor scene get help run watch long term care walker help the facility stand fallen person chair These are inter-related problems: model structures comfort fall squat floor

  9. Desiderata for Activity Recognition Models Group structure Label structure Temporal structure help the fallen long term care facility person walker indoor scene chair floor time Ibrahim et al., CVPR 16 Yeung et al., CVPR 16 Hu et al., CVPR 16 Mehrasa et al., SLOAN 18 Yeung et al., IJCV 17 Deng et al., CVPR 16 Khodabandeh et al., arXiv 17 He et al., WACV 18 Nauata et al., CVPRW 17 Lan et al. CVPR 12 Chen et al., ICCVW 17 Deng et al., CVPR 17 Zhong et al., 2018 9

  10. Task: action detection Input Output t = 0 t = T Running Talking Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.

  11. Dominant paradigm: Dense processing t = 0 t = T … Standard in THUMOS challenge action detection entries … Oneata et al. 2014 Wang et al. 2014 Gkioxari and Malik 2015 Oneata et al. 2014 Yu et al. 2015 Escorcia et al. 2016 Yuan et al. 2015 Peng and Schmid 2016 He et al. 2018 Sliding windows Action proposals Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.

  12. Efficiently detecting actions Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.

  13. Our model for efficient action detection Detected actions Video t = 0 t = T Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 13

  14. Our model for efficient action detection Detected actions Video t = 0 t = T Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 14

  15. Our model for efficient action detection Detected actions Convolutional neural network (frame information) Video t = 0 t = T Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 15

  16. Our model for efficient action detection Detected actions Recurrent neural network (time information) Convolutional neural network (frame information) Video t = 0 t = T Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 16

  17. Our model for efficient action detection Detected actions [ ] Outputs: Output Detection instance hypothesis [start, end] Recurrent neural network (time information) Convolutional neural network (frame information) Video t = 0 t = T Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 17

  18. Our model for efficient action detection Detected actions x [ ] Outputs: Output Detection instance hypothesis [start, end] Emission indicator Recurrent neural network (time information) Convolutional neural network (frame information) Video t = 0 t = T Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 18

  19. Our model for efficient action detection Detected actions x [ ] Outputs: Output Detection instance hypothesis [start, end] Emission indicator Next frame to glimpse Recurrent neural network (time information) Convolutional neural network (frame information) Video t = 0 t = T Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 19

  20. Our model for efficient action detection Detected actions x [ ] Outputs: Output Detection instance hypothesis [start, end] Emission indicator Next frame to glimpse Recurrent neural network (time information) Convolutional neural network (frame information) Video t = 0 t = T Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 20

  21. Our model for efficient action detection Detected actions x [ ] Outputs: Output Detection instance hypothesis [start, end] Emission indicator Next frame to glimpse Recurrent neural network (time information) Convolutional neural network (frame information) Video t = 0 t = T Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 21

  22. Our model for efficient action detection Detected actions x [ ] [ ] Outputs: Output Output Detection instance hypothesis [start, end] Emission indicator Next frame to glimpse Recurrent neural network (time information) Convolutional neural network (frame information) Video t = 0 t = T Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 22

  23. Our model for efficient action detection Detected actions x x [ ] [ ] Outputs: Output Output Detection instance hypothesis [start, end] Emission indicator Next frame to glimpse Recurrent neural network (time information) Convolutional neural network (frame information) Video t = 0 t = T Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 23

  24. Our model for efficient action detection Detected actions x x [ ] [ ] Outputs: Output Output Detection instance hypothesis [start, end] Emission indicator Next frame to glimpse Recurrent neural network (time information) Convolutional neural network (frame information) Video t = 0 t = T Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 24

  25. Our model for efficient action detection Detected actions x x [ ] [ ] Outputs: Output Output Detection instance hypothesis [start, end] Emission indicator Next frame to glimpse Recurrent neural network (time information) Convolutional neural network (frame information) Video t = 0 t = T Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 25

  26. Our model for efficient action detection Detected actions x x [ ] [ ] [ ] Outputs: Output Output Output Detection instance hypothesis [start, end] Emission indicator Next frame to glimpse Recurrent neural network (time information) Convolutional neural network (frame information) Video t = 0 t = T Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 26

  27. Our model for efficient action detection Detected [ ] actions x x [ ] [ ] [ ] Outputs: Output Output Output Detection instance hypothesis [start, end] Emission indicator Next frame to glimpse Recurrent neural network (time information) Convolutional neural network (frame information) Video t = 0 t = T Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 27

  28. Our model for efficient action detection Detected [ ] actions x x [ ] [ ] [ ] Outputs: Output Output Output Detection instance hypothesis [start, end] Emission indicator Next frame to glimpse Recurrent neural network … (time information) Convolutional neural network (frame information) Video t = 0 t = T Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 28

  29. Training the detection instance output Positive video Negative video g 1 g 2 [ ] [ ] Training data t = 0 t = T t = 0 t = T y 2 = 1 y 3 = 2 y 4 = 0 y 1 = 1 [ ] ] [ ] [ ] [ Detections t = 0 t = T t = 0 t = T d 1 d 2 d 3 d 4 Reward for detection cross-entropy L 2 distance classification loss localization loss Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 29

  30. Training the non-differentiable outputs [ ] [ ] Training data t = T t = 0 [ ] [ ] [ ] Detections t = 0 t = T Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 30

  31. Training the non-differentiable outputs [ ] [ ] Training data t = T t = 0 [ ] [ ] [ ] Detections t = 0 t = T d 1 d 2 d 3 (1) whether to predict a detection ⍉ Model’s action Frame 1 Frame 8 Frame 6 Frame 15 sequence a (2) where to look next go to frame 15 go to frame 8 go to frame 6 Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 31

Recommend


More recommend