learning to detect activity in
play

Learning to Detect Activity in Untrimmed Video Prof. Bernard Ghanem - PowerPoint PPT Presentation

Learning to Detect Activity in Untrimmed Video Prof. Bernard Ghanem An image is worth a thousand words A video is worth a million words Source: YouTube Image: a tiger attacking a person on a grass field Video: the tiger is being


  1. Learning to Detect Activity in Untrimmed Video Prof. Bernard Ghanem

  2. An image is worth a thousand words A video is worth a million words Source: YouTube Image: “a tiger attacking a person on a grass field” Video: “the tiger is being playful ” Bernard Ghanem

  3. Fun facts about video 45% of people watch more than an By 2017, online video will account 55% of people watch videos online hour of Facebook or YouTube for 74% of all online traffic 3 every day 1 videos a week 2 Almost 50% of internet users look 85% of Facebook video is watched for videos related to a product or without sound 5 service before visiting a store 4 Source:Source:1) MWP Statistics, 2015; 2) HubSpot, 2016 3) KPCB, 2016 4) Google, 2016; 5) DIGIDAY, 2016 Bernard Ghanem

  4. Problem: Detecting Human Activities in Video Input … … … … Bernard Ghanem

  5. Problem: Detecting Human Activities in Video Input … … Output … … Class: Pole Vault Bounds: (23.1s, 25.2s) Bernard Ghanem

  6. Why Activity Detection? Bernard Ghanem

  7. Bernard Ghanem

  8. Challenges of Detecting Human Activities Input … … Output … … 1. Not enough large-scale training data 2. Large number of activities 3. Real-time processing is not enough Bernard Ghanem

  9. 1. Not enough large-scale training data 1 st Version (R1.1): • ~200 classes • ~850 hours • class hierarchy ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding [CVPR 2015] Bernard Ghanem

  10. 1. Not enough large-scale training data At CVPR 2017 (July 26 – afternoon) http://activity-net.org/challenges/2017 Sponsored by: ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding [CVPR 2015] Bernard Ghanem

  11. Classical Activity Detection Pipeline … … Basketball Dunk Classifier . . . Volleyball Spiking Classifier Bernard Ghanem

  12. Classical Activity Detection Pipeline … … Basketball Dunk Classifier . . . Volleyball Spiking Classifier Bernard Ghanem

  13. Using proposals is important … … Action Proposal Basketball Dunk Basketball Dunk Classifier Classifier Volleyball Spiking Volleyball Spiking Classifier Classifier Bernard Ghanem

  14. What have we done? Fast Temporal Activity Proposals for Efficient Detection of Human Actions in Untrimmed Videos [CVPR 2016] proposals are represented as sparse combinations of STIPs (10FPS on single CPU core) DAPs: Deep Action Proposals for Action Understanding [ECCV 2016] multi-scale (sparse) proposals are output by an LSTM in one pass (130FPS on single GPU) SST: Single-Stream Temporal Action Proposals [CVPR 2017] multi-scale (dense) proposals are scored by a GRU in one pass + streaming (300FPS on single GPU) Bernard Ghanem

  15. SST: Single Stream Temporal Action Proposals Untrimmed Input Video Temporal Action Proposals Localized Action Detections SST classifier Output … c t Proposals output k - proprosals (time step t ) ⬄ … Seq. Encoder … ϕ ϕ ϕ ϕ ϕ ϕ Visual Encoder k · δ maximum proposal size (per output) … Input video δ Time Bernard Ghanem

  16. SS-TAD: Single Stream Temporal Action Detection (a) (b) (c) Action Detections Classifiers Merging/Smoothing SS-TAD Proposals Frame-level Classifiers Untrimmed Video Input End-to-End, Single-Stream Temporal Action Detection in Untrimmed Videos [BMVC 2017] multi-scale (dense) detection are scored in one pass + streaming (700FPS on TitanX GPU) Bernard Ghanem

  17. SS-TAD: Single Stream Temporal Action Detection Key Detection Ground-truth Time (Actions are played at 1x speed, Background video is sped up) Bernard Ghanem

  18. 2. Large number of activities • Applying activity detectors for large number of activity classes is expensive. • Can we do better than linear computational growth with # of activity classes? Bernard Ghanem

  19. Activity-Object and Activity-Scene Relations SCC: Semantic Context Cascade for Efficient Action Detection [CVPR 2017] DAPs: Deep Action Proposals for Action Understanding [ECCV 2016] Bernard Ghanem

  20. Typical Activity Detection Pipeline Action Action Video Sequence Action Proposals Proposals Classifiers (Stage 1) (Stage 2) Reject SCC: Semantic Context Cascade for Efficient Action Detection [CVPR 2017] DAPs: Deep Action Proposals for Action Understanding [ECCV 2016] Bernard Ghanem

  21. SCC: Semantic Context Cascade SCC: Semantic Context Cascade for Efficient Action Detection [CVPR 2017] Bernard Ghanem

  22. SCC: Semantic Context Cascade SCC: Semantic Context Cascade for Efficient Action Detection [CVPR 2017] Bernard Ghanem

  23. SCC: Semantic Context Cascade SCC: Semantic Context Cascade for Efficient Action Detection [CVPR 2017] Bernard Ghanem

  24. 3. Real-time processing is not enough • In the past, real- time processing was a “good -to- have”, i.e. 1min video → 1min processing • But, not anymore! • We need to stay ahead of the increasing video upload rate. How? hardware acceleration (GPUs)  more efficient implementation  do we need to visit every frame?  Bernard Ghanem

  25. Do we have to visit every frame? • Log how human annotator moves the time slider instead of throwing it away • Can we learn from how humans move the slider to localize t activities? Search History Action Search: Learning to Search for Human Activities in Untrimmed Videos [arXiv 2017][To be submitted to CVPR2018] Bernard Ghanem

  26. 𝑢 𝑢 Action Search: Learning to Search for Human Activities in Untrimmed Videos [arXiv 2017][To be submitted to CVPR2018] Bernard Ghanem

  27. 𝑢 𝑔(𝒀 𝑗−3 ) 𝑔(𝒀 𝑗−2 ) 𝑔(𝒀 𝑗 ) 𝑔(𝒀 𝑗+1 ) 𝑔(𝒀 𝑗−1 ) . . . . . . 𝒊 𝑗−3 𝒊 𝑗−2 𝒊 𝑗−1 𝒊 𝑗 𝒊 𝑗+1 𝒘 𝑗−1 𝒘 𝑗 LSTM 𝒘 𝑗−2 𝒘 𝑗+1 . . . . . . 3D ConvNet Target Activity 𝒀: Visual Observation 𝒘: Feature Vector 𝒀 𝑗 𝒀 𝑗−2 𝒀 𝑗−1 𝒀 𝑗+1 𝒊: LSTM State 𝑔 𝒀 : Temporal Location 𝑔(𝒀 𝑗−3 ) 𝑔(𝒀 𝑗−2 ) 𝑔(𝒀 𝑗+1 ) 𝑔(𝒀 𝑗 ) 𝑔(𝒀 𝑗−1 ) 𝑢 Action Search: Learning to Search for Human Activities in Untrimmed Videos [arXiv 2017][To be submitted to CVPR2018] Bernard Ghanem

  28. Action Search or Action Spotting Activity: “shot put” Activity: “basketball dunk” Activity: “shot put” Action Search: Learning to Search for Human Activities in Untrimmed Videos [arXiv 2017][To be submitted to CVPR2018] Bernard Ghanem

  29. SPONSORS Bernard Ghanem

  30. Prof. Bernard Ghanem bernard.ghanem@kaust.edu.sa ivul.kaust.edu.sa baseball throw dunk shoveling washing dishes pole vault dancing Bernard Ghanem

Recommend


More recommend