Learning to Detect Activity in Untrimmed Video Prof. Bernard Ghanem

An image is worth a thousand words A video is worth a million words Source: YouTube Image: “a tiger attacking a person on a grass field” Video: “the tiger is being playful ” Bernard Ghanem

Fun facts about video 45% of people watch more than an By 2017, online video will account 55% of people watch videos online hour of Facebook or YouTube for 74% of all online traffic 3 every day 1 videos a week 2 Almost 50% of internet users look 85% of Facebook video is watched for videos related to a product or without sound 5 service before visiting a store 4 Source:Source:1) MWP Statistics, 2015; 2) HubSpot, 2016 3) KPCB, 2016 4) Google, 2016; 5) DIGIDAY, 2016 Bernard Ghanem

Problem: Detecting Human Activities in Video Input … … … … Bernard Ghanem

Problem: Detecting Human Activities in Video Input … … Output … … Class: Pole Vault Bounds: (23.1s, 25.2s) Bernard Ghanem

Why Activity Detection? Bernard Ghanem

Bernard Ghanem

Challenges of Detecting Human Activities Input … … Output … … 1. Not enough large-scale training data 2. Large number of activities 3. Real-time processing is not enough Bernard Ghanem

1. Not enough large-scale training data 1 st Version (R1.1): • ~200 classes • ~850 hours • class hierarchy ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding [CVPR 2015] Bernard Ghanem

1. Not enough large-scale training data At CVPR 2017 (July 26 – afternoon) http://activity-net.org/challenges/2017 Sponsored by: ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding [CVPR 2015] Bernard Ghanem

Classical Activity Detection Pipeline … … Basketball Dunk Classifier . . . Volleyball Spiking Classifier Bernard Ghanem

Using proposals is important … … Action Proposal Basketball Dunk Basketball Dunk Classifier Classifier Volleyball Spiking Volleyball Spiking Classifier Classifier Bernard Ghanem

What have we done? Fast Temporal Activity Proposals for Efficient Detection of Human Actions in Untrimmed Videos [CVPR 2016] proposals are represented as sparse combinations of STIPs (10FPS on single CPU core) DAPs: Deep Action Proposals for Action Understanding [ECCV 2016] multi-scale (sparse) proposals are output by an LSTM in one pass (130FPS on single GPU) SST: Single-Stream Temporal Action Proposals [CVPR 2017] multi-scale (dense) proposals are scored by a GRU in one pass + streaming (300FPS on single GPU) Bernard Ghanem

SST: Single Stream Temporal Action Proposals Untrimmed Input Video Temporal Action Proposals Localized Action Detections SST classifier Output … c t Proposals output k - proprosals (time step t ) ⬄ … Seq. Encoder … ϕ ϕ ϕ ϕ ϕ ϕ Visual Encoder k · δ maximum proposal size (per output) … Input video δ Time Bernard Ghanem

SS-TAD: Single Stream Temporal Action Detection (a) (b) (c) Action Detections Classifiers Merging/Smoothing SS-TAD Proposals Frame-level Classifiers Untrimmed Video Input End-to-End, Single-Stream Temporal Action Detection in Untrimmed Videos [BMVC 2017] multi-scale (dense) detection are scored in one pass + streaming (700FPS on TitanX GPU) Bernard Ghanem

SS-TAD: Single Stream Temporal Action Detection Key Detection Ground-truth Time (Actions are played at 1x speed, Background video is sped up) Bernard Ghanem

2. Large number of activities • Applying activity detectors for large number of activity classes is expensive. • Can we do better than linear computational growth with # of activity classes? Bernard Ghanem

Activity-Object and Activity-Scene Relations SCC: Semantic Context Cascade for Efficient Action Detection [CVPR 2017] DAPs: Deep Action Proposals for Action Understanding [ECCV 2016] Bernard Ghanem

Typical Activity Detection Pipeline Action Action Video Sequence Action Proposals Proposals Classifiers (Stage 1) (Stage 2) Reject SCC: Semantic Context Cascade for Efficient Action Detection [CVPR 2017] DAPs: Deep Action Proposals for Action Understanding [ECCV 2016] Bernard Ghanem

SCC: Semantic Context Cascade SCC: Semantic Context Cascade for Efficient Action Detection [CVPR 2017] Bernard Ghanem

3. Real-time processing is not enough • In the past, real- time processing was a “good -to- have”, i.e. 1min video → 1min processing • But, not anymore! • We need to stay ahead of the increasing video upload rate. How? hardware acceleration (GPUs)  more efficient implementation  do we need to visit every frame?  Bernard Ghanem

Do we have to visit every frame? • Log how human annotator moves the time slider instead of throwing it away • Can we learn from how humans move the slider to localize t activities? Search History Action Search: Learning to Search for Human Activities in Untrimmed Videos [arXiv 2017][To be submitted to CVPR2018] Bernard Ghanem

𝑢 𝑢 Action Search: Learning to Search for Human Activities in Untrimmed Videos [arXiv 2017][To be submitted to CVPR2018] Bernard Ghanem

𝑢 𝑔(𝒀 𝑗−3 ) 𝑔(𝒀 𝑗−2 ) 𝑔(𝒀 𝑗 ) 𝑔(𝒀 𝑗+1 ) 𝑔(𝒀 𝑗−1 ) . . . . . . 𝒊 𝑗−3 𝒊 𝑗−2 𝒊 𝑗−1 𝒊 𝑗 𝒊 𝑗+1 𝒘 𝑗−1 𝒘 𝑗 LSTM 𝒘 𝑗−2 𝒘 𝑗+1 . . . . . . 3D ConvNet Target Activity 𝒀: Visual Observation 𝒘: Feature Vector 𝒀 𝑗 𝒀 𝑗−2 𝒀 𝑗−1 𝒀 𝑗+1 𝒊: LSTM State 𝑔 𝒀 : Temporal Location 𝑔(𝒀 𝑗−3 ) 𝑔(𝒀 𝑗−2 ) 𝑔(𝒀 𝑗+1 ) 𝑔(𝒀 𝑗 ) 𝑔(𝒀 𝑗−1 ) 𝑢 Action Search: Learning to Search for Human Activities in Untrimmed Videos [arXiv 2017][To be submitted to CVPR2018] Bernard Ghanem

Action Search or Action Spotting Activity: “shot put” Activity: “basketball dunk” Activity: “shot put” Action Search: Learning to Search for Human Activities in Untrimmed Videos [arXiv 2017][To be submitted to CVPR2018] Bernard Ghanem

SPONSORS Bernard Ghanem

Prof. Bernard Ghanem bernard.ghanem@kaust.edu.sa ivul.kaust.edu.sa baseball throw dunk shoveling washing dishes pole vault dancing Bernard Ghanem

Learning to Detect Activity in Untrimmed Video Prof. Bernard Ghanem - PowerPoint PPT Presentation

Learning to Detect Activity in Untrimmed Video Prof. Bernard Ghanem An image is worth a thousand words A video is worth a million words Source: YouTube Image: a tiger attacking a person on a grass field Video: the tiger is being

Can We Detect Crisp Sets Based Only on How to Detect 1- . . . the Subsethood Ordering of Fuzzy

Year 3 Reading Activity 1 Prefixes - page 2 Activity 2 Context clues page 15

Year 4 Science - Sound Activity 1 - Vibrations - Page 2 Activity 2 How do we hear? - Page

Activity 4 Inference from pictures Page 27 Activity 5 Feelings from pictures Page

CS 403X Mobile and Ubiquitous Computing Lecture 12: Activity Recognition Emmanuel Agu Activity

Transcultural Identity in European Popular Crime Narratives Slides: Federico Pagello

Styles of Intrusion Detection Misuse intrusion detection Try to detect things known to be

Detect ctor Charact cterization fo for the underground gr gravitational-wave detect ctor,

ANTIBACTERIAL ACTIVITY Antibacterial activity: Standard drugs 246 Graphical Presentation of

Flow, Space and Activity Relationships II. Chapter 3 of the textbook Activity relationships

Using Commas Using Commas Introductory Activity Independent Focused Activity Review Activity

Year 4 - Writing Activity 1 Similes Page 2 Activity 2 The Power of Three Page 12

MATHS YEAR 3 - TIME Activity 1 Oclock and Half past page 2 Activity 2 Quarter

YEAR 4 - MATHS Week 7-8 ACTIVITIES Activity 1 P lace value - Page 3 Activity 2 Order and

Using Rule-Based Activity Using Rule-Based Activity Using Rule-Based Activity Using Rule-Based

FORD SITE ENERGY STUDY TAG MEETING JULY 2015 ACTIVITY FOCUS Complete - Activity 1.1:

2014 Track and Field Take Part. Get Set For Life. Who to contact: Richard McWhirter-

FARMER NEEDS *Photos by Nesta Challenge Prize Centre Team We dont know how to identify and

Public Rights of Way Legislation Public Rights of Way Legislation The Powers and Duties of

TWO YEARS EFFECT OF SOME ALTERNATIVES TO METHYL BROMIDE ON STRAWBERRY CROPS Vicent Cebolla*,

MERALCO Standards: Electrical Design Engr. Eduard Mercado Electrical Design Team Electrical

Bright Spots Student & Staff Recognition Winter Athletics Indoor Track The NY State

CAL-MUM CENTRAL SCHOOL DISTRICT MAY 22, 2018 Agenda Project Scope Highlights Cost

Vision One91 BURNSVILLE HIGH SCHOOL BOARD UPDATE MAY 28, 2015 Mark Hayes Mark Hovelson