End-to-end Learning of Action Detection from Frame Glimpse in - PowerPoint PPT Presentation

End-to-end Learning of Action Detection from Frame Glimpse in Videos BIL722 - Advanced Topics in Computer Vision Ezgi Pekşen Soysal Hacettepe University

Task: what is the person doing? Input Output t = 0 t = T Running Talking Olga Russakovsky: The human side of computer vision

Task: what is the person doing? Input Output t = 0 t = T Running Talking Accuracy Efficiency Interpretability Olga Russakovsky: The human side of computer vision

Efficient video processing t = 0 t = T Olga Russakovsky: The human side of computer vision

Efficient video processing t = 0 t = T Running Talking Olga Russakovsky: The human side of computer vision

Efficient video processing t = 0 t = T Running Talking “Knowing the output or the final state… there is no need to explicitly store many previous states” [N. I. Badler. “Temporal Scene Analysis…” 1975 ] Olga Russakovsky: The human side of computer vision

Efficient video processing t = 0 t = T Running Talking “Knowing the output or the final state… there is no need to explicitly store many previous states” [N. I. Badler. “Temporal Scene Analysis…” 1975 ] Dominant paradigm: sliding windows t = T t = 0 Used in all THUMOS challenge action detection entries [OneVerSch 2014 ] … [WanQiaTan 2014 ] KarSeiBim 2014 ] [YuaPeiNiMouKas 2015 ] … Olga Russakovsky: The human side of computer vision

Efficient video processing t = 0 t = T Running Talking “Knowing the output or the final state… there is no need to explicitly store many previous states” “Time may be represented in several ways… The intervals between ‘pulses’ need not be equal.” [N. I. Badler. “Temporal Scene Analysis…” 1975 ] Olga Russakovsky: The human side of computer vision

Our model for efficient action detection Output Frame model Input: A frame t = 0 t = T [YeuRusMorFei CVPR’16] Olga Russakovsky: The human side of computer vision

Our model for efficient action detection [ ] Output: Detection instance [start, end] Output Next frame to glimpse Frame model Input: A frame t = 0 t = T [YeuRusMorFei CVPR’16] Olga Russakovsky: The human side of computer vision

Our model for efficient action detection [ ] [ ] Output: Detection instance [start, end] Output Output Next frame to glimpse Frame model t = 0 t = T [YeuRusMorFei CVPR’16] Olga Russakovsky: The human side of computer vision

Our model for efficient action detection [ ] [ ] [ ] Output: Detection instance [start, end] Output Output Output … Next frame to glimpse Frame model t = 0 t = T [YeuRusMorFei CVPR’16] Olga Russakovsky: The human side of computer vision

Our model for efficient action detection [ ] [ ] [ ] Output: Detection instance [start, end] Output Output Output … Next frame to glimpse Recurrent neural network (time information) Convolutional neural network (frame information) t = 0 t = T [YeuRusMorFei CVPR’16] Olga Russakovsky: The human side of computer vision

Our model for efficient action detection � � [ ] Optional output: Detection instance [start, end] Output Output Output Output: … Next frame to glimpse Recurrent neural network (time information) Convolutional neural network (frame information) t = 0 t = T [YeuRusMorFei CVPR’16] Olga Russakovsky: The human side of computer vision

Training the detection instance output Positive video Negative video [ ] [ ] Training data t = T t = T t = 0 t = 0 [YeuRusMorFei CVPR’16] Olga Russakovsky: The human side of computer vision

Training the detection instance output Positive video Negative video [ ] [ ] Training data t = T t = T t = 0 t = 0 Aside: • effective video annotation [YeuRusJinAndMorFei UnderReview] [LiuRusDenBerFei ImageNetChallenge ’ 15] • weakly supervised detection [YeuRamRusMorFei InPreparation] [YeuRusMorFei CVPR’16] Olga Russakovsky: The human side of computer vision

Training the detection instance output Positive video Negative video g 1 g 2 [ ] [ ] Training data t = T t = T t = 0 t = 0 [ ] ] [ ] [ ] [ Detections t = T t = T t = 0 t = 0 d 1 d 2 d 3 d 4 [YeuRusMorFei CVPR’16] Olga Russakovsky: The human side of computer vision

Training the detection instance output Positive video Negative video g 1 g 2 [ ] [ ] Training data t = T t = T t = 0 t = 0 [ ] ] [ ] [ ] [ Detections t = T t = T t = 0 t = 0 d 1 d 2 d 3 d 4 Reward for detection [YeuRusMorFei CVPR’16] Olga Russakovsky: The human side of computer vision

Training the detection instance output Positive video Negative video g 1 g 2 [ ] [ ] Training data t = T t = T t = 0 t = 0 y 2 = 1 y 3 = 2 y 4 = 0 y 1 = 1 [ ] ] [ ] [ ] [ Detections t = T t = T t = 0 t = 0 d 1 d 2 d 3 d 4 Reward for detection cross-entropy classification loss [YeuRusMorFei CVPR’16] Olga Russakovsky: The human side of computer vision

Training the detection instance output Positive video Negative video g 1 g 2 [ ] [ ] Training data t = T t = T t = 0 t = 0 y 2 = 1 y 3 = 2 y 4 = 0 y 1 = 1 [ ] ] [ ] [ ] [ Detections t = T t = T t = 0 t = 0 d 1 d 2 d 3 d 4 Reward for detection cross-entropy L 2 distance classification loss localization loss [YeuRusMorFei CVPR’16] Olga Russakovsky: The human side of computer vision

Training the non-differentiable outputs [ ] [ ] Training data t = T t = 0 [ ] [ ] [ ] Detections t = T t = 0 [YeuRusMorFei CVPR’16] Olga Russakovsky: The human side of computer vision

Training the non-differentiable outputs [ ] [ ] Training data t = T t = 0 [ ] [ ] [ ] Detections t = T t = 0 d 1 d 2 d 3 � (1) whether to predict a detection Model’s action Frame 1 Frame 8 Frame 6 Frame 15 sequence a (2) where to look next go to frame 15 go to frame 8 go to frame 6 [YeuRusMorFei CVPR’16] Olga Russakovsky: The human side of computer vision

Training the non-differentiable outputs [ ] [ ] Training data t = T t = 0 [ ] [ ] [ ] Detections t = T t = 0 d 1 d 2 d 3 � (1) whether to predict a detection Model’s action Frame 1 Frame 8 Frame 6 Frame 15 sequence a (2) where to look next go to frame 15 go to frame 8 go to frame 6 Train an policy for actions (1) and (2) using REINFORCE [Williams 1992] [YeuRusMorFei CVPR’16] Olga Russakovsky: The human side of computer vision

Training the non-differentiable outputs [ ] [ ] Training data t = T t = 0 good bad bad [ ] [ ] [ ] Detections t = T t = 0 d 1 d 2 d 3 � (1) whether to predict a detection Model’s action Frame 1 Frame 8 Frame 6 Frame 15 sequence a (2) where to look next go to frame 15 go to frame 8 go to frame 6 Train an policy for actions (1) and (2) using REINFORCE [Williams 1992] Reward for an action sequence : [YeuRusMorFei CVPR’16] Olga Russakovsky: The human side of computer vision

Training the non-differentiable outputs [ ] [ ] Training data t = T t = 0 good bad bad [ ] [ ] [ ] Detections t = T t = 0 d 1 d 2 d 3 � (1) whether to predict a detection Model’s action Frame 1 Frame 8 Frame 6 Frame 15 sequence a (2) where to look next go to frame 15 go to frame 8 go to frame 6 Train an policy for actions (1) and (2) using REINFORCE [Williams 1992] Reward for an action sequence : Objective: Gradient: Monte-Carlo approximation: [YeuRusMorFei CVPR’16] Olga Russakovsky: The human side of computer vision

Accuracy Efficiency Interpretability [YeuRusMorFei CVPR’16] Olga Russakovsky: The human side of computer vision

✓ Accuracy Efficiency Detection AP at IOU 0.5 Dataset State-of-the-art Our result 17.1 THUMOS 2014 14.4 36.7 ActivityNet sports 33.2 39.9 ActivityNet work 31.1 Interpretability [YeuRusMorFei CVPR’16] Olga Russakovsky: The human side of computer vision

✓ ✓ Accuracy Efficiency Glimpse only 2% of video frames Detection AP at IOU 0.5 Dataset State-of-the-art Our result 17.1 THUMOS 2014 14.4 36.7 ActivityNet sports 33.2 39.9 ActivityNet work 31.1 Interpretability [YeuRusMorFei CVPR’16] Olga Russakovsky: The human side of computer vision

✓ ✓ Accuracy Efficiency Glimpse only 2% of video frames Detection AP at IOU 0.5 Dataset Samping Detection AP at IOU 0.5 State-of-the-art Our result Uniform 9.3 17.1 THUMOS 2014 14.4 17.1 Our glimpses 36.7 ActivityNet sports 33.2 39.9 ActivityNet work 31.1 Interpretability [YeuRusMorFei CVPR’16] Olga Russakovsky: The human side of computer vision

✓ ✓ Accuracy Efficiency Glimpse only 2% of video frames Detection AP at IOU 0.5 Dataset Samping Detection AP at IOU 0.5 State-of-the-art Our result Uniform 9.3 17.1 THUMOS 2014 14.4 17.1 Our glimpses 36.7 ActivityNet sports 33.2 39.9 ActivityNet work 31.1 ✓ Interpretability [ ] Ground truth Javelin throw [ ] Detections Javelin throw Glimpses Frames [YeuRusMorFei CVPR’16] Olga Russakovsky: The human side of computer vision

End-to-end Learning of Action Detection from Frame Glimpse in - PowerPoint PPT Presentation

End-to-end Learning of Action Detection from Frame Glimpse in Videos BIL722 - Advanced Topics in Computer Vision Ezgi Peken Soysal Hacettepe University Task: what is the person doing? Input Output t = 0 t = T Running Talking Olga

Kinds of picture Single frame Kinds of picture Single frame Multi-frame Kinds of

What is frame busting? What is frame busting? HTML allows for any site to frame any URL with an

End-to-end Learning of Action Detection from Frame Glimpses in Videos CVPR 2016 Serena Yeung,

Frame Relay Topologies and Designs Frame Relay Topologies and Design As we learned in the Frame

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Green Action Centre, 2019 Green Action Centre, 2019 Green Action Centre, 2019 Green Action

FRAME- -DRAGGI NG DRAGGI NG FRAME (GRAVI TOMAGNETI SM) (GRAVI TOMAGNETI SM) AND I TS

Deck Deck Frame Frame DeckFrame Deck Frame is the utilization of VP Buildings

The Frame of the p -Adic Numbers Francisco Avila June 27, 2017 Francisco Avila The Frame

Solving Quadratic BSDEs Hlne HIBON 29/06/16 Contents Introduction The convex frame The

How to make Key-Frame Animation with Automatic Function 1. The Aurora 3D Animation has key-frame

AND ITS MEASUREMENT AND ITS MEASUREMENT INTRODUCTION INTRODUCTION Frame- -Dragging Dragging

Frame Relay Basic Configurations: Point to Point Frame Relay Basic Point to Point Configuration

Frame Relay Basic Configurations: Hub and Spoke Frame Relay Basic Hub and Spoke Configuration

Surfin ng the f frame n et A frame is a complex condition on its

Municipal Water District of Orange County May 1, 2019 Action 1 Action 1 Action 2 Action 2

1. Introduction Introduction Basics Simple Statistics More on S What is R? 1.1 What is R?

Debra Hoffos I love You, L ORD , my s trength . The L ORD is my rock , my fortress and my deliverer

Propositions & Example: There are 5 regular solids. 6 Logical True False Operations

NEW DEVELOPMENTS IN GEOMETRIC MECHANICS Janusz Grabowski (Polish Academy of Sciences) GEOMETRY

Investigating the impact of the Large Scale on distributed systems F. Cappello INRIA

Making cards for the adults who have helped us learn this year in Chestnut/Willow 1 Week 7

1. Introduction excerpt from the lecture at ETHZ (1V + 1U) , Autumn Sem. 2010 In this Chapter you

burlingtonbaptist.org.uk/sovereign Four Faces - a lion, an ox, a man, an eagle Ezekiel 1:6, 10

End-to-end Learning of Action Detection from Frame Glimpse in - PowerPoint PPT Presentation

End-to-end Learning of Action Detection from Frame Glimpse in Videos BIL722 - Advanced Topics in Computer Vision Ezgi Peken Soysal Hacettepe University Task: what is the person doing? Input Output t = 0 t = T Running Talking Olga

Kinds of picture Single frame Kinds of picture Single frame Multi-frame Kinds of

What is frame busting? What is frame busting? HTML allows for any site to frame any URL with an

End-to-end Learning of Action Detection from Frame Glimpses in Videos CVPR 2016 Serena Yeung,

Frame Relay Topologies and Designs Frame Relay Topologies and Design As we learned in the Frame

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Green Action Centre, 2019 Green Action Centre, 2019 Green Action Centre, 2019 Green Action

FRAME- -DRAGGI NG DRAGGI NG FRAME (GRAVI TOMAGNETI SM) (GRAVI TOMAGNETI SM) AND I TS

Deck Deck Frame Frame DeckFrame Deck Frame is the utilization of VP Buildings

The Frame of the p -Adic Numbers Francisco Avila June 27, 2017 Francisco Avila The Frame

Solving Quadratic BSDEs Hlne HIBON 29/06/16 Contents Introduction The convex frame The

How to make Key-Frame Animation with Automatic Function 1. The Aurora 3D Animation has key-frame

AND ITS MEASUREMENT AND ITS MEASUREMENT INTRODUCTION INTRODUCTION Frame- -Dragging Dragging

Frame Relay Basic Configurations: Point to Point Frame Relay Basic Point to Point Configuration

Frame Relay Basic Configurations: Hub and Spoke Frame Relay Basic Hub and Spoke Configuration

Surfin ng the f frame n et A frame is a complex condition on its

Municipal Water District of Orange County May 1, 2019 Action 1 Action 1 Action 2 Action 2

1. Introduction Introduction Basics Simple Statistics More on S What is R? 1.1 What is R?

Debra Hoffos I love You, L ORD , my s trength . The L ORD is my rock , my fortress and my deliverer

Propositions &amp; Example: There are 5 regular solids. 6 Logical True False Operations

NEW DEVELOPMENTS IN GEOMETRIC MECHANICS Janusz Grabowski (Polish Academy of Sciences) GEOMETRY

Investigating the impact of the Large Scale on distributed systems F. Cappello INRIA

Making cards for the adults who have helped us learn this year in Chestnut/Willow 1 Week 7

1. Introduction excerpt from the lecture at ETHZ (1V + 1U) , Autumn Sem. 2010 In this Chapter you

burlingtonbaptist.org.uk/sovereign Four Faces - a lion, an ox, a man, an eagle Ezekiel 1:6, 10

Propositions & Example: There are 5 regular solids. 6 Logical True False Operations