UCF Yogesh S Rawat, Aayush Rana, Praveen Tirupattur, and Mubarak - PowerPoint PPT Presentation

UCF Yogesh S Rawat, Aayush Rana, Praveen Tirupattur, and Mubarak Shah Center for Research in Computer Vision University of Central Florida

Contents • Activity Detection in Untrimmed Videos • AD Task • Activity Object Detection in Untrimmed Videos • AOD Task

Activity Detection (AD) in Untrimmed Videos 4

Action Analysis in Video Given Untrimmed videos • Containing multiple • actors • actions • action labels per actor • Varying length of action • Unbalanced dataset (low samples) • We want to • Localize all actions • Classify each action 5

Key Points • Bottom up foreground background segmentation • Detect actions tubes from long untrimmed videos • Classify each instance individually • Activity tube generation 6

Overview of Architecture Tubes ( 8, 112, 112, 3 ) Untrimmed Video . . . Foreground/ Tube Clip 1 Clip 2 Clip N Background Extraction Input video Segmentation ( 8, 448, 800, 3 ) Network Action classifier • Divide video into smaller clips Tube T Tube 1 Tube 2 • Send one clip at a time as input . . . Stitch output • Perform foreground segmentation • Find connected components Long tube • Classify each tube (resized to 112 x 112) • Stitch classified tubes Individual actions 7

Foreground/ Background Segmentation Network Encoder block Skip Connections Localization output Decoder block 8

Output Tubes Tube Extraction ( 112 x 112 x 3) Input Video Multiply and Connected (448 x 800 x 3) segment components foreground Localization Mask (448 x 800 x 1) 9

Action Classification Output Tubes Classification output ( 8 x 112 x 112 x 3) Classification Block Transport Heavycarry: 0.69 Walking: 0.81 Vehicle moving: 0.86 Network (ResNet 3D) Extracted Output Features Standing: 0.73 Talking: 0.65 Interacts: 0.77 10

Tube Stitching . . . Tube T Tube 1 Tube 2 Vehicle Stopping Long Tube Vehicle Turning Left Vehicle Turning Right 11

Final Output (Example-1) 12

NIST Evaluation on Validation Set Metric name Metric Value Mean-p_miss @ 0.01 rfa 0.9066 Mean-p_miss @ 0.03 rfa 0.8478 Mean-p_miss @ 0.1 rfa 0.6973 Mean-p_miss @ 0.15 rfa 0.6608 Mean-p_miss @ 0.2 rfa 0.6279 Mean-p_miss @ 1 rfa 0.4633 N-mide 0.2045 15

Issues • Imbalanced Dataset • Extremely low samples for some classes • Similar activities being confused by classifier • Activities far from camera • Very small activities, hard to locate 16

Contents • Activity Detection in Untrimmed Videos • AD Task • Activity Object Detection in Untrimmed Videos • AOD Task

Activity Detection based on Actor-Object Interaction 18

Actor Object Interaction in Videos • Given an untrimmed video, localize • all actors present • all objects interacted with • Classify Activities based on the actor-object interaction

Challenges • Multiple actor-object instances in single clip • Multiple actors and objects • Same actor-object combination in multiple classes • Opening door, closing door • Same actor-object instance with multiple labels • Exiting, closing door

Approaches • Region Proposals • Based on bounding box proposals T-CNN [1], Mask-CNN [2] • Bottom-up approach • Regression over full space • Encoder-Decoder • Unified semantic segmentation ST-CNN [3], SegNet [4] • Issue with multiple activity instances • Need of connected components and post processing [1] Hui et al. "Tube convolutional neural network (T-CNN) for action detection in videos." In IEEE international conference on computer vision. 2017. [2] He et al. "Mask r-cnn." In Computer Vision (ICCV), 2017 IEEE International Conference on, pp. 2980-2988. IEEE, 2017. [3] Rui et al. "An End-to-end 3D Convolutional Neural Network for Action Detection and Segmentation in Videos." arXiv preprint arXiv:1712.01111 (2017). [4] Badrinarayananet al. "Segnet: A deep convolutional encoder-decoder architecture for image segmentation." arXiv preprint arXiv:1511.00561 (2015).

Motivation • End-to-end training framework • Completely remove region proposal and ToI/RoI pooling • Use actor-object attention instead • Multiple tasks • Foreground/background • Objects • Actions • Model convergence using multiple losses • Joint actor-object action classification 22

Action Classification in Videos Object : Objects : - Vehicle - Person Action : Activity Talking (red), Action : Vehicle turning left Activity Carrying (green) 23

Overview of Proposed Architecture • Get 8 frame video clip • Generate foreground / background segmentation mask • Generate object segmentation mask for each object type Foreground/Background Segmentation • Use fg/bg segmentation for feature attention • Classify action using actor - object information Skip Connections Input video Object Classification Video Encoder Decoder C3D or I3D backbone

End-to-end network for Video Action Segmentation Encoder Block Skip connections Object Classification Branch Decoder Block O • Encode video features (Conv 3D) • Decode features (Deconv 3D) with skip connection • Segment foreground/background • Segment each object class 25

Quantitative Results • DIVA data subset • Smaller clips focusing on activity used (128 x 192 resolution) • 64 training videos, 55 validation videos • 19 action classes (DIVA 1B set) • 2 object classes (person and vehicle) • Action object localization IoU: 0.64 • Classification F1 Score (19 classes): 0.46 26

Qualitative Results Input Foreground/Background Action classification Object segmentation segmentation Talking (Red) (3 people) (Only moving objects) Carrying (Green) 27

Qualitative Results Input Foreground/Background Action classification Object segmentation segmentation Vehicle turning left Vehicle (Green) (Only moving objects) Person (Red) 28

NIST Evaluation on Validation Set Activity Detection Metric Value mean-p_miss@0.01rfa 0.954337382386 mean-p_miss@0.03rfa 0.925133046316 mean-p_miss@0.15rfa 0.757087143515 mean-p_miss@0.1rfa 0.784522064048 mean-p_miss@0.2rfa 0.739966420528 mean-p_miss@1rfa 0.605960537865

NIST Evaluation on Validation Set Object detection Metric Value mean-mean-object-p_miss@0.033rfa 0.7397920634 mean-mean-object-p_miss@0.1rfa 0.673425676293 mean-mean-object-p_miss@0.2rfa 0.624957826044 mean-mean-object-p_miss@0.5rfa 0.538296977439

Thank you!

UCF Yogesh S Rawat, Aayush Rana, Praveen Tirupattur, and Mubarak - PowerPoint PPT Presentation

UCF Yogesh S Rawat, Aayush Rana, Praveen Tirupattur, and Mubarak Shah Center for Research in Computer Vision University of Central Florida Contents Activity Detection in Untrimmed Videos AD Task Activity Object Detection in Untrimmed

1 ~UCF ~UCF Methodology response to Strategic Initiative 10: Enhance UCF Community

UCF Lake Nona Cancer Center Discussion Orange County Board of County Commissioners August 21,

UCF/Alafaya Trail Pedestrian Safety Study & Campus Development Agreement BCC Public Hearing

Searchable Security Scheme for Cloud NoSQL Mohammad Ahmadian ahmadian@knights.ucf.edu Advisor:

SECURE QUERY PROCESSING in CLOUD NoSQL Mohammad Ahmadian ahmadian@knights.ucf.edu University of

Lessons Learned from A Three-Week Lessons Learned from A Three-Week Long User Study w ith

in an Adverse Business Environment Michael Doc Terry Associate Professor UCF Rosen College

Enterprise Resource Planning Faculty Senate July 9, 2020 7/9/2020 1 What is UCF ERP today?

UCF Office of Undergraduate Admissions Review DE/EA Program with OCPS Guidance Counselors

Open Enrollment Information 2021 Plan Year UCF Human Resources Benefits Section 407.823.2771

Going Mobile with Canvas: Beyond the Basics Ryan Seilhamer (UCF) Sandesh Tuladhar (Columbia) Ryan

Financials Focus Group April 10, 2020 Financials Updates: UCF Rising HRS-PeopleSoft Integration

Cloud Computing RICS tutorial Dan C. Marinescu Computer Science Division EECS Department, UCF

UCF Creative School for Children is honored to welcome you to our family! Our Mission We strive

Introduction Partnership perspective Social capital Relationship building Program

in Finance (UCF) Sophomore Open House Director of Undergraduate Studies : Yacine Ait-Sahalia

W HAT ARE E XPRESS S ERVICES ? Core elements : Triage-based STD testing without a full physical

for Communities We will begin at approximately Please use your computers speakers for

Jim Galvins Slides Exports: Today & Tomorrow Jim Galvin Chief Executive Officer

Asset Pricing Chapter VI. Risk Aversion and Investment Decisions, Part II: Modern Portfolio

Gra Grant nt Coordi rdina nato tor M r Meeti ting ng Febru ruary ry 2019 2019 Office

I PB X-ray and IR spectrometry Quantitative Rntgenfluoreszenzanalyse Woran wir g lauben und was

Q1 Abt Associates SNAP and food assistance policy Klerman, JA and C Danielson, 2011.

3.12: Closure Properties of Regular Languages In this section, we show how to convert regular

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

UCF Yogesh S Rawat, Aayush Rana, Praveen Tirupattur, and Mubarak - PowerPoint PPT Presentation

UCF Yogesh S Rawat, Aayush Rana, Praveen Tirupattur, and Mubarak Shah Center for Research in Computer Vision University of Central Florida Contents Activity Detection in Untrimmed Videos AD Task Activity Object Detection in Untrimmed

1 ~UCF ~UCF Methodology response to Strategic Initiative 10: Enhance UCF Community

UCF Lake Nona Cancer Center Discussion Orange County Board of County Commissioners August 21,

UCF/Alafaya Trail Pedestrian Safety Study &amp; Campus Development Agreement BCC Public Hearing

Searchable Security Scheme for Cloud NoSQL Mohammad Ahmadian ahmadian@knights.ucf.edu Advisor:

SECURE QUERY PROCESSING in CLOUD NoSQL Mohammad Ahmadian ahmadian@knights.ucf.edu University of

Lessons Learned from A Three-Week Lessons Learned from A Three-Week Long User Study w ith

in an Adverse Business Environment Michael Doc Terry Associate Professor UCF Rosen College

Enterprise Resource Planning Faculty Senate July 9, 2020 7/9/2020 1 What is UCF ERP today?

UCF Office of Undergraduate Admissions Review DE/EA Program with OCPS Guidance Counselors

Open Enrollment Information 2021 Plan Year UCF Human Resources Benefits Section 407.823.2771

Going Mobile with Canvas: Beyond the Basics Ryan Seilhamer (UCF) Sandesh Tuladhar (Columbia) Ryan

Financials Focus Group April 10, 2020 Financials Updates: UCF Rising HRS-PeopleSoft Integration

Cloud Computing RICS tutorial Dan C. Marinescu Computer Science Division EECS Department, UCF

UCF Creative School for Children is honored to welcome you to our family! Our Mission We strive

Introduction Partnership perspective Social capital Relationship building Program

in Finance (UCF) Sophomore Open House Director of Undergraduate Studies : Yacine Ait-Sahalia

W HAT ARE E XPRESS S ERVICES ? Core elements : Triage-based STD testing without a full physical

for Communities We will begin at approximately Please use your computers speakers for

Jim Galvins Slides Exports: Today &amp; Tomorrow Jim Galvin Chief Executive Officer

Asset Pricing Chapter VI. Risk Aversion and Investment Decisions, Part II: Modern Portfolio

Gra Grant nt Coordi rdina nato tor M r Meeti ting ng Febru ruary ry 2019 2019 Office

I PB X-ray and IR spectrometry Quantitative Rntgenfluoreszenzanalyse Woran wir g lauben und was

Q1 Abt Associates SNAP and food assistance policy Klerman, JA and C Danielson, 2011.

3.12: Closure Properties of Regular Languages In this section, we show how to convert regular

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

UCF/Alafaya Trail Pedestrian Safety Study & Campus Development Agreement BCC Public Hearing

Jim Galvins Slides Exports: Today & Tomorrow Jim Galvin Chief Executive Officer