Localiza)on using Faster R-CNN and Mul)-Frame Fusion Ryosuke - PowerPoint PPT Presentation

Localiza)on using Faster R-CNN and Mul)-Frame Fusion Ryosuke Yamamoto, Nakamasa Inoue, Koichi Shinoda Tokyo Ins8tute of Technology

Outline Mo)va)on: detect an ac)on concept “Si?ngDown” Our method: Faster R-CNN + LSTM + Re-scoring Annota)on: Frame-wise annota)on for Si?ngDown, Key-frame annota)on for other concepts Results: 2nd among 3 teams, best result at Si?ngDown 0.5 iframe_fscore 0.4 mean_pixel_fscore 0.3 F-score 0.2 0.1 0 ��

Mo)va)on ・ Localiza)on task focuses not only on sta)c objects, � but also on ac)on concepts ・ We focus on Si?ngDown, one of ac)on concepts ・ How to dis)nguish between Si?ng and Si?ngDown? → Dynamic informa)on is important for precise detec)on Si?ng � Si?ngDown � ��

Our Method ・ Faster-RCNN (Ren 2015) Faster R-CNN �� - Efficient object localiza)on ・ LSTM (Donahue 2015) �� - Precise ac)on localiza)on �� - Applied to Si?ngDown Fusion ・ Re-scoring (Yamamoto 2015) LSTM LSTM LSTM �� - Mul)-frame Score Fusion �� - Mul)-Shot Score Boos)ng Boost Boost Boost Prediction Prediction Prediction Time Sequence ��

Faster R-CNN (Ren 2015) Efficient End-to-End object localiza)on Region Region proposals proposals 1. Generate region proposals by a network 2. Predict scores for each region by using CNN features Example CNNs: - ZF Net (Zeiler 2014) � we use CNN - VGG-16 (Simonyan 2014) - GoogLeNet (Szegedy 2015) ROI Pooling ROI Pooling - ResNet (He 2016) DNN DNN ��

Long Short-Term Memory (LSTM) An LSTM layer is introduced to Faster R-CNN - memorize long and short term informa)on - applied only to Si?ngDown Faster Faster Faster R-CNN R-CNN R-CNN LSTM LSTM LSTM Prediction Prediction Prediction Time Sequence ��

Mul)-Frame and Mul)-Shot (Yamamoto 2015) Average � Mul)-Frame Score Fusion l Average pooling of scores over 5 frames in a shot Key-frame Mul)-Shot Score Boos)ng (I-frame) � l Add adjacent shot scores ��

Key-Frame Annota)ons Bounding-box annota)on on the representa)ve key-frame for each shot labeled as posi)ve in collabora)ve annota)on Concept � # frames � # boxes � Concept � # frames � # boxes � Animal 11,545 9,155 Inst.Musician 4,923 7,229 Bicycling 599 1,355 Running 945 1,394 Boy 1,848 2,492 Si?ngDown - - Dancing 2,118 5,199 Baby 898 895 ExplosionFire 2,483 2,402 � Skier � 320 � 521 � ��

I-Frame Annota)ons for Si?ngDown I-Frame annota)on for Si?ngDown to train LSTM l Annota)on results l # shots = 92 # frames = 481 # bounding-boxes = 515 * We found Si?ngDown in only 92 shots in the 3K shots labeled as posi)ve in collabora)ve annota)on ��

Results ID � Method � RunID � 1* Faster R-CNN + Mul)-Frame Score Fusion fusion 2* 1 + Mul)-Shot Score Boos)ng boost 3* 1 + LSTM(4096units) for Si?ngDown fusion.lstm 4* 2 + LSTM(4096units) for Si?ngDown boost.lstm 5 2 + LSTM(64units) for Si?ngDown (post exp.) � 0.5 iframe_fscore 0.4 TokyoTech Runs � mean_pixel_fscore 0.3 F-score 0.2 0.1 0 2nd among 3 teams l ��

Results for Si?ngDown Best result for Si?ngDown with run #2 LSTM with 4096 units (run #4) did not work → LSTM with 64 units (run #5) avoided over-fi?ng and worked in post submission experiment ID � Method � I-Frame F-score � Pixel F-score � 2* Fusion + Boos)ng 0.63 0.22 4* 2 + LSTM (4096units) 0.00 0.00 5 2 + LSTM (64units) 11.96 � 4.51 � �

SittingDown Re-trained network with LSTM 64 units System output Good cases Bad cases Ground truth Sitting down Moving but not sitting down Moving around a chair ��

Animal, Good Results Faster R-CNN Score Fusion Score Boosting Cat (no movement) Dog (walking) System output Ground truth ��

Animal, Bad Results Faster R-CNN Score Fusion Score Boosting Many animals Bird (flying fast) System output Ground truth ��

Others Faster R-CNN Score Fusion Score Boosting Bicycling Boy System output Ground truth ��

Others Faster R-CNN Score Fusion Score Boosting Dancing ExplosionFire System output Ground truth ��

Others Faster R-CNN Score Fusion Score Boosting InstrumentalMusician Running System output Ground truth ��

Others Faster R-CNN Score Fusion Score Boosting Baby Skier System output Ground truth ��

Conclusion & Future Work We proposed a localiza)on system l - Faster R-CNN + LSTM + Re-scoring Manual annota)on l - 31K bounding boxes Results l - 2nd among 3 teams, best result at Si?ngDown - LSTM with 64 units was effec)ve for Si?ngDown Future work l - Find a beoer way to localize ac)on ��

Localiza)on using Faster R-CNN and Mul)-Frame Fusion Ryosuke - PowerPoint PPT Presentation

Localiza)on using Faster R-CNN and Mul)-Frame Fusion Ryosuke Yamamoto, Nakamasa Inoue, Koichi Shinoda Tokyo Ins8tute of Technology Outline Mo)va)on: detect an ac)on concept Si?ngDown Our method: Faster R-CNN + LSTM + Re-scoring Annota)on:

CS7015 (Deep Learning) : Lecture 12 Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only

Object Detection using R-CNN Experiments CS381V: Visual Recognition, Spring 2016 William Xie

FASTER TRANSFORMER Bo Yang Hsueh, 2019/12/18 AGENDA What is Faster Transformer Introduce the

Decay vertex ID using CNN for p K+ Aaron Higuera University of Houston CNN Tools on

OverFeat Integrated Recogni.on, Localiza.on and Detec.on using

CNN Ba CNN Based ed Pi Pipeline peline for or Op Optical ical Fl Flow ow Tal Schuster,

CENG5030 Part 2-1: Introduction to Convolutional Nueral Network Bei Yu (Latest update: March 4,

Nue Energy Reconstruction with CNN Lars Hertel, Ilsoo Seong, Jianming Bian 2018/08/20 Intro.

On Secure Ranging and Localiza:on Srdjan apkun Department

Institute of Telecommunications (ITK) Joint Localiza,on Algorithms for Network

Incremental and Approximate Inference for Faster Occlusion-based Deep CNN Explanations Supun

Faster Cover Trees Mike Izbicki and Christian R. Shelton UC Riverside Izbicki and Shelton (UC

WRITING FASTER CODE 1 . 1 WRITING FASTER CODE AND NOT HATING YOUR JOB AS A SOFTWARE DEVELOPER

Water Rights Accounting New Accounting Model New Technology: 1979 versus 2011 Faster

Faster Johnson-Lindenstrauss style reductions Aditya Menon August 23, 2007 Faster

Faster Code Nicolas Limare 2014/11/19 faster? one task vs many speeds one operation vs many

Exercise 1: Basic Input Exercise 1: Basic Input FLUKA Beginners Course Exercise 1: Basic Input

Drasil: From generating code to generating software Jacques Carette, Spencer Smith, Dan Szymczak

January 18, Week 1 Today: Chapter 2, Position and Average Velocity Homework Assignment #1 - Due

Learning Collaborative (TALC) FY19 Quarter 3 Meeting May 15, 2019 Housekeeping All lines

Chemistry 2000 Slide Set 4: Molecular spectroscopy of diatomic molecules Marc R. Roussel January

Y N A B You Need a Budget! You have not budgeted like this The YNAB Methodology Four rules:

Skip Day? U nit 5: I nference for categorical variables The table below shows the number of pupils

Featured Guest Tony Biele CENTURY 21 Action Plus Realty Jackson, New Jersey 1 The Seller Lead