Learning Realis,c Human Ac,ons from Movies I. Laptev, M. - PowerPoint PPT Presentation

Learning ¡Realis,c ¡Human ¡Ac,ons ¡ from ¡Movies ¡ I. ¡Laptev, ¡M. ¡Marszałek, ¡C. ¡Schmid ¡and ¡B. ¡Rozenfeld. ¡ ¡CVPR ¡2008. ¡ Presented ¡by: ¡Islam ¡Beltagy ¡ Girish ¡Malkarnenkar ¡ Experiment ¡presenta2on ¡for ¡CS ¡395T ¡ 9 th ¡November ¡2012 ¡

• Realis2c ¡varia2on ¡of ¡human ¡ac2ons ¡ • Many ¡classes ¡and ¡many ¡examples ¡per ¡class ¡ Problems: ¡ • Typically ¡only ¡a ¡few ¡class-‑samples ¡per ¡movie ¡ • Manual ¡annota2on ¡is ¡very ¡2me ¡consuming ¡ Slide ¡from: ¡link ¡

• Scripts available for >500 movies (no time synchronization) ¡ ¡www.dailyscript.com, ¡www.movie-‑page.com, ¡www.weeklyscript.com ¡… ¡ • Subtitles (with time info.) are available for the most of movies • Can transfer time to scripts by text alignment movie ¡script ¡ sub,tles ¡ … 1172 … 01:20:17,240 --> 01:20:20,437 RICK Why weren't you honest with me? Why weren't you honest with me? Why Why'd you keep your marriage a secret? did you keep your marriage a secret? 01:20:17 1173 Rick sits down with Ilsa. 01:20:20,640 --> 01:20:23,598 01:20:23 lt wasn't my secret, Richard. ILSA Victor wanted it that way. Oh, it wasn't my secret, Richard. Victor wanted it that way. Not even 1174 our closest friends knew about our 01:20:23,800 --> 01:20:26,189 marriage. Not even our closest friends … knew about our marriage. Slide ¡from: ¡link ¡ …

• Annotate ¡ac2on ¡samples ¡in ¡text ¡ • Do ¡automa2c ¡script-‑to-‑video ¡alignment ¡ • Check ¡the ¡correspondence ¡of ¡ac2ons ¡in ¡scripts ¡and ¡movies ¡ Example ¡of ¡a ¡“visual ¡false ¡posi2ve” ¡ A ¡black ¡car ¡pulls ¡up, ¡two ¡army ¡ officers ¡get ¡out. ¡ a: ¡quality ¡of ¡sub2tle-‑script ¡matching ¡ ¡ Slide ¡from: ¡link ¡

Bag ¡of ¡space-‑2me ¡features ¡+ ¡mul2-‑channel ¡SVM ¡ [Schuldt’04, ¡Niebles’06, ¡Zhang’07] ¡ Collec2on ¡of ¡space-‑2me ¡patches ¡ Visual ¡vocabulary ¡ Histogram ¡of ¡visual ¡words ¡ Mul2-‑channel ¡ HOG ¡& ¡HOF ¡ SVM ¡ patch ¡ Classifier ¡ descriptors ¡ Slide ¡from: ¡link ¡

• ¡Space-‑2me ¡corner ¡detector ¡ [Laptev, ¡IJCV ¡2005] ¡ • ¡Dense ¡scale ¡sampling ¡(no ¡explicit ¡scale ¡selec2on) ¡ Slide ¡from: ¡link ¡

Mul2-‑scale ¡space-‑2me ¡patches ¡from ¡ corner ¡detector ¡ Histogram ¡of ¡oriented ¡ Histogram ¡ spa2al ¡grad. ¡(HOG)� ¡ of ¡op2cal ¡ • flow ¡(HOF)� ¡ Public ¡code ¡available ¡at ¡ www.irisa.fr/vista/ac2ons ¡ 3x3x2x5bins ¡ HOF ¡ 3x3x2x4bins ¡ HOG ¡ descriptor ¡ descriptor ¡ Slide ¡from: ¡link ¡

We ¡use ¡global ¡spa2o-‑temporal ¡grids ¡ � ¡ ¡In ¡the ¡spa2al ¡domain: ¡ � 1x1 ¡(standard ¡BoF) ¡ � 2x2, ¡o2x2 ¡(50% ¡overlap) ¡ � h3x1 ¡(horizontal), ¡v1x3 ¡(ver2cal) ¡ � 3x3 ¡ � ¡ ¡In ¡the ¡temporal ¡domain: ¡ � t1 ¡(standard ¡BoF), ¡t2, ¡t3 ¡ Figure: ¡Examples ¡of ¡a ¡few ¡spa2o-‑temporal ¡grids ¡ • ¡ • ¡ • ¡ Slide ¡from: ¡link ¡ Quan2za2on: ¡

We ¡use ¡SVMs ¡with ¡a ¡mul2-‑channel ¡chi-‑square ¡kernel ¡for ¡ classifica2on ¡ � Channel ¡ c ¡is ¡a ¡combina2on ¡of ¡a ¡detector, ¡descriptor ¡and ¡a ¡ grid ¡ � D c (H i , ¡H j ) ¡is ¡the ¡chi-‑square ¡distance ¡between ¡histograms ¡ � A c ¡is ¡the ¡mean ¡value ¡of ¡the ¡distances ¡between ¡all ¡training ¡ samples ¡ � The ¡best ¡set ¡of ¡channels ¡ C ¡for ¡a ¡given ¡training ¡set ¡is ¡found ¡ based ¡on ¡a ¡greedy ¡approach ¡ Slide ¡from: ¡link ¡

STIP ¡in ¡Ac2on! ¡ • Link ¡to ¡a ¡2min ¡video ¡showing ¡the ¡author’s ¡ CVPR ¡2008 ¡paper ¡results ¡[no2ce ¡the ¡sub2tle ¡ dialogue ¡and ¡human ¡ac2on/screenplay ¡ informa2on] ¡ ¡

Examples ¡of ¡STIP ¡detec2ons ¡ • AnswerPhone ¡ For ¡the ¡Hollywood ¡Dataset, ¡ • GetOutCar ¡ STIPs ¡are ¡calculated ¡only ¡ for ¡specified ¡start ¡& ¡end ¡ frames ¡from ¡the ¡ • HugPerson ¡ annota2ons ¡file ¡& ¡not ¡for ¡ the ¡whole ¡clip, ¡unlike ¡the ¡ • Kiss ¡ KTH ¡ac2on ¡clips… ¡ • SitDown ¡

Experimental ¡Dataset ¡1: ¡KTH ¡Ac2ons ¡ • 6 ¡classes ¡of ¡100 ¡clips ¡ each ¡[64 ¡training ¡& ¡36 ¡ tes2ng] ¡ • Same ¡size/split ¡as ¡ used ¡in ¡the ¡CVPR ¡ 2008 ¡paper ¡ Link ¡

KTH ¡Dataset ¡examples ¡ KTH ¡Training ¡& ¡Tes2ng ¡ • Boxing ¡ split ¡are ¡based ¡on ¡ making ¡sure ¡that ¡the ¡ • Hand-‑Clapping ¡ same ¡person ¡(actor) ¡ doesn’t ¡appear ¡in ¡both ¡ • Hand-‑Waving ¡ training ¡& ¡tes2ng! ¡ • Jogging ¡ • Running ¡ • Walking ¡ Between ¡which ¡2 ¡categories ¡do ¡you ¡expect ¡the ¡ ¡ most ¡confusion ¡in ¡a ¡6 ¡way ¡mul2-‑classifica2on ¡ task? ¡

Experimental ¡Dataset ¡2: ¡Hollywood ¡ • Selected ¡a ¡ subset ¡ of ¡the ¡ dataset ¡used ¡in ¡the ¡paper ¡ • 4 ¡classes ¡with ¡18 ¡videos ¡ each ¡[9 ¡training ¡& ¡9 ¡ tes2ng] ¡

Hollywood ¡Dataset ¡examples ¡ • GetOutCar ¡ Hollywood ¡Training ¡& ¡ Tes2ng ¡split ¡are ¡based ¡ • HandShake ¡ on ¡making ¡sure ¡that ¡clips ¡ from ¡the ¡ same ¡movie ¡ don’t ¡appear ¡in ¡both ¡ • Kiss ¡ training ¡& ¡tes2ng! ¡ • Stand-‑Up ¡ Between ¡which ¡2 ¡categories ¡do ¡you ¡expect ¡the ¡ ¡ most ¡confusion ¡in ¡a ¡4 ¡way ¡mul2-‑classifica2on ¡ task? ¡

Experiment ¡1: ¡HoG ¡& ¡HoF ¡ • Goal : ¡See ¡the ¡effect ¡of ¡HoG, ¡HoF ¡and ¡HoG +HoF ¡on ¡KTH ¡& ¡Hollywood ¡ ¡ • Did ¡a ¡simple ¡bag ¡of ¡features ¡approach ¡over ¡the ¡ full ¡video ¡ • 100k ¡features ¡randomly ¡sampled ¡from ¡the ¡ total ¡of ¡~300k ¡(HoG ¡| ¡HoF ¡| ¡HoG+HoF) ¡ descriptors) ¡to ¡form ¡4000 ¡clusters ¡ • Used ¡kchi2 ¡kernel ¡for ¡SVM ¡based ¡mul2-‑ classifica2on ¡(one ¡against ¡one) ¡

Classifica2on ¡Accuracy ¡ Dataset ¡ HoG ¡ HoF ¡ HoG+HoF ¡ (classes*tests ¡per ¡class) ¡ 69.44% ¡ 81.94% ¡ 79.17% ¡ KTH ¡ (150) ¡ (177) ¡ (171) ¡ (6*36=216) ¡ 44.44% ¡ 30.56% ¡ 33.33% ¡ Hollywood ¡ (4*9=36) ¡ (16) ¡ (11) ¡ (12) ¡

Discussion: ¡KTH ¡v/s ¡Hollywood… ¡ • Reason ¡behind ¡higher ¡mul2-‑classifica2on ¡ accuracy ¡achieved ¡on ¡ KTH ¡(~82%) ¡ than ¡on ¡ Hollywood ¡(~44%) ? ¡ • KTH ¡is ¡“easier” ¡than ¡Hollywood : ¡homogenous ¡ background ¡+ ¡choreographed ¡ac2ons ¡ • Hollywood ¡dataset: ¡variability ¡in ¡scale/ viewpoint/background ¡ ¡

Discussion: ¡HOG ¡v/s ¡HOF ¡ Similar ¡to ¡the ¡results ¡ • obtained ¡in ¡the ¡paper ¡ ¡ Data ¡ HOG ¡ HOF ¡ HoG ¡performs ¡becer ¡ • for ¡Hollywood ¡ perhaps ¡ because ¡ HoG ¡captures ¡ KTH ¡ 69.44 ¡ 81.94 ¡ context ¡& ¡image ¡ content ¡becer ¡than ¡HoF ¡ and ¡these ¡play ¡an ¡ Hollywood ¡ 44.44 ¡ 30.56 ¡ important ¡role ¡in ¡ realis2c ¡sezngs ¡ Simple ¡ac2ons ¡(like ¡in ¡ • KTH) ¡can ¡be ¡well ¡ represented ¡by ¡their ¡ mo2on ¡only ¡(i.e. ¡HoF) ¡ ¡

Discussion: ¡HoG+HoF ¡ • Combining ¡HoG ¡and ¡HoF ¡didn’t ¡help ¡a ¡lot ¡over ¡ either. ¡ • I ¡used ¡ ¡a ¡simple ¡1x1x1 ¡BoF ¡approach ¡for ¡ binning ¡(just ¡a ¡single ¡channel) ¡ • Paper ¡explores ¡be|er ¡combina2ons ¡based ¡on ¡ various ¡binning/spa2o-‑temporal ¡grids ¡& ¡ combines ¡the ¡best ¡channels ¡using ¡a ¡greedy ¡ approach ¡and ¡a ¡mul2 ¡channel ¡SVM ¡

Learning Realis,c Human Ac,ons from Movies I. Laptev, M. - PowerPoint PPT Presentation

Learning Realis,c Human Ac,ons from Movies I. Laptev, M. Marszaek, C. Schmid and B. Rozenfeld. CVPR 2008. Presented by: Islam Beltagy Girish

Op#miza#ons for Rendering Realis#c Lens flares in Polynomial

Func+on applica+ons (calls, invoca+ons) lambda denotes a anonymous func+on To use a func+on, you

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

The Sword & Sorcery Movies of the 1980s The Sword & Sorcery Movies of the 1980s - -

Super 8 Languages for Making Movies (A Functional Pearl) Leif Andersen Stephen Chang Ma hias

Funny movies for the All rights available, English and Russian whole family including OTT and

Canadian Movie Channels Investment Opportunity March 2012 MOVIES Executive Summary SPT

Overview Entertainment: Films/movies are more successful than video games, or have games

Realizing Bullet Time in Realizing Bullet Time in movies: visual effect combining slow motion

Music The Compact Disc replaced vinyl and cassettes Movies The DVD replaced VHS tapes Video

A C on t e x t Se ns i t i v e S o l u t i ons W e b i n a r I n t e g r a t i ng C SS i n C ons t

Analy&c Window Fu Func&ons A prac'cal look at using analy'c func'ons Olympia Area

1. Overview of NOAA CMIP5 Task Force Model Evalua=ons 2. Global

Mathema'cal Founda'ons of Human Computa'on Jenn Wortman Vaughan Microso; Research Why

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

A Survey of Human-Computer Interaction Design in Science Fiction Movies Michael Schmitz,

CSCI 4152/6509 Natural Language Processing Lab 8: Prolog Tutorial 1 Lab Instructor: Dijana

Learning Flat Latent Manifolds with VAEs Nutan Chen 1 , Alexej Klushyn 1 , Francesco Ferroni 2 ,

PEDALI NG T HRO UG H PANDEMI C Me lissa & Chris B runtle tt @ m o d a c it y lif e C O

Felicitous Computing David S. Rosenblum School of Computing National University of Singapore

Writing Ratios When have you seen or used ratios? Return to Table of Contents Slide 6 / 206

IceBreak 2013 Reykjavik, Iceland IceBreak 2013 A bit of history of Nordic research schools in

Sc Sche heduling duling Jo Jobs s With With De Depe pende ndenc ncie ies: s: New

2016 INDUSTRY UPDATES MARK RANDALL, GOLDENCARE JUMP INTO COMPANY UPDATES We saw a lot of

Learning Realis,c Human Ac,ons from Movies I. Laptev, M. - PowerPoint PPT Presentation

Learning Realis,c Human Ac,ons from Movies I. Laptev, M. Marszaek, C. Schmid and B. Rozenfeld. CVPR 2008. Presented by: Islam Beltagy Girish

Op#miza#ons for Rendering Realis#c Lens flares in Polynomial

Func+on applica+ons (calls, invoca+ons) lambda denotes a anonymous func+on To use a func+on, you

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

The Sword &amp; Sorcery Movies of the 1980s The Sword &amp; Sorcery Movies of the 1980s - -

Super 8 Languages for Making Movies (A Functional Pearl) Leif Andersen Stephen Chang Ma hias

Funny movies for the All rights available, English and Russian whole family including OTT and

Canadian Movie Channels Investment Opportunity March 2012 MOVIES Executive Summary SPT

Overview Entertainment: Films/movies are more successful than video games, or have games

Realizing Bullet Time in Realizing Bullet Time in movies: visual effect combining slow motion

Music The Compact Disc replaced vinyl and cassettes Movies The DVD replaced VHS tapes Video

A C on t e x t Se ns i t i v e S o l u t i ons W e b i n a r I n t e g r a t i ng C SS i n C ons t

Analy&amp;c Window Fu Func&amp;ons A prac'cal look at using analy'c func'ons Olympia Area

1. Overview of NOAA CMIP5 Task Force Model Evalua=ons 2. Global

Mathema'cal Founda'ons of Human Computa'on Jenn Wortman Vaughan Microso; Research Why

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

A Survey of Human-Computer Interaction Design in Science Fiction Movies Michael Schmitz,

CSCI 4152/6509 Natural Language Processing Lab 8: Prolog Tutorial 1 Lab Instructor: Dijana

Learning Flat Latent Manifolds with VAEs Nutan Chen 1 , Alexej Klushyn 1 , Francesco Ferroni 2 ,

PEDALI NG T HRO UG H PANDEMI C Me lissa &amp; Chris B runtle tt @ m o d a c it y lif e C O

Felicitous Computing David S. Rosenblum School of Computing National University of Singapore

Writing Ratios When have you seen or used ratios? Return to Table of Contents Slide 6 / 206

IceBreak 2013 Reykjavik, Iceland IceBreak 2013 A bit of history of Nordic research schools in

Sc Sche heduling duling Jo Jobs s With With De Depe pende ndenc ncie ies: s: New

2016 INDUSTRY UPDATES MARK RANDALL, GOLDENCARE JUMP INTO COMPANY UPDATES We saw a lot of

The Sword & Sorcery Movies of the 1980s The Sword & Sorcery Movies of the 1980s - -

Analy&c Window Fu Func&ons A prac'cal look at using analy'c func'ons Olympia Area

PEDALI NG T HRO UG H PANDEMI C Me lissa & Chris B runtle tt @ m o d a c it y lif e C O