of human actions Ivan Laptev ivan.laptev@inria.fr WILLOW, - PowerPoint PPT Presentation

ENS/INRIA CVML Summer School 45 rue d’Ulm , Paris July 26, 2013 Modeling and visual recognition of human actions Ivan Laptev ivan.laptev@inria.fr WILLOW, INRIA/ENS/CNRS, Paris

Objects: Actions: cars, glasses, drinking, running, people, etc… door exit, car enter, etc… constraints Scene categories: Geometry: indoors, outdoors, Street, wall, field, street scene, stair, etc… etc…

Human Actions: Why do we care?

Why video analysis? Data: TV-channels recorded since 60’s >34K hours of video uploads every day ~30M surveillance cameras in US => ~700K video hours/day

Why video analysis? Applications: First appearance of Sociology research: Education: How do I N. Sarkozy on TV Influence of character make a pizza? smoking in movies Predicting crowd behavior Where is my cat? Motion capture and animation Counting people

Why human actions? How many person-pixels are in the video? Movies TV YouTube

Why human actions? How many person-pixels are in the video? 35% 34% Movies TV 40% YouTube

How many person pixels in our daily life?  Wearable camera data: Microsoft SenseCam dataset

How many person pixels in our daily life?  Wearable camera data: Microsoft SenseCam dataset ~4%

Why do we prefer to watch other people?  Why do we watch TV, Movies, … at all?  Why do we read books? “… books teach us new patterns of behavior…” Olga Slavnikova Russian journalist and writer

Why action recognition is difficult?

Challenges  Large variations in appearance: occlusions, non-rigid motion, view- … point changes, clothing… Action Hugging :  Manual collection of training samples is prohibitive: many … action classes, rare occurrence  Action vocabulary is not well-defined … Action Open :

How to recognize actions?

Activities characterized by a pose Slide credit: A. Zisserman

Activities characterized by a pose Examples from VOC action recognition challenge ?

Human pose estimation (1990-2000) Finding People by Sampling Ioffe & Forsyth, ICCV 1999 Pictorial Structure Models for Object Recognition Felzenszwalb & Huttenlocher, 2000 Learning to Parse Pictures of People Ronfard, Schmid & Triggs, ECCV 2002

Human pose estimation Y. Yang and D. Ramanan. Articulated pose estimation with flexible mixtures-of-parts. In Proc. CVPR 2011 Extension of LSVM model of Felzenszwalb et al. Y. Wang, D. Tran and Z. Liao. Learning Hierarchical Poselets for Human Parsing. In Proc. CVPR 2011 . Builds on Poslets idea of Bourdev et al. S. Johnson and M. Everingham. Learning Effective Human Pose Estimation from Inaccurate Annotation. In Proc. CVPR 2011 . Learns from lots of noisy annotations B. Sapp, D.Weiss and B. Taskar. Parsing Human Motion with Stretchable Models. In Proc. CVPR 2011 . Explores temporal continuity

Human pose estimation J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman and A. Blake. Real-Time Human Pose Recognition in Parts from Single Depth Images. (Best paper award at CVPR 2011)

Pose estimation is still a hard problem • occlusions Issues: • clothing and pose variations

Appearance methods: Shape [A.F. Bobick and J.W. Davis, PAMI 2001] Idea: summarize motion in video in a Motion History Image (MHI) : L. Gorelick, M. Blank, E. Shechtman, M. Irani, and R. Basri. Actions as spacetime shapes. 2007

Appearance methods: Shape Pros: + Simple and fast + Works in controlled settings Cons: - Prone to errors of background subtraction Variations in light, shadows, clothing… What is the background here? - Does not capture interior Structure and motion Silhouette tells little about actions

Appearance methods: Motion Learning Parameterized Models of Image Motion M.J. Black, Y. Yacoob, A.D. Jepson and D.J. Fleet, 1997 Recognizing action at a distance A.A. Efros, A.C. Berg, G. Mori, and J. Malik., 2003.     blurred , , , F F F F x x y y

Action recognition with local features

Local space-time features + No segmentation needed + No object detection/tracking needed - Loss of global structure [Laptev 2005]

Local approach: Bag of Visual Words Airplanes Motorbikes Faces Wild Cats Leaves People Bikes

Space-Time Interest Points: Detection What neighborhoods to consider? Look at the High image Distinctive   distribution of the variation in space neighborhoods gradient and time Definitions: Original image sequence Space-time Gaussian with covariance Gaussian derivative of Space-time gradient Second-moment matrix [Laptev 2005]

Local features: Proof of concept  Finds similar events in pairs of video sequences

Bag-of-Features action recogntion space-time patches Extraction of Local features K-means clustering Occurrence histogram (k=4000) of visual words Feature description Non-linear SVM with χ 2 Feature kernel quantization [Laptev, Marszałek , Schmid, Rozenfeld 2008]

Action classification results KTH dataset Hollywood-2 dataset AnswerPhone GetOutCar HandShake StandUp DriveCar Kiss [Laptev, Marszałek , Schmid, Rozenfeld 2008]

Action classification Test episodes from movies “The Graduate”, “It’s a Wonderful Life”, “Indiana Jones and the Last Crusade”

Evaluation of local feature detectors and descriptors Four types of detectors: • Harris3D [Laptev 2003] • Cuboids [Dollar et al. 2005] • Hessian [Willems et al. 2008] • Regular dense sampling Four types of descriptors: • HoG/HoF [Laptev et al. 2008] • Cuboids [Dollar et al. 2005] • HoG3D [Kläser et al. 2008] • Extended SURF [ Willems’et al. 2008] Three human actions datasets: • KTH actions [Schuldt et al. 2004] • UCF Sports [Rodriguez et al. 2008] • Hollywood 2 [ Marszałek et al. 2009]

Space-time feature detectors Harris3D Hessian Cuboids Dense

Results on KTH Actions 6 action classes, 4 scenarios, staged Detectors Harris3D Cuboids Hessian Dense HOG3D 89.0% 90.0% 84.6% 85.3% Descriptors HOG/HOF 91.8% 88.7% 88.7% 86.1% 80.9% 82.3% 77.7% 79.0% HOG 92.1% 88.2% 88.6% 88.0% HOF Cuboids - 89.1% - - E-SURF - - 81.4% - (Average accuracy scores) • Best results for sparse Harris3D + HOF • Dense features perform relatively poor compared to sparse features [Wang, Ullah, Kläser, Laptev, Schmid, 2009]

Results on Diving Kicking Walking UCF Sports Skateboarding High-Bar-Swinging Golf-Swinging 10 action classes, videos from TV broadcasts Detectors Harris3D Cuboids Hessian Dense Descriptors 79.7% 82.9% 79.0% 85.6% HOG3D HOG/HOF 78.1% 77.7% 79.3% 81.6% HOG 71.4% 72.7% 66.0% 77.4% HOF 75.4% 76.7% 75.3% 82.6% - 76.6% - - Cuboids - - 77.3% - E-SURF (Average precision scores) • Best results for dense + HOG3D [Wang, Ullah, Kläser, Laptev, Schmid, 2009]

Results on Hollywood-2 AnswerPhone GetOutCar Kiss HandShake StandUp DriveCar 12 action classes collected from 69 movies Detectors Harris3D Cuboids Hessian Dense Descriptors HOG3D 43.7% 45.7% 41.3% 45.3% HOG/HOF 45.2% 46.2% 46.0% 47.4% 32.8% 39.4% 36.2% 39.4% HOG 43.3% 42.9% 43.0% 45.5% HOF Cuboids - 45.0% - - E-SURF - - 38.2% - (Average precision scores) • Best results for dense + HOG/HOF [Wang, Ullah, Kläser, Laptev, Schmid, 2009]

Other recent local representations • Y. and L. Wolf, "Local Trinary Patterns for Human Action Recognition ", ICCV 2009 • P. Matikainen, R. Sukthankar and M. Hebert "Trajectons: Action Recognition Through the Motion Analysis of Tracked Features" ICCV VOEC Workshop 2009, • • H. Wang, A. Klaser, C. Schmid, C.-L. Liu, "Action Recognition by Dense Trajectories", CVPR 2011 • Recognizing Human Actions by Attributes J. Liu, B. Kuipers, S. Savarese, CVPR 2011

Dense trajectory descriptors [Wang et al. CVPR’11]

Dense trajectory descriptors [Wang et al. CVPR’11] [Wang et al.] [Wang et al.] [Wang et al.] [Wang et al.]

Dense trajectory descriptors [Wang et al. CVPR’11] Computational cost:

Highly-efficient video descriptors Optical flow from MPEG video compression

Highly-efficient video descriptors Evaluation on Hollywood2 [Wang et al.’11] Evaluation on UCF50 [Wang et al.’11] [Kantorov & Laptev, 2013]

Beyond BOF: Temporal structure • Modeling Temporal Structure of Decomposable Motion Segments for Activity Classication, J.C. Niebles, C.-W. Chen and L. Fei-Fei, ECCV 2010 • Learning Latent Temporal Structure for Complex Event Detection. Kevin Tang, Li Fei-Fei and Daphne Koller, CVPR 2012

Beyond BOF: Social roles • T. Yu, S.-N. Lim, K. Patwardhan, and N. Krahnstoever. Monitoring, recognizing and discovering social networks. In CVPR, 2009. • L. Ding and A. Yilmaz. Learning relations among movie characters: A social network perspective. In ECCV, 2010 • V. Ramanathan, B. Yao, and L. Fei-Fei. Social Role Discovery in Human Events. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2013.

of human actions Ivan Laptev ivan.laptev@inria.fr WILLOW, - PowerPoint PPT Presentation

ENS/INRIA CVML Summer School 45 rue dUlm , Paris July 26, 2013 Modeling and visual recognition of human actions Ivan Laptev ivan.laptev@inria.fr WILLOW, INRIA/ENS/CNRS, Paris Objects: Actions: cars, glasses, drinking, running, people,

Actions of Compact Quantum Groups V Free and homogeneous actions I Kenny De Commer (VUB,

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

Actions of Compact Quantum Groups III Reduced and universal actions Kenny De Commer (VUB,

Civil Actions Civil Actions Civil Actions Lesson No. 13 ENV H 471 Environmental Health

Motion and Human Motion and Human Actions Actions Ivan Laptev ivan.laptev@ens.fr Equipe projet

Sug Suggested Actions sted Actions on W on WebSAMS bSAMS Suggested Actions Keep Latest

Actions of Compact Quantum Groups I Definition Kenny De Commer (VUB, Brussels, Belgium) CQG

Actions of Compact Quantum Groups VI Free and homogeneous actions II Kenny De Commer (VUB,

Actions eg. BP check h k Follow up Actions 1. Follow up from the main screen ALL the actions

SFTR Corporate Actions Working Group #4 27 th June 2019 James Langlois SFTR Corporate Actions Q2

Management Actions November 7, 2018 Projects and Management Actions 4 Approach and Objective

WatchKit Actions & Outlets Actions & Outlets App Extension (Interface) (Code) Watch

C*-algebras associated with algebraic actions Joachim Cuntz Abel, August 2015 Topic: Actions by

Optics of the Human Eye Optics of the Human Eye Optics of the Human Eye Optics of the Human Eye

Human Resources Human Resources Business Unit Business Unit DaVonna Johnson Human Resources

Tompkins County Tompkins County HUMAN SER HUMAN SERVICES CO HUMAN SER HUMAN SERVICES CO ICES

Trans ansfor orming ming Tec echnic hnical al As Assis sistan tance: ce: Using Using

Random Forests vs. Deep Learning Christian Wolf Universit de Lyon, INSA-Lyon LIRIS UMR CNRS

Aron Yu Nov 2, 2012 1 Depth Image Body Parts 3D Joint Est. 2 Image Credit: Shotton et al.

343H: Honors AI Lecture 26: More applications 4/29/2014 Kristen Grauman UT Austin This week

CSC 411: Lecture 06: Decision Trees Class based on Raquel Urtasun & Rich Zemels lectures

Classification and statistical machine learning Sylvain Arlot http://www.di.ens.fr/~arlot/ 1 Cnrs

3D Landmark Model Discovery from a Registered Set of Organic Shapes Clement Creusot, Nick Pears,

Augusta Maine Short Handed Firefighting National Volunteer Fire Council Concord North Carolina

of human actions Ivan Laptev ivan.laptev@inria.fr WILLOW, - PowerPoint PPT Presentation

ENS/INRIA CVML Summer School 45 rue dUlm , Paris July 26, 2013 Modeling and visual recognition of human actions Ivan Laptev ivan.laptev@inria.fr WILLOW, INRIA/ENS/CNRS, Paris Objects: Actions: cars, glasses, drinking, running, people,

Actions of Compact Quantum Groups V Free and homogeneous actions I Kenny De Commer (VUB,

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

Actions of Compact Quantum Groups III Reduced and universal actions Kenny De Commer (VUB,

Civil Actions Civil Actions Civil Actions Lesson No. 13 ENV H 471 Environmental Health

Motion and Human Motion and Human Actions Actions Ivan Laptev ivan.laptev@ens.fr Equipe projet

Sug Suggested Actions sted Actions on W on WebSAMS bSAMS Suggested Actions Keep Latest

Actions of Compact Quantum Groups I Definition Kenny De Commer (VUB, Brussels, Belgium) CQG

Actions of Compact Quantum Groups VI Free and homogeneous actions II Kenny De Commer (VUB,

Actions eg. BP check h k Follow up Actions 1. Follow up from the main screen ALL the actions

SFTR Corporate Actions Working Group #4 27 th June 2019 James Langlois SFTR Corporate Actions Q2

Management Actions November 7, 2018 Projects and Management Actions 4 Approach and Objective

WatchKit Actions &amp; Outlets Actions &amp; Outlets App Extension (Interface) (Code) Watch

C*-algebras associated with algebraic actions Joachim Cuntz Abel, August 2015 Topic: Actions by

Optics of the Human Eye Optics of the Human Eye Optics of the Human Eye Optics of the Human Eye

Human Resources Human Resources Business Unit Business Unit DaVonna Johnson Human Resources

Tompkins County Tompkins County HUMAN SER HUMAN SERVICES CO HUMAN SER HUMAN SERVICES CO ICES

Trans ansfor orming ming Tec echnic hnical al As Assis sistan tance: ce: Using Using

Random Forests vs. Deep Learning Christian Wolf Universit de Lyon, INSA-Lyon LIRIS UMR CNRS

Aron Yu Nov 2, 2012 1 Depth Image Body Parts 3D Joint Est. 2 Image Credit: Shotton et al.

343H: Honors AI Lecture 26: More applications 4/29/2014 Kristen Grauman UT Austin This week

CSC 411: Lecture 06: Decision Trees Class based on Raquel Urtasun &amp; Rich Zemels lectures

Classification and statistical machine learning Sylvain Arlot http://www.di.ens.fr/~arlot/ 1 Cnrs

3D Landmark Model Discovery from a Registered Set of Organic Shapes Clement Creusot, Nick Pears,

Augusta Maine Short Handed Firefighting National Volunteer Fire Council Concord North Carolina

WatchKit Actions & Outlets Actions & Outlets App Extension (Interface) (Code) Watch

CSC 411: Lecture 06: Decision Trees Class based on Raquel Urtasun & Rich Zemels lectures