Localisation and Recognition of Human Actions Ioan Ioannis nis Pat Patra ras School of Electronic Engineering and Computer Science Queen Mary University of London in collaboration with A. Oikonomopoulos and M. Pantic, Imperial College London I. Kotsia and Guo Weiwei, Queen Mary University of London 1 Ioannis Patras CVPR 2011
Related research in QMUL URL: www.eecs.qmul.ac.uk/~ioannisp/ Static Analysis • Scene analysis (Izquierdo, Diplaros) Object Detection/ Semantic segmentation • Dynamic Vision Motion Analysis (Lagendijk, Hendriks, Hancock) Motion estimation / segmentation Object Tracking Looking at / sensing people • Facial (Expression) Analysis (Pantic, Koelstra, Rudovic) Head tracking/Facial Feature Tracking Facial expression recognition • Action / Gesture Recognition (Kotsia, Guo, Kumar, Pantic) Spatio-temporal representations for action recognition Pose estimation • Brain Computer Interfaces 2 Ioannis Patras CVPR 2011
Looking at/sensing people • Facial (Expression) Analysis Head tracking/Facial Feature Tracking Facial expression recognition • Action / Gesture Recognition Action recognition and localisation Pose estimation Tensor-based space-time analysis • Brain Computer Interfaces 3 3 Ioannis Patras CVPR 2011
Localisation of Human Actions Oikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing, Mar. 2011. Goal: Recognize categories of actions Localize them in terms of their bounding box (space + time) Challenges: Occlusions, clutter, variations, … Hypothesis: Analysis can be restricted on a set of spatiotemporally „interesting‟/salient events 4 Ioannis Patras CVPR 2011
Information theoretical spatial saliency T. Kadir and M. Brady. IJVC, Nov. 2001 Proposal: Use signal unpredictability as an indicator of saliency Spatial Saliency: Unpredictability in a single frame H D =3.866 H D =7.201 5 Ioannis Patras CVPR 2011
Towards scale invariance The entropy maxima reveal the spatial scale(s) of a salient region 1 0.8 Entropy 0.6 0.4 0.2 0 29 59 -0.2 0 20 40 60 80 Scale (circle radius) Detected salient points in a single frame 6 Ioannis Patras CVPR 2011
Spatial and spatiotemporal saliency Oikonomopoulos, Patras, Pantic, IEEE Transaction s SMC, part B, 2006 Spatiotemporal Saliency: Driven by signal unpredictability in a spatiotemporal volume (cylinder / sphere) Entropy (H D ) Y v w v H v k k k Entropy‟s „peakness‟ Entropy‟s „height‟ Examine entropy: dq , , , , , , w s d u s p s d u dq d p s d u D D s d q q 7 7 Ioannis Patras CVPR 2011
Descriptor extraction – codebook creation t Spatiotemporal Input sequence Optical Flow Optical Flow Salient Point Detection after median subtraction c 1 c 2 Feature selection Ensemble codewords … c N Optical Flow + Spatial Gradient Descriptors. Feature ensembles Bin in histograms and concatenate. Codebook O.Boiman & M.Irani [ICCV‟05] (class-specific) 8 8 Ioannis Patras CVPR 2011
Class-dependent Spatio-temporal probabilistic voting • Parameters stored for each ensemble in the training set e d average spatial position of ensemble with X respect to subject center and lower bound. distance in frames of the activated ensemble from T the start/end of the action average spatiotemporal scale of ensemble. S c • Localisation model learned for codeword/cluster : i | | | p c w p e c p e i i d i d e d -t T-t T c i t Current frame e x | p c d X i 99 Ioannis Patras CVPR 2011
Discriminative learning • Higher weights for pdfs with low localisation entropy | exp( | log | w d p c p c p c i i i i • Class dictionary comprises of discriminative codewords • Adaboost on the codeword similarities 10 Ioannis Patras CVPR 2011
Discriminative learning Higher weights for pdfs with low temporal localisation entropy 11 Ioannis Patras CVPR 2011
Spatio-temporal probabilistic voting Extension in the space time domain of ‘Implicit Shape Model’, Leibe et al., ECCV’04 12 12 Ioannis Patras CVPR 2011
Hypothesis verification with Relevance Vector Machine classification • Mean-shift responses F ..., ,... f f 1 , i used as features in RVM-based classification 2 ( , ') D F F C 2 2 ( , ') K F F e • Two class classification problem (one-vs-all) N l l ( ; ) , c F w w w K F F 0 l j l j i j • Select class l that maximizes the posterior probability 1 ; c l F w | 1 p l F e 13 13 Ioannis Patras CVPR 2011
Localisation of single actions 14 14 Ioannis Patras CVPR 2011
Localisation accuracy (KTH) 15 Ioannis Patras CVPR 2011
Localisation accuracy (KTH) [SS-PE] Shechtman, E., Irani, M.: Matching local self-similarities across images and videos. CVPR 2007 16 Ioannis Patras CVPR 2011
Action recognition • HoHA dataset – average : 37% • KTH dataset – average : 88% 17 17 Ioannis Patras CVPR 2011
Localisation under artificial occlusions (KTH) 18 Ioannis Patras CVPR 2011
Localisation under clutter (KTH) 19 Ioannis Patras CVPR 2011
Conclusions • Voting schemes based on local descriptors are robust to occlusions • Good localisation and recognition accuracy • Relies on annotation in terms of action localisation. • More suitable for gestures rather than less „structured‟ actions 20 Ioannis Patras CVPR 2011
Support Tensor Learning I. Kotsia and I. Patras, “Support Tucker Machines” CVPR 2011, Thursday afternoon I. Kotsia and I. Patras, "Relative Margin Support Tensor Machines for gait and action recognition," in CIVR 2010 . 21 Ioannis Patras CVPR 2011
Motivation Vector-based methods ignore the space (time) structure of the visual data N 1 T ξ w φ(g )+b T ξ min s.t. 1 w w +C j j j 2 1 j= ξ j = ,N 0, 1,... j Large dimensionality in the case of linear SVMs 22 Ioannis Patras CVPR 2011
Tensor Machines Variants of Linear SVMs, where constraints are imposed on the separating tensorplane N ξ f W f W W W min ( ) where ( ) a regularisation te rm e.g. ( ) , f W +C j 1 j= ξ ξ j = ,N s.t. , 1 , 0, 1,... X W +b j j Support Tensor Machines [16] D. Tao, et al, KIS,13(1):1 – 42, 2007 I. Kotsia, I. Patras, CVPR 2011 Support Tucker Machines I. Kotsia, I. Patras, CVPR 2011 S/Sw Support Tucker Machines = I. Kotsia, I. Patras, CVPR 2011 Smaller dimensionality, structural constraints 23 Ioannis Patras CVPR 2011
Supervised learning Non-convex optimization problem w.r.t. A, B, C and core tensor G. But: Convex w.r.t. A or B or C or G alone Block coordinate optimization: - e.g. optimization w.r.t G keeping A, B, C fixed Each step can be reduced to a vector-based SVM-like constrained optimization problem, e.g. I 1 M 1 T min ( ( )) ( ( )) , A vec G A vec G C (1) (1) i 2 , , 0 G b (1) 1 i 1 T s.t. [( ( )) ( ] 1 , 0 y A vec G vec X b (1) (1) : i i i i 2 24 Ioannis Patras CVPR 2011
Gait Recognition (USF dataset) Σ w-StuMs Probe Set Sota (five SVMs STMs [16] STMs RMSTMs StuMs methods) (w vector) ( W tensor) ( W tensor) ( W tensor) ( W tensor) A 100/100 80/97 92/100 99/100 100/100 99/100 100/100 B 89/90 79/93 81/90 85/93 89/97 85/93 87/95 C 83/88 68/85 73/88 79/93 83/95 79/90 81/91 D 39/55 30/54 47/67 53/72 56/75 53/71 55/74 E 33/55 23/46 48/79 62/88 65/91 63/86 65/90 F 30/46 24/49 29/49 41/71 44/74 42/63 44/66 G 29/48 12/37 31/71 50/88 53/90 52/87 54/90 Average - 45/62 57/68 67/86 70/89 68/84 69/87 • Significant improvements in comparison to state of the art 25 Ioannis Patras CVPR 2011
KTH recognition Input features: Dense oriented gradients (at each pixel) Results comparable to state of the art, using very simple features [7] T.K.Kim and R. Cipolla, „Canonical Correlation analysis of video volume tensors for action categorization and detection,‟IEEE PAMI, vol. 31, no. 8, pp. 1415 -1428, August 2009) 26 Ioannis Patras CVPR 2011
Conclusions • Tensors exploit topology of data better than vectors •The proposed algorithms (STuMs and Σ/Σ w- STuMs) consistently outperform previous approaches, producing state of the art results Limitations: • Requires good alignment of the input data • More suitable for gestures rather than less „structured‟ actions 27 Ioannis Patras CVPR 2011
Recommend
More recommend