framework
play

Framework) Behrouz Saghafi N-mode Analysis (Tensor Framework) - PowerPoint PPT Presentation

N-mode Analysis (Tensor Framework) Behrouz Saghafi N-mode Analysis (Tensor Framework) Drawback of 1-mode analysis (e.g. PCA): Captures the variance among just a single factor Our training set contains changes in more than 1 factor:


  1. N-mode Analysis (Tensor Framework) Behrouz Saghafi

  2. N-mode Analysis (Tensor Framework) Drawback of 1-mode analysis (e.g. PCA):  Captures the variance among just a single factor  Our training set contains changes in more than 1 factor: People, action, viewpoint, etc  This motivates analysis in multiple modes.

  3. N-mode Analysis (Related work)  Ding and Ye [1] extend the common matrix SVD to 2D-SVD.  2D-LDA has been introduced [2].  Vasilescu and Terzopoulos [3-5] : proposed the idea of using N-mode SVD on the data tensor to decompose it into multiple factors. Have Applied it on face recognition and also synthesis and recognition of human signatures and actions. [1] C. Ding and J. Ye, "Two-dimensional Singular Value Decomposition (2DSVD) for 2D Maps and Images," in SIAM Int'l Conf. Data Mining , 2005. [2] K. Inoue and K. Urahama, "Non-Iterative Two-Dimensional Linear Discriminant Analysis," in ICPR , 2006. [3] M. A. O. Vasilescu and D. Terzopoulos, "Multilinear Image Analysis for Facial Recognition," in ICPR , 2002. [4] M. A. O. Vasilescu and D. Terzopoulos, "Multilinear Analysis of Image Ensembles: TensorFaces," in ECCV , 2002. [5] M. A. O. Vasilescu, "Human Motion Signatures: Analysis, Synthesis, Recognition," in ICPR (3) , 2002, pp. 456-460.

  4. Drawback of Vasilescu’s method on action recognition  Using point trajectories as features for representing actions. Instead, sillhouettes are more informative cues. Point Trajectories require accurate and expensive tracking methods, but silhouettes can be approximated through edgemaps, so are extracted more efficiently.  Data tensor comprises three modes :actions, people and joint angles. We separate the modes regarding frames and pixels because they contain different types of information (without making the computation cost increase sensibly).  In action recognition, they assumed the person to be known.  They have used a very small motion capture database comprising three simple actions: walk, ascend stairs and descent stairs. No numerical evaluation of the results is provided.

  5. Tensors  Tensor: extend the concepts of vectors and matrices into higher orders.      I 1 ... I ... I A n N  The order of tensor is N . a  i  A  An element of is denoted as where 1 I i ... i ... i n n 1 n N A  mode n vectors of tensor : the n-dimensional vectors obtained i by varying index while keeping the other indices fixed or the n A column vectors of matrix that results from flattening the tensor. ( ) n

  6. Flattening a tensor

  7. Product of a tensor by a matrix     B A M B MA n ( ) n ( ) n

  8. N-mode SVD (HD-SVD)   T D U U SVD: 1 2    D U U SVD ( in term of mode-n products ): 1 1 2 2 Core tensor      D Z U U 2 ... U ... U N-mode SVD: 1 1 2 n n N N Mode matrices

  9. N -mode Action Video Analysis > form tensor D from the image ensembles: Mode 1: pixels Scenario 1 (3 modes) Mode 2: actions Mode 3: people Mode 1: pixels Mode 2: frames Scenario 2 (4 modes) Mode 3: actions Mode 4: people

  10. N -mode Action Video Analysis For 3-mode scenario:

  11. N -mode Action Video Analysis      D Z U U U U N-mode SVD: 1 pixels 2 frames 3 actions 4 people Basis tensor   D B U Data tensor 3 actions Action space embedded matrix Basis Tensor Computation:     B Z U U U 1 pixels 2 frames 4 people   T D U 3 actions

  12. linear projections  Index into the basis tensor for a particular t & p : B t p , B  Flatten along the action mode: t p , (actions)  B T  For training frames therefore x y t a p , , t p , (action) a  B  T y x a t p , (action) t a p , ,  Given an unknown frame , project it into a set of candidate x  B  embedded vectors for every t and p : T y x t p , t p , (action) y y  compare each against the learned vectors to find the a t p , action class in a nearest neighbor framework.

  13. Experimental Results (Data Sets) Weizmann Database:  A widely used database with a reasonable size  Contains ten action classes performed by nine different human subjects. Actions include bending (bend), jumping jack (jack), jumping-forward-on-two-legs (jump), jumping-in-place-on-two- legs (pjump), running (run), galloping sideways (side), skipping (skip), walking (walk), waving-one-hand (wave1), and waving- two-hands (wave2).

  14. Experimental Results (Data Sets) Weizmann Database:

  15.  Experimental Results (Preprocessing)  We use the silhouettes provided  All the silhouettes are centered and normalized into the same  dimension (64 48)  We find the sequence periods using [1] which uses absolute correlation between frames.  In 3-mode scenario: We use the max period to select equal length subsequences.  In 4-mode scenario: We warp all the sequences into the same temporal duration using bicubic interpolation technique. [1] R. Cutler and L. Davis, "Robust Real-Time Periodic Motion Detection, Analysis, and Applications," IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, pp. 781-796, 2000.

  16. Experimental Results 1-mode (PCA) 77.78% 3-mode 80.25% 4-mode 85.19% Neibles et al. (CVPR 2007) 72.8%

Recommend


More recommend