E ffi cient 2D Viewpoint Combination for Human Action Recognition

Multi-view Action Recognition • Video describes a 2-dimensional space while actions truly occur in 3-dimensional world space. • Subject may be occluded by an object or by itself (self-occlusions) • Researchers have used multiple cameras to obtain a 3D representation of the subject (visual hull)

Drawbacks of Visual Hulls • A sufficient number of views is required to build a reliable visual hull. • In carving a visual hull, some information is lost. Visual hull is an approximation of the true 3D model.

Proposed method (1) • We propose to extract features from each viewpoint separately and combine them efficiently such that useful information is reinforced and redundant features are attenuated. • We extract local features from each view, which is easy to extract and does not require segmentation. • As opposed to Peng and Qian who used HMMs, we use a simple BOW model which is obviously orderless, much more easier to train/test and can be used with classifiers such as SVM. • Instead of extracting many heterogeneous features, we focus on computing different models using different codebooks and functions and combining them efficiently.

Proposed method (2) • Multi-class recognition is done using 1-vs-1 scheme instead of 1- vs-all to achieve more precision and the ability to add a category without the need to re-train the whole system. • We model the same video with different histograms obtained from two local features and two vocabularies. • The distance between histograms are measured using HIK (Histogram Intersection Kernel) as well as RBF (Radial Basis Function) kernel with Chi-square distance. • We use an efficient interleaved optimization strategy to learn the optimum weights for the multiple kernels. The obtained optimum weights score each kernel based on its ability to discriminate between two different categories.

Some viewpoints are more discriminative between some pairs of actions

Feature Types • Apply a Gaussian filter to the spatial domain and a quadratic pair of Gabor filters to the temporal Separable dimension. Proposed by Dollar et al. Linear Filters Space- • Extension of the Harris corner detector proposed by Laptev and Lindeberg. Time Corner Detector

Codebook sizes • After extracting features, we use them to obtain a codebook for each view by applying k-means using Euclidean distance. • We use two codebooks, one of size V and the other 2V. • According to Gehler and Nowozin, adding any kernel, even uninformative and non-discriminative one, to the kernel weight optimization methods will not reduce the classification performance. In particular, when the added feature(kernel) is discriminative, the classification performance will increase. • Using two different sizes of vocabulary will enable us to model the actions with two different scales of detail.

Kernel Types Histogram Intersection Kernel (HIK) Radial Basis Function (RBF) Kernel with Chi- Square Distance

Learning an e ffi cient combination of kernels (1) • The HIK and RBF kernels from different histograms need to be combined in an efficient way to acquire an optimized final kernel. • The final kernel is used with SVM to classify the actions. The binary SVM classier will be in the form of • is the kernel weight which changes (scales) the influence of the kernel space associated with and subsequently the corresponding histogram space.

Learning an e ffi cient combination of kernels (2) • We use 1-vs-1 classification scheme and choose different weights for each binary classifier. As some of the histograms (feature spaces) may be discriminative in differentiating between a pair of classes but non-informative for another pair. • For every combination of local feature and codebook size, we incorporate only one instance of HIK kernel and four instances of RBF kernel with different bandwidths. • Experimental results show that sparse methods do not perform much better than baseline methods using average weights, therefore we use a non-sparse general -norm multiple kernel learning algorithm where no feature is removed but all features participate and with different contributions. We empirically select the -norm. Newton descent is used for optimization due to its faster performance compared to cutting planes.

MKL (sparse and non-sparse) • Lp-norm refers to the norm which is used by the regularizer of learning function.

IXMAS dataset 11 actions, 10 subjects, 5 views.

Views in IXMAS dataset

Confusion matrix for the best result achieved on IXMAS Recognition accuracy: 95.8 %

Accuracy for each view (camera) of IXMAS

Best accuracy for combination of views in IXMAS

Performance of each feature type

Performance of using more codebooks

Performance of each kernel type and combination of them

Comparison of di ff erent fusion methods

Comparison of Recognition Accuracy on IXMAS dataset Multi-view Visual Hull Single view

E ffi cient 2D Viewpoint Combination for Human Action Recognition - PowerPoint PPT Presentation

E ffi cient 2D Viewpoint Combination for Human Action Recognition Multi-view Action Recognition Video describes a 2-dimensional space while actions truly occur in 3-dimensional world space. Subject may be occluded by an object or by

Lecture 3. Su ffi ciency Lecture 3. Su ffi ciency 1 (114) 3. Su ffi ciency 3.1. Su ffi cient

Immutability, or Putting the Dream Machine to Work The trie memory scheme is ine ffi cient for

Immutability, or Putting the Dream Machine to Work The trie memory scheme is ine ffi cient for

An E ffi cient A ffi ne-Scaling Algorithm for Hyperbolic Programming Jim Renegar joint work

FFI The good, the bad and the ugly Esteban Lorenzano (The Pharo firefighter) Current status of

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

15 E ffi cient mesh models Steve Marschner CS5625 Spring 2020 Follows chapter 16 in RTR 4e Basics

Taming the C Monster Haskell FFI Techniques Fraser Tweedale @hackuador May 22, 2018 FFI basics

Green Action Centre, 2019 Green Action Centre, 2019 Green Action Centre, 2019 Green Action

Solid State Drive Based Energy E ffi cient Cloud Storage Jesus Ramos Alexis Je ff erson Ti ff any

E ffi cient, Cost E ff ective and Sustainable Self-Delivery of Asphalt for Small Works 1

A Large Scale Study of the Small Sample Performance of Random Coe ffi cient Models of Demand

E ffi cient and Incentive-Compatible Liver Exchange Haluk Ergin Tayfun Snmez M. Utku nver U

Point-Voxel CNN for E ffi cient 3D Deep Learning Zhijian Liu* , Haotian Tang* , Yujun Lin , and

E ffi cient use of semidefinite programming for the selection of rotamers in protein conformation

Tips on Writing Papers with Mathematical Content John N. Tsitsiklis May 2019

Convex hull of a random point set Pierre Calka Journ ees nationales 2016 GdR Informatique

0/1 Polytopes with Quadratic Chv atal Rank Thomas Rothvo and Laura Sanit` a 3rd Cargese

Tight Rectilinear Hulls of Simple Polygons Annika Bonerath, Jan-Henrik Haunert and Benjamin

Hardware Tessellation Charles Loop Scott Schaefer Microsoft Research Texas A&M University

Exploiting Linear Hull in Matsuis Algorithm 1 Andrea Rck and Kaisa Nyberg Department of

Using R to Assess Mathematical Sense-Making in Introductory Physics Courses by Brian Danielak,

COMS 4721: Machine Learning for Data Science Lecture 11, 2/23/2017 Prof. John Paisley Department

Decision Aid Methodologies In Transportation Lecture 4: MILP Mixed Linear Integer Programming

E ffi cient 2D Viewpoint Combination for Human Action Recognition - PowerPoint PPT Presentation

E ffi cient 2D Viewpoint Combination for Human Action Recognition Multi-view Action Recognition Video describes a 2-dimensional space while actions truly occur in 3-dimensional world space. Subject may be occluded by an object or by

Lecture 3. Su ffi ciency Lecture 3. Su ffi ciency 1 (114) 3. Su ffi ciency 3.1. Su ffi cient

Immutability, or Putting the Dream Machine to Work The trie memory scheme is ine ffi cient for

Immutability, or Putting the Dream Machine to Work The trie memory scheme is ine ffi cient for

An E ffi cient A ffi ne-Scaling Algorithm for Hyperbolic Programming Jim Renegar joint work

FFI The good, the bad and the ugly Esteban Lorenzano (The Pharo firefighter) Current status of

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

15 E ffi cient mesh models Steve Marschner CS5625 Spring 2020 Follows chapter 16 in RTR 4e Basics

Taming the C Monster Haskell FFI Techniques Fraser Tweedale @hackuador May 22, 2018 FFI basics

Green Action Centre, 2019 Green Action Centre, 2019 Green Action Centre, 2019 Green Action

Solid State Drive Based Energy E ffi cient Cloud Storage Jesus Ramos Alexis Je ff erson Ti ff any

E ffi cient, Cost E ff ective and Sustainable Self-Delivery of Asphalt for Small Works 1

A Large Scale Study of the Small Sample Performance of Random Coe ffi cient Models of Demand

E ffi cient and Incentive-Compatible Liver Exchange Haluk Ergin Tayfun Snmez M. Utku nver U

Point-Voxel CNN for E ffi cient 3D Deep Learning Zhijian Liu* , Haotian Tang* , Yujun Lin , and

E ffi cient use of semidefinite programming for the selection of rotamers in protein conformation

Tips on Writing Papers with Mathematical Content John N. Tsitsiklis May 2019

Convex hull of a random point set Pierre Calka Journ ees nationales 2016 GdR Informatique

0/1 Polytopes with Quadratic Chv atal Rank Thomas Rothvo and Laura Sanit` a 3rd Cargese

Tight Rectilinear Hulls of Simple Polygons Annika Bonerath, Jan-Henrik Haunert and Benjamin

Hardware Tessellation Charles Loop Scott Schaefer Microsoft Research Texas A&amp;M University

Exploiting Linear Hull in Matsuis Algorithm 1 Andrea Rck and Kaisa Nyberg Department of

Using R to Assess Mathematical Sense-Making in Introductory Physics Courses by Brian Danielak,

COMS 4721: Machine Learning for Data Science Lecture 11, 2/23/2017 Prof. John Paisley Department

Decision Aid Methodologies In Transportation Lecture 4: MILP Mixed Linear Integer Programming

Hardware Tessellation Charles Loop Scott Schaefer Microsoft Research Texas A&M University