Learning Semantic Visual Codebook for Action Recognition by - PowerPoint PPT Presentation

Learning Semantic Visual Codebook for Action Recognition by Embedding into Concept Space Behrouz Saghafi

Using Spatio-temporal Features Action recognition using silhouettes or optical flow encounters difficulties when dealing with non-uniform background, severe camera jitter and noise Local spatio-temporal features are fast and easy to extract and reliable.

Bag of Words model The raw features are clustered based on the their appearance rather than their semantic relations . By utilizing the semantics , the recognition accuracy will improve.

Incorporating Semantics into BoW model (Related work) • Build a model for each category and fit the query to one of the models in an unsupervised framework, like Probabilistic Latent Semantic Analysis and Latent Generative Dirichlet Allocation. methods • Their unsupervised nature limits their performance • The number of topics = the number of categories, which limits their efficiency. • Try to construct a semantic vocabulary and use it with a classifier. • Liu and shah (CVPR 2008): maximization of mutual information between visual words and videos > The Discriminative formed clusters do not necessarily represent topics or synonym words. methods • Liu et al. (CVPR 2009): use Diffusion Map (DM) to construct a semantic visual vocabulary > Considering connectivity in measuring the semantic distance is not appropriate in the presence of polysemy.

Embedding into Concept Space (Proposed)  We propose a framework for constructing a semantic visual vocabulary via computing a rich semantic space ( Concept space ). The concept space is computed by Latent Semantic Models or Canonical Correlation Analysis .  The visual words are embedded into concept space to form meaningful clusters representing semantic topics , consequently the formed histograms are more discriminative .  As opposed to generative methods which do not use category labels, our method uses a classifier trained on the training histograms.  The number of topics can be more than categories as opposed to the unsupervised framework, which allows analysis in more details.  By using pLSA in constructing the concept space, the problem of polysemy is handled.

Overview of the proposed framework Constructing the semantic visual vocabulary: Training steps of the proposed method:

Latent Semantic Analysis (LSA) (1) • Latent Semantic Analysis (LSA) originally used in text mining applications, is the factorization of word-video co-occurrence matrix into linear subspaces of words and videos. Videos words M x N word-video co-occurrence matrix word vector video vector • The word vectors reveal the semantic relations of words , since semantically synonymous words occur in similar documents.

Latent Semantic Analysis (LSA) (2) • The word vectors are sparse so their correlation may not be so representative of their semantic relations. Therefore, we need to find the reduced dimensional space. Rank L optimal representation: videos topics topics videos topics topics L x L L x M words N x M words N x L ~= x x • The correlation of words based on word vectors: Rows of are a good representation of rows of (words) in the sense that they approximate the correlation between words.

Embedding into concept space using LSA : representation of word i in the N x L L -dimensional concept space

Probabilistic Latent Semantic Analysis (pLSA) z d w Observed word Topic distributions word distributions distributions per document per topic

Probabilistic Latent Semantic Analysis (pLSA) z d w known unknown Likelihood

Probabilistic Latent Semantic Analysis (pLSA) z d w E-step: Maximum Likelihood by M-step: EM

Embedding into concept space using pLSA

Embedding into concept space using pLSA representation of word i in the L -dimensional concept space

Using LSA vs pLSA • pLSA can handle polysemy – Polysemes are the words which have more than one meaning. Table

Using LSA vs pLSA • LSA can perform faster. LSA pLSA Mean Training time 62 sec 4261 sec (having the initial vocabulary) Mean Testing time 0.54 sec 0.71 sec (having learned the concept space)

Canonical Correlation Analysis (CCA) • Given a pair of vector sets, CCA finds the direction for each set, such that the projection of the vectors onto these directions have maximal correlation.

Canonical Correlation Analysis (CCA) (2)

Embedding into concept space using CCA Raw feature Semantic representation representation noisy Noise covariance is reduced

Constructing the semantic visual vocabulary using CCA

Local Feature Extractor

Performance of proposed method (Latent Semantic Space) on KTH dataset with different number of topics pLSA LSA

Comparison of results with the classic framework for different sizes of vocabulary (KTH)

Confusion matrix for the best result achieved using Latent Semantic Space (KTH) Best recognition accuracy: 93.94 % by pLSA with L=50, Kf=400.

Effect of changing the vocabulary size (CCA Space)

Confusion matrix for the best result on KTH dataset using CCA Best recognition accuracy: 93.39 % by Kf=700.

Comparison with reported results on KTH dataset

Learning Semantic Visual Codebook for Action Recognition by - PowerPoint PPT Presentation

Learning Semantic Visual Codebook for Action Recognition by Embedding into Concept Space Behrouz Saghafi Using Spatio-temporal Features Action recognition using silhouettes or optical flow encounters difficulties when dealing with non-uniform

Pattern Recognition Part 5: Codebook Training Gerhard Schmidt Christian-Albrechts-Universitt

2. Coding-Theoretic Foundations Source alphabet S Target alphabet {0, 1} Categories of

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Creating Semantic Mashups: Bridging Web 2.0 and the Semantic Web Jamie Taylor, Colin Evans, Toby

: on the Semantic Web : on the Semantic Web Building a Semantic Prototype for Danish Building a

Semantic Processing Augmenting CFGs Currying Quantifier scope Semantic Grammars L445 / L545

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity Semantic

Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly

Module 13 Introduction to Semantic Technology, Ontologies and the Semantic Web Module 13 Outline

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

A Model of Visual Imagery A Model of Visual Imagery John Abbondanza, OD, FCOVD John Abbondanza,

Overview Overview Visual displays Visual displays Visual and tactile displays Visual and

Application: Semantic Role Labeling CS 6956: Deep Learning for NLP Overview What is semantic

Log In Display Rates in Codebook Select View Jobs Tab Add Date & Time, Select Start the Job

Softplus Log In Display Rates in Codebook Select View Jobs Tab Add Date & Time, Select Start

Codebook and Marker Sequence Design for Synchronization-Correcting Codes Victor Buttigieg 1 Johann

Knowledge Transfer Using Latent Variable Models Ayan Acharya UT Austin, Department of ECE July

Latent Dimensions of Religion and Spirituality: A Longitudinal Correlated Topic Model Seong-Hyeon

Nonparametric spectral-based estimation of latent structures Stphane Bonhomme (Chicago), Koen

Event Generation and Statistical Sampling with Deep Generative Models Rob Verheyen Introduction

Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models

Policy Evaluation with Latent Confounders via Optimal Balance Andrew Bennett 1 Cornell University

A Tutorial on Deep Probabilistic Generative Models Ryan P. Adams Princeton University Machine

19 Auto Lecture encoders : Ankur Bambhanoliya Scribes : Donald Hamnett Motivation

Learning Semantic Visual Codebook for Action Recognition by - PowerPoint PPT Presentation

Learning Semantic Visual Codebook for Action Recognition by Embedding into Concept Space Behrouz Saghafi Using Spatio-temporal Features Action recognition using silhouettes or optical flow encounters difficulties when dealing with non-uniform

Pattern Recognition Part 5: Codebook Training Gerhard Schmidt Christian-Albrechts-Universitt

2. Coding-Theoretic Foundations Source alphabet S Target alphabet {0, 1} Categories of

Biovision team 2 Retina Visual cortex 3 Retina Visual cortex 3 Retina Visual cortex 3

Creating Semantic Mashups: Bridging Web 2.0 and the Semantic Web Jamie Taylor, Colin Evans, Toby

: on the Semantic Web : on the Semantic Web Building a Semantic Prototype for Danish Building a

Semantic Processing Augmenting CFGs Currying Quantifier scope Semantic Grammars L445 / L545

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity Semantic

Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly

Module 13 Introduction to Semantic Technology, Ontologies and the Semantic Web Module 13 Outline

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

A Model of Visual Imagery A Model of Visual Imagery John Abbondanza, OD, FCOVD John Abbondanza,

Overview Overview Visual displays Visual displays Visual and tactile displays Visual and

Application: Semantic Role Labeling CS 6956: Deep Learning for NLP Overview What is semantic

Log In Display Rates in Codebook Select View Jobs Tab Add Date &amp; Time, Select Start the Job

Softplus Log In Display Rates in Codebook Select View Jobs Tab Add Date &amp; Time, Select Start

Codebook and Marker Sequence Design for Synchronization-Correcting Codes Victor Buttigieg 1 Johann

Knowledge Transfer Using Latent Variable Models Ayan Acharya UT Austin, Department of ECE July

Latent Dimensions of Religion and Spirituality: A Longitudinal Correlated Topic Model Seong-Hyeon

Nonparametric spectral-based estimation of latent structures Stphane Bonhomme (Chicago), Koen

Event Generation and Statistical Sampling with Deep Generative Models Rob Verheyen Introduction

Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models

Policy Evaluation with Latent Confounders via Optimal Balance Andrew Bennett 1 Cornell University

A Tutorial on Deep Probabilistic Generative Models Ryan P. Adams Princeton University Machine

19 Auto Lecture encoders : Ankur Bambhanoliya Scribes : Donald Hamnett Motivation

Log In Display Rates in Codebook Select View Jobs Tab Add Date & Time, Select Start the Job

Softplus Log In Display Rates in Codebook Select View Jobs Tab Add Date & Time, Select Start