SUBSPACE CLUSTERING Sylvain Calinon Robot Learning & - PowerPoint PPT Presentation

EE613 Machine Learning for Engineers SUBSPACE CLUSTERING Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Oct. 25, 2017 1

SUBSPACE CLUSTERING (Wed, Oct. 25) HIDDEN MARKOV MODELS (Wed, Nov. 1) LINEAR REGRESSION (Thu, Nov. 9) GAUSSIAN MIXTURE REGRESSION (Wed, Dec. 13) GAUSSIAN PROCESS REGRESSION (Wed, Dec. 20) Time series analysis and synthesis, Multivariate data processing 2

Outline • High-dimensional data clustering (HDDC) Matlab code: demo_HDDC01.m • Mixture of factor analyzers (MFA) Matlab code: demo_MFA01.m • Mixture of probabilistic principal component analyzers (MPPCA) Matlab code: demo_MPPCA01.m • GMM with semi-tied covariance matrices Matlab code: demo_semitiedGMM01.m 3

Introduction K clusters N datapoints D dimensions (original space) d dimensions (latent space) Subspace clustering aims at clustering data while reducing the dimension of each cluster (cluster-dependent subspace) Considering the two problems separately (clustering, then subspace projection) can be inefficient and can produce poor local optima, especially when datapoints of high dimensions are considered. 4

Example of application: Whole body motion Image: Dominici et al. (2010), J NEUROPHYSIOL  About 90% of variance in walking motion can be explained by 2 principal components  Each type of periodic motion can be characterized by a different subspace Walking Running Walking  Requires clustering of the complete motion into different locomotion phases  Requires extraction of coordination patterns for each cluster 5

Curse of dimensionality in GMM encoding K clusters N datapoints D dimensions (original space) d dimensions (latent space) Image: datasciencecentral.com 6

Curse of dimensionality Some characteristics of high-dimensional spaces can ease the classification of data. Indeed, having different groups living in different subspaces may be a useful property for discriminating the groups. Subspace clustering exploits the phenomenon that high-dimensional spaces are mostly empty to ease the discrimination between groups of points.  Curse of dimensionality or… blessing of dimensionality? 7

N datapoints Curse of dimensionality D dimensions (original space) d dimensions (latent space) Bouveyron and Brunet (2014, COMPUT STAT DATA AN ) reviewed various ways of handling the problem of high-dimensional data in clustering problems: 1. Since D is too large w.r.t. N , a global dimensionality reduction should be applied as a pre-processing step to reduce D . 2. Since D is too large w.r.t. N , the solution space contains many poor local optima. The solution space should be smoothed by introducing ridge or lasso regularization in the estimation of the covariance (avoiding numerical problem and singular solutions when inverting the covariances). A simple form of regularization can be achieved after the maximization step of each EM loop. 3. Since D is too large w.r.t. N , the model is probably over-parametrized, and a more parsimonious model should be used (thus estimating a fewer number of parameters). 8

Gaussian Mixture Model (GMM) K Gaussians N datapoints of dimension D Equidensity contour of one standard deviation 9

Covariance structures in GMM 10

Multivariate normal distribution - Stochastic sampling 11

Expectation-maximization (EM) 12

Expectation-maximization (EM) M-step Converge? Stop Initial guess E-step 13

EM for GMM 14

EM for GMM 15

EM for GMM 16

EM for GMM 17

EM for GMM: Resulting procedure K Gaussians N datapoints These results can be intuitively interpreted in terms of normalized counts. EM provides a systematic approach to derive such procedure.  Weighted averages taking into account the responsibility of each datapoint in each cluster. 18

EM for GMM 19

EM for GMM: Local optima issue 20

Local optima in EM EM will improve the likelihood at each iteration, but it can get trapped into poor local optima in the solution space  Parameters initialization is important! Log-likelihood Unknown solution space Parameter space 21

Parameters estimation in GMM… in 1893 54 pages! Proposed solution: Moment-based approach requiring to solve a polynomial of degree 9… … which does not mean that moment - based approaches are old-fashioned! They are actually today popular again with new developments related to spectral decomposition.

High-dimensional data clustering (HDDC) Matlab code: demo_HDDC01.m [C. Bouveyron and C. Brunet. Model-based clustering of high-dimensional data: A review. Computational Statistics and Data Analysis, 71:52 – 78, March 2014] 23

Curse of dimensionality Bouveyron and Brunet (2014, COMPUT STAT DATA AN ) reviewed various ways of viewing the problem and coping with high-dimensional data in clustering problems: 1. Since D is too large wrt N , a global dimensionality reduction should be applied as a pre-processing step to reduce D . 2. Since D is too large wrt N , the solution space contains many poor local optima; the solution space should be smoothed by introducing ridge or lasso regularization in the estimation of the covariance (avoiding numerical problem and singular solutions when inverting the covariances). A simple form of regularization can be achieved after the maximization step of each EM loop. 3. Since D is too large wrt N , the model is probably over-parametrized, and a more parsimonious model should be used (thus estimating a fewer number of parameters). 24

Regularization of the GMM parameters The introduction of a regularization term can change the shape of the solution space Log-likelihood Unknown solution space Parameter space 25

Regularization of the GMM parameters Regularization with minimal admissible eigenvalue: Tikhonov regularization with diagonal isotropic covariance:

High-dimensional data clustering (HDDC) 27

Mixture of factor analyzers (MFA) Matlab code: demo_MFA01.m [P. D. McNicholas and T. B. Murphy. Parsimonious Gaussian mixture models. Statistics and Computing, 18(3):285 – 296, September 2008] 28

Mixture of factor analyzers (MFA) 29

Mixture of factor analyzers (MFA): graphical model 31

Estimation of parameters in MFA 34

Alternating Expectation Conditional Maximization (AECM) 35

AECM for MFA (UUU model in McNicholas and Murphy, 2008) covariance as in GMM 36

AECM for MFA (UUU model in McNicholas and Murphy, 2008) Same as standard GMM 37 covariance as in GMM

Mixture of probabilistic PCA (MPPCA) Matlab code: demo_MPPCA01.m [M. E. Tipping and C. M. Bishop. Mixtures of probabilistic principal component analyzers. Neural Computation, 11(2):443 – 482, 1999] 38

Mixture of probabilistic PCA (MPPCA) covariance as in GMM 39

A taxonomy of parsimonious GMMs D in the slides of this lecture [C. Bouveyron and C. Brunet. Model-based clustering of high-dimensional data: A review. Computational Statistics and Data Analysis, 71:52 – 78, March 2014] 40

GMM with semi-tied covariance matrices Matlab code: demo_semitiedGMM01.m [M. J. F. Gales. Semi-tied covariance matrices for hidden Markov models. IEEE Trans. on Speech and Audio Processing, 7(3):272 – 281, 1999] 41

Sharing of parameters in mixture models 42

GMM with semi-tied covariance matrices H 43

GMM with semi-tied covariance matrices 44

GMM with semi-tied covariance matrices 45

GMM with semi-tied covariance matrices covariance as in GMM 46

Summary of relevant covariance structures H 47

Main references Parsimonious GMM C. Bouveyron and C. Brunet. Model-based clustering of high-dimensional data: A review. Computational Statistics and Data Analysis, 71:52 – 78, March 2014 P. D. McNicholas and T. B. Murphy. Parsimonious Gaussian mixture models. Statistics and Computing, 18(3):285 – 296, September 2008 MFA G. J. McLachlan, D. Peel, and R. W. Bean. Modelling high-dimensional data by mixtures of factor analyzers. Computational Statistics and Data Analysis, 41(3-4):379 – 388, 2003 G. E. Hinton, P. Dayan, and M. Revow. Modeling the manifolds of images of handwritten digits. IEEE Trans. on Neural Networks, 8(1):65 – 74, 1997 MPPCA M. E. Tipping and C. M. Bishop. Mixtures of probabilistic principal component analyzers. Neural Computation, 11(2):443 – 482, 1999 GMM with semi-tied covariances M. J. F. Gales. Semi-tied covariance matrices for hidden Markov models. IEEE Trans. on Speech and Audio Processing, 7(3):272 – 281, 1999 48

General textbooks 49

SUBSPACE CLUSTERING Sylvain Calinon Robot Learning & - PowerPoint PPT Presentation

EE613 Machine Learning for Engineers SUBSPACE CLUSTERING Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Oct. 25, 2017 1 SUBSPACE CLUSTERING (Wed, Oct. 25) HIDDEN MARKOV MODELS (Wed, Nov. 1) LINEAR

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Subspace Polynomials and Cyclic Subspace Codes Netanel Raviv Joint work with: Prof. Tuvi Etzion

Graph based Subspace Segmentation Canyi Lu National University of Singapore Nov. 21, 2013

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Paper Presentation (EE698M) Abhay Kumar Subspace clustering Cluster data drawn from multiple

Neu Neural C al Collab ollabor orativ ive e Subspace ce Clustering Tong Zhang, Pan Ji ,

0 -Sparse Subspace Clustering Yingzhen Yang 1 , Jiashi Feng 2 , Nebojsa Jojic 3 , Jianchao Yang

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Motivation High Dimensional Issues Subspace Clustering Full Dimensional Clustering Issues

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

CSCE 478/878 Lecture 8: Stephen Scott Clustering Introduction Outline Clustering Stephen

Instrumental Variables Regression, GMM, and Weak Instruments in Time Series James H. Stock

CS 6316 Machine Learning Clustering Yangfeng Ji Department of Computer Science University of

Expectation-Maximization Algorithm. Petr Pok Czech Technical University in Prague Faculty of

EM Algorithm 09-09-2019 For Mixture Gaussian Models Instructor - Sriram Ganapathy

Graph Neural Network to label particle hits in Liquid Argon Time Projection Chamber Hanfei Cui

Clustering: Models and Algorithms Shikui Tu 2019-03-07 1 Outline Gaussian Mixture Models

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 21: Speaker

Lecture 20 Lecture 20 Nov 12 th 2008 Clustering with Mixture of Gaussians Clustering with Mixture

Sambuz

Useful Links

Newsletter

Mail Us