Bi-clustering and co-clustering Going further in cluster analysis - PDF document

HAL Id: hal-01810380 émanant des établissements d’enseignement et de Summer School on Clustering, Data Analysis and Visualization of Complex Data, May 2018, Catania, Going further in cluster analysis and classifjcation: Bi-clustering and co-clustering. C Biernacki. To cite this version: C Biernacki Bi-clustering and co-clustering Going further in cluster analysis and classifjcation: publics ou privés. recherche français ou étrangers, des laboratoires scientifjques de niveau recherche, publiés ou non, https://hal.inria.fr/hal-01810380 destinée au dépôt et à la difgusion de documents L’archive ouverte pluridisciplinaire HAL , est abroad, or from public or private research centers. teaching and research institutions in France or The documents may come from lished or not. entifjc research documents, whether they are pub- archive for the deposit and dissemination of sci- HAL is a multi-disciplinary open access Submitted on 7 Jun 2018 Italy. ฀hal-01810380฀

HD clustering Modeling Estimating Selecting BlockCluster in MASSICCC To go further Going further in cluster analysis and classification: Bi-clustering and co-clustering C. Biernacki Summer School on Clustering, Data Analysis and Visualization of Complex Data May 21-25 2018, University of Catania, Italy 1/66

HD clustering Modeling Estimating Selecting BlockCluster in MASSICCC To go further Outline 1 HD clustering 2 Modeling 3 Estimating 4 Selecting 5 BlockCluster in MASSICCC 6 To go further 2/66

HD clustering Modeling Estimating Selecting BlockCluster in MASSICCC To go further Motivation High dimensional (HD) data sets are now frequent: Marketing: d ∼ 10 2 microarray gene expression: d ∼ 10 2 –10 4 SNP data: d ∼ 10 6 Curves: depends on discretization but can be very high Text mining . . . Clustering has to be applied for HD datasets for the same reasons as the lower dimensional datasets: Data summary Data exploratory Preprocessing for more flexibility of a forthcoming prediction step But clustering is even more important since visualization in the HD setting can be hazardous. . . 3/66

HD clustering Modeling Estimating Selecting BlockCluster in MASSICCC To go further Today’: exponential growing of dimension 1 1 S. Alelyani, J. Tang and H. Liu (2013). Feature Selection for Clustering: A Review. Data Clustering: Algorithms and Applications , 29 4/66

HD clustering Modeling Estimating Selecting BlockCluster in MASSICCC To go further HD data: definition (1/2) An attempt in the non-parametric case Dataset x = ( x 1 , . . . , x n ), x j described by d variables, where n = o � e d � Justifications: To approximate within error ǫ a (Lipschitz) function of d variables, about (1 /ǫ ) d evaluations on a grid are required [Bellman, 61] Approximate a Gaussian distribution with fixed Gaussian kernels and with approximate error of about 10% [Silverman, 86] log 10 n ( d ) ≈ 0 . 6( d − 0 . 25) For instance, n (10) ≈ 7 . 10 5 5/66

HD clustering Modeling Estimating Selecting BlockCluster in MASSICCC To go further HD data: definition (2/2) An attempt in the parametric case Dataset x = ( x 1 , . . . , x n ), x j described by d variables and a model m with ν parameters, where n = o ( g ( ν )), with g a given function Justification: We consider the heteroscedastic Gaussian mixture with of true parameter θ ∗ with K ∗ components. We note ˆ θ the Gaussian MLE with K ∗ components. We have g linear from the following result [Michel, 08] : it exists constants κ , A and C such that � � � � � ν �� κ ν + 1 E x [Hellinger 2 (p θ ∗ , p ˆ K )] ≤ C 2 A ln d + 1 − ln 1 ∧ n A ln d . θ ˆ n n But ν can be high since ν ∼ d 2 / 2, combined with potentially large constants. 6/66

HD clustering Modeling Estimating Selecting BlockCluster in MASSICCC To go further HD density estimation: curse A two-component d -variate Gaussian mixture: π 1 = π 2 = 1 2 , X 1 | z 11 = 1 ∼ N d ( 0 , I ) , X 1 | z 12 = 1 ∼ N d ( 1 , I ) √ Components are more and more separated when d grows: � µ 2 − µ 1 � I = d . . . 4 16.5 16 3 15.5 2 15 Kullback−Leibler 1 14.5 x2 0 14 −1 13.5 −2 13 −3 12.5 −4 12 −3 −2 −1 0 1 2 3 4 1 2 3 4 5 6 7 8 9 10 x1 d . . . but density estimation quality decreases with d 7/66

HD clustering Modeling Estimating Selecting BlockCluster in MASSICCC To go further HD clustering: blessing (1/2) A two-component d -variate Gaussian mixture: π 1 = π 2 = 1 X 1 | z 11 = 1 ∼ N d ( 0 , I ) , X 1 | z 12 = 1 ∼ N d ( 1 , I ) 2 , Each variable provides equal and own separation information √ Theoretical error decreases when d grows: err theo = Φ( − d / 2). . . 4 0.4 Empirical Theoretical 3 0.35 2 0.3 1 0.25 x2 err 0 0.2 −1 0.15 −2 0.1 −3 −4 0.05 −3 −2 −1 0 1 2 3 4 1 2 3 4 5 6 7 8 9 10 x1 d . . . and empirical error rate decreases also with d ! 8/66

HD clustering Modeling Estimating Selecting BlockCluster in MASSICCC To go further HD clustering: blessing (2/2) FDA d=2 d=20 2.5 4 2 3 1.5 2 1 2nd axis FDA 2nd axis FDA 0.5 1 0 0 −0.5 −1 −1 −1.5 −2 −2 −2.5 −3 −4 −3 −2 −1 0 1 2 3 4 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 1st axis FDA 1st axis FDA d=200 d=400 5 3 4 2 3 2 1 2nd axis FDA 2nd axis FDA 1 0 0 −1 −1 −2 −2 −3 −4 −3 −1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 1st axis FDA 1st axis FDA 9/66

HD clustering Modeling Estimating Selecting BlockCluster in MASSICCC To go further HD clustering: curse (1/2) Many variables provide no separation information Same parameter setting except: X 1 | z 12 = 1 ∼ N d ((1 0 . . . 0) ′ , I ) Groups are not separated more when d grows: � µ 2 − µ 1 � I = 1. . . 4 0.44 3 0.42 2 0.4 1 0.38 0 x2 err Empirical −1 0.36 Theoretical −2 0.34 −3 0.32 −4 −5 0.3 −4 −3 −2 −1 0 1 2 3 4 1 2 3 4 5 6 7 8 9 10 x1 d . . . thus theoretical error is constant (= Φ( − 1 2 )) and empirical error increases with d 10/66

HD clustering Modeling Estimating Selecting BlockCluster in MASSICCC To go further HD clustering: curse (2/2) Many variables provide redundant separation information Same parameter setting except: X j 1 = X 1 1 + N 1 (0 , 1) ( j = 2 , . . . , d ) Groups are not separated more when d grows: � µ 2 − µ 1 � Σ = 1. . . 6 0.42 0.4 4 2 0.38 Empirical err x2 0 0.36 Theoretical −2 0.34 −4 0.32 −6 0.3 −3 −2 −1 0 1 2 3 4 1 2 3 4 5 6 7 8 9 10 x1 d . . . thus err theo is constant (= Φ( − 1 2 )) and empirical error increases (less) with d 11/66

HD clustering Modeling Estimating Selecting BlockCluster in MASSICCC To go further The trade-off bias/variance The fundamental statistical principle Always minimize an error err between truth ( z ) and estimate (ˆ z ) Gap between true ( z ) and model-based ( Z p) partitions: z ∗ = arg min ˜ z ∈Z p ∆( z , ˜ z ) z of z ∗ in Z p: any relevant method (bias, consistency, efficiency. . . ) Estimation ˆ Fundamental decomposition of the observed error err( z , ˆ z ): � � � � err( z , z ∗ ) − err( z , z ) z ) − err( z , z ∗ ) err( z , ˆ z ) = + err( z , ˆ � � � � = bias + variance � � � � = error of approximation + error of estimation 12/66

HD clustering Modeling Estimating Selecting BlockCluster in MASSICCC To go further Bias/variance in HD: reduce variance, accept bias A two-component d -variate Gaussian mixture with intra-dependency: π 1 = π 2 = 1 2 , X 1 | z 11 = 1 ∼ N d ( 0 , Σ ) , X 1 | z 12 = 1 ∼ N d ( 1 , Σ ) Each variable provides equal and own separation information Theoretical error decreases when d grows: err theo = Φ( −� µ 2 − µ 1 � Σ − 1 / 2) Empirical error rate with the (true) intra-correlated model worse with d Empirical error rate with the (false) intra-independent model better with d ! 0.38 4 0.36 3 0.34 2 Empirical corr. Empirical indep. 0.32 1 Theoretical err x2 0.3 0 0.28 −1 0.26 −2 0.24 −3 1 2 3 4 5 6 7 8 9 10 −4 −3 −2 −1 0 1 2 3 4 5 d x1 13/66

HD clustering Modeling Estimating Selecting BlockCluster in MASSICCC To go further Some alternatives for reducing variance Dimension reduction in non-canonical space (PCA-like typically) Dimension reduction in the canonical space (variable selection) Model parsimony in the initial HD space (constraints on model parameters) But which kind of parsimony? Remember that clustering is a way for dealing with large n Why not reusing this idea for large d ? Co-clustering It performs parsimony of row clustering through variable clustering 14/66

HD clustering Modeling Estimating Selecting BlockCluster in MASSICCC To go further From clustering to co-clustering [Govaert, 2011] 15/66

HD clustering Modeling Estimating Selecting BlockCluster in MASSICCC To go further Bi-clustering A generalization of co-clustering Look for submatrices of x which are homogeneous We do not consider bi-clustering here 16/66

HD clustering Modeling Estimating Selecting BlockCluster in MASSICCC To go further Outline 1 HD clustering 2 Modeling 3 Estimating 4 Selecting 5 BlockCluster in MASSICCC 6 To go further 17/66

Bi-clustering and co-clustering Going further in cluster analysis - PDF document

HAL Id: hal-01810380 manant des tablissements denseignement et de Summer School on Clustering, Data Analysis and Visualization of Complex Data, May 2018, Catania, Going further in cluster analysis and classifjcation: Bi-clustering and

Cluster Architectures Overview Cluster Computing The Problem The Solution The Anatomy

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Go Further Go Further Go Further Go Further BOB SHANKS: EXECUTIVE VICE PRESIDENT AND CFO

Go Further Go Further Go Further Go Further BOB SHANKS EXECUTIVE VICE PRESIDENT AND CFO

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

CLUSTER ANALYSIS Agenda Introduction to cluster analysis and application Feature

Clustering with k-means Introduction to Machine Learning Clustering, what? Cluster :

Clustering Algorithms Dalya Baron (Tel Aviv University) XXX Winter School, November 2018

Clustering Data Clustering with user constraints The clustering problem : Given a set of

history and drivers The Aerospace Cluster The Cluster-Association The Aerospace Cluster The

Getting started on the cluster Learning Objectives Describe the structure of a compute cluster

PAC-Bayesian Analysis of Co-clustering, Graph Clustering and Pairwise Clustering Yevgeny Seldin

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering?

What is Cluster Analysis? Dmitriy (Dima) Gorenshteyn Sr. Data Scientist, Memorial Sloan

Low Cost solution for Pose Estimation of Quadrotor mangal@iitk.ac.in

1 LHC-ATLAS Micromegas

UKACC Control 2012 D t Determination of Dynamic Flexure i ti f D i Fl Model Parameters for

INSTRUMENTATION NEEDS FOR THE CALIBRATION UNIT AS A WORK DOCUMENT Inputs from : Miguel Ardid,

EVERYONES RESPONSIBILITY Hosted by Nelson Harris, Jeff Calderone, and Nick Switzer Join Us for

Speech Processing 15-492/18-492 Speech Synthesis Building Voices Building a Voice Designing the

Cold Electronics and Ionization Charge Extraction in the MicroBooNE LArTPC New Perspectives 2018

CS 528 Mobile and Ubiquitous Computing Lecture 9b: Voice Analytics, Affect Detection &

Sambuz

Useful Links

Newsletter

Mail Us

Bi-clustering and co-clustering Going further in cluster analysis - PDF document

HAL Id: hal-01810380 manant des tablissements denseignement et de Summer School on Clustering, Data Analysis and Visualization of Complex Data, May 2018, Catania, Going further in cluster analysis and classifjcation: Bi-clustering and

Cluster Architectures Overview Cluster Computing The Problem The Solution The Anatomy

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Go Further Go Further Go Further Go Further BOB SHANKS: EXECUTIVE VICE PRESIDENT AND CFO

Go Further Go Further Go Further Go Further BOB SHANKS EXECUTIVE VICE PRESIDENT AND CFO

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

CLUSTER ANALYSIS Agenda Introduction to cluster analysis and application Feature

Clustering with k-means Introduction to Machine Learning Clustering, what? Cluster :

Clustering Algorithms Dalya Baron (Tel Aviv University) XXX Winter School, November 2018

Clustering Data Clustering with user constraints The clustering problem : Given a set of

history and drivers The Aerospace Cluster The Cluster-Association The Aerospace Cluster The

Getting started on the cluster Learning Objectives Describe the structure of a compute cluster

PAC-Bayesian Analysis of Co-clustering, Graph Clustering and Pairwise Clustering Yevgeny Seldin

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Lecture 23: Spectral clustering Hierarchical clustering What is a good clustering?

What is Cluster Analysis? Dmitriy (Dima) Gorenshteyn Sr. Data Scientist, Memorial Sloan

Low Cost solution for Pose Estimation of Quadrotor mangal@iitk.ac.in

1 LHC-ATLAS Micromegas

UKACC Control 2012 D t Determination of Dynamic Flexure i ti f D i Fl Model Parameters for

INSTRUMENTATION NEEDS FOR THE CALIBRATION UNIT AS A WORK DOCUMENT Inputs from : Miguel Ardid,

EVERYONES RESPONSIBILITY Hosted by Nelson Harris, Jeff Calderone, and Nick Switzer Join Us for

Speech Processing 15-492/18-492 Speech Synthesis Building Voices Building a Voice Designing the

Cold Electronics and Ionization Charge Extraction in the MicroBooNE LArTPC New Perspectives 2018

CS 528 Mobile and Ubiquitous Computing Lecture 9b: Voice Analytics, Affect Detection &amp;

Sambuz

Useful Links

Newsletter

Mail Us

CS 528 Mobile and Ubiquitous Computing Lecture 9b: Voice Analytics, Affect Detection &