unifying data units and models in co clustering
play

Unifying Data Units and Models in (Co-)Clustering C. Biernacki - PowerPoint PPT Presentation

Introduction Units in model-based clustering Units in model-based co-clustering Conclusion Unifying Data Units and Models in (Co-)Clustering C. Biernacki Joint work with A. Lourme 24 e rencontres de la Soci et e Francophone de


  1. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion Unifying Data Units and Models in (Co-)Clustering C. Biernacki Joint work with A. Lourme 24 e rencontres de la Soci´ et´ e Francophone de Classification 28-30 juin 2017 – Lyon – Fance 1/48

  2. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion Quizz! y = β x 2 + e Is it a linear regression on co-variates ( x 2 )? Is it a quadratic regression on co-variates x ? Both! 2/48

  3. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion Take home message Units are entirely interrelated with models This part: Be aware that interpretation of (“classical”) models is unit dependent Models should even be revisited as a couple units × “classical” models Opportunity for cheap/wide/meaningful enlarging of “classical” model families Focus on model-based (co-)clustering but larger potential impact 3/48

  4. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion Outline 1 Introduction 2 Units in model-based clustering Scale units and parsimonious Gaussians Non scale units and Gaussians Class conditional units and Gaussians Units and Poissons 3 Units in model-based co-clustering Model for di ff erent kinds of data Units and Bernoulli Units and multinomial 4 Conclusion Summary Units and other distributions 4/48

  5. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion General (model-based) statistical framework Data: Whole data set composed by n objects, described by d variables x = ( x 1 , . . . , x n ) with x i = ( x i 1 , . . . , x id ) ∈ X Each x i value is provided with a unit id We note “ id ” since units are often user defined (a kind of canonical units) Model: A pdf 1 family, indexed by m ∈ M 2 p m = {· ∈ X �→ p( · ; θ ) : θ ∈ Θ m } With p( · ; θ ) a (parametric) pdf and Θ m a space where evolves this parameter Target: � target = f ( x , p m ) Unit id is hidden everywhere and could have consequences on the target estimation! 1 probability density function 2 Often, the index m is confounded with the distribution family itself as a shortcut 5/48

  6. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion Changing the data units Principle of data units transformation u : X = X id X u u : − → x = x id = id ( x ) x u = u ( x ) �− → u is a bijective mapping to preserve the whole data set information quantity We denote by u − 1 the reciprocal of u , so u − 1 ◦ u = id Thus, id is only a particular unit u Often a meaningful restriction 3 on u : it proceeds lines by lines and rows by rows u ( x ) = ( u ( x 1 ) , . . . , u ( x n )) with u ( x i ) = ( u 1 ( x i 1 ) , . . . , u d ( x id )) Advantage to respect the variable definition, transforming only its unit u ( x i ) means that u applied to the data set x i , restricted to the single individual i u j corresponds to the specific (bijective) transformation unit associated to variable j 3 Possibility to relax this restriction, including for instance linear transformations involved in PCA (principal component analysis). But the variable definition is no longer respected. 6/48

  7. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion Revisiting units as a modelling component Explicitly exhibiting the “canonical” unit id in the model p m = {· ∈ X �→ p( · ; θ ) : θ ∈ Θ m } = {· ∈ X id �→ p( · ; θ ) : θ ∈ Θ m } = p id m Thus the variable space and the probability measure are embedded As the standard probability theory: a couple (variable space,probability measure)! Changing id into u , while preserving m , is expected to produce a new modelling m = {· ∈ X u �→ p( · ; θ ) : θ ∈ Θ m } . p u A model should be systematically defined by a couple ( u , m ), denoted by p u m 7/48

  8. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion Interpretation and identifiability of p u m Standard probability theory (again): there exists a measure u − 1 ( m ) s.t. 4 u − 1 ( m ) ∈ { m ′ ∈ M : p id m ′ = p u m } There exists two alternative interpretations of strictly the same model: p u m : data measured with unit u arise from measure m ; p id u − 1( m ) : data measured with unit id arise from measure u − 1 ( m ) Two points of view: Statistician The model p u m is not identifiable over the couple ( m , u ) Practitioner Freedom to choose the interpretation which is the most meaningful for him 4 This set is usually restricted to a single element 8/48

  9. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion Opportunity for designing new models Great opportunity to build easily numerous new meaningful models p u m ! Just combine a standard model family { m } with a standard unit family { u } New family can be huge! Combinatorial problems can occur. . . Some model stability can exist in some (specific) cases: m = u − 1 ( m ) 9/48

  10. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion Model selection As any model, possible to choose between p u 1 m 1 and p u 2 m 2 However, caution when using likelihood-based model selection criteria (as BIC) Prohibited to compare m 1 in unit u 1 and m 2 in unit u 2 But allowed after transforming in identical unit id Thus compare their equivalent expression: p id ( m 1 ) and p id u − 1 u − 1 ( m 2 ) 1 2 Example for abs. continuous x and di ff erentiable u , the density transform in id is: u − 1 ( m ) = {· ∈ X id �→ p( u ( · ); θ ) × | J u ( · ) | : θ ∈ Θ m } p id with J u ( · ) the Jacobian associated to the transformation u 10/48

  11. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion Focus on the clustering target A current challenge is to enlarge model collection. . . and units could contribute to it! Model: mixture model m of parameter θ = { π k , α k } g k =1 g � p m ( ; θ ) = π k p( ; α k ) k =1 g is the number of clusters Clusters correspond to a hidden partition z = ( z 1 , . . . , z n ), where z i ∈ { 1 , . . . , g } π k = p( Z = k ) and p( ; α k ) = p( = | Z = k ) Target: estimate z (and often g ) Estimate ˆ θ m by maximum likelihood (typically) i = x i ; ˆ Estimate z by the MAP principle ˆ z i = arg max k ∈ { 1 ,..., g } p( Z i = k | θ m ) Estimate g by BIC or ICL criteria typically (maximum likelihood based criteria) 11/48

  12. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion Outline 1 Introduction 2 Units in model-based clustering Scale units and parsimonious Gaussians Non scale units and Gaussians Class conditional units and Gaussians Units and Poissons 3 Units in model-based co-clustering Model for di ff erent kinds of data Units and Bernoulli Units and multinomial 4 Conclusion Summary Units and other distributions 12/48

  13. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion 14 spectral models on Σ k X = R d d -variate Gaussian model m : p m ( · ; α k ) = N d ( µ k , Σ k ) [Celeux & Govaert, 1995] 5 propose the following eigen decomposition · D ′ Σ k = λ k · D k · Λ k k ���� ���� ���� volume orientation shape x 2 0.12 0.1 a k λ k 0.08 f(x) 0.06 λ k α k 0.04 a k x 1 0.02 µ k 4 0 − 2 2 0 0 2 − 2 4 6 − 4 x2 x1 5 Celeux, G., and Govaert, G.. Gaussian parsimonious clustering models. Pattern Recognition, 28(5), 781–793 (1995). 13/48

  14. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion Scale unit invariance Consider scale unit transformation u ( x ) = Dx , with diagonal D ∈ R d × d Very current transformation: standard units (mm, cm), standardized units [Biernacki & Lourme, 2014] listed models where invariance holds (8 among 14) The general model is invariant: k ] = u − 1 ([ λ k ′ ′ [ λ k k ]) k Λ k k Λ k An example of not invariant model: ′ ] ̸ = u − 1 ([ λ k ′ ]) [ λ k Λ k Λ k Do not forget to compare all models m ′ = u − 1 ( m ) in unit id for BIC / ICL validity Use the Rmixmod package 14/48

  15. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion MASSICCC platform for the MIXMOD software https://massiccc.lille.inria.fr/ 15/48

  16. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion Illustration on the Old Faithful geyser data set All models are with free proportions ( π k ) All ICL values are expressed with the initial unit id =min × min We observe the e ff ect of unit on the ICL ranking for some models Cheap opportunity to enlarge the model family! u scale1 = (sec , min) u scale2 = (stand , stand) id = (min , min) ICL id ICL id ICL id family m m m All mod. ′ ] ′ ] ′ [ λ k Λ k 1 160 . 3 [ λ k Λ k 1 158 . 7 [ λ k k Λ k ] 1 160 . 3 General mod. ′ ′ ′ [ λ k k Λ k k ] 1 161 . 4 [ λ k k Λ k k ] 1 161 . 4 [ λ k k Λ k k ] 1 161 . 4 16/48

  17. Introduction Units in model-based clustering Units in model-based co-clustering Conclusion Outline 1 Introduction 2 Units in model-based clustering Scale units and parsimonious Gaussians Non scale units and Gaussians Class conditional units and Gaussians Units and Poissons 3 Units in model-based co-clustering Model for di ff erent kinds of data Units and Bernoulli Units and multinomial 4 Conclusion Summary Units and other distributions 17/48

Recommend


More recommend