Pitfalls in Mixtures from the Clustering Angle C. Biernacki (with - PowerPoint PPT Presentation

Overview The degeneracy problem Avoiding degeneracy The label switching problem Conclusion Pitfalls in Mixtures from the Clustering Angle C. Biernacki (with G. Castellan, S. Chr´ etien, B. Guedj, V. Vandewalle) Working Group on Model-Based Clustering Summer Session, Paris, July 17-23, 2016 1/72

Overview The degeneracy problem Avoiding degeneracy The label switching problem Conclusion Take home message Computational estimates ˜ θ are the imbricated result of five factors 1 An initial practitioner target t 2 A data set x 3 A theoretical model m 4 A theoretical estimate ˆ θ 5 An estimation algorithm A θ = f ( t , x , m , ˆ ˜ θ, A ) This talk Considered pitfalls in mixtures are degeneracy and label switching Consequences can be disastrous on ˜ θ Often, solutions are sought in m or ˆ θ We explore here also solutions through t and A Focus target t : clustering Focus algorithms A : EM, SEM, Gibbs 2/72

Overview The degeneracy problem Avoiding degeneracy The label switching problem Conclusion Outline 1 Overview 2 The degeneracy problem Individual data Binned data Missing data 3 Avoiding degeneracy Adding a minimal clustering information Strategy 1: a data-driven lower bound on variances Strategy 2: an approximate EMgood algorithm 4 The label switching problem The problem Existing solutions Proposed solution (in progress) 5 Conclusion 3/72

Overview The degeneracy problem Avoiding degeneracy The label switching problem Conclusion Unbounded likelihood d -variate g -Gaussian mixture with θ = ( { π k } , { µ k } , { Σ k } ) g 1 � − 1 � � 2 ( x − µ k ) ′ Σ − 1 ( x − µ k ) p ( x ; θ ) = π k (2 π ) d / 2 | Σ k | 1 / 2 exp k k =1 � �� p ( x ; µ k , Σ k ) x n ) i . i . d . Sampling: x = ( x 1 , . . . , ∼ p ( . ; θ ) Likelihood: ℓ ( θ ; x ) = p ( x ; θ ) particular center µ 2 = ⇒ | Σ 2 |→ 0 ℓ ( θ ; x ) = + ∞ lim x i [Kiefer and Wolfowitz, 1956] [Day, 1969] 4/72

Overview The degeneracy problem Avoiding degeneracy The label switching problem Conclusion EM behaviour: illustration 0.01 0.2 0.2 Iteration 1 Iteration 2 Iteration 50 0.15 0.15 Density Density Density 0.005 0.1 0.1 0.05 0.05 0 0 0 −10 0 10 20 −10 0 10 20 −10 0 10 20 x x x 0.2 0.2 0.2 Iteration 77 Iteration 78 Iteration 79 0.15 0.15 0.15 Density Density Density 0.1 0.1 0.1 0.05 0.05 0.05 0 0 0 −10 0 10 20 −10 0 10 20 −10 0 10 20 x x x 0.2 0.2 0.4 Iteration 80 Iteration 81 Iteration 82 0.15 0.15 0.3 Density Density Density 0.1 0.1 0.2 0.05 0.05 0.1 0 0 0 −10 0 10 20 −10 0 10 20 −10 0 10 20 x x x degeneracy may occur even when starting from large variances convergence can be slow when far from the degenerate limit convergence extremely fast near degeneracy 5/72

Overview The degeneracy problem Avoiding degeneracy The label switching problem Conclusion EM behaviour: results pi0k0 � � 1 u 0 = p i 0 k 0 , { p ik 0 } i � = i 0 component k0 degeneracy of component k 0 at x i 0 pik0 ⇔ � u 0 � → 0 xi0 xi [Biernacki and Chr´ etien, 2003] [Ingrassia and Rocci, 2009] Proposition 1: Existence of a bassin of attraction ∃ ǫ > 0 s.t. if � u 0 � ≤ ǫ then � u + 0 � = o � u 0 � with probability 1 . Proposition 2: Speed towards degeneracy is exponential ∃ ǫ > 0 , α > 0 and β > 0 s.t. if � u 0 � ≤ ǫ then, with probability 1, � � | Σ + k 0 | ≤ α/ | Σ k 0 | · exp − β/ | Σ k 0 | . 6/72

Overview The degeneracy problem Avoiding degeneracy The label switching problem Conclusion Consequences of the EM study When EM is close to degeneracy, EM mapping is contracting and reaches numerical tolerance extremely quickly ⇓ Simply starting again EM when numerical tolerance is reached (pragmatic bahaviour of EM practitioners) is now somewhat justified ⇓ However, the numerical tolerance is finally an arbitrary lower bound for | Σ k | . . . 7/72

Overview The degeneracy problem Avoiding degeneracy The label switching problem Conclusion Outline 1 Overview 2 The degeneracy problem Individual data Binned data Missing data 3 Avoiding degeneracy Adding a minimal clustering information Strategy 1: a data-driven lower bound on variances Strategy 2: an approximate EMgood algorithm 4 The label switching problem The problem Existing solutions Proposed solution (in progress) 5 Conclusion 8/72

Overview The degeneracy problem Avoiding degeneracy The label switching problem Conclusion Binned data A binned partition of R in H intervals Ω 1 , . . . , Ω H : Ω h =] α h , β h [ Individuals x i unknown, only the interval where x i lies is known Hypothesis of Gaussian mixture on x i ’s unchanged The log-likelihood is written a kh � �� H K � � � � � ℓ ( θ ) = m h ln π k f k ( x ) dx �� Ω h h =1 k =1 # Ω h � �� p ( X ∈ Ω h ) Question Does degeneracy still exists since ℓ ( θ ) ≤ 0? 9/72

Overview The degeneracy problem Avoiding degeneracy The label switching problem Conclusion Degeneracy may still happen! Proposition 3 Let for all b ∈ N sequence { ǫ b } : ǫ b > 0 and ǫ b → 0 when b → ∞ � h , h = 1 , . . . , H b � h ≥ ǫ b then m b Ω b : if β b h − α b bins h = 0 Ω h b 0 is a non-empty interval and k 0 ∈ { 1 , . . . , K } a component � � ˆ θ b is the unique consistent root of the ML associated to (Ω b h , m b h ) ℓ b ( θ ) − → ℓ b deg ( θ ) when µ k 0 ∈ Ω h 0 et Σ k 0 → 0 . deg (ˆ θ b ) ≥ ℓ b (ˆ Thus, it exists B ∈ N such that for all b > B we have ℓ b θ b ) . Sketch of proof At a first time, we have to show that, for all θ , it exists B θ ∈ N such that for all b > B θ we have ℓ b deg ( θ ) ≥ ℓ b ( θ ). Then, we conclude by noting that B = sup θ B θ . 10/72

Overview The degeneracy problem Avoiding degeneracy The label switching problem Conclusion Meaning If dimension of non-empty bins is “small enough”, then the global maximum of the likelihood is obtained in a degenerate situation Histogram and mixture density Histogram and mixture density 0.7 3 degenerate mixture (L=−12.69) degenerate mixture (L=−20.9) undegenerate mixture (L=−11.44) undegenerate mixture (L=−21.11) 0.6 histogram (bar width=1) 2.5 histogram (bar width=0.2) 0.5 2 0.4 1.5 0.3 1 0.2 0.5 0.1 0 0 −1 0 1 2 3 4 5 6 7 −1 0 1 2 3 4 5 6 7 x x 11/72

Overview The degeneracy problem Avoiding degeneracy The label switching problem Conclusion EM behaviour in a degeneracy neighborough? Remind � � component k 0 degenerates inside Ω h 0 ⇔ µ k 0 ∈ Ω h 0 and Σ k 0 → 0 Notations Ω h ′ 0 : bin the closest to the center µ k 0 (left or right of Ω h 0 ) γ : borderline of Ω h 0 the closest to µ k 0 (either α h 0 , or β h 0 ) η = | γ − µ k 0 | : distance between the center and the closest center σ = sign( γ − µ k 0 ) and u = Σ k 0 f k 0 ( γ ) R h = ( π k 0 + A k 0 h 0 ) / A k 0 h with A k 0 h = � k � = k 0 π k a kh 12/72

Overview The degeneracy problem Avoiding degeneracy The label switching problem Conclusion Possibility to be attracted around degeneracy Proposition 4 It exists ǫ > 0 such that, if 0 < Σ k 0 < ǫ η ∈ ( δ, ∆ − � Σ k 0 ) with 0 < δ < ∆ < ( β h 0 − α h 0 ) / 2 m h ′ 1 − 0 m h 0 R h ′ 0 > 0 then,   � m h ′ �  δ  e − ∆ 2 / (2Σ k 0 ) 0 < Σ +  0  k 0 < Σ k 0  1 − 1 − R h ′  2 � 2 π Σ k 0  m h 0 0  � �� ρ and � � � η + ∈ Σ + δ, ∆ − . k 0 13/72

Overview The degeneracy problem Avoiding degeneracy The label switching problem Conclusion sketch of proof It relies on Taylor expansions around Σ k 0 = 0 with µ k 0 ∈ Ω h 0 µ + Σ + k 0 = µ k 0 − σρ u + o ( u ) and k 0 = Σ k 0 − ηρ u + o ( u ) . Then the inequality on Σ k 0 arises easily. For the second expression, we obtain in the same manner (for Σ k 0 “small enough”) � δ < | γ − µ + Σ + k 0 | < ∆ − k 0 . k 0 | < ∆ < ( β h 0 − α h 0 ) / 2 and so γ + = γ (the closest borderline is kept Thus | γ − µ + unchanged). Since η + = | γ − µ + k 0 | , conclusion follows. 14/72

Overview The degeneracy problem Avoiding degeneracy The label switching problem Conclusion Attraction or repulsion? Around a degenerate solution, EM runs closer or further depending on the sign of ρ which itself depends on the sample size of the “closest” bin. Attraction: ρ > 0 from the theorem, if Σ k 0 is “close enough” to 0 and µ k 0 ∈ Ω h 0 then 0 < Σ + µ + k 0 < Σ k 0 [1 − ρ × | fcte( θ ) | ] and k 0 ∈ Ω h 0 � �� Σ k 0 decreases Repulsion: ρ < 0 Taylor: Σ + k 0 = Σ k 0 − ηρ u + o ( u ) ⇒ Σ k 0 increases 15/72

Pitfalls in Mixtures from the Clustering Angle C. Biernacki (with - PowerPoint PPT Presentation

Overview The degeneracy problem Avoiding degeneracy The label switching problem Conclusion Pitfalls in Mixtures from the Clustering Angle C. Biernacki (with G. Castellan, S. Chr etien, B. Guedj, V. Vandewalle) Working Group on Model-Based

ALUMINUM ANGLE ARCH ALUMINUM ANGLE 1-1/2x1-1/2x1/8x20 ARCH ALUMINUM ANGLE 1x1x1/16x20 6063 ARCH

Right Angle: An angle whose measure is 90. Straight Angle: An angle whose measure is 180.

Cert-Lexsi Cert-Lexsi Dead angle ( Torpig vs PRG) Dead angle ( Torpig vs PRG) Dead angle (

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Analysis of a model of elastic plastic mixtures (Prandtl-Reuss-mixtures) Project of Josef

Phase Angle What is it & why is it Important? Phase Angle Basics What is Phase Angle and

3D orientation Rotation matrix Fixed angle and Euler angle Axis angle

3D orientation Rotation matrix Fixed angle and Euler angle Axis angle

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

WACREN 4 TH ANNUAL CONFERENCE LOME, TOGO MARCH 15-16, 2018 Building Capacities for

Friday 29th December 2017 1 SIGNIFICANT OPPORTUNITIES 2nd largest economy in Africa

DIFFERENCES IN RELIGION(R) SPIRITUALITY(S), SECULARISM (SE) Confusion in Meaning S differs from R

BNM M Inst stit itute ute of Technolo ology gy (Approved by AICTE, Affiliated to VTU, ISO

The Equipm ent nat ure of t he so- c alled IT Art ifac t s Holist ic ont ology based

IN5320 - Development in Platform Ecosystems Lecture 6: Information systems and complexity 24th of

Extending propositional separation logic for robustness properties of separation logic Alessio

Decidable fragments of first-order logic, and combinations Pascal Fontaine GF joint work with

Pitfalls in Mixtures from the Clustering Angle C. Biernacki (with - PowerPoint PPT Presentation

Overview The degeneracy problem Avoiding degeneracy The label switching problem Conclusion Pitfalls in Mixtures from the Clustering Angle C. Biernacki (with G. Castellan, S. Chr etien, B. Guedj, V. Vandewalle) Working Group on Model-Based

ALUMINUM ANGLE ARCH ALUMINUM ANGLE 1-1/2x1-1/2x1/8x20 ARCH ALUMINUM ANGLE 1x1x1/16x20 6063 ARCH

Right Angle: An angle whose measure is 90. Straight Angle: An angle whose measure is 180.

Cert-Lexsi Cert-Lexsi Dead angle ( Torpig vs PRG) Dead angle ( Torpig vs PRG) Dead angle (

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Analysis of a model of elastic plastic mixtures (Prandtl-Reuss-mixtures) Project of Josef

Phase Angle What is it &amp; why is it Important? Phase Angle Basics What is Phase Angle and

3D orientation Rotation matrix Fixed angle and Euler angle Axis angle

3D orientation Rotation matrix Fixed angle and Euler angle Axis angle

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Clustering Hierarchical clustering, k-mean clustering Genome 559: Introduction to Statistical and

WACREN 4 TH ANNUAL CONFERENCE LOME, TOGO MARCH 15-16, 2018 Building Capacities for

Friday 29th December 2017 1 SIGNIFICANT OPPORTUNITIES 2nd largest economy in Africa

DIFFERENCES IN RELIGION(R) SPIRITUALITY(S), SECULARISM (SE) Confusion in Meaning S differs from R

BNM M Inst stit itute ute of Technolo ology gy (Approved by AICTE, Affiliated to VTU, ISO

The Equipm ent nat ure of t he so- c alled IT Art ifac t s Holist ic ont ology based

IN5320 - Development in Platform Ecosystems Lecture 6: Information systems and complexity 24th of

Extending propositional separation logic for robustness properties of separation logic Alessio

Decidable fragments of first-order logic, and combinations Pascal Fontaine GF joint work with

Phase Angle What is it & why is it Important? Phase Angle Basics What is Phase Angle and