Flexible Mixture Modeling and Model-Based Clustering in R Bettina - PowerPoint PPT Presentation

Estimation The log-likelihood for N independent observations is given by � K N � � � log L (Θ) = ℓ (Θ) = log π k f k ( y n | x n , ϑ k ) . n = 1 k = 1 Maximum-likelihood (ML) estimation: Direct optimization of likelihood (mostly in simpler cases). Expectation-maximization (EM) algorithm for more complicated models (Dempster, Laird and Rubin, 1977). EM followed by direct optimization for estimate of Hessian. . . . Bayesian estimation: Gibbs sampling (Diebolt and Robert, 1994) Markov chain Monte Carlo algorithm Applicable when the joint posterior distribution is not known explicitly, but the conditional posterior distributions of each variable/subsets of variables are known. Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 16 / 170

EM algorithm General method for ML estimation in models with unobserved latent variables. The complete-data log-likelihood contains the observed and the unobserved / missing data. This is easier to estimate. Iterates between E-step , which computes the expectation of the complete-data log-likelihood, and M-step , where the expected complete-data log-likelihood is maximized. Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 17 / 170

Missing data The component label vectors z n = ( z nk ) k = 1 ,..., K are treated as missing data. It holds that z nk ∈ { 0 , 1 } and � K k = 1 z nk = 1 for all n = 1 , . . . , N . The complete-data log-likelihood is given by K N � � log L c (Θ) = z nk [log π k + log f k ( y n | x n , ϑ k )] . k = 1 n = 1 Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 18 / 170

EM algorithm: E-step Given the current parameter estimates Θ ( i ) replace the missing data z nk by the estimated a-posteriori probabilities π ( i ) k f k ( y n | x n , ϑ ( i ) k ) z ( i ) nk = P ( k | y n , x n , Θ ( i ) ) = ˆ . K � π ( i ) u f k ( y n | x n , ϑ ( i ) u ) u = 1 The conditional expectation of log L c (Θ) at the i th step is given by Q (Θ; Θ ( i ) ) = E Θ ( i ) [log L c (Θ) | y , x ] K N z ( i ) � � ˆ = nk [log π k + log f k ( y n | x n , ϑ k )] . k = 1 n = 1 Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 19 / 170

EM algorithm: M-step The next parameter estimate is given by: Θ ( i + 1 ) = arg max Q (Θ; Θ ( i ) ) . Θ The estimates for the component sizes are given by: N = 1 π ( i + 1 ) z ( i ) � ˆ nk . k N n = 1 The component specific parameter estimates are determined by: N ϑ ( i + 1 ) z ( i ) � ˆ nk log( f k ( y n | x n , ϑ k )) . = arg max k ϑ k n = 1 ⇒ Weighted ML estimation of the component specific model. Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 20 / 170

Mixtures of two univ. normal distributions: Example Finite mixture of two univariate normal distributions: h ( y | Θ) = π f N ( y | µ 1 , σ 2 1 ) + ( 1 − π ) f N ( y | µ 2 , σ 2 2 ) , where f N ( ·| µ, σ 2 ) is the density of the univariate normal distribution with mean µ and variance σ 2 . Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 21 / 170

Mixtures of two univ. normal distributions: Example / 2 Initialize the parameters µ ( 0 ) 1 , σ ( 0 ) 1 , µ ( 0 ) 2 , σ ( 0 ) and π ( 0 ) . Set i = 0. 1 2 E-step: Determine the a-posteriori probabilities. The contributions 2 to the likelihood for each observation and component are: z ( i ) n 1 = π ( i ) f N ( y n | µ ( i ) 1 ) ( i ) ) , 1 , ( σ 2 ˜ z ( i ) n 2 = ( 1 − π ( i ) ) f N ( y n | µ ( i ) 2 ) ( i ) ) . 2 , ( σ 2 ˜ The a-posteriori probability is given by: z ( i ) ˜ z ( i ) n 1 n 1 = . z ( i ) z ( i ) ˜ n 1 + ˜ n 2 Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 22 / 170

Mixtures of two univ. normal distributions: Example / 3 M-step: 3 N π ( i + 1 ) = 1 z ( i ) � n 1 N n = 1 N N 1 1 µ ( i + 1 ) z ( i ) µ ( i + 1 ) ( 1 − z ( i ) � � = n 1 y n , = n 1 ) y n 1 2 N π ( i ) N ( 1 − π ( i ) ) n = 1 n = 1 N 1 1 ) ( i + 1 ) = � z ( i ) n 1 ( y n − µ ( i ) ( σ 2 1 ) 2 N π ( i ) n = 1 N 1 2 ) ( i + 1 ) = ( 1 − z ( i ) n 1 )( y n − µ ( i ) ( σ 2 � 2 ) 2 N ( 1 − π ( i ) ) n = 1 Increase i by 1. Iterate between E- and M-step until convergence. 4 Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 23 / 170

EM algorithm Advantages: The likelihood is increased in each step → EM algorithm converges for bounded likelihoods. Relatively easy to implement: Different mixture models require only different M-steps. Weighted ML estimation of the component specific model is sometimes already available in standard software. Disadvantages: Standard errors have to be determined separately as the information matrix is not required during the algorithm. Convergence only to a local optimum. Slow convergence. Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 24 / 170

Why does EM work? Given the definition of conditional probabilities it holds P ( z | y , Θ ′ ) = P ( z , y | Θ ′ ) P ( y | Θ ′ ) . This can be transformed to P ( y | Θ ′ ) = P ( z , y | Θ ′ ) P ( z | y , Θ ′ ) . For the log-likelihoods it thus holds ℓ (Θ ′ ; y ) = ℓ c (Θ ′ ; y , z ) − ℓ 1 (Θ ′ ; z | y ) . Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 25 / 170

Why does EM work? / 2 Taking expectation with respect to the conditional density of y , z | y , Θ gives ℓ (Θ ′ ; y ) = E [ ℓ c (Θ ′ ; y , z ) | y , Θ] − E [ ℓ 1 (Θ ′ ; z | y ) | y , Θ] = Q (Θ ′ , Θ) − R (Θ ′ , Θ) . R (Θ ′ , Θ) is the expected value of the log-likelihood of a density with parameter Θ ′ taken with respect to the same density with parameter value Θ . Based on the Jensen’s inequality it follows that R (Θ ′ , Θ) is maximum for Θ ′ = Θ . If Θ ′ maximizes Q (Θ ′ , Θ) , it holds ℓ (Θ ′ ; y ) − ℓ (Θ; y ) = ( Q (Θ ′ , Θ) − Q (Θ , Θ)) − ( R (Θ ′ , Θ) − R (Θ , Θ)) ≥ 0 . Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 26 / 170

EM as a Max-Max method The EM algorithm can also be interpreted as a joint maximization method. Consider the following function: F (Θ ′ , ˜ P [ ℓ c (Θ ′ ; y , z )] − E ˜ P [log ˜ P ) = E ˜ P ( z )] . For fixed Θ ′ the function F with respect to ˜ P ( z ) is maximized by P ( z ) = P [ z | y , Θ ′ ] . ˜ In the M-step F () is maximized with respect Θ ′ keeping ˜ P fixed. The observed log-likelihood and F (Θ ′ , ˜ P ) have the same value for ˜ P ( z ) = P [ z | y , Θ ′ ] . Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 27 / 170

EM as a Max-Max method / 2 0.5 0.7 0.9 Model parameter 0 . 8 0.6 Latent data parameter Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 28 / 170

EM algorithm: Variants Classification EM (CEM): assigns each observation to the component with the maximum a-posteriori probability. In general faster convergence than EM. Convergence to the classification likelihood. Stochastic EM (SEM): assigns each observation to one component by drawing from the multinomial distribution induced by the a-posteriori probabilities. Does not converge to the “closest” local optimum given the initial values. If started with too many components, some components will eventually become empty and will be eliminated. Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 29 / 170

Bayesian estimation Determine the posterior density using Bayes’ theorem p (Θ | Y , X ) ∝ h ( Y | X , Θ) p (Θ) , where p (Θ) is the prior and Y = ( y n ) n and X = ( x n ) n . Standard prior distributions: Proper priors: Improper priors give improper posteriors. Independent priors for the component weights and the component specific parameters. Conjugate priors for the complete likelihood. Dirichlet distribution D ( e 0 , 1 , . . . , e 0 , K ) for the component weights which is the conjugate prior for the multinomial distribution. Priors on the component specific parameters depend on the underlying distribution family. Invariant priors, e.g. the parameter for the Dirichlet prior is constant over all components: e 0 , k ≡ e 0 . Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 30 / 170

Estimation: Gibbs sampling Starting with Z 0 = ( z 0 n ) n = 1 ,..., N repeat the following steps for i = 1 , . . . , I 0 , . . . , I + I 0 . Parameter simulation conditional on the classification Z ( i − 1 ) : 1 n = 1 z ( i − 1 ) Sample π 1 , . . . , π K from D (( � N + e 0 , k ) k = 1 ,..., K ) . 1 nk Sample component specific parameters from the 2 complete-data posterior p ( ϑ 1 , . . . , ϑ K | Z ( i − 1 ) , Y ) Store the actual values of all parameters Θ ( i ) = ( π ( i ) k , ϑ ( i ) k ) k = 1 ,..., K . Classification of each observation ( y n , x n ) conditional on knowing 2 Θ ( i ) : Sample z ( i ) from the multinomial distribution with parameter equal n to the posterior probabilities. After discarding the burn-in draws the draws I 0 + 1 , . . . , I + I 0 can be used to approximate all quantities of interest (after resolving label switching). Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 31 / 170

Example: Gaussian distribution Assume an independence prior p ( µ k , Σ − 1 k ) ∼ f N ( µ k ; b 0 , B 0 ) f W (Σ − 1 k ; c 0 , C 0 ) . Parameter simulation conditional on the classification Z ( i − 1 ) : 1 Sample π ( i ) 1 , . . . , π ( i ) n = 1 z ( i − 1 ) K from D (( � N + e 0 , k ) k = 1 ,..., K ) . 1 nk k ) ( i ) in each group k from a Wishart Sample (Σ − 1 2 W ( c k ( Z ( i − 1 ) ) , C k ( Z ( i − 1 ) )) distribution. Sample µ ( i ) in each group k from a 3 k N ( b k ( Z ( i − 1 ) ) , B k ( Z ( i − 1 ) )) distribution. Classification of each observation y n conditional on knowing Θ ( i ) : 2 P ( z ( i ) nk = 1 | y n , Θ ( i ) ) ∝ π k f N ( y n ; µ k , Σ k ) Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 32 / 170

Estimation: Gibbs sampling Advantages: Relatively easy to implement. Different mixture models differ only in the parameter simulation step. Parameter simulation conditional on the classification is sometimes already available. Disadvantages: Might fail to escape the attraction area of one mode → Not all posterior modes are visited. Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 33 / 170

Selecting the number of components A-priori known Information criteria: e.g. AIC, BIC, ICL: AIC = − 2 log( L ) + 2 p , BIC = − 2 log( L ) + log( N ) p , ICL = − 2 log( L ) + log( N ) p + 2 ent , where p is the number of estimated parameters and ent denotes the mean entropy of the a-posteriori probabilities. Likelihood ratio test statistic: in a ML framework. Comparison of nested models ⇒ Regularity conditions are not fulfilled. Bayes factors in a Bayesian framework. Sampling schemes with a varying number of components in a Bayesian framework. Reversible-jump MCMC. Inclusion of birth-and-death processes. Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 34 / 170

Initialization Construct a suitable parameter vector Θ ( 0 ) . Random. Other estimation methods: e.g. moment estimators. Classify observations/assign a-posteriori probabilities to each observation. Random. Cluster analysis results: e.g. hierarchical clustering, k -means. Use short runs of EM, CEM or SEM with different initializations (Biernacki et al., 2003). Use different subsets of the complete data (Wehrens et al., 2004). Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 35 / 170

Resampling for model diagnostics Testing for the number of components: For example, the likelihood-ratio test using the parametric bootstrap (McLachlan, 1987). Standard errors for the parameter estimates: For example using the parametric bootstrap with initialization in the solution (Basford et al., 1997). Identifiability: For example by testing for unimodality of the component specific parameter estimates using either empirical or parametric bootstrap with random initialization. Stability of induced partitions: For example by comparing the results based on the Rand index correct by chance (Hubert & Arabie, 1985) using either empirical or parametric bootstrap with random initialization. Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 36 / 170

Identifiability Three kinds of identifiability issues: Label switching Overfitting: leads to empty components or components with equal parameters. Generic unidentifiability Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 37 / 170

Label switching The posterior distribution is invariant under a permutation of the components with the same component-specific model. ⇒ Determine a unique labeling for component-specific inference: Impose a suitable ordering constraint, e.g. π s < π t ∀ s , t ∈ { 1 , . . . , S } with s < t . Minimize the distance to the Maximum-A-Posteriori (MAP) estimate. Fix the component membership for some observations. Relabeling algorithms. Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Estimation and inference – 38 / 170

Mixtures of distributions Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 39 / 170

Model definition A finite mixture of distributions is given by K � H ( y | Θ) = π k F k ( y | ϑ k ) , k = 1 where K � π k = 1 π k > 0 ∀ k . and k = 1 In the case of mixtures of distributions sometimes one of the components is assumed to be from a different parametric family, e.g., to model noise. Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 40 / 170

Distributions in the components We assume that we have p -dimensional data with Metric variables. ⇒ Mixtures of multivariate normal distributions Binary / categorical variables. ⇒ Latent class analysis: Assumes that variables within each component are independent. Both variable types. ⇒ Mixed-mode data: Assuming again that variable types are independent within each component allows to separate the two cases (Hunt and Jorgensen, 1988). Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 41 / 170

Generic identifiability Identifiable: (multivariate) normal-, Gamma-, exponential-, Cauchy- and Poisson distribution in the components. Not identifiable: continuous or discrete uniform distribution in the components. Identifiable under certain conditions: Mixture of binomial and multinomial distributions are identifiable if N ≥ 2 K − 1 , where N is the parameter representing the number of trials. See for example Everitt & Hand (1981), Titterington et al. (1985), McLachlan & Peel (2000). Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 42 / 170

Mixtures of normal distributions Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 43 / 170

Univariate normal distributions The density of a finite mixture of univariate normal distributions is given by K � π k f N ( y | µ k , σ 2 h ( y | Θ) = k ) . k = 1 The likelihood is unbounded for degenerate solutions with σ 2 k = 0. ⇒ Add a constraint that σ 2 k > ǫ for all k or 1 π k > ǫ for all k (to achieve in general a similar effect). 2 Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 44 / 170

Univariate normal distributions / 2 Histogram of faithful$waiting 0.05 0.04 0.03 Density 0.02 0.01 0.00 40 50 60 70 80 90 100 faithful$waiting Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 45 / 170

Univariate normal distributions / 3 Histogram of faithful$waiting 0.05 0.04 0.03 Density 0.02 0.01 0.00 40 50 60 70 80 90 100 faithful$waiting Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 46 / 170

Univariate normal distributions / 4 The estimated model has the following parameters: Comp.1 Comp.2 pi 0.64 0.36 mu 80.10 54.63 sigma 5.87 5.89 Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 47 / 170

Multivariate normal distributions For p -dimensional data the number of parameters to estimate is � p + p ( p + 1 ) � K − 1 + K · 2 for unrestricted variance-covariance matrices. Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 48 / 170

Multivariate normal distributions / 2 More parsimonious model specifications are: Restrict the variance-covariance matrices to be the same over components. Specify the structure of the variance-covariance matrix using its decomposition into volume, shape and orientation (Fraley & Raftery, 2002): Σ k = λ k D ⊤ k diag ( a k ) D k . For these three characteristics restrictions can be imposed separately to be the same over components. In addition the orientation can be assumed to be equal to the identity matrix and the shape to be spherical. Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 49 / 170

Multivariate normal distributions / 3 Specify a factor model for the variance-covariance matrix. ⇒ Mixtures of “factor analyzers” (McLachlan & Peel, 2000) ⇒ Analogous restrictions possible to obtain “parsimonious Gaussian mixture models” (McNicholas & Murphy, 2008). Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 50 / 170

Multivariate normal distributions / 4 ● ● ● 8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −5 0 5 10 Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 51 / 170

Multivariate normal distributions / 5 ● ● ● 8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● ● 6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 4 4 ● ● ● ● ● ● 2 3 2 0 −2 −5 0 5 10 Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 52 / 170

Multivariate normal distributions / 6 ● ● ● 8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● ● 6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● ● ● ● ● ● 4 ● ● ● ● ● 3 2 4 2 0 −2 −5 0 5 10 Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 53 / 170

Mixtures of normal distributions: Example Example from Fraley & Raftery (2002), data from package spatstat . Estimation of a density for the occurrence of maple trees. 2-dimensional coordinates for 514 trees observed. Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 54 / 170

Mixtures of normal distributions: Example / 2 1.0 ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● 0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● y ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● ● ● ● ●● ● ● ● ● ● 0.0 0.2 0.4 0.6 0.8 1.0 x Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 55 / 170

Mixtures of normal distributions: Example / 3 Mixtures with 1 to 9 components are fitted with all different restrictions on the variance-covariance matrices. The best model has 7 components and spherical variance-covariance matrices with different volume across components. Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 56 / 170

Mixtures of normal distributions: Example / 4 1.0 1.0 0.8 0.8 2 5 0 . 5 . 2 1 3.5 0.6 0.6 2 . 5 1.5 3 2 1.5 y y 0.5 0.4 0.4 ● ● ● ● ● 2 . 5 ● ● ● ● ● ● ● ● ● ● ● 3 ● ● ● ● 2 . 5 ● ● ● 0.2 ● ● 0.2 ● ● ● ● ● ● ● ● 1.5 ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● 2 2 ● ● 3 3 . ● ● ● ● 5 ● ● ● ● ● ● ● ● 4 ● ● ● ● ● ● ● 2 ● ● ● ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 ● ● ●● ● ●● 1 ● ● ● ● 0.0 ● 0.0 ● 0 . 5 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x x Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 57 / 170

Package mclust Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 58 / 170

Package mclust Written by Chris Fraley, Adrian Raftery, Luca Scrucca, Thomas Brendan Murphy. In particular suitable for mixtures of uni- and multivariate normal distributions. Uses model-based hierarchical clustering for initialization. Variance-covariance matrices can be specified via volume, shape and orientation. Function Mclust() returns the best model according to the BIC. Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 59 / 170

Package mclust: Model specification univariate: "E" : identical variance "V" : different variances multivariate: three letter abbreviation for volume: equal (E) / variable (V) shape: identity (I) / equal (E) / variable (V) orientation: identity (I) / equal (E) / variable (V) "EII" : spherical, same volume "VII" : spherical, different volume "EEI" : diagonal matrix, same volume & shape "VEI" : diagonal matrix, different volume, same shape "EVI" : diagonal matrix, same volume, different shape "VVI" : diagonal matrix, different volume & shape Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 60 / 170

Package mclust: Model specification / 2 "EEE" : ellipsoidal, same volume, shape & orientation "EVE" : ellipsoidal, same volume & orientation "VEE" : ellipsoidal, same shape & orientation "VVE" : ellipsoidal, same orientation "EEV" : ellipsoidal, same volume & shape "VEV" : ellipsoidal, same shape "EVV" : ellipsoidal, same volume "VVV" : ellipsoidal, different volume, shape and orientation Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 61 / 170

Package mclust: Model specification / 3 EII VII EEI VEI ● ● ● ● ● ● ● ● EVI VVI ● ● ● ● Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 62 / 170

Package mclust: Model specification / 4 EEE EVE VEE VVE ● ● ● ● ● ● ● ● EEV VEV EVV VVV ● ● ● ● ● ● ● ● Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 63 / 170

Package mclust: Function Mclust() Mclust() has the following arguments: data: A numeric vector, a numeric matrix or a data frame of observations. Categorical variables are not allowed. If a matrix or a data frame is supplied, the rows correspond to observations and columns to variables. G: An integer vector indicating the number of components for which the model should be fitted. The default is G = 1:9 . modelNames: A character vector specifying the models (with different restrictions on the variance-covariance matrices) to fit. prior: The default is to use no prior. This argument allows to specify conditionally conjugate priors on the mean and the variance-covariance matrices of the components using priorControl() . Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 64 / 170

Package mclust: Function Mclust() / 2 control: A list of control parameters for the EM algorithm. The defaults are obtained by calling emControl() . initialization: A list containing one or several elements of: (1) "hcPairs" , providing the results of the hierarchical clustering, and (2) "subset" , a logical or numeric vector to specify a subset. warn: A logical value to indicate if warnings (e.g., with respect to singularities) are issued. The default is to suppress warnings. . . . : Catches unused arguments. Returns an object of class ‘ Mclust ’. Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 65 / 170

Package mclust: Further methods The fitted object with class ‘ Mclust ’ can be plotted using the associated plot method. The following plots can be generated: Comparison of the BIC values for all estimated models. Scatter plot indicating the classification for 2-dimensional data. 2-dimensional projection of the data with classification. 2-dimensional projection of the data with uncertainty of classification. Estimated density for 1- and 2-dimensional data. Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 66 / 170

Package mclust: Maple trees > data("lansing", package = "spatstat") > maples <- subset(spatstat::as.data.frame.ppp(lansing), + marks == "maple", c("x", "y")) > library("mclust") > maplesMix <- Mclust(maples) > maplesMix ’Mclust’ model object: best model: spherical, varying volume (VII) with 7 compone > par(mfrow = c(2, 2)) > plot(maplesMix, what = "BIC") > plot(maplesMix, what = "classification") > plot(maplesMix, what = "uncertainty") > plot(maplesMix, what = "density") Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 67 / 170

Package mclust: Maple trees / 2 Classification 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● 100 ● 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.6 50 EII EVE ● BIC ● ● VII VEE y EEI VVE 0.4 ● 0 ● ● VEI EEV ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● EVI VEV 0.2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● VVI EVV ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● −100 ● ● ● ● ● ● ● ● ● ● ● ● ● EEE VVV ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● ● ● ●● ● ●● ● ● ● ● 1 2 3 4 5 6 7 8 9 0.0 0.2 0.4 0.6 0.8 1.0 Number of components x Uncertainty log Density Contour Plot 1.0 1.0 −3 −6 − 1 −5 0.8 0.8 −2 1 0 0.6 0.6 1 y y −6 4 −2 0.4 0.4 − −3 − 2 − 3 −1 0.2 0.2 1 1 1 −2 0 0.0 0.0 −1 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x x Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 68 / 170

Latent class analysis Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 69 / 170

Latent class analysis Used for multivariate binary and categorical data. Often only local identifiability ensured. Observed variables are correlated because of the unobserved group structure. ⇒ Within each component variables are assumed to be independent and the multivariate density is given by the product of the univariate densities. Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 70 / 170

Latent class analysis / 2 In general the model can be written as   p K � �  . h ( y | Θ) = f kj ( y j | ϑ kj ) π k  k = 1 j = 1 In latent class analysis the densities f kj () for all k and j are from the same univariate parametric distribution family. For binary variables the Bernoulli distribution is used: f ( y | θ ) = θ y ( 1 − θ ) 1 − y . Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 71 / 170

Latent class analysis: Example Example from Dayton (1999), data from package poLCA . Data from a survey among students about cheating. 319 students were questioned. The students were asked about their behavior during their bachelor studies. The following four questions are used in the following which are about lying to avoid taking an exam, 1 lying to avoid handing a term paper in on time, 2 purchasing a term paper to hand in as their own or obtaining 3 a copy of an exam prior to taking the exam, copying answers during an exam from someone sitting near 4 to them. Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 72 / 170

Latent class analysis: Example / 2 The relative frequencies of agreement with the four questions are: 0.11, 0.12, 0.07, 0.21. First it is assessed if the answers to the different questions are independent given the marginal distributions. The density is given by 4 y j � j ( 1 − θ j ) 1 − y j , h ( y | Θ) = θ j = 1 where θ j is the probability that question j is agreed to. A χ 2 -goodness-of-fit test is conducted: χ 2 = 136 . 3 , p -value < 2 e − 16 . This simple model does not fit the data well. ⇒ Extension to mixture models. Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 73 / 170

Latent class analysis: Example / 3 The finite mixture model is given by   K 4 y j � � kj ( 1 − θ kj ) 1 − y j  , h ( y | Θ) = π k θ  k = 1 j = 1 where θ kj is the probability to agree to question j , if the respondent is from component k . This model is fit with 2 components. The χ 2 -goodness-of-fit test gives χ 2 = 8 . 3 , p -value = 0 . 22 . The null hypothesis that the data is from this model cannot be rejected. Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 74 / 170

Latent class analysis: Example / 4 The size of the first component is 0.16. The estimated parameters for both components are Comp.1 Comp.2 center.LIEEXAM 0.58 0.02 center.LIEPAPER 0.59 0.03 center.FRAUD 0.22 0.04 center.COPYEXAM 0.38 0.18 Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 75 / 170

Package flexmix Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 76 / 170

Software: flexmix (Leisch, 2004; Grün & Leisch, 2008a) The function flexmix() provides the E-step and all data handling. The M-step is supplied by the user similar to glm() families. Multiple independent responses from different families Currently bindings to several GLM families exist (Gaussian, Poisson, Gamma, Binomial) Weighted, hard (CEM) and random (SEM) classification Components with prior probability below a user-specified threshold are automatically removed during iterations. Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 77 / 170

flexmix design Primary goal is extensibility: ideal for trying out new mixture models. No replacement of specialized mixtures like Mclust() , but complement. Usage of S4 classes and methods. Formula-based interface. Multivariate responses: combination of univariate families: assumption of independence (given x ), each response may have its own model formula, i.e., a different set of regressors. multivariate families: if family handles multivariate response directly, then arbitrary multivariate response distributions are possible. Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 78 / 170

Function flexmix() flexmix() takes the following arguments: formula: A symbolic description of the model, which is fitted. The general form is y~x|g , where y is the dependent variable, x the independent variables and g an optional categorical grouping variable for repeated observations. data: An optional data frame used for evaluating the formula. k: Number of components (not necessary if cluster is specified). cluster: Either a matrix with k columns which contains the initial a-posteriori probabilities; or a categorical variable or integer vector indicating the component memberships of observations. model: Object of class ‘ FLXM ’ or a list of these objects. concomitant: Object of class ‘ FLXP ’. control: Object of class ‘ FLXcontrol ’ or a named list. Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 79 / 170

Function flexmix() / 2 Returns an object of class ‘ flexmix ’. Repeated calls of flexmix() with stepFlexmix() : using random initialization. initFlexmix() : combining short with long runs of EM. Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 80 / 170

Controlling the EM algorithm ‘ FLXcontrol ’ for setting the specifications for the EM algorithm: iter.max: maximum number of iterations. minprior: minimum component sizes. verbose: If larger than 0, then flexmix() provides status information each verbose iteration of the EM algorithm. classify: One of “auto”, “weighted”, “CEM” (or “hard”), “SEM” (or “random”). For convenience flexmix() also allows specification of the control argument through a named list, where the names are partially matched, e.g., flexmix(..., control = list(class = "r")) Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 81 / 170

Variants of mixture models Component specific models: FLXMxxx() Model-based clustering: FLXMCxxx() FLXMCmvnorm() FLXMCmvbinary() FLXMCmvpois() . . . Clusterwise regression: FLXMRxxx() FLXMRglm() FLXMRglmfix() FLXMRziglm() . . . Concomitant variable models: FLXPxxx() FLXPconstant() FLXPmultinom() Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 82 / 170

Methods for ‘ flexmix ’ objects show() , summary() : some information on the fitted model. plot() : rootogram of posterior probabilities. refit() : refits an estimated mixture model to obtain the variance-covariance matrix. logLik() , BIC() , . . . : obtain log-likelihood and model fit criteria. parameters() , prior() : obtain component specific model parameters and prior class probabilities / component weights. posterior() , clusters() : obtain a-posteriori probabilities and assignments to the maximum a-posteriori probability. fitted() , predict() : fitted and predicted (component-specific) values. Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 83 / 170

Package flexmix: Cheating Data preparation: > data("cheating", package = "poLCA") > X <- as.data.frame(table(cheating[, 1:4])) > X <- subset(X, Freq > 0) > X[, 1:4] <- sapply(X[,1:4], as.integer) - 1 > head(X) LIEEXAM LIEPAPER FRAUD COPYEXAM Freq 1 0 0 0 0 207 2 1 0 0 0 10 3 0 1 0 0 13 4 1 1 0 0 11 5 0 0 1 0 7 6 1 0 1 0 1 Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 84 / 170

Package flexmix: Cheating / 2 Fit model with flexmix : > set.seed(201107) > FORMULA <- + cbind(LIEEXAM, LIEPAPER, FRAUD, COPYEXAM) ~ 1 > CONTROL <- + list(tolerance = 10^-12, iter.max = 500) > cheatMix <- + stepFlexmix(FORMULA, k = 2, weights = ~ Freq, + model = FLXMCmvbinary(), data = X, + control = CONTROL, nrep = 3) 2 : * * * Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 85 / 170

Package flexmix: Cheating / 3 > cheatMix Call: stepFlexmix(FORMULA, weights = ~Freq, model = FLXMCmvbinary( data = X, control = CONTROL, k = 2, nrep = 3) Cluster sizes: 1 2 54 265 convergence after 314 iterations Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 86 / 170

Package flexmix: Cheating / 4 > summary(cheatMix) Call: stepFlexmix(FORMULA, weights = ~Freq, model = FLXMCmvbinary( data = X, control = CONTROL, k = 2, nrep = 3) prior size post>0 ratio Comp.1 0.161 54 319 0.169 Comp.2 0.839 265 319 0.831 ’log Lik.’ -440 (df=9) AIC: 898.1 BIC: 931.9 > plot(cheatMix) Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 87 / 170

Package flexmix: Cheating / 5 Rootogram of posterior probabilities > 1e−04 0.0 0.2 0.4 0.6 0.8 1.0 Comp. 1 Comp. 2 225 150 75 0 0.0 0.2 0.4 0.6 0.8 1.0 Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of distributions – 88 / 170

Mixtures of regression models Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 89 / 170

Mixtures of regression models aka clusterwise regression Regression models are fitted for each component. ⇒ Weighted ML estimation of linear and generalized linear models required. Heterogeneity between observations with respect to their regression parameters. Random effects can be estimated in a semi-parametric way. Possible models: Mixtures of linear regression models Mixtures of generalized linear regression models Mixtures of linear mixed regression models . . . Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 90 / 170

Mixtures of regression models / 2 The density of a mixture of regression models is given by K � h ( y | x , w , Θ) = π k f k ( y | x , ϑ k ) , k = 1 where K � π k = 1 π k > 0 ∀ k . and k = 1 In the EM algorithm the weighted log-likelihoods of the regression models need to be maximized in the M-step. Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 91 / 170

Generic identifiability Influencing factors: Component distribution (see mixtures of distributions). Model matrix. Repeated observations for each individual / partly labeled observations. Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 92 / 170

Identifiability: Model matrix If only one observation per individual is available, the full rank of the model matrix is not sufficient for identifiability. Linear models: Mixtures of linear models with normally distributed errors are identifiable, if the number of components K is smaller than the minimum number of (feasible) hyper planes which are necessary to cover all covariate points (Hennig, 2000). Generalized linear models: Analogous condition for the linear predictor, additional constraints dependent on the distribution of the dependent variable (in particular for binomial and multinomial logit-models, see Grün & Leisch, 2008b). ⇒ Coverage condition . Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 93 / 170

Identifiability: Repetitions / labels At least one of the hyper planes in the coverage condition needs to cover all repeated / labeled observations where component membership is fixed. Violation of the coverage condition leads to: Intra-component label switching: If the labels are fixed in one covariate point according to some ordering constraint, then labels may switch in other covariate points for different parameterizations of the model. Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 94 / 170

Illustration: Binomial logit models We consider a mixture density h ( y | x , Θ) = π 1 f B ( y | N , ( 1 , x ) β 1 ) + ( 1 − π 1 ) f B ( y | N , ( 1 , x ) β 2 ) with binomial logit models in the components and parameters β 1 = ( − 2 , 4 ) ′ , β 2 = ( 2 , − 4 ) ′ . π 1 = 0 . 5 , Even if N ≥ 3 the mixture is not identifiable if there are only 2 covariate points available. The second solution is then given by π ( 2 ) β ( 2 ) β ( 2 ) = ( − 2 , 0 ) ′ , = ( 2 , 0 ) ′ . = 0 . 5 , 1 1 2 Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 95 / 170

Illustration: Binomial logit models / 2 Simulation design: Number of repetitions N ∈ { 1 , 10 } in the same covariate point. Number of covariate points: # x ∈ { 2 , 5 } , equidistantly spread across [ 0 , 1 ] . 100 samples with 100 observations are drawn from the model for each combination of N and # x . Finite mixtures with 2 components are fitted to each sample. Solutions after imposing an ordering constraint on the intercept are reported. Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 96 / 170

Illustration: Binomial logit models / 3 #x = 5 #x = 5 N = 1 N = 10 ● ● ● ● ● 1.0 ● ● ● ● ● ● ● ● ● 0.8 ● ● ● ● ● ● ● 0.6 ● ● ● 0.4 ● ● ● ● ● ● ● ● ● ● 0.2 ● Relative frequency ● ● ● ● ● ● ● ● ● ● ● 0.0 #x = 2 #x = 2 N = 1 N = 10 ● ● ● ● 1.0 ● ● 0.8 ● ● ● ● 0.6 ● 0.4 ● ● ● 0.2 ● ● ● ● ● 0.0 ● ● 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 x Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 97 / 170

Illustration: Binomial logit models / 4 5% 95% #x = 5 #x = 5 N = 1 N = 10 π x (Intercept) #x = 2 #x = 2 N = 1 N = 10 π x (Intercept) 5% 95% Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 98 / 170

Mixtures of linear regressions: Example ● 20 ● ● 15 ● ● CO2 ● ● ● ● 10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 ● ● ● 10 20 30 40 GNP Bettina Grün � September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R – Mixtures of,regression models – 99 / 170

Flexible Mixture Modeling and Model-Based Clustering in R Bettina - PowerPoint PPT Presentation

Flexible Mixture Modeling and Model-Based Clustering in R Bettina Grn September 2017 c Flexible Mixture Modeling and Model-Based Clustering in R 0 / 170 Outline Bettina Grn September 2017 c Flexible Mixture Modeling and

Lecture 20 Lecture 20 Nov 12 th 2008 Clustering with Mixture of Gaussians Clustering with Mixture

Copula Mixture Model for Dependency-seeking Clustering Melanie Rey, Volker Roth Department of

Model-based clustering of categorical data by relaxing conditional independence M. Marbac 3 , 6 ,

MixtComp software: Model-based clustering/imputation with mixed data, missing data and uncertain

Contents Clustering K-means Mixture of Gaussians Expectation Maximization

Clustering ! Hierarchical methods ! Model-based methods ! Density-based methods 1 2 What is

Clustering: Models and Algorithms Shikui Tu 2019-03-07 1 Outline Gaussian Mixture Models

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Model-based clustering and data transformations of gene expression data Walter L. Ruzzo

EM Algorithm and Mixture Models Guojun Zhang University of Waterloo Unsupervised learning and

On Mixtures of Factor Mixture Analyzers Cinzia Viroli cinzia.viroli@unibo.it Department of

CSC321 Lecture 18: Mixture Modeling Roger Grosse Roger Grosse CSC321 Lecture 18: Mixture

Optimal interval clustering: Application to Bregman clustering and statistical mixture learning

Model-based clustering with mixed/missing data using the new software MixtComp

Week 7 Video 3 Advanced Clustering Algorithms Today Multiple advanced algorithms for

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

S6465: Physics-Based Modeling of Flexible Tires on Deformable Terrain with the GPU Daniel Melanz,

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Clustering: Models and Algorithms Shikui Tu 2019-02-28 1 Outline Clustering K-mean

Machine Learning 2 DS 4420 - Spring 2020 Topic Modeling 1 Byron C. Wallace Last time:

Vine copula mixture models and clustering for non-Gaussian data Statistical Methods in Machine

Bhattacharyya clustering with applications to mixture simplifications ICPR 2010, Istanbul, Turkey

Statistical Modeling Approaches for Statistical Modeling Approaches for Information Retrieval

Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition