Finite mixture models Dr. Jarad Niemi STAT 615 - Iowa State - PowerPoint PPT Presentation

Finite mixture models Dr. Jarad Niemi STAT 615 - Iowa State University November 28, 2017 Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 1 / 27

Multinomial distribution Categorical distribution Let Z ∼ Cat ( H, p ) represent a categorical distribution with P ( Z = h ) = p h for h = 1 , . . . , H and � H h =1 p h = 1 . Example: discrete choice model Suppose we have a set of H categories and we label these 1 , . . . , H . Independently, consumers choose one of the H categories with the same ind probability. Then a reasonable model is Z i ∼ Cat ( H, p ) . Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 2 / 27

Multinomial distribution Multinomial distribution If we count the number of times the consumer chose each category, i.e. n � Y h = I( Z i = h ) , i =1 then the result is the multinomial distribution, i.e. Y ∼ Mult ( n, p ) . The multinomial distribution has probability mass function H n ! � p y h p ( y ; n, p ) = h y 1 ! · · · y H ! h =1 which has E [ Y i ] = np i , V [ Y i ] = np i (1 − p i ) , and Cov [ Y i , Y j ] = − np i p j for ( i � = j ) . A special case is H = 2 which is the binomial distribution. Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 3 / 27

Dirichlet distribution Dirichlet distribution The Dirichlet distribution (named after Peter Gustav Lejeune Dirichlet), i.e. P ∼ Dir ( a ) , is a probability distribution for a probability vector of length H . The probability density function for the Dirichlet distribution is H p ( P ; a ) = Γ( a 1 + · · · + a H ) p a h − 1 � h Γ( a 1 ) · · · Γ( a H ) h =1 where p h ≥ 0 , � H h =1 p h = 1 , and a h > 0 . Letting a 0 = � H h =1 a h , then some moments are E [ p h ] = a h a 0 , V [ p h ] = a h ( a 0 − a h ) 0 ( a 0 +1) , a 2 a h a k Cov ( p h , p k ) = − 0 ( a 0 +1) , and a 2 mode ( p h ) = a h − 1 a 0 − H for a h > 1 . A special case is H = 2 which is the beta distribution. Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 4 / 27

Dirichlet distribution Conjugate prior for multinomial distribution Conjugate prior for multinomial distribution The Dirichlet distribution is the natural conjugate prior for the multinomial distribution. If Y ∼ Mult ( n, π ) and π ∼ Dir ( a ) then π | y ∼ Dir ( a + y ) . Some possible default priors are a = 1 which is the uniform density over π , a = 1 / 2 is Jeffreys prior for the multinomial, and a = 0 , an improper prior that is uniform on log( π h ) . The resulting posterior is proper if y h > 0 for all h . Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 5 / 27

Finite mixtures Complicated distributions 10.0 10.0 10.0 10.0 3000 7.5 7.5 7.5 7.5 count 5.0 5.0 2000 5.0 5.0 φ φ φ φ 2.5 2.5 2.5 2.5 1000 0.0 0.0 0.0 0.0 0 −2.5 −2.5 −2.5 −2.5 0 4 8 −1 0 1 −3 −2 −1 0 1 −1 0 1 2 3 φ −2 −1 0 1 2 3 γ ψ α δ 20000 3 3 3 2 2 2 15000 1 1 count 1 α α α 10000 0 0 0 5000 −1 −1 −1 −2 −2 −2 0 −1 0 1 −3 −2 −1 0 1 −2 0 2 −1 0 1 2 3 γ ψ α δ 20000 3 3 15000 2 2 count 10000 δ 1 δ 1 0 0 5000 −1 −1 0 −1 0 1 −3 −2 −1 0 1 −1 0 1 2 3 γ ψ δ 20000 1 15000 count γ 0 10000 5000 −1 0 −1 0 1 −3 −2 −1 0 1 γ ψ 6000 4000 count 2000 0 −3 −2 −1 0 1 ψ Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 6 / 27

Finite mixtures Finite mixtures Let’s focus on modeling the univariate distribution for φ 0.20 0.15 density 0.10 0.05 0.00 0 4 8 phi Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 7 / 27

Finite mixtures Finite mixture Finite mixture A model for the marginal distribution for Y i = φ i is H ind � π h N ( µ h , σ 2 Y i ∼ h ) h =1 where � H h =1 π h = 1 . Alternatively, we can introduce a latent variable ζ i = h if observation i came from group h . Then ind ∼ N ( µ z , σ 2 Y i | ζ i = z z ) ind ζ i ∼ Cat ( H, π ) where ζ ∼ Cat ( H, π ) is a categorical random variable with P ( ζ = h ) = π h for h = 1 , . . . , H and π = ( π 1 , . . . , π H ) . Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 8 / 27

Finite mixtures Finite mixture A possible prior Let’s assume π ∼ Dir ( a ) ind µ h | σ 2 ∼ N ( m h , v 2 h σ 2 h ) h ind σ 2 ∼ IG ( c h , d h ) h and π is independent of µ = ( µ 1 , . . . , µ H ) and σ 2 = ( σ 2 1 , . . . , σ 2 H ) . Commonly, we have m h = m , v h = v , c h = c , and d h = d . If the data have been standardized (scaled and centered), a reasonable default prior is m = 0 , v = 1 , c = 2 , d = 4 , (BDA3 pg 535) and a is 1 /H (BDA3 pg 536). Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 9 / 27

Finite mixtures Finite mixture MCMC The steps of a Gibbs sampler with stationary distribution p ( π, µ, σ 2 , ζ | y ) ∝ p ( y | ζ, µ, σ 2 ) p ( ζ | π ) p ( µ | σ 2 ) p ( σ 2 ) p ( π ) has steps 1. For i = 1 , . . . , n , sample ζ i from its full conditional (as the ζ are conditionally independent across i ): P ( ζ i = h | . . . ) ∝ π h N ( y i ; µ h , σ 2 h ) Jointly sample π and µ, σ 2 as they are conditionally independent. 2. Sample π ∼ Dir ( a + Z ) where n = ( Z 1 , . . . , Z H ) and Z h = � n a. i =1 I( ζ i = h ) . For h = 1 , . . . , H , sample µ h , σ 2 b. h from their full conditional (as these are conditionally independent across h ): ind ind σ 2 ∼ IG ( c ′ h , d ′ µ h | σ 2 ∼ N ( m ′ h , v ′ 2 h σ 2 h ) h ) h h where v ′ 2 = (1 /v 2 h + Z h ) − 1 h m ′ = v ′ 2 h ( m h /v 2 h + Z h y h ) h c ′ = c d + Z h / 2 h � � � � i : ζi = h ( y i − y h ) 2 + Zh d ′ = d h + 1 ( y h − m h ) 2 � h 2 1+ Zh/v 2 h 1 � y h = i : ζi = h y i Zh Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 10 / 27

Finite mixtures JAGS library("rjags") jags_model = " model { for (i in 1:n) { y[i] ~ dnorm(mu[zeta[i]], tau[zeta[i]]) zeta[i] ~ dcat(pi[]) } for (i in 1:H) { mu[i] ~ dnorm(0,1e-5) tau[i] ~ dgamma(1,1) sigma[i] <- 1/sqrt(tau[i]) } pi ~ ddirich(a) } " tmp = hat[sample(nrow(hat), 1000),] dat = list(n=nrow(tmp), H=3, y=tmp$phi, a=rep(1,3)) jm = jags.model(textConnection(jags_model), data = dat, n.chains = 3) r = coda.samples(jm, c('mu','sigma','pi'), 1e3) Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 11 / 27

Finite mixtures JAGS Convergence diagnostics gelman.diag(r) ## Error in chol.default(W): the leading minor of order 6 is not positive definite gelman.diag(r, multivariate=FALSE) ## Potential scale reduction factors: ## ## Point est. Upper C.I. ## mu[1] 21.76 55.54 ## mu[2] 41.55 90.41 ## mu[3] 91.54 222.09 ## pi[1] 11.79 29.52 ## pi[2] 3.46 6.36 ## pi[3] 18.63 40.16 ## sigma[1] 13.70 35.60 ## sigma[2] 4.26 8.87 ## sigma[3] 13.93 37.09 Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 12 / 27

Finite mixtures JAGS Convergence diagnostics (2) plot(r, density=FALSE) Trace of mu[1] Trace of mu[2] Trace of mu[3] 6 6 1 3 2 −2 −2 0 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 Iterations Iterations Iterations Trace of pi[1] Trace of pi[2] Trace of pi[3] 0.4 0.4 0.4 0.1 0.1 0.1 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 Iterations Iterations Iterations Trace of sigma[1] Trace of sigma[2] Trace of sigma[3] 2.5 2.0 2.0 1.0 0.0 0.0 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 Iterations Iterations Iterations Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 13 / 27

Finite mixtures JAGS Prior distributions The parameters of the model are unidentified due to label-switching, i.e. H H ind h ) d � π h N ( µ h , σ 2 � π h ′ N ( µ h ′ , σ 2 Y i ∼ = h ′ ) h ′ =1 h =1 for some permutation h ′ . One way to resolve this issue is to enforce identifiability in the prior. For example, in one-dimension, we can order the component means: µ 1 < µ 2 < · · · < µ H . To ensure the posterior is proper Maintain proper prior for π Ensure proper prior for ratios of variances (perhaps by ensuring prior is proper for variances themselves) Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 14 / 27

Finite mixtures JAGS Two conditionally conjugate prior options Option 1: H � N ( µ h ; m h , v 2 ) IG ( σ 2 Dir ( π ; a )I( µ 1 < · · · < µ H ) h ; c h , d h ) h h =1 Option 2: 1 � N ( µ h ; m h , v 2 h σ 2 h ) IG ( σ 2 Dir ( π ; a )I( µ 1 < · · · < µ H ) h ; c h , d h ) h =0 Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 15 / 27

Finite mixtures JAGS library("rjags") jags_model = " model { for (i in 1:n) { y[i] ~ dnorm(mu[zeta[i]], tau[zeta[i]]) zeta[i] ~ dcat(pi[]) } for (i in 1:H) { mu0[i] ~ dnorm(0,1e-5) tau[i] ~ dgamma(1,1) sigma[i] <- 1/sqrt(tau[i]) } mu[1:H] <- sort(mu0) pi ~ ddirich(a) } " jm = jags.model(textConnection(jags_model), data = dat, n.chains = 3) r = coda.samples(jm, c('mu','sigma','pi'), 1e3) Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 16 / 27

Finite mixture models Dr. Jarad Niemi STAT 615 - Iowa State - PowerPoint PPT Presentation

Finite mixture models Dr. Jarad Niemi STAT 615 - Iowa State University November 28, 2017 Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 1 / 27 Multinomial distribution Categorical distribution Let Z Cat ( H, p )

Bernoulli Mixture Models Victor Medina Researcher at SBIF DataCamp Mixture Models in R The

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

AND MACHINE LEARNING CHAPTER 10: MIXTURE MODELS AND EM Mixture Models - Define a joint

Gaussian Mixture Models & EM CE-717: Machine Learning Sharif University of Technology M.

Deep Gaussian Mixture Models Cinzia Viroli (University of Bologna, Italy) joint with Geoff

Classification of High Dimensional Data By Two-way Mixture Models Jia Li Statistics Department

Finite A to B implies |A| = |B| Cardinality for finite A, B finite-card .1 finite-card .2

Vine copula mixture models and clustering for non-Gaussian data Statistical Methods in Machine

CSci 8980: Advanced Topics in Graphical Models Mixture Models, EM, Exponential Families

Solutions Unit 6 1 Solutions Homogenous Mixture (Solution) two or more substances mixed

Mixture Selection, Mechanism Design, and Signaling Ho Yee Cheung Shaddin Dughmi Yu Cheng Ehsan

Constrained Mixture Estimation for Constrained Mixture Estimation Analysis and Robust

Binary liquid mixture of EmimBF 4 and methoxyethanol Binary liquid mixture excess molar volume

CSC321 Lecture 18: Mixture Modeling Roger Grosse Roger Grosse CSC321 Lecture 18: Mixture

MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Charles Martin SO FAR; RNNS THAT MODEL

Lecture 20 Lecture 20 Nov 12 th 2008 Clustering with Mixture of Gaussians Clustering with Mixture

Multivariate GLMs Author: Nicholas Reich, transcribed by Kate Hoff Shutta and Herb Susmann

Outline Storage local/mounted on Compute Elements $OSG_APP, $OSG_WN_TMP, $OSG_DATA

Matching skills needs with skills reserves: Protecting workers and communities for a Just

Introduction to Reinforcement Learning Finale Doshi-Velez Harvard University Buenos Aires MLSS

Explainable Neural Computation via Stack Neural Module Networks (July, 2018) Ronghang Hu, Jacob

Some Discrete Distribution Families Many families of discrete distributions have been studied; we

Discrete Mathematics & Mathematical Reasoning Chapter 7: Discrete Probability Kousha

12/1/2019 Department of Veterinary and Animal Sciences Department of Veterinary and Animal

Sambuz

Useful Links

Newsletter

Mail Us

Finite mixture models Dr. Jarad Niemi STAT 615 - Iowa State - PowerPoint PPT Presentation

Finite mixture models Dr. Jarad Niemi STAT 615 - Iowa State University November 28, 2017 Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 1 / 27 Multinomial distribution Categorical distribution Let Z Cat ( H, p )

Bernoulli Mixture Models Victor Medina Researcher at SBIF DataCamp Mixture Models in R The

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

AND MACHINE LEARNING CHAPTER 10: MIXTURE MODELS AND EM Mixture Models - Define a joint

Gaussian Mixture Models &amp; EM CE-717: Machine Learning Sharif University of Technology M.

Deep Gaussian Mixture Models Cinzia Viroli (University of Bologna, Italy) joint with Geoff

Classification of High Dimensional Data By Two-way Mixture Models Jia Li Statistics Department

Finite A to B implies |A| = |B| Cardinality for finite A, B finite-card .1 finite-card .2

Vine copula mixture models and clustering for non-Gaussian data Statistical Methods in Machine

CSci 8980: Advanced Topics in Graphical Models Mixture Models, EM, Exponential Families

Solutions Unit 6 1 Solutions Homogenous Mixture (Solution) two or more substances mixed

Mixture Selection, Mechanism Design, and Signaling Ho Yee Cheung Shaddin Dughmi Yu Cheng Ehsan

Constrained Mixture Estimation for Constrained Mixture Estimation Analysis and Robust

Binary liquid mixture of EmimBF 4 and methoxyethanol Binary liquid mixture excess molar volume

CSC321 Lecture 18: Mixture Modeling Roger Grosse Roger Grosse CSC321 Lecture 18: Mixture

MIXTURE DENSITY NETWORKS MIXTURE DENSITY NETWORKS Charles Martin SO FAR; RNNS THAT MODEL

Lecture 20 Lecture 20 Nov 12 th 2008 Clustering with Mixture of Gaussians Clustering with Mixture

Multivariate GLMs Author: Nicholas Reich, transcribed by Kate Hoff Shutta and Herb Susmann

Outline Storage local/mounted on Compute Elements $OSG_APP, $OSG_WN_TMP, $OSG_DATA

Matching skills needs with skills reserves: Protecting workers and communities for a Just

Introduction to Reinforcement Learning Finale Doshi-Velez Harvard University Buenos Aires MLSS

Explainable Neural Computation via Stack Neural Module Networks (July, 2018) Ronghang Hu, Jacob

Some Discrete Distribution Families Many families of discrete distributions have been studied; we

Discrete Mathematics &amp; Mathematical Reasoning Chapter 7: Discrete Probability Kousha

12/1/2019 Department of Veterinary and Animal Sciences Department of Veterinary and Animal

Sambuz

Useful Links

Newsletter

Mail Us

Gaussian Mixture Models & EM CE-717: Machine Learning Sharif University of Technology M.

Discrete Mathematics & Mathematical Reasoning Chapter 7: Discrete Probability Kousha