Finite mixture models Dr. Jarad Niemi STAT 615 - Iowa State University November 28, 2017 Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 1 / 27
Multinomial distribution Categorical distribution Let Z ∼ Cat ( H, p ) represent a categorical distribution with P ( Z = h ) = p h for h = 1 , . . . , H and � H h =1 p h = 1 . Example: discrete choice model Suppose we have a set of H categories and we label these 1 , . . . , H . Independently, consumers choose one of the H categories with the same ind probability. Then a reasonable model is Z i ∼ Cat ( H, p ) . Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 2 / 27
Multinomial distribution Multinomial distribution If we count the number of times the consumer chose each category, i.e. n � Y h = I( Z i = h ) , i =1 then the result is the multinomial distribution, i.e. Y ∼ Mult ( n, p ) . The multinomial distribution has probability mass function H n ! � p y h p ( y ; n, p ) = h y 1 ! · · · y H ! h =1 which has E [ Y i ] = np i , V [ Y i ] = np i (1 − p i ) , and Cov [ Y i , Y j ] = − np i p j for ( i � = j ) . A special case is H = 2 which is the binomial distribution. Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 3 / 27
Dirichlet distribution Dirichlet distribution The Dirichlet distribution (named after Peter Gustav Lejeune Dirichlet), i.e. P ∼ Dir ( a ) , is a probability distribution for a probability vector of length H . The probability density function for the Dirichlet distribution is H p ( P ; a ) = Γ( a 1 + · · · + a H ) p a h − 1 � h Γ( a 1 ) · · · Γ( a H ) h =1 where p h ≥ 0 , � H h =1 p h = 1 , and a h > 0 . Letting a 0 = � H h =1 a h , then some moments are E [ p h ] = a h a 0 , V [ p h ] = a h ( a 0 − a h ) 0 ( a 0 +1) , a 2 a h a k Cov ( p h , p k ) = − 0 ( a 0 +1) , and a 2 mode ( p h ) = a h − 1 a 0 − H for a h > 1 . A special case is H = 2 which is the beta distribution. Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 4 / 27
Dirichlet distribution Conjugate prior for multinomial distribution Conjugate prior for multinomial distribution The Dirichlet distribution is the natural conjugate prior for the multinomial distribution. If Y ∼ Mult ( n, π ) and π ∼ Dir ( a ) then π | y ∼ Dir ( a + y ) . Some possible default priors are a = 1 which is the uniform density over π , a = 1 / 2 is Jeffreys prior for the multinomial, and a = 0 , an improper prior that is uniform on log( π h ) . The resulting posterior is proper if y h > 0 for all h . Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 5 / 27
Finite mixtures Complicated distributions 10.0 10.0 10.0 10.0 3000 7.5 7.5 7.5 7.5 count 5.0 5.0 2000 5.0 5.0 φ φ φ φ 2.5 2.5 2.5 2.5 1000 0.0 0.0 0.0 0.0 0 −2.5 −2.5 −2.5 −2.5 0 4 8 −1 0 1 −3 −2 −1 0 1 −1 0 1 2 3 φ −2 −1 0 1 2 3 γ ψ α δ 20000 3 3 3 2 2 2 15000 1 1 count 1 α α α 10000 0 0 0 5000 −1 −1 −1 −2 −2 −2 0 −1 0 1 −3 −2 −1 0 1 −2 0 2 −1 0 1 2 3 γ ψ α δ 20000 3 3 15000 2 2 count 10000 δ 1 δ 1 0 0 5000 −1 −1 0 −1 0 1 −3 −2 −1 0 1 −1 0 1 2 3 γ ψ δ 20000 1 15000 count γ 0 10000 5000 −1 0 −1 0 1 −3 −2 −1 0 1 γ ψ 6000 4000 count 2000 0 −3 −2 −1 0 1 ψ Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 6 / 27
Finite mixtures Finite mixtures Let’s focus on modeling the univariate distribution for φ 0.20 0.15 density 0.10 0.05 0.00 0 4 8 phi Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 7 / 27
Finite mixtures Finite mixture Finite mixture A model for the marginal distribution for Y i = φ i is H ind � π h N ( µ h , σ 2 Y i ∼ h ) h =1 where � H h =1 π h = 1 . Alternatively, we can introduce a latent variable ζ i = h if observation i came from group h . Then ind ∼ N ( µ z , σ 2 Y i | ζ i = z z ) ind ζ i ∼ Cat ( H, π ) where ζ ∼ Cat ( H, π ) is a categorical random variable with P ( ζ = h ) = π h for h = 1 , . . . , H and π = ( π 1 , . . . , π H ) . Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 8 / 27
Finite mixtures Finite mixture A possible prior Let’s assume π ∼ Dir ( a ) ind µ h | σ 2 ∼ N ( m h , v 2 h σ 2 h ) h ind σ 2 ∼ IG ( c h , d h ) h and π is independent of µ = ( µ 1 , . . . , µ H ) and σ 2 = ( σ 2 1 , . . . , σ 2 H ) . Commonly, we have m h = m , v h = v , c h = c , and d h = d . If the data have been standardized (scaled and centered), a reasonable default prior is m = 0 , v = 1 , c = 2 , d = 4 , (BDA3 pg 535) and a is 1 /H (BDA3 pg 536). Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 9 / 27
Finite mixtures Finite mixture MCMC The steps of a Gibbs sampler with stationary distribution p ( π, µ, σ 2 , ζ | y ) ∝ p ( y | ζ, µ, σ 2 ) p ( ζ | π ) p ( µ | σ 2 ) p ( σ 2 ) p ( π ) has steps 1. For i = 1 , . . . , n , sample ζ i from its full conditional (as the ζ are conditionally independent across i ): P ( ζ i = h | . . . ) ∝ π h N ( y i ; µ h , σ 2 h ) Jointly sample π and µ, σ 2 as they are conditionally independent. 2. Sample π ∼ Dir ( a + Z ) where n = ( Z 1 , . . . , Z H ) and Z h = � n a. i =1 I( ζ i = h ) . For h = 1 , . . . , H , sample µ h , σ 2 b. h from their full conditional (as these are conditionally independent across h ): ind ind σ 2 ∼ IG ( c ′ h , d ′ µ h | σ 2 ∼ N ( m ′ h , v ′ 2 h σ 2 h ) h ) h h where v ′ 2 = (1 /v 2 h + Z h ) − 1 h m ′ = v ′ 2 h ( m h /v 2 h + Z h y h ) h c ′ = c d + Z h / 2 h � � � � i : ζi = h ( y i − y h ) 2 + Zh d ′ = d h + 1 ( y h − m h ) 2 � h 2 1+ Zh/v 2 h 1 � y h = i : ζi = h y i Zh Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 10 / 27
Finite mixtures JAGS library("rjags") jags_model = " model { for (i in 1:n) { y[i] ~ dnorm(mu[zeta[i]], tau[zeta[i]]) zeta[i] ~ dcat(pi[]) } for (i in 1:H) { mu[i] ~ dnorm(0,1e-5) tau[i] ~ dgamma(1,1) sigma[i] <- 1/sqrt(tau[i]) } pi ~ ddirich(a) } " tmp = hat[sample(nrow(hat), 1000),] dat = list(n=nrow(tmp), H=3, y=tmp$phi, a=rep(1,3)) jm = jags.model(textConnection(jags_model), data = dat, n.chains = 3) r = coda.samples(jm, c('mu','sigma','pi'), 1e3) Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 11 / 27
Finite mixtures JAGS Convergence diagnostics gelman.diag(r) ## Error in chol.default(W): the leading minor of order 6 is not positive definite gelman.diag(r, multivariate=FALSE) ## Potential scale reduction factors: ## ## Point est. Upper C.I. ## mu[1] 21.76 55.54 ## mu[2] 41.55 90.41 ## mu[3] 91.54 222.09 ## pi[1] 11.79 29.52 ## pi[2] 3.46 6.36 ## pi[3] 18.63 40.16 ## sigma[1] 13.70 35.60 ## sigma[2] 4.26 8.87 ## sigma[3] 13.93 37.09 Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 12 / 27
Finite mixtures JAGS Convergence diagnostics (2) plot(r, density=FALSE) Trace of mu[1] Trace of mu[2] Trace of mu[3] 6 6 1 3 2 −2 −2 0 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 Iterations Iterations Iterations Trace of pi[1] Trace of pi[2] Trace of pi[3] 0.4 0.4 0.4 0.1 0.1 0.1 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 Iterations Iterations Iterations Trace of sigma[1] Trace of sigma[2] Trace of sigma[3] 2.5 2.0 2.0 1.0 0.0 0.0 0 200 400 600 800 0 200 400 600 800 0 200 400 600 800 Iterations Iterations Iterations Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 13 / 27
Finite mixtures JAGS Prior distributions The parameters of the model are unidentified due to label-switching, i.e. H H ind h ) d � π h N ( µ h , σ 2 � π h ′ N ( µ h ′ , σ 2 Y i ∼ = h ′ ) h ′ =1 h =1 for some permutation h ′ . One way to resolve this issue is to enforce identifiability in the prior. For example, in one-dimension, we can order the component means: µ 1 < µ 2 < · · · < µ H . To ensure the posterior is proper Maintain proper prior for π Ensure proper prior for ratios of variances (perhaps by ensuring prior is proper for variances themselves) Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 14 / 27
Finite mixtures JAGS Two conditionally conjugate prior options Option 1: H � N ( µ h ; m h , v 2 ) IG ( σ 2 Dir ( π ; a )I( µ 1 < · · · < µ H ) h ; c h , d h ) h h =1 Option 2: 1 � N ( µ h ; m h , v 2 h σ 2 h ) IG ( σ 2 Dir ( π ; a )I( µ 1 < · · · < µ H ) h ; c h , d h ) h =0 Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 15 / 27
Finite mixtures JAGS library("rjags") jags_model = " model { for (i in 1:n) { y[i] ~ dnorm(mu[zeta[i]], tau[zeta[i]]) zeta[i] ~ dcat(pi[]) } for (i in 1:H) { mu0[i] ~ dnorm(0,1e-5) tau[i] ~ dgamma(1,1) sigma[i] <- 1/sqrt(tau[i]) } mu[1:H] <- sort(mu0) pi ~ ddirich(a) } " jm = jags.model(textConnection(jags_model), data = dat, n.chains = 3) r = coda.samples(jm, c('mu','sigma','pi'), 1e3) Jarad Niemi (STAT615@ISU) Finite mixture models November 28, 2017 16 / 27
Recommend
More recommend