Minimum Message Length Inference and Mixture Modelling of Inverse Gaussian Distributions Daniel F. Schmidt Enes Makalic Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health University of Melbourne 25th Australasian Joint Conference on Artificial Intelligence 2012 (The University of Melbourne) AI’2012 1 / 19
Mixture Modelling Content Mixture Modelling 1 Problem Description MML Mixture Models MML Inverse Gaussian Distributions 2 Inverse Gaussian Distributions MML Inference of Inverse Gaussians Example 3 (The University of Melbourne) AI’2012 2 / 19
Mixture Modelling Problem Description Problem Description We have n items, each with q associated attributes, formed into a matrix y 1 y 1 , 1 y 1 , 2 . . . y 1 ,q y 2 y 2 , 1 y 2 , 2 . . . y 2 ,q Y = = . . . . ... . . . . . . . . y n y n, 1 y n, 2 . . . y n,q Group together, or “cluster”, similar items A form of unsupervised learning Sometimes called intrinsic classification ⇒ Class labels are learned from the data (The University of Melbourne) AI’2012 3 / 19
Mixture Modelling Problem Description Mixture Modelling (1) Models data as a mixture of probability distributions K � p ( y i,j ; Φ ) = α k p ( y i,j ; θ k,j ) k =1 where K is the number of classes α = ( α 1 , . . . , α K ) are the mixing (population) weights θ k,j are the parameters of the distributions Φ = { K, α , θ 1 , 1 , . . . , θ K,q } denotes the complete mixture model Has an explicit probabilistic form ⇒ allows for statistical interpretion (The University of Melbourne) AI’2012 4 / 19
Mixture Modelling Problem Description Mixture Modelling (2) How is this related to clustering? Each class is a cluster Class-specific probability distributions over each attribute e.g., normal, inverse Gaussian, Poisson, etc. Mixing weight is prevalance of the classes in the population Measure of similarity of item to class q � p k ( y i ) = p ( y i,j ; θ k,j ) j =1 ⇒ probability of item’s attributes under class distributions (The University of Melbourne) AI’2012 5 / 19
Mixture Modelling Problem Description Mixture Modelling (2) How is this related to clustering? Each class is a cluster Class-specific probability distributions over each attribute e.g., normal, inverse Gaussian, Poisson, etc. Mixing weight is prevalance of the classes in the population Measure of similarity of item to class q � p k ( y i ) = p ( y i,j ; θ k,j ) j =1 ⇒ probability of item’s attributes under class distributions (The University of Melbourne) AI’2012 5 / 19
Mixture Modelling Problem Description Mixture Modelling (3) Membership of items to classes is soft α k p k ( y i ) r i,k = � K l =1 α l p l ( y i ) Posterior probability of belonging to class k α k is a priori probability item belongs to class k p k ( y i ) is probability of data item y i under class k ⇒ Assign to class with highest posterior probability Total number of samples in a class is then n � n k = r i,k i =1 (The University of Melbourne) AI’2012 6 / 19
Mixture Modelling Problem Description Mixture Modelling (3) Membership of items to classes is soft α k p k ( y i ) r i,k = � K l =1 α l p l ( y i ) Posterior probability of belonging to class k α k is a priori probability item belongs to class k p k ( y i ) is probability of data item y i under class k ⇒ Assign to class with highest posterior probability Total number of samples in a class is then n � n k = r i,k i =1 (The University of Melbourne) AI’2012 6 / 19
Mixture Modelling MML Mixture Models MML Mixture Models (1) Minimum Message Length goodness-of-fit criterion Popular criterion for mixture modelling Based on the idea of compression Message length of data is our yardstick ; comprised of Length of codeword needed to state model Φ 1 Number of classes: I ( K ) Relative abundances: I ( α ) Parameters for each distribution in each class: I ( θ k,j ) Length of codeword needed to state data, given model: I ( Y | Φ ) 2 (The University of Melbourne) AI’2012 7 / 19
Mixture Modelling MML Mixture Models MML Mixture Models (1) Minimum Message Length goodness-of-fit criterion Popular criterion for mixture modelling Based on the idea of compression Message length of data is our yardstick ; comprised of Length of codeword needed to state model Φ 1 Number of classes: I ( K ) Relative abundances: I ( α ) Parameters for each distribution in each class: I ( θ k,j ) Length of codeword needed to state data, given model: I ( Y | Φ ) 2 (The University of Melbourne) AI’2012 7 / 19
Mixture Modelling MML Mixture Models MML Mixture Models (2) Total message length: K q � � I ( Y , Φ ) = I ( K ) + I ( α ) + I ( θ k,j ) + I ( Y | Φ ) j =1 k =1 ⇒ balances model complexity against model fit Estimate Φ by minimising message length α and ˆ ˆ θ j,k found by expectation-maximisation Find ˆ K by splitting/merging classes (The University of Melbourne) AI’2012 8 / 19
Mixture Modelling MML Mixture Models MML Mixture Models (2) Total message length: K q � � I ( Y , Φ ) = I ( K ) + I ( α ) + I ( θ k,j ) + I ( Y | Φ ) j =1 k =1 ⇒ balances model complexity against model fit Estimate Φ by minimising message length α and ˆ ˆ θ j,k found by expectation-maximisation Find ˆ K by splitting/merging classes (The University of Melbourne) AI’2012 8 / 19
MML Inverse Gaussian Distributions Content Mixture Modelling 1 Problem Description MML Mixture Models MML Inverse Gaussian Distributions 2 Inverse Gaussian Distributions MML Inference of Inverse Gaussians Example 3 (The University of Melbourne) AI’2012 9 / 19
MML Inverse Gaussian Distributions Inverse Gaussian Distributions Inverse Gaussian Distributions (1) Distribution for positive, continuous data We say Y i ∼ IG ( µ, λ ) if p.d.f. for Y i = y i is � 1 � � � − ( y i − µ ) 2 1 2 p ( y i ; µ, λ ) = exp , 2 πλy 3 2 µ 2 λy i i where µ > 0 is the mean parameter λ > 0 is the inverse-shape parameter Suitable for positively skewed data Derive the message length formula for use in mixture modelling (The University of Melbourne) AI’2012 10 / 19
MML Inverse Gaussian Distributions Inverse Gaussian Distributions Inverse Gaussian Distributions (2) Example of inverse Gaussian distributions 2 µ =1, λ =1 1.8 µ =1, λ =3 µ =3, λ =1 1.6 1.4 1.2 p(y; µ , λ ) 1 0.8 0.6 0.4 0.2 0 0 0.5 1 1.5 2 2.5 3 y (The University of Melbourne) AI’2012 11 / 19
MML Inverse Gaussian Distributions MML Inference of Inverse Gaussians MML Inference of Inverse Gaussians (1) Use Wallace–Freeman approximation Bayesian; we chose uninformative priors 1 π ( µ, λ ) ∝ 3 λµ 2 Message length component for use in mixture models √ � � I ( θ k,j ) = log n k − 1 2 2 a j 2 log ˆ λ k,j + log � b j where ˆ λ k,j is the MML estimate of λ for class k and variable j n k is number of samples in class k a j , b j are hyper-parameters Details may be found in the paper (The University of Melbourne) AI’2012 12 / 19
MML Inverse Gaussian Distributions MML Inference of Inverse Gaussians MML Inference of Inverse Gaussians (1) Use Wallace–Freeman approximation Bayesian; we chose uninformative priors 1 π ( µ, λ ) ∝ 3 λµ 2 Message length component for use in mixture models √ � � I ( θ k,j ) = log n k − 1 2 2 a j 2 log ˆ λ k,j + log � b j where ˆ λ k,j is the MML estimate of λ for class k and variable j n k is number of samples in class k a j , b j are hyper-parameters Details may be found in the paper (The University of Melbourne) AI’2012 12 / 19
MML Inverse Gaussian Distributions MML Inference of Inverse Gaussians MML Inference of Inverse Gaussians (2) Let y = ( y 1 , . . . , y n ) be data from an inverse Gaussian Define sufficient statistics n n 1 � � S 1 = S 2 = y i , , y i i =1 i =1 Compare maximum likelihood estimates λ ML = S 1 S 2 − n 2 µ ML = S 1 ˆ ˆ n , nS 1 to minimum message length estimates λ 87 = S 1 S 2 − n 2 µ 87 = S 1 ˆ ˆ n , ( n − 1) S 1 MML estimates: Are Unbiased 1 Strictly dominate ML estimates in terms of KL risk 2 (The University of Melbourne) AI’2012 13 / 19
MML Inverse Gaussian Distributions MML Inference of Inverse Gaussians MML Inference of Inverse Gaussians (2) Let y = ( y 1 , . . . , y n ) be data from an inverse Gaussian Define sufficient statistics n n 1 � � S 1 = S 2 = y i , , y i i =1 i =1 Compare maximum likelihood estimates λ ML = S 1 S 2 − n 2 µ ML = S 1 ˆ ˆ n , nS 1 to minimum message length estimates λ 87 = S 1 S 2 − n 2 µ 87 = S 1 ˆ ˆ n , ( n − 1) S 1 MML estimates: Are Unbiased 1 Strictly dominate ML estimates in terms of KL risk 2 (The University of Melbourne) AI’2012 13 / 19
Example Content Mixture Modelling 1 Problem Description MML Mixture Models MML Inverse Gaussian Distributions 2 Inverse Gaussian Distributions MML Inference of Inverse Gaussians Example 3 (The University of Melbourne) AI’2012 14 / 19
Recommend
More recommend