minimum message length inference and mixture modelling of
play

Minimum Message Length Inference and Mixture Modelling of Inverse - PowerPoint PPT Presentation

Minimum Message Length Inference and Mixture Modelling of Inverse Gaussian Distributions Daniel F. Schmidt Enes Makalic Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health University of


  1. Minimum Message Length Inference and Mixture Modelling of Inverse Gaussian Distributions Daniel F. Schmidt Enes Makalic Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health University of Melbourne 25th Australasian Joint Conference on Artificial Intelligence 2012 (The University of Melbourne) AI’2012 1 / 19

  2. Mixture Modelling Content Mixture Modelling 1 Problem Description MML Mixture Models MML Inverse Gaussian Distributions 2 Inverse Gaussian Distributions MML Inference of Inverse Gaussians Example 3 (The University of Melbourne) AI’2012 2 / 19

  3. Mixture Modelling Problem Description Problem Description We have n items, each with q associated attributes, formed into a matrix     y 1 y 1 , 1 y 1 , 2 . . . y 1 ,q y 2 y 2 , 1 y 2 , 2 . . . y 2 ,q         Y = = . . . . ...  .   . . .  . . . .         y n y n, 1 y n, 2 . . . y n,q Group together, or “cluster”, similar items A form of unsupervised learning Sometimes called intrinsic classification ⇒ Class labels are learned from the data (The University of Melbourne) AI’2012 3 / 19

  4. Mixture Modelling Problem Description Mixture Modelling (1) Models data as a mixture of probability distributions K � p ( y i,j ; Φ ) = α k p ( y i,j ; θ k,j ) k =1 where K is the number of classes α = ( α 1 , . . . , α K ) are the mixing (population) weights θ k,j are the parameters of the distributions Φ = { K, α , θ 1 , 1 , . . . , θ K,q } denotes the complete mixture model Has an explicit probabilistic form ⇒ allows for statistical interpretion (The University of Melbourne) AI’2012 4 / 19

  5. Mixture Modelling Problem Description Mixture Modelling (2) How is this related to clustering? Each class is a cluster Class-specific probability distributions over each attribute e.g., normal, inverse Gaussian, Poisson, etc. Mixing weight is prevalance of the classes in the population Measure of similarity of item to class q � p k ( y i ) = p ( y i,j ; θ k,j ) j =1 ⇒ probability of item’s attributes under class distributions (The University of Melbourne) AI’2012 5 / 19

  6. Mixture Modelling Problem Description Mixture Modelling (2) How is this related to clustering? Each class is a cluster Class-specific probability distributions over each attribute e.g., normal, inverse Gaussian, Poisson, etc. Mixing weight is prevalance of the classes in the population Measure of similarity of item to class q � p k ( y i ) = p ( y i,j ; θ k,j ) j =1 ⇒ probability of item’s attributes under class distributions (The University of Melbourne) AI’2012 5 / 19

  7. Mixture Modelling Problem Description Mixture Modelling (3) Membership of items to classes is soft α k p k ( y i ) r i,k = � K l =1 α l p l ( y i ) Posterior probability of belonging to class k α k is a priori probability item belongs to class k p k ( y i ) is probability of data item y i under class k ⇒ Assign to class with highest posterior probability Total number of samples in a class is then n � n k = r i,k i =1 (The University of Melbourne) AI’2012 6 / 19

  8. Mixture Modelling Problem Description Mixture Modelling (3) Membership of items to classes is soft α k p k ( y i ) r i,k = � K l =1 α l p l ( y i ) Posterior probability of belonging to class k α k is a priori probability item belongs to class k p k ( y i ) is probability of data item y i under class k ⇒ Assign to class with highest posterior probability Total number of samples in a class is then n � n k = r i,k i =1 (The University of Melbourne) AI’2012 6 / 19

  9. Mixture Modelling MML Mixture Models MML Mixture Models (1) Minimum Message Length goodness-of-fit criterion Popular criterion for mixture modelling Based on the idea of compression Message length of data is our yardstick ; comprised of Length of codeword needed to state model Φ 1 Number of classes: I ( K ) Relative abundances: I ( α ) Parameters for each distribution in each class: I ( θ k,j ) Length of codeword needed to state data, given model: I ( Y | Φ ) 2 (The University of Melbourne) AI’2012 7 / 19

  10. Mixture Modelling MML Mixture Models MML Mixture Models (1) Minimum Message Length goodness-of-fit criterion Popular criterion for mixture modelling Based on the idea of compression Message length of data is our yardstick ; comprised of Length of codeword needed to state model Φ 1 Number of classes: I ( K ) Relative abundances: I ( α ) Parameters for each distribution in each class: I ( θ k,j ) Length of codeword needed to state data, given model: I ( Y | Φ ) 2 (The University of Melbourne) AI’2012 7 / 19

  11. Mixture Modelling MML Mixture Models MML Mixture Models (2) Total message length: K q � � I ( Y , Φ ) = I ( K ) + I ( α ) + I ( θ k,j ) + I ( Y | Φ ) j =1 k =1 ⇒ balances model complexity against model fit Estimate Φ by minimising message length α and ˆ ˆ θ j,k found by expectation-maximisation Find ˆ K by splitting/merging classes (The University of Melbourne) AI’2012 8 / 19

  12. Mixture Modelling MML Mixture Models MML Mixture Models (2) Total message length: K q � � I ( Y , Φ ) = I ( K ) + I ( α ) + I ( θ k,j ) + I ( Y | Φ ) j =1 k =1 ⇒ balances model complexity against model fit Estimate Φ by minimising message length α and ˆ ˆ θ j,k found by expectation-maximisation Find ˆ K by splitting/merging classes (The University of Melbourne) AI’2012 8 / 19

  13. MML Inverse Gaussian Distributions Content Mixture Modelling 1 Problem Description MML Mixture Models MML Inverse Gaussian Distributions 2 Inverse Gaussian Distributions MML Inference of Inverse Gaussians Example 3 (The University of Melbourne) AI’2012 9 / 19

  14. MML Inverse Gaussian Distributions Inverse Gaussian Distributions Inverse Gaussian Distributions (1) Distribution for positive, continuous data We say Y i ∼ IG ( µ, λ ) if p.d.f. for Y i = y i is � 1 � � � − ( y i − µ ) 2 1 2 p ( y i ; µ, λ ) = exp , 2 πλy 3 2 µ 2 λy i i where µ > 0 is the mean parameter λ > 0 is the inverse-shape parameter Suitable for positively skewed data Derive the message length formula for use in mixture modelling (The University of Melbourne) AI’2012 10 / 19

  15. MML Inverse Gaussian Distributions Inverse Gaussian Distributions Inverse Gaussian Distributions (2) Example of inverse Gaussian distributions 2 µ =1, λ =1 1.8 µ =1, λ =3 µ =3, λ =1 1.6 1.4 1.2 p(y; µ , λ ) 1 0.8 0.6 0.4 0.2 0 0 0.5 1 1.5 2 2.5 3 y (The University of Melbourne) AI’2012 11 / 19

  16. MML Inverse Gaussian Distributions MML Inference of Inverse Gaussians MML Inference of Inverse Gaussians (1) Use Wallace–Freeman approximation Bayesian; we chose uninformative priors 1 π ( µ, λ ) ∝ 3 λµ 2 Message length component for use in mixture models √ � � I ( θ k,j ) = log n k − 1 2 2 a j 2 log ˆ λ k,j + log � b j where ˆ λ k,j is the MML estimate of λ for class k and variable j n k is number of samples in class k a j , b j are hyper-parameters Details may be found in the paper (The University of Melbourne) AI’2012 12 / 19

  17. MML Inverse Gaussian Distributions MML Inference of Inverse Gaussians MML Inference of Inverse Gaussians (1) Use Wallace–Freeman approximation Bayesian; we chose uninformative priors 1 π ( µ, λ ) ∝ 3 λµ 2 Message length component for use in mixture models √ � � I ( θ k,j ) = log n k − 1 2 2 a j 2 log ˆ λ k,j + log � b j where ˆ λ k,j is the MML estimate of λ for class k and variable j n k is number of samples in class k a j , b j are hyper-parameters Details may be found in the paper (The University of Melbourne) AI’2012 12 / 19

  18. MML Inverse Gaussian Distributions MML Inference of Inverse Gaussians MML Inference of Inverse Gaussians (2) Let y = ( y 1 , . . . , y n ) be data from an inverse Gaussian Define sufficient statistics n n 1 � � S 1 = S 2 = y i , , y i i =1 i =1 Compare maximum likelihood estimates λ ML = S 1 S 2 − n 2 µ ML = S 1 ˆ ˆ n , nS 1 to minimum message length estimates λ 87 = S 1 S 2 − n 2 µ 87 = S 1 ˆ ˆ n , ( n − 1) S 1 MML estimates: Are Unbiased 1 Strictly dominate ML estimates in terms of KL risk 2 (The University of Melbourne) AI’2012 13 / 19

  19. MML Inverse Gaussian Distributions MML Inference of Inverse Gaussians MML Inference of Inverse Gaussians (2) Let y = ( y 1 , . . . , y n ) be data from an inverse Gaussian Define sufficient statistics n n 1 � � S 1 = S 2 = y i , , y i i =1 i =1 Compare maximum likelihood estimates λ ML = S 1 S 2 − n 2 µ ML = S 1 ˆ ˆ n , nS 1 to minimum message length estimates λ 87 = S 1 S 2 − n 2 µ 87 = S 1 ˆ ˆ n , ( n − 1) S 1 MML estimates: Are Unbiased 1 Strictly dominate ML estimates in terms of KL risk 2 (The University of Melbourne) AI’2012 13 / 19

  20. Example Content Mixture Modelling 1 Problem Description MML Mixture Models MML Inverse Gaussian Distributions 2 Inverse Gaussian Distributions MML Inference of Inverse Gaussians Example 3 (The University of Melbourne) AI’2012 14 / 19

Recommend


More recommend