Minimum Message Length Inference and Mixture Modelling of Inverse - PowerPoint PPT Presentation

Minimum Message Length Inference and Mixture Modelling of Inverse Gaussian Distributions Daniel F. Schmidt Enes Makalic Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health University of Melbourne 25th Australasian Joint Conference on Artificial Intelligence 2012 (The University of Melbourne) AI’2012 1 / 19

Mixture Modelling Content Mixture Modelling 1 Problem Description MML Mixture Models MML Inverse Gaussian Distributions 2 Inverse Gaussian Distributions MML Inference of Inverse Gaussians Example 3 (The University of Melbourne) AI’2012 2 / 19

Mixture Modelling Problem Description Problem Description We have n items, each with q associated attributes, formed into a matrix     y 1 y 1 , 1 y 1 , 2 . . . y 1 ,q y 2 y 2 , 1 y 2 , 2 . . . y 2 ,q         Y = = . . . . ...  .   . . .  . . . .         y n y n, 1 y n, 2 . . . y n,q Group together, or “cluster”, similar items A form of unsupervised learning Sometimes called intrinsic classification ⇒ Class labels are learned from the data (The University of Melbourne) AI’2012 3 / 19

Mixture Modelling Problem Description Mixture Modelling (1) Models data as a mixture of probability distributions K � p ( y i,j ; Φ ) = α k p ( y i,j ; θ k,j ) k =1 where K is the number of classes α = ( α 1 , . . . , α K ) are the mixing (population) weights θ k,j are the parameters of the distributions Φ = { K, α , θ 1 , 1 , . . . , θ K,q } denotes the complete mixture model Has an explicit probabilistic form ⇒ allows for statistical interpretion (The University of Melbourne) AI’2012 4 / 19

Mixture Modelling Problem Description Mixture Modelling (2) How is this related to clustering? Each class is a cluster Class-specific probability distributions over each attribute e.g., normal, inverse Gaussian, Poisson, etc. Mixing weight is prevalance of the classes in the population Measure of similarity of item to class q � p k ( y i ) = p ( y i,j ; θ k,j ) j =1 ⇒ probability of item’s attributes under class distributions (The University of Melbourne) AI’2012 5 / 19

Mixture Modelling Problem Description Mixture Modelling (3) Membership of items to classes is soft α k p k ( y i ) r i,k = � K l =1 α l p l ( y i ) Posterior probability of belonging to class k α k is a priori probability item belongs to class k p k ( y i ) is probability of data item y i under class k ⇒ Assign to class with highest posterior probability Total number of samples in a class is then n � n k = r i,k i =1 (The University of Melbourne) AI’2012 6 / 19

Mixture Modelling MML Mixture Models MML Mixture Models (1) Minimum Message Length goodness-of-fit criterion Popular criterion for mixture modelling Based on the idea of compression Message length of data is our yardstick ; comprised of Length of codeword needed to state model Φ 1 Number of classes: I ( K ) Relative abundances: I ( α ) Parameters for each distribution in each class: I ( θ k,j ) Length of codeword needed to state data, given model: I ( Y | Φ ) 2 (The University of Melbourne) AI’2012 7 / 19

Mixture Modelling MML Mixture Models MML Mixture Models (2) Total message length: K q � � I ( Y , Φ ) = I ( K ) + I ( α ) + I ( θ k,j ) + I ( Y | Φ ) j =1 k =1 ⇒ balances model complexity against model fit Estimate Φ by minimising message length α and ˆ ˆ θ j,k found by expectation-maximisation Find ˆ K by splitting/merging classes (The University of Melbourne) AI’2012 8 / 19

MML Inverse Gaussian Distributions Content Mixture Modelling 1 Problem Description MML Mixture Models MML Inverse Gaussian Distributions 2 Inverse Gaussian Distributions MML Inference of Inverse Gaussians Example 3 (The University of Melbourne) AI’2012 9 / 19

MML Inverse Gaussian Distributions Inverse Gaussian Distributions Inverse Gaussian Distributions (1) Distribution for positive, continuous data We say Y i ∼ IG ( µ, λ ) if p.d.f. for Y i = y i is � 1 � � � − ( y i − µ ) 2 1 2 p ( y i ; µ, λ ) = exp , 2 πλy 3 2 µ 2 λy i i where µ > 0 is the mean parameter λ > 0 is the inverse-shape parameter Suitable for positively skewed data Derive the message length formula for use in mixture modelling (The University of Melbourne) AI’2012 10 / 19

MML Inverse Gaussian Distributions Inverse Gaussian Distributions Inverse Gaussian Distributions (2) Example of inverse Gaussian distributions 2 µ =1, λ =1 1.8 µ =1, λ =3 µ =3, λ =1 1.6 1.4 1.2 p(y; µ , λ ) 1 0.8 0.6 0.4 0.2 0 0 0.5 1 1.5 2 2.5 3 y (The University of Melbourne) AI’2012 11 / 19

MML Inverse Gaussian Distributions MML Inference of Inverse Gaussians MML Inference of Inverse Gaussians (1) Use Wallace–Freeman approximation Bayesian; we chose uninformative priors 1 π ( µ, λ ) ∝ 3 λµ 2 Message length component for use in mixture models √ � � I ( θ k,j ) = log n k − 1 2 2 a j 2 log ˆ λ k,j + log � b j where ˆ λ k,j is the MML estimate of λ for class k and variable j n k is number of samples in class k a j , b j are hyper-parameters Details may be found in the paper (The University of Melbourne) AI’2012 12 / 19

MML Inverse Gaussian Distributions MML Inference of Inverse Gaussians MML Inference of Inverse Gaussians (2) Let y = ( y 1 , . . . , y n ) be data from an inverse Gaussian Define sufficient statistics n n 1 � � S 1 = S 2 = y i , , y i i =1 i =1 Compare maximum likelihood estimates λ ML = S 1 S 2 − n 2 µ ML = S 1 ˆ ˆ n , nS 1 to minimum message length estimates λ 87 = S 1 S 2 − n 2 µ 87 = S 1 ˆ ˆ n , ( n − 1) S 1 MML estimates: Are Unbiased 1 Strictly dominate ML estimates in terms of KL risk 2 (The University of Melbourne) AI’2012 13 / 19

Example Content Mixture Modelling 1 Problem Description MML Mixture Models MML Inverse Gaussian Distributions 2 Inverse Gaussian Distributions MML Inference of Inverse Gaussians Example 3 (The University of Melbourne) AI’2012 14 / 19

Minimum Message Length Inference and Mixture Modelling of Inverse - PowerPoint PPT Presentation

Minimum Message Length Inference and Mixture Modelling of Inverse Gaussian Distributions Daniel F. Schmidt Enes Makalic Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health University of

Bernoulli Mixture Models Victor Medina Researcher at SBIF DataCamp Mixture Models in R The

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

On the minimum rank of a graph Jisu Jeong June 21, 2013 Jisu Jeong On the minimum rank of a

GSM Short Message Service GSM Short Message Service GSM Short Message Service GSM Short Message

COMP31212: Concurrency Topics 4.3: Message Passing Topic 4.3: Message Passing Outline Topic

Govt. of Gujarat Gujarat Coastline Zone Accretion Erosion length Stable length Total length

DATA MINING LECTURE 10 Minimum Description Length Information Theory Co-Clustering MINIMUM

Lecture Notes: Message Management 1 Slide 1: Message Management Message Management A critical

ROI HSP Design Clarification Recipient ID in Message must match the physical Message recipient

Web Engineering HTTP-message = Request | Response generic-message = start-line *message-header

References Message Authentication Codes (MACs) Message Authentication Codes (MACs), Chapter

Message Passing Concepts Message Passing Model The message passing model is based on the

For Friday Read Chapter 10, sections 1 and 2 Prolog Handout 4 Length of a List

Verification of Security Protocols with Lists: from Length One to Unbounded Length Miriam Paiola

Minimum Description Length Bono Nonchev Principle in Model Selection Information Theory The

The Minimum Description Length Principle Peter Grnwald CWI Amsterdam www.grunwald.nl (slides

Dimensions By Multiple Line Arrays Matthew N. Montag April 15, 2011 Presentation Outline

Identifying Heat Transfer Regimes by Acoustic Analysis in Pool and Flow Boiling Do Yeong Lim, Ji

STAR-CCM+ Anders Tenstam Volvo Technology AB Anders Tenstam, Volvo Technology AB STAR European

Reconfiguration Overhead in Dynamic Task-Based Implementations on FPGAs Padmini Nagaraj UCB,

Antitrust Notice The Casualty Actuarial Society is committed to adhering strictly to the

Inverse Synthetic Array Reconciliation Tomography Performance Presented by: Andrew Cavanaugh

A Class of Pleasing Periodic Designs Travis Clohessy and Kenneth Gibson December 14, 2006

Uncertainty Propagation in Linear Systems: An Exact Solution Using random Matrix Theory S