estimating gaussian mixture models from data with missing
play

Estimating Gaussian Mixture Models from Data with Missing Features - PowerPoint PPT Presentation

Estimating Gaussian Mixture Models from Data with Missing Features by Daniel McMichael CSSIP Missing Data In classification we frequently see to classify objects using vectors of measured features. Sometimes these features are missing: [ 0


  1. Estimating Gaussian Mixture Models from Data with Missing Features by Daniel McMichael CSSIP

  2. Missing Data In classification we frequently see to classify objects using vectors of measured features. Sometimes these features are missing: [ 0 : 5 0 : 7 0 : 4 X 0 : 2 X 0 : 8 x = ]

  3. Gaussian Mixture Models (GMMs) A probability density model (a weighted sum of Gaussians): � 1 n i exp ) = 2 T � f� ( y � � ) � ( y � � g (1) X i i n p ( y jf � ; � ; � g ) = i = 1 i i i j 2 p � � j i = 1 i : Qualities of GMMs: � Can model any density (given enough components) � Can be applied to classification � Widely used � “Easy” to analyse

  4. Heteroscedastic GMMs (HGMMs) Conventionally, GMMs are homoscedastic: all data are modelled with the same Gaussian distribution: n (2) X p ( y j � ; � ; � ) = � p ( y j � ; � ) j i i i i i i i = 1 Introduce a heteroscedastic variant, where the response to each datum is different: n (3) X T p ( y j y ; M ; � ; � ; � ) = � p ( y j y + M � ; M � M + � ) j j i i i i j i j i y j j j j i = 1

  5. Uses for HGMMs � estimation of GMM parameters from data with missing features; � estimation and prediction of indirectly observed mixture processes; � modelling heteroscedastic data. Need only a simplified HGMM: n (4) X T p ( y j �) = � p ( y j M � ; M � M ) j i j j i j i j i = 1 The gain matrices of each of the N data, = 1 , contain only 1s and 0s , N f M g j j and are formed by deleting the rows from an identity matrix corresponding to the missing features of each datum. This is the marginal distribution of the remaining features.

  6. The EM Algorithm [Refer: Dempster, Laird and Rubin, 1977 ] Aim: to maximise a likelihood or posterior, over the parameters � , whilst integrating out the nuisance parameters Z . To maximise the likelihood ) , start with the guess, � : p ( Y j � ; Z � = � � E-step: R Q (� j � ) = p ( Y j � ; Z ) p ( Z j Y ; � ) d Z � � = arg max � M-step: � Q (� j � ) � � �

  7. The E-step for HGMMs Assume conditionally independent data = 1 , and group the N Y = f y g j j heteroscedastic parameters = 1 together into a set M . N f M g j j The E-step is the calculation of ) for all the pairs of data P ( i j y ; � ; M y � j j and HGMM components: p ( y j i; � ; M ) P ( i j � ; M ) � � j P ( i j y ; � ; M ) = : � j n P p ( y j l ; � ; M ) P ( l j � ; M ) l = 1 j � � i.e. p ( y j i; � ; M ) � � j � i P ( i j y ; � ; M ) = � j n P p ( y j l ; � ; M ) � l = 1 � j � l

  8. The M-step for HGMMs Maximise ) with respect to � : Q (� j � � 1 � � = P ( i j y ; � ; M ) � i j N � 1 � � � � N N P P � � = P ( i j y ; � )) H M P ( i j y ; � ) H y = 1 = 1 � � i j j j j j j j j � iterate to find i : � � ! � + � � ; i i i N � i X T T � = P ( i j y ; � ) H [( y � M � )( y � M � ) H � I ] � � i 2 j j j j i j j i i j = 1 j If < 2 then 8 i will never be non-positive definite. � � i

  9. Results Figure 1: Left: each feature missing 10% of the time. Figure 2: Right: each feature missing 60% of the time.

  10. Conclusions � Method for ML or MAP estimation of GMMs for data with missing features. � An EM algorithm: fast. � Able to stand very high levels of missing data. � Other applications

Recommend


More recommend