Augmented Likelihood Estimators for Mixture Models Markus Haas - PowerPoint PPT Presentation

Augmented Likelihood Estimators for Mixture Models Markus Haas Jochen Krause Marc S. Paolella Swiss Banking Institute, University of Zurich M. Haas, J. Krause, M. S. Paolella Augmented Likelihood Estimation

What is mixture degeneracy? • mixtures under study are finite convex combinations of 1 ≤ k < ∞ (single-component) probability density functions k � f MIX ( ε ; θ ) = ω i f i ( ε ; θ i ) i =1 • unbounded mixture likelihood function • infinite likelihood values ( singularities ) • mixture components degenerate to Dirac’s delta function Delta Fun. • maximum-likelihood estimation yields degenerated estimates • set of local optima includes singularities M. Haas, J. Krause, M. S. Paolella Augmented Likelihood Estimation

Why does degeneracy matter for mixture estimation? 1 true mixture µ =(−1.00,1.00) 0.8 σ =(2.00,1.00) ω =(0.60,0.40) m.l. estimate 0.6 µ =(8.00,−0.09) σ =(5.8e−11,2.40) ω =(0.02,0.98) 0.4 0.2 0 −10 −5 0 5 10 mixture of two (e.g., normal) densities and exemplary m.l.e., N = 100 M. Haas, J. Krause, M. S. Paolella Augmented Likelihood Estimation

Selected literature on mixture estimation – first occurrence of mixture estimation (method of moments) K. Pearson (1894) – unboundedness of the likelihood function, e.g. J. Kiefer and J. Wolfowitz (1956); N. E. Day (1969) – expectation maximization concepts for mixture estimation, e.g. V. Hasselblad (1966); R. A. Redner and H. F. Walker (1984) – constraint maximum-likelihood approach, e.g. R. J. Hathaway (1985) – penalized maximum-likelihood approach, e.g. J. D. Hamilton (1991); G. Ciuperca et al. (2003); K. Tanaka (2009) – semi-parametric smoothed maximum-likelihood approach, e.g. B. Seo and B. G. Lindsay (2010) Bib. M. Haas, J. Krause, M. S. Paolella Augmented Likelihood Estimation

What is the contribution? ◮ Fast, Consistent and General Estimation of Mixture Models • fast: as fast as maximum-likelihood estimation (MLE) • consistent: if the true mixture is non-degenerated • general: likelihood-based, neither constraints nor penalties ◮ Augmented Likelihood Estimation (ALE) • shrinkage-like solution of the mixture degeneracy problem • approach copes with all kinds of local optima, not only singularities M. Haas, J. Krause, M. S. Paolella Augmented Likelihood Estimation

A simple solution using the idea of shrinkage augmented likelihood estimator : ˆ θ ALE = arg max θ ˜ ℓ ( θ ; ε ) augmented likelihood function : k � ˜ ¯ ℓ ( θ ; ε ) = ℓ ( θ ; ε ) + τ ℓ i ( θ i ; ε ) i =1 T k k T 1 � � � � = log ω i f i ( ε t ; θ i ) + τ log f i ( ε t ; θ i ) T t =1 i =1 i =1 t =1 � �� CLF ◮ number of component likelihood functions (CLF): k ∈ N ◮ shrinkage constant: τ ∈ R + ◮ geometric average of the ith likelihood function: ¯ ℓ i ∈ R M. Haas, J. Krause, M. S. Paolella Augmented Likelihood Estimation

A simple solution using the idea of shrinkage augmented likelihood estimator : ˆ θ ALE = arg max θ ˜ ℓ ( θ ; ε ) augmented likelihood function : k � ˜ ¯ ℓ ( θ ; ε ) = ℓ ( θ ; ε ) + τ ℓ i ( θ i ; ε ) i =1 T k k T 1 � � � � = log ω i f i ( ε t ; θ i ) + τ log f i ( ε t ; θ i ) T t =1 t =1 i =1 i =1 � �� CLF ◮ CLF penalizes for small component likelihoods ◮ CLF rewards for high component likelihoods ◮ CLF identifies the ALE M. Haas, J. Krause, M. S. Paolella Augmented Likelihood Estimation

A simple solution using the idea of shrinkage augmented likelihood estimator : ˆ θ ALE = arg max θ ˜ ℓ ( θ ; ε ) augmented likelihood function : k � ˜ ¯ ℓ ( θ ; ε ) = ℓ ( θ ; ε ) + τ ℓ i ( θ i ; ε ) i =1 T k k T 1 � � � � = log ω i f i ( ε t ; θ i ) + τ log f i ( ε t ; θ i ) T t =1 i =1 i =1 t =1 � �� CLF ◮ consistent ALE as T → ∞ ◮ ALE → MLE, if τ → 0 or if k = 1 ◮ separate component estimates for τ → ∞ M. Haas, J. Krause, M. S. Paolella Augmented Likelihood Estimation

How does the ALE work? • assume all mixture components of the true underlying data generating mixture process as non-degenerated • likelihood product is zero for degenerated components • individual mixture components not prone to degeneracy • prevent degeneracy by shrinkage • shrink overall mixture likelihood function towards component likelihood functions shrinkage term k � τ i ¯ CLF = ℓ i ( θ i ; ε ) i =1 M. Haas, J. Krause, M. S. Paolella Augmented Likelihood Estimation

A short comparison, mixture of normals IG p.d.f. Penalized Maximum Likelihood Estimation, Ciuperca et al. (2003), Inverse Gamma (IG) Penalty: T k � � ℓ IG ( θ ; ε ) = log f MixN ( ε ; θ ) + log f IG ( σ i ; 0 . 4 , 0 . 4) t =1 i =1 Augmented Likelihood Estimator, τ = 1: T k T 1 � � � ℓ ALE ( θ ; ε ) = log f MixN ( ε ; θ ) + log f i ( ε t ; θ i ) T t =1 i =1 t =1 M. Haas, J. Krause, M. S. Paolella Augmented Likelihood Estimation

100 estimations, 500 simulated obs., random starts Details M. Haas, J. Krause, M. S. Paolella Augmented Likelihood Estimation

Conclusion & Further Research What is the contribution of ALE? + solution to the mixture degeneracy problem + very simple implementation + no prior information required, except for shrinkage constant(s) + purely based on likelihood values + applicable to mixtures of mixtures + gives consistent estimators + directly extendable to multivariate mixtures (e.g., for classification) + computationally feasible for out-of-samples exercises • further research: trade-off between potential shrinkage bias and number of local optima as well as small sample properties M. Haas, J. Krause, M. S. Paolella Augmented Likelihood Estimation

Augmented Likelihood Estimators for Mixture Models Thank you for your attention! M. Haas, J. Krause, M. S. Paolella Augmented Likelihood Estimation

What is a delta function? 9 x 10 10 9 8 7 6 5 4 3 2 1 0 −1 −0.5 0 0.5 1 probability density function with point support Back M. Haas, J. Krause, M. S. Paolella Augmented Likelihood Estimation

Bibliography I • K. Pearson (1894) “Contributions to the Mathematical Theory of Evolution” • J. Kiefer and J. Wolfowitz (1956) “Consistency of the Maximum Likelihood Estimator in the Presence of Infinitely Many Incidental Parameters” • V. Hasselblad (1966) “Estimation of Parameters for a Mixture of Normal Distributions” • N. E. Day (1969) “Estimating the Components of a Mixture of Normal Distributions” • R. A. Redner and H. F. Walker (1984) “Mixture Densities, Maximum Likelihood and the EM Algorithm” • R. J. Hathaway (1985) “A Constrained Formulation of Maximum-Likelihood Estimation for Normal Mixture Distributions” Back M. Haas, J. Krause, M. S. Paolella Augmented Likelihood Estimation

Bibliography II • J. D. Hamilton (1991) “A Quasi-Bayesian Approach to Estimating Parameters for Mixtures of Normal Distributions” • G. Ciuperca, A. Ridolfi and J. Idier (2003) “Penalized Maximum Likelihood Estimator for Normal Mixtures” • K. Tanaka (2009) “Strong Consistency of the Maximum Likelihood Estimator for Finite Mixtures of LocationScale Distributions When Penalty is Imposed on the Ratios of the Scale Parameters” • B. Seo and B. G. Lindsay (2010) “A Computational Strategy for Doubly Smoothed MLE Exemplified in the Normal Mixture Model” Back M. Haas, J. Krause, M. S. Paolella Augmented Likelihood Estimation

Inverse Gamma Probability Density Function 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 1 2 3 4 5 6 Inverse Gamma p.d.f. as used in Ciuperca et al. (2003); α = 0 . 4, β = 0 . 4. Back M. Haas, J. Krause, M. S. Paolella Augmented Likelihood Estimation

Simulation Study - Details • number of simulations, 100 • initial starting values, uniformly drawn from hand-selected intervals • hybrid optimization algorithm, BFGS, Downhill-Simplex, etc. • maximal tolerance, 10 − 8 • maximal number of function evaluations, 100 ′ 000 • estimated mixture components, sorted in increasing order by σ i Back M. Haas, J. Krause, M. S. Paolella Augmented Likelihood Estimation

Simulation Study - the true mixture density mixture of three normals mixture components θ true = ( µ , σ , ω ) = (2 . 5 , 0 . 0 , − 2 . 1 , 0 . 9 , 1 . 0 , 1 . 25 , 0 . 35 , 0 . 4 , 0 . 25) Back M. Haas, J. Krause, M. S. Paolella Augmented Likelihood Estimation

Variance weighted extension An extended augmented likelihood estimator : T � ℓ ALE ( θ ; ε ) = log f MIX ( ε ; θ ) t =1 � 1 � T k T � � + log f i ( ε t ; θ i ) i =1 t =1  2   T  � T � 1 k T  1 + 1 � � �   log  f i ( ε t ; θ i ) − f i ( ε t ; θ i ) −   T i =1 t =1 t =1 This specific ALE not only enforces a meaningful (high) explanatory power for all observations, it also enforces a meaningful (small) variance of the explanatory power. M. Haas, J. Krause, M. S. Paolella Augmented Likelihood Estimation

Augmented Likelihood Estimators for Mixture Models Markus Haas - PowerPoint PPT Presentation

Augmented Likelihood Estimators for Mixture Models Markus Haas Jochen Krause Marc S. Paolella Swiss Banking Institute, University of Zurich M. Haas, J. Krause, M. S. Paolella Augmented Likelihood Estimation What is mixture degeneracy?

L-estimators, R-estimators, Redescending M gr. Jakub Petr asek Estimators Revision Seminar

Bernoulli Mixture Models Victor Medina Researcher at SBIF DataCamp Mixture Models in R The

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

Network performance requirements of Augmented Reality Systems Mike P. Wittie 1 Augmented

IMPACT OF AUGMENTED REALITY ON SOCIETY BY DEREK MANDL AND STEPHEN SLADEK WHAT IS AUGMENTED

Max. likelihood & Bayesian techniques are both likelihood-based. Weaknesses of likelihood for

Two Ideas For Structured Data: Reward augmented maximum likelihood Order matters Samy

Max Likelihood for Log-Linear Models Daphne Koller Log-Likelihood for Markov Nets A B C

AND MACHINE LEARNING CHAPTER 10: MIXTURE MODELS AND EM Mixture Models - Define a joint

Gaussian Mixture Models & EM CE-717: Machine Learning Sharif University of Technology M.

Lesson 3: Likelihood-based inference for POMP models Aaron A. King, Edward L. Ionides, Kidus

Maximum likelihood models Tues. Feb. 27, 2018 1 Overview of today Informal notion of

Dynamic Panel Data estimators Christopher F Baum EC 823: Applied Econometrics Boston College,

Small Sample Performance of Instrumental Variables Probit Estimators: A Monte Carlo Investigation

Review - Mathematical Statistics Estimators and Estimates Unbiased estimators Efficiency

Review - Mathematical Statistics Estimators and Estimates Unbiased estimators Efficiency

More Design Issues 1. Roles. 2. Sub classes. 3. Keys. 4. W eak en tit y sets. 5.

Learning to Prune Dominated Action Sequences in Online Black-box Planning Yuu Jinnai Alex

Outline Security risk and management Some terminology CSci 5271 Introduction to Computer

ALE: AES-Based Lightweight Authenticated Encryption Andrey Bogdanov 1 , Florian Mendel 2 ,

Spelunking for Hardware Data Matt Porter <mporter@konsulko.com> CC-BY SA4 c ii

Spurious Mixing in MOM6 An energetic approach Angus Gibson May 27, 2016 Overview Motivation

Scaling limits for planar aggregation with subcritical fluctuations Amanda Turner Lancaster

Neural Network Approaches to Representation Learning for NLP Navid Rekabsaz Idiap Research

Sambuz

Useful Links

Newsletter

Mail Us

Augmented Likelihood Estimators for Mixture Models Markus Haas - PowerPoint PPT Presentation

Augmented Likelihood Estimators for Mixture Models Markus Haas Jochen Krause Marc S. Paolella Swiss Banking Institute, University of Zurich M. Haas, J. Krause, M. S. Paolella Augmented Likelihood Estimation What is mixture degeneracy?

L-estimators, R-estimators, Redescending M gr. Jakub Petr asek Estimators Revision Seminar

Bernoulli Mixture Models Victor Medina Researcher at SBIF DataCamp Mixture Models in R The

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

Network performance requirements of Augmented Reality Systems Mike P. Wittie 1 Augmented

IMPACT OF AUGMENTED REALITY ON SOCIETY BY DEREK MANDL AND STEPHEN SLADEK WHAT IS AUGMENTED

Max. likelihood &amp; Bayesian techniques are both likelihood-based. Weaknesses of likelihood for

Two Ideas For Structured Data: Reward augmented maximum likelihood Order matters Samy

Max Likelihood for Log-Linear Models Daphne Koller Log-Likelihood for Markov Nets A B C

AND MACHINE LEARNING CHAPTER 10: MIXTURE MODELS AND EM Mixture Models - Define a joint

Gaussian Mixture Models &amp; EM CE-717: Machine Learning Sharif University of Technology M.

Lesson 3: Likelihood-based inference for POMP models Aaron A. King, Edward L. Ionides, Kidus

Maximum likelihood models Tues. Feb. 27, 2018 1 Overview of today Informal notion of

Dynamic Panel Data estimators Christopher F Baum EC 823: Applied Econometrics Boston College,

Small Sample Performance of Instrumental Variables Probit Estimators: A Monte Carlo Investigation

Review - Mathematical Statistics Estimators and Estimates Unbiased estimators Efficiency

Review - Mathematical Statistics Estimators and Estimates Unbiased estimators Efficiency

More Design Issues 1. Roles. 2. Sub classes. 3. Keys. 4. W eak en tit y sets. 5.

Learning to Prune Dominated Action Sequences in Online Black-box Planning Yuu Jinnai Alex

Outline Security risk and management Some terminology CSci 5271 Introduction to Computer

ALE: AES-Based Lightweight Authenticated Encryption Andrey Bogdanov 1 , Florian Mendel 2 ,

Spelunking for Hardware Data Matt Porter &lt;mporter@konsulko.com&gt; CC-BY SA4 c ii

Spurious Mixing in MOM6 An energetic approach Angus Gibson May 27, 2016 Overview Motivation

Scaling limits for planar aggregation with subcritical fluctuations Amanda Turner Lancaster

Neural Network Approaches to Representation Learning for NLP Navid Rekabsaz Idiap Research

Sambuz

Useful Links

Newsletter

Mail Us

Max. likelihood & Bayesian techniques are both likelihood-based. Weaknesses of likelihood for

Gaussian Mixture Models & EM CE-717: Machine Learning Sharif University of Technology M.

Spelunking for Hardware Data Matt Porter <mporter@konsulko.com> CC-BY SA4 c ii