Fast and Stable Maximum Likelihood Estimation for Incomplete - PowerPoint PPT Presentation

Fast and Stable Maximum Likelihood Estimation for Incomplete Multinomial Models Chenyang Zhang, Guosheng Yin Department of Statistics and Actuarial Science, The University of Hong Kong June 13, 2019 (HKU SAAS) ICML 2019 June 13, 2019 1 / 9

What is Incomplete Multinomial Model? A toy example: Incompelte contingency table Young Middle Senior Female p 1 p 2 p 3 Male p 4 p 5 p 6 Young Middle Senior Sample 1: Female 21 24 18 Male 20 25 12 Female 18 Sample 2: Male 22 Young Middle Senior Sample 3: 10 20 10 Young Sample 4: Female 53 Male 47 (HKU SAAS) ICML 2019 June 13, 2019 1 / 9

What is Incomplete Multinomial Model (Cont’d) Multinomial model: the sample space Ω is partitioned into K disjoint subspaces. Incomplete cases: (a) a subset of categories rather than a unique category is reported (partial classification). (b) the set of possible outcomes contains only part of all categories (truncated outcomes). K q K q p b j � p a k � � p a k � j p ) b j . L ( p | a , b , ∆ ) ∝ ˜ j = ( δ ⊺ k k k =1 j =1 k =1 j =1 p = ( p 1 , . . . , p K ) ⊺ : parameters of the incomplete multinomial model. a = ( a 1 , . . . , a K ) ⊺ : counts of fully classified observations. b = ( b 1 , . . . , b q ) ⊺ : counts of incomplete observations. Positive terms for partial classification, and negative terms for truncated outcomes. ∆ = { ∆ kj } K × q = [ δ 1 , . . . , δ q ]: indicator matrix. (HKU SAAS) ICML 2019 June 13, 2019 2 / 9

What is Incomplete Multinomial Model (Cont’d) p 21 1 p 24 2 p 18 3 p 20 4 p 25 5 p 12 L( p ) ∝ 6 × ( p 1 + p 2 + p 3 ) 18 ( p 4 + p 5 + p 6 ) 22 × ( p 1 + p 4 ) 10 ( p 2 + p 5 ) 20 ( p 3 + p 6 ) 10 � 53 � � 47 � p 1 p 4 × p 1 + p 4 p 1 + p 4 . p 1 p 2 p 3 p 4 p 5 p 6 a ⊺ = � 21 + 53 , 24 , 18 , 20 + 47 , 25 , 12 � , 1 2 3 4 5 b ⊺ = � � 18 , 22 , 10 − 53 − 47 , 20 , 10 , j p 1 p 2 p 3 p 4 p 5 p 6  1 1 1  1 1 1 1 2   ∆ ⊺ =   1 1 3   .   1 1 4   1 1 5 (HKU SAAS) ICML 2019 June 13, 2019 3 / 9

Optimality condition j =1 b j , Q + = { j | b j > 0 , j = 1 , . . . , q } and k =1 a k + � q Let s = � K Q − = { j | b j < 0 , j = 1 , . . . , q } be the sets of indices of positive and negative elements in b respectively. � K K q � � � � b j log δ ⊺ ℓ ( p | a , b , ∆ ) = a k log p k + j p − s p k − 1 . k =1 j =1 k =1 Optimality condition: ∇ ℓ ( p ) = 0, ∂ℓ = a k | b j | ∆ kj | b j | ∆ kj � � + − − s = 0 , δ ⊺ δ ⊺ ∂ p k p k j p j p j ∈ Q + j ∈ Q − which is equivalent to   | b j | ∆ kj | b j | ∆ kj  � �  p k = 0 . a k + − − s δ ⊺ δ ⊺ j p j p j ∈ Q + j ∈ Q − (HKU SAAS) ICML 2019 June 13, 2019 4 / 9

Stable Weaver Algorithm Algorithm 1 Stable Weaver Algorithm Input: Observations ( a , b , ∆ ) Initialize: p (0) = (1 / K , . . . , 1 / K ) ⊺ , s = 1 ⊺ a + 1 ⊺ b repeat τ = b / ∆ ⊺ p ( t ) (element-wise division) τ + = max( τ , 0 ) , τ − = min( τ , 0 ) p ( t +1) = a + ( ∆ τ + ) ◦ p ( t ) � � / ( s 1 − ∆ τ − ) ( ◦ represents element-wise product) p ( t +1) = p ( t +1) / sum( p ( t +1) ) until convergence The weaver algorithm updates the parameter by p = a / ( s 1 − ∆ τ ). Bayesian weaver is time-consuming due to the inner–outer iteration structure and the selection of the thickening parameter is difficult. (HKU SAAS) ICML 2019 June 13, 2019 5 / 9

Application Contingency tables with merged and truncated cells. Polytomous response data with underlying categories. For example, the phenotype expressions on blood types. Interval censored time-to-event data with truncation in survival analysis. Include several well-known ranking models as special cases, such as the Bradley–Terry, Plackett–Luce models and their variants. (HKU SAAS) ICML 2019 June 13, 2019 6 / 9

Results on Real Datasets NASCAR HKJC1416 Algorithm (w/o ties) (w/ ties) (w/o ties) (w/ ties) Stable Iteration 22 459 40.4K 27.2K Weaver Time (s) < 0.01 0.03 38.46 86.40 Bayesian Iteration 128K 263K > 1M > 1M Weaver Time (s) 25.27 50.12 > 5000 > 5000 Iteration 22 – 40.4K – MM Time (s) < 0.01 – 375.79 – 636 † 649 † Trust Iteration 1937 5048 Region * Time (s) 74.31 125.68 1139.14 1835.37 Iteration 12 – 4056 – ILSR Time (s) 0.06 – 1166.97 – Self Iteration 36798 11282 – ‡ – Consistency Time (s) 11.61 2.08 – – * The number of iterations for the trust region constrained algorithm refers to the number of the objective function evaluations. † We use the approximated Hessian matrix when fitting the trust region constrained algorithm to the HKJC1416 data because its calculation is too time-consuming. ‡ For the HKJC1416 data, the self-consistency approach converges to a wrong solution. (HKU SAAS) ICML 2019 June 13, 2019 7 / 9

Results on Real Datasets (Cont’d) (a) (b) 212500 Stable Weaver Bayesian Weaver 215000 Trust Region 217500 log-likelihood 220000 222500 225000 227500 Stable Weaver Bayesian Weaver 230000 Trust Region 0 20 40 60 80 100 10 2 10 3 10 4 Time (s) Time (s) Figure 1: Convergence plot of the stable weaver algorithm compared with existing methods on the dataset HKJC9916 against running time (a) t ∈ [0 , 100] and (b) t ∈ [100 , 36000] (s). (HKU SAAS) ICML 2019 June 13, 2019 8 / 9

Thanks for listening. (HKU SAAS) ICML 2019 June 13, 2019 9 / 9

Fast and Stable Maximum Likelihood Estimation for Incomplete - PowerPoint PPT Presentation

Fast and Stable Maximum Likelihood Estimation for Incomplete Multinomial Models Chenyang Zhang, Guosheng Yin Department of Statistics and Actuarial Science, The University of Hong Kong June 13, 2019 (HKU SAAS) ICML 2019 June 13, 2019 1 / 9

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation

Maximum likelihood parameter estimation Maximum likelihood parameter estimation For an HMM

Maximum Likelihood properties Maximum parsimony Maximum likelihood Experimental design

15-388/688 - Practical Data Science: Maximum likelihood estimation, nave Bayes J. Zico Kolter

Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Maximum likelihood

Chapter 8: Estimation In this chapter we will cover: 1. The likelihood and maximum likelihood

Maximum Likelihood Estimation CS 446 Maximum likelihood: abstract formulation Weve had one

Maximum Likelihood Estimation CS 446 Maximum likelihood: abstract formulation Weve had one

Maximum-likelihood and Bayesian parameter estimation Andrea Passerini passerini@disi.unitn.it

Phylogenetic trees IV Maximum Likelihood Gerhard Jger ESSLLI 2016 Gerhard Jger Maximum

Maximum likelihood models Tues. Feb. 27, 2018 1 Overview of today Informal notion of

Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood

Quasi-maximum likelihood estimation for multivariate CARMA processes Eckhard Schlemm Institute

Week 2: Maximum Likelihood Estimation Instructor: Sergey Levine 1 Recap: MLE for the binomial

Phylogenetic trees IV Maximum Likelihood Gerhard Jger Words, Bones, Genes, Tools February 28,

Lecture 8: Maximum Likelihood Estimation (MLE) (contd.) Maximum a posteriori (MAP)

CS54701: Information Retrieval CS-54701 Information Retrieval Course Review Luo Si Department

Maximum Likelihood Density Estimation under Total Positivity Elina Robeva MIT joint work with

Gaussian Discriminant Analysis material thanks to Andrew Ng @Stanford Course Map / module3

Maximum Likelihood Theory Max Turgeon STAT 4690Applied Multivariate Analysis Suffjcient

About this class Point Estimators The next two lectures are really coming from Lets say we

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct Learning

Nave Bayes Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative HW

TUTORIAL TUTORIAL Matthieu R Bloch Tuesday, March 24, 2020 1 MLE FOR UNIFORM DISTRIBUTIONS