estimation and model selection in dirichlet regression
play

Estimation and Model Selection in Dirichlet Regression Andr Camargo - PowerPoint PPT Presentation

Estimation and Model Selection in Dirichlet Regression Andr Camargo 1 , Julio Michael Stern 1 , Marcelo de Souza Lauretto 2 1 Institute of Mathematis and Statistics, 2 School of Arts, Sciences and Humanities, University of Sao Paulo Conference


  1. Estimation and Model Selection in Dirichlet Regression André Camargo 1 , Julio Michael Stern 1 , Marcelo de Souza Lauretto 2 1 Institute of Mathematis and Statistics, 2 School of Arts, Sciences and Humanities, University of Sao Paulo Conference on Inductive Statistics A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

  2. Introduction ◮ Compositional data: vectors whose components are the proportions or percentages of some whole. ◮ Sample Space: S D , ( D − 1 ) − dimensional simplex: S D = { z = ( z 1 , z 2 . . . z D ) : z > 0 , z1 = 1 } . ◮ Many applications, e.g: ◮ Market share analysis ◮ Election forecasts ◮ Soil composition analysis ◮ Household expenses composition ◮ Aitchison(1986) developed a methodology for compositional data analysis based on logistic normal distributions. ◮ Here we focus on Dirichlet Regression. A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

  3. Dirichlet Regression ◮ Let X = [ x 1 • ; x 2 • ; . . . ; x n • ] , Y = [ y 1 • ; y 2 • ; . . . ; y n • ] be a sample of observations where y i • ∈ S D and x i • ∈ R C , i = 1 , 2 , . . . , n . ◮ The goal is to build a regression predictor for y i • as a function of x i • . ◮ We assume that y i • ∼ D ( α 1 ( x i • ) , . . . , α D ( x i • )) , where each α j ( x i • ) is a positive function of x i • . ◮ In this work: α j ( x i • ) = x i , 1 β 1 , j + x i , 2 β 2 , j + ... + x i , C β C , j = x i • β • j . ◮ Parameters to be estimated: β = ( β k , j , k = 1 . . . C , j = 1 . . . D ) , subject to the constraint α ( x i • ) > 0. ◮ Model selection can be done by testing β k , j = 0 for some pairs ( k , j ) ∈ { 1 . . . C } × { 1 . . . D } . A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

  4. x 1 , 1 x 1 , 2 x 1 , C y 1 , 1 y 1 , 2 y 1 , D  . . .   . . .  x 2 , 1 x 2 , 2 x 2 , C y 2 , 1 y 2 , 2 y 2 , D . . . . . .     X =  Y =  . . .   . . .  ... ... . . . . . .     . . . . . .    x n , 1 x n , 2 x n , C y n , 1 y n , 2 y n , D . . . . . .  β 1 , 1 β 1 , 2 . . . β 1 , D  β 2 , 1 β 2 , 2 . . . β 2 , D   β = α = X β  . . .  ... . . .   . . .   β C , 1 β C , 2 . . . β C , D A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

  5. Case study ◮ Arctic Lake Sediments dataset (Coakley & Rust, 1968): compositions of sand, silt and clay ( y ) for 39 sediment samples at different water depths ( x ). ◮ Interest in submodels of the complete second-order polynomial model on x , α j ( x ) = β 1 , j + β 2 , j x + β 3 , j x 2 , j = 1 . . . 3 . A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

  6. Case study A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

  7. Parameters Estimation ◮ Likelihood function: Given y 1 • . . . , y n • c.i.i.d. given β :   α j ( x i • ) − 1 y � n � D ij  , L ( β | X , Y ) =  Γ(Λ i ( x i • )) Γ( α j ( x i • )) i = 1 j = 1 where Λ i ( x i • ) = � D j = 1 α j ( x i • ) . ◮ Gradients: ∂ log L � n = � Γ ′ (Λ i ( x i • )) − Γ ′ ( α j ( x i • )) + x i , k log y i , j � ∂β k , j i = 1 Γ ′ : digamma function, Γ ′ ( u ) = ∂ log Γ ∂ u ( u ) . A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

  8. ◮ Fitting Dirichlet Distributions with constant parameters is straightforward via standard numerical methods. ◮ The difficulty arises when we attempt to extend the estimation to Dirichlet Regression. ◮ Starting values and regularization policies must be carefully chosen to assure the optimization convergence. ◮ Hijazi and Jernigan (2009) proposed a method for choosing starting values for the coefficients, which is based on: ◮ Drawing resamples of the original data; ◮ Fitting the resamples by least squares method. A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

  9. Hijazi and Jernigan’s Method ◮ Hijazi and Jernigan’s method: 1. Draw r resamples with replacement from X and Y , each of size m ( m < n ). 2. For each resample l : - fit a Dirichlet model with constant parameters; and - compute the mean of the corresponding covariates. This will result in matrices A r × D , W r × C where row a l • and w l • represent, respectively, the ML estimates and the covariates mean of resample l . 3. Fit by least squares D models of the form A i , j = α j ( w i • ) = � C k = 1 w ik β kj . ˆ 4. Use the fitted coefficients beta k , j as starting values. ◮ Drawback: This method does not guarantee that the starting values ˆ beta k , j yield positive values for α j ( x i ) . A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

  10. Our Proposal ◮ We propose a regularization approach anchored by the constant (without covariates) Dirichlet model. ◮ We extend the initial model to include the constant (intercept) terms as artificial variables, in case they are not present. ◮ Finally, we solve a sequence of optimization problems that drive the artificial variables back to zero. A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

  11. ◮ Algorithm: 1 Include a constant vector 1 as the first column of X , in case it is not present in the original model. 2 Define a boolean matrix M indicating the non-zero parameters of the original model, namely: � 1 if β k , j is a model parameter; M k , j = 0 if β k , j = 0. 3 Fit Y by a Dirichlet distribution with constant parameters (via MLE). Notice that this corresponds to the solution β 0 of a basic model whose boolean matrix model M is: � 1 if k = 1 M 0 k , j = 0 if k � = 1 Moreover, this solution is a feasible point for the (possible extended) model including the intercept. A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

  12. ◮ (cont.) 4 Build a supermodel joining all variables present either in the anchor or in the original model, namely: k , j = max ( M 0 M ∗ k , j , M k , j ) , k = 1 . . . C , j = 1 . . . D . 5 Solve the sequence of optimization problems g ( β | X , Y ) = − K b β 2 + log L ( β | X , Y ) . max β Boolean vector b indicates which of the β 1 , j are “artificial” variables: � 1 if M 1 , j = 0 ; b j = 1 − M 1 , j = 0 otherwise. ◮ − K b β 2 : penalty term for artificial variables. ◮ Repeating step 5 with a sequence of increasing scalars, K t , drives these artificial variables to zero, converging to the optimal solution (best fit) of the original model. A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

  13. Prediction using Dirichlet Regression ◮ Having obtained the estimate ˆ β , the expected composition proportions in y given the vector x of covariates values is the mean of the distribution D (ˆ α ( x )) : � � α 1 ( x ) ˆ , ˆ α 2 ( x ) . . . ˆ α D ( x ) ˆ y = ˆ ˆ ˆ Λ( x ) Λ( x ) Λ( x ) where ˆ Λ( x ) = � D j = 1 ˆ α j ( x ) . A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

  14. Results - Parameter Estimation Procedures ◮ Random subsamples of arctic lake dataset, n ∈ { 20 , 27 } ◮ We try to fit each subsample with an incomplete polynomial model described by a random structural matrix M ( q ) : M ( q ) k , j ∼ Ber ( p ) Fill-in probability, p ∈ { 0 . 33 , 0 . 5 , 0 . 66 } . ◮ Performance measures: 1. Failure rate; 2. Computational processing time. A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

  15. Falhas Tempo de processamento 20 Hijazi 6 Nosso Método 4 15 Segundos (Log 2 ) 2 10 % 0 5 −2 Hijazi Nosso Método −4 0 0.33 0.5 0.66 0.33 0.5 0.66 Completude da Matriz de modelo: Pr(m jk = 1) Completude da Matriz de modelo: Pr(m jk = 1) A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

  16. Full Bayesian Significance Test (FBST) ◮ FBST: proposed by Pereira & Stern (1999); a review in Pereira et al (2008). ◮ Notation and assumptions: ◮ Parameter space: Θ ⊆ R n ◮ Hypothesis H : θ ∈ Θ H , where H ≡ Θ H = { θ ∈ Θ | g ( θ ) ≤ 0 ∧ h ( θ ) = 0 } ; dim ( H ) < dim (Θ) ◮ f x ( θ ) denotes the posterior probability density function. A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

  17. ◮ Computation of the evidence measure used on the FBST: 1. Optimization step: find the maximum (supremum) of posterior under the hypothesis: θ ∗ = arg sup f x ( θ ) , f ∗ = f x ( θ ∗ ) H 2. Integration step: integrate the posteriori density over the tangential set: T = { θ ∈ Θ : f ( θ ) > f ∗ } � Ev ( H ) = Pr ( θ ∈ T | x ) = f ( θ ) d θ T ◮ Ev ( H ) “large” ⇒ T “heavy” ⇒ hypothesis set in a region of “low” posterior density ⇒ “strong” evidence against H . ◮ Ev ( H ) : evidence against H ; Ev ( H ) = 1 − Ev ( H ) : evidence in favor of H . A.Camargo,J.M.Stern,M.S.Lauretto Estimation and Model Selection in Dirichlet Regression

Recommend


More recommend