ratemaking application of bayesian lasso with conjugate
play

Ratemaking application of Bayesian LASSO with conjugate hyperprior - PowerPoint PPT Presentation

Ratemaking application of Bayesian LASSO with conjugate hyperprior Himchan Jeong and Emiliano A. Valdez University of Connecticut Actuarial Science Seminar Department of Mathematics University of Illinois at Urbana-Champaign 26 October 2018


  1. Ratemaking application of Bayesian LASSO with conjugate hyperprior Himchan Jeong and Emiliano A. Valdez University of Connecticut Actuarial Science Seminar Department of Mathematics University of Illinois at Urbana-Champaign 26 October 2018 Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 1 / 31

  2. Outline of talk Introduction Regularization or penalized least squares Bayesian LASSO Bayesian LASSO with conjugate hyperprior LAAD penalty Comparing the different penalty functions Optimization routine Model calibration The two-part model Data Estimation results: frequency Validation: frequency Estimation results: average severity Validation: average severity Conclusion Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 2 / 31

  3. Introduction Regularization or penalized least squares Regularization or least squares penalty L q penalty function: � � || Y − Xβ || 2 + λ || β || q ˜ β = argmin , β where λ is the regularization or penalty parameter and || β || q = � p j =1 | β j | q . Special cases include: LASSO (Least Absolute Shrinkage and Selection Operator): q = 1 Ridge regression: q = 2 Interpretation is to penalize unreasonable values of β . LASSO optimization problem: � p � || Y − Xβ || 2 � min subject to j =1 | β j | = || β || 1 ≤ t β See Tibshirani (1996) Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 3 / 31

  4. Introduction Regularization or penalized least squares A motivation for regularization: correlated predictors Let y be a response variable with potential predictors x 1 , x 2 , and x 3 and consider the case when predictors are highly correlated. > x1 <- rnorm(50); x2 <- rnorm(50,mean=x1,sd=0.05); x3 <- rnorm(50,mean=-x1,sd=0.02) > y <- rnorm(50,mean=-2+x1+x2-2*x3); x <- data.frame(x1,x2,x3); x <- as.matrix(x) > # correlation matrix > upper x1 x2 x3 x1 1 x2 0.9984 1 x3 -0.9997 -0.9982 1 Fitting the least squares regression: > coef(lm(y~x1+x2+x3)) (Intercept) x1 x2 x3 -2.3347410 -16.5839237 0.2353327 -19.9617757 Fitting ridge regression and lasso: > library(glmnet) > lm.ridge <- glmnet(x,y,alpha=0,lambda=0.1,standardize=FALSE); t(coef(lm.ridge)) 1 x 4 sparse Matrix of class "dgCMatrix" (Intercept) x1 x2 x3 s0 -2.359547 1.114166 1.104729 -1.356508 > lm.lasso <- glmnet(x,y,alpha=1,lambda=0.1,standardize=FALSE); t(coef(lm.lasso)) 1 x 4 sparse Matrix of class "dgCMatrix" (Intercept) x1 x2 x3 s0 -2.381575 . . -3.496807 Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 4 / 31

  5. Introduction Bayesian LASSO Bayesian interpretation of LASSO (Naive) Park and Casella (2008) demonstrated that we may interpret LASSO in a Bayesian framework as follows: Y | β ∼ N ( Xβ, σ 2 I n ) , β i | λ ∼ Laplace (0 , 2 /λ ) so that p ( β i | λ ) = λ 4 e − λ | β i | . According to this specification, we may write out the likelihood for β as � � �� n i =1 ( y i − X i β ) 2 � − 1 L ( β | Y, X, λ ) ∝ exp − λ || β || 1 2 σ 2 and the log-likelihood as �� n i =1 ( y i − X i β ) 2 � ℓ ( β | Y, X, λ ) = − 1 − λ || β || 1 + Constant . 2 σ 2 Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 5 / 31

  6. Bayesian LASSO with conjugate hyperprior Bayesian LASSO with conjugate hyperprior Choice of the optimal λ is critical in penalized regression. Here, let us assume that Y | β ∼ N ( Xβ, σ 2 I n ) , ∼ Gamma ( r/σ 2 − 1 , 1) . λ j | r i.i.d. β j | λ j ∼ Laplace (0 , 2 /λ j ) , In other words, the ‘hyperprior’ of λ follows a gamma distribution so that p ( λ | r ) = λ ( r/σ 2 ) − p − 1 e − λ / Γ( r/σ 2 − p ) , then we have � i =1 ( y i − X i β ) 2 �� �� n − 1 L ( β, λ 1 , . . . , λ p | Y, X, r ) ∝ exp × 2 σ 2 � p j =1 exp ( − λ j [ | β j | + 1]) λ r/σ 2 − 1 . j Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 6 / 31

  7. Bayesian LASSO with conjugate hyperprior LAAD penalty Log adjusted absolute deviation (LAAD) penalty Integrating out the λ and taking the log of the likelihood, we get �� n � � p ℓ ( β | Y, X, r ) = − 1 i =1 ( y i − X i β ) 2 + 2 r j =1 log(1 + | β j | ) + Const . 2 σ 2 Therefore, we have a new formulation for our penalized least squares problem. This gives rise to what we call LAAD penalty function: � p || β || L = j =1 log(1 + | β j | ) so that || y − Xβ || 2 + 2 r || β || L . � β = argmin β Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 7 / 31

  8. Bayesian LASSO with conjugate hyperprior LAAD penalty Analytic solution for the univariate case To understand the characteristics of the new penalty, consider the simple example when X ′ X = I , in other words, design matrix is orthonormal so that it is enough to solve the following: 1 2( z j − θ j ) 2 + r log(1 + | θ j | ) . � θ j = argmin θ j 2 ( z − θ ) 2 + r log(1 + | θ | ) , then we can show that By setting ℓ ( θ | r, z ) = 1 minimizer will be given as � θ = θ ∗ ✶ {| z |≥ z ∗ ( r ) ∨ r ) } where z ∗ ( r ) is the unique solution of ∆( z | r ) = 1 2( θ ∗ ) 2 − θ ∗ z + r log(1 + | θ ∗ | ) = 0 , �� � θ ∗ = 1 ( | z | − 1) 2 + 4 | z | − 4 r − 1 2( z + sgn ( z ) . Note that � θ converges to z as | z | tends to ∞ . Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 8 / 31

  9. Bayesian LASSO with conjugate hyperprior LAAD penalty Sketch of the proof We have � θ × z ≥ 0 so we start from the case that z is nonnegative number and we have the following; r r ℓ ′ ( θ | r, z ) = ( θ − z ) + 1 + θ, ℓ ′′ ( θ | r, z ) = 1 − (1 + θ ) 2 , � ( z − 1) 2 + 4 z − 4 r ℓ ′ ( θ ∗ ) = 0 ⇔ θ ∗ = z − 1 + 2 2 Case (1) z ≥ r ⇒ ℓ ′′ ( θ ∗ | r, z ) > 0 so that θ ∗ is the local minimum. Moreover, ℓ ′ (0 | r, z ) ≤ 0 implies θ ∗ is the global minimum. Case (2) z < r, z < 1 ⇒ θ ∗ < 0 so that ℓ ′ ( θ | r, z ) > 0 ∀ θ ≥ 0 . Therefore, ℓ ( θ | r, z ) strictly increasing and � θ = 0 . 2 ) 2 ⇒ in this case, θ ∗ / 2 ) 2 ≥ z , Case (3) r ≥ ( z +1 ∈ R . Moreover, ( z +1 ℓ ′ (0 | r, z ) = r − z ≥ 0 and ℓ ′ ( θ | r, z ) > 0 ∀ θ > 0 . Therefore, � θ = 0 . Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 9 / 31

  10. Bayesian LASSO with conjugate hyperprior LAAD penalty Contour map of � θ 50 ^ = θ ∗ θ ^ = 0 θ 40 30 z 20 10 0 0 10 20 30 40 50 r Figure 1: Distribution of the optimizer for the three cases Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 10 / 31

  11. Bayesian LASSO with conjugate hyperprior LAAD penalty - continued 2 ) 2 ⇒ First, we show that ℓ ′′ ( θ ∗ | r, z ) > 0 so Case (4) 1 ≤ z < r < ( z +1 that θ ∗ is a local minimum of ℓ ( θ | r, z ) and � θ would be either θ ∗ or 0 . In this case, we compute ∆( z | r ) = ℓ ( θ ∗ | r, z ) − ℓ (0 | r, z ) and � θ ∗ , if ∆( z | r ) < 0 � θ = , 0 if ∆( z | r ) > 0 � � ∂θ ∗ r θ ∗ − z + ∂z − θ ∗ = − θ ∗ < 0 . ∆ ′ ( z | r ) = 1 + θ ∗ Thus, ∆( z | r ) is strictly decreasing w.r.t. z and ∆( z | r ) = 0 has a unique solution because ∆( z | r ) < 0 ⇔ � θ = θ ∗ , if z = r and θ = 0 , if z = 2 √ r − 1 . ∆( z | r ) > 0 ⇔ � Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 11 / 31

  12. Bayesian LASSO with conjugate hyperprior LAAD penalty - continued 50 ^ = θ ∗ θ ^ = 0 θ 40 30 z 20 10 0 0 10 20 30 40 50 r Figure 2: Distribution of the optimizer for all the cases Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 12 / 31

  13. Bayesian LASSO with conjugate hyperprior Comparing the different penalty functions Estimate behavior L2 Penalty − Ridge L1 Penalty − LASSO LAAD Penalty 6 6 6 4 4 4 2 2 2 beta_hat beta_hat beta_hat 0 0 0 −2 −2 −2 −4 −4 −4 −6 −6 −6 −6 −4 −2 0 2 4 6 −6 −4 −2 0 2 4 6 −6 −4 −2 0 2 4 6 beta beta beta Figure 3: Estimate behavior for different penalties Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 13 / 31

  14. Bayesian LASSO with conjugate hyperprior Comparing the different penalty functions Penalty regions L2 penalty - Ridge L1 penalty - LASSO LAAD penalty 10 10 10 5 5 5 0 0 0 -5 -5 -5 -10 -10 -10 -10 -5 0 5 10 -10 -5 0 5 10 -10 -5 0 5 10 Figure 4: Penalty regions for different penalties Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 14 / 31

Recommend


More recommend