Ratemaking application of Bayesian LASSO with conjugate hyperprior Himchan Jeong and Emiliano A. Valdez University of Connecticut Actuarial Science Seminar Department of Mathematics University of Illinois at Urbana-Champaign 26 October 2018 Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 1 / 31
Outline of talk Introduction Regularization or penalized least squares Bayesian LASSO Bayesian LASSO with conjugate hyperprior LAAD penalty Comparing the different penalty functions Optimization routine Model calibration The two-part model Data Estimation results: frequency Validation: frequency Estimation results: average severity Validation: average severity Conclusion Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 2 / 31
Introduction Regularization or penalized least squares Regularization or least squares penalty L q penalty function: � � || Y − Xβ || 2 + λ || β || q ˜ β = argmin , β where λ is the regularization or penalty parameter and || β || q = � p j =1 | β j | q . Special cases include: LASSO (Least Absolute Shrinkage and Selection Operator): q = 1 Ridge regression: q = 2 Interpretation is to penalize unreasonable values of β . LASSO optimization problem: � p � || Y − Xβ || 2 � min subject to j =1 | β j | = || β || 1 ≤ t β See Tibshirani (1996) Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 3 / 31
Introduction Regularization or penalized least squares A motivation for regularization: correlated predictors Let y be a response variable with potential predictors x 1 , x 2 , and x 3 and consider the case when predictors are highly correlated. > x1 <- rnorm(50); x2 <- rnorm(50,mean=x1,sd=0.05); x3 <- rnorm(50,mean=-x1,sd=0.02) > y <- rnorm(50,mean=-2+x1+x2-2*x3); x <- data.frame(x1,x2,x3); x <- as.matrix(x) > # correlation matrix > upper x1 x2 x3 x1 1 x2 0.9984 1 x3 -0.9997 -0.9982 1 Fitting the least squares regression: > coef(lm(y~x1+x2+x3)) (Intercept) x1 x2 x3 -2.3347410 -16.5839237 0.2353327 -19.9617757 Fitting ridge regression and lasso: > library(glmnet) > lm.ridge <- glmnet(x,y,alpha=0,lambda=0.1,standardize=FALSE); t(coef(lm.ridge)) 1 x 4 sparse Matrix of class "dgCMatrix" (Intercept) x1 x2 x3 s0 -2.359547 1.114166 1.104729 -1.356508 > lm.lasso <- glmnet(x,y,alpha=1,lambda=0.1,standardize=FALSE); t(coef(lm.lasso)) 1 x 4 sparse Matrix of class "dgCMatrix" (Intercept) x1 x2 x3 s0 -2.381575 . . -3.496807 Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 4 / 31
Introduction Bayesian LASSO Bayesian interpretation of LASSO (Naive) Park and Casella (2008) demonstrated that we may interpret LASSO in a Bayesian framework as follows: Y | β ∼ N ( Xβ, σ 2 I n ) , β i | λ ∼ Laplace (0 , 2 /λ ) so that p ( β i | λ ) = λ 4 e − λ | β i | . According to this specification, we may write out the likelihood for β as � � �� n i =1 ( y i − X i β ) 2 � − 1 L ( β | Y, X, λ ) ∝ exp − λ || β || 1 2 σ 2 and the log-likelihood as �� n i =1 ( y i − X i β ) 2 � ℓ ( β | Y, X, λ ) = − 1 − λ || β || 1 + Constant . 2 σ 2 Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 5 / 31
Bayesian LASSO with conjugate hyperprior Bayesian LASSO with conjugate hyperprior Choice of the optimal λ is critical in penalized regression. Here, let us assume that Y | β ∼ N ( Xβ, σ 2 I n ) , ∼ Gamma ( r/σ 2 − 1 , 1) . λ j | r i.i.d. β j | λ j ∼ Laplace (0 , 2 /λ j ) , In other words, the ‘hyperprior’ of λ follows a gamma distribution so that p ( λ | r ) = λ ( r/σ 2 ) − p − 1 e − λ / Γ( r/σ 2 − p ) , then we have � i =1 ( y i − X i β ) 2 �� �� n − 1 L ( β, λ 1 , . . . , λ p | Y, X, r ) ∝ exp × 2 σ 2 � p j =1 exp ( − λ j [ | β j | + 1]) λ r/σ 2 − 1 . j Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 6 / 31
Bayesian LASSO with conjugate hyperprior LAAD penalty Log adjusted absolute deviation (LAAD) penalty Integrating out the λ and taking the log of the likelihood, we get �� n � � p ℓ ( β | Y, X, r ) = − 1 i =1 ( y i − X i β ) 2 + 2 r j =1 log(1 + | β j | ) + Const . 2 σ 2 Therefore, we have a new formulation for our penalized least squares problem. This gives rise to what we call LAAD penalty function: � p || β || L = j =1 log(1 + | β j | ) so that || y − Xβ || 2 + 2 r || β || L . � β = argmin β Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 7 / 31
Bayesian LASSO with conjugate hyperprior LAAD penalty Analytic solution for the univariate case To understand the characteristics of the new penalty, consider the simple example when X ′ X = I , in other words, design matrix is orthonormal so that it is enough to solve the following: 1 2( z j − θ j ) 2 + r log(1 + | θ j | ) . � θ j = argmin θ j 2 ( z − θ ) 2 + r log(1 + | θ | ) , then we can show that By setting ℓ ( θ | r, z ) = 1 minimizer will be given as � θ = θ ∗ ✶ {| z |≥ z ∗ ( r ) ∨ r ) } where z ∗ ( r ) is the unique solution of ∆( z | r ) = 1 2( θ ∗ ) 2 − θ ∗ z + r log(1 + | θ ∗ | ) = 0 , �� � θ ∗ = 1 ( | z | − 1) 2 + 4 | z | − 4 r − 1 2( z + sgn ( z ) . Note that � θ converges to z as | z | tends to ∞ . Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 8 / 31
Bayesian LASSO with conjugate hyperprior LAAD penalty Sketch of the proof We have � θ × z ≥ 0 so we start from the case that z is nonnegative number and we have the following; r r ℓ ′ ( θ | r, z ) = ( θ − z ) + 1 + θ, ℓ ′′ ( θ | r, z ) = 1 − (1 + θ ) 2 , � ( z − 1) 2 + 4 z − 4 r ℓ ′ ( θ ∗ ) = 0 ⇔ θ ∗ = z − 1 + 2 2 Case (1) z ≥ r ⇒ ℓ ′′ ( θ ∗ | r, z ) > 0 so that θ ∗ is the local minimum. Moreover, ℓ ′ (0 | r, z ) ≤ 0 implies θ ∗ is the global minimum. Case (2) z < r, z < 1 ⇒ θ ∗ < 0 so that ℓ ′ ( θ | r, z ) > 0 ∀ θ ≥ 0 . Therefore, ℓ ( θ | r, z ) strictly increasing and � θ = 0 . 2 ) 2 ⇒ in this case, θ ∗ / 2 ) 2 ≥ z , Case (3) r ≥ ( z +1 ∈ R . Moreover, ( z +1 ℓ ′ (0 | r, z ) = r − z ≥ 0 and ℓ ′ ( θ | r, z ) > 0 ∀ θ > 0 . Therefore, � θ = 0 . Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 9 / 31
Bayesian LASSO with conjugate hyperprior LAAD penalty Contour map of � θ 50 ^ = θ ∗ θ ^ = 0 θ 40 30 z 20 10 0 0 10 20 30 40 50 r Figure 1: Distribution of the optimizer for the three cases Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 10 / 31
Bayesian LASSO with conjugate hyperprior LAAD penalty - continued 2 ) 2 ⇒ First, we show that ℓ ′′ ( θ ∗ | r, z ) > 0 so Case (4) 1 ≤ z < r < ( z +1 that θ ∗ is a local minimum of ℓ ( θ | r, z ) and � θ would be either θ ∗ or 0 . In this case, we compute ∆( z | r ) = ℓ ( θ ∗ | r, z ) − ℓ (0 | r, z ) and � θ ∗ , if ∆( z | r ) < 0 � θ = , 0 if ∆( z | r ) > 0 � � ∂θ ∗ r θ ∗ − z + ∂z − θ ∗ = − θ ∗ < 0 . ∆ ′ ( z | r ) = 1 + θ ∗ Thus, ∆( z | r ) is strictly decreasing w.r.t. z and ∆( z | r ) = 0 has a unique solution because ∆( z | r ) < 0 ⇔ � θ = θ ∗ , if z = r and θ = 0 , if z = 2 √ r − 1 . ∆( z | r ) > 0 ⇔ � Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 11 / 31
Bayesian LASSO with conjugate hyperprior LAAD penalty - continued 50 ^ = θ ∗ θ ^ = 0 θ 40 30 z 20 10 0 0 10 20 30 40 50 r Figure 2: Distribution of the optimizer for all the cases Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 12 / 31
Bayesian LASSO with conjugate hyperprior Comparing the different penalty functions Estimate behavior L2 Penalty − Ridge L1 Penalty − LASSO LAAD Penalty 6 6 6 4 4 4 2 2 2 beta_hat beta_hat beta_hat 0 0 0 −2 −2 −2 −4 −4 −4 −6 −6 −6 −6 −4 −2 0 2 4 6 −6 −4 −2 0 2 4 6 −6 −4 −2 0 2 4 6 beta beta beta Figure 3: Estimate behavior for different penalties Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 13 / 31
Bayesian LASSO with conjugate hyperprior Comparing the different penalty functions Penalty regions L2 penalty - Ridge L1 penalty - LASSO LAAD penalty 10 10 10 5 5 5 0 0 0 -5 -5 -5 -10 -10 -10 -10 -5 0 5 10 -10 -5 0 5 10 -10 -5 0 5 10 Figure 4: Penalty regions for different penalties Jeong/Valdez (U. of Connecticut) Bayesian LASSO with conjugate hyperprior 26 Oct 2018 14 / 31
Recommend
More recommend