regression
play

Regression Applied Bayesian Statistics Dr. Earvin Balderama - PowerPoint PPT Presentation

Regression Applied Bayesian Statistics Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago October 24, 2017 Regression 1 Last edited October 24, 2017 by <ebalderama@luc.edu> MCMC Bayesian linear


  1. Regression Applied Bayesian Statistics Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago October 24, 2017 Regression 1 Last edited October 24, 2017 by <ebalderama@luc.edu>

  2. MCMC Bayesian linear regression Suppose the data is ind µ, σ 2 � � Y i ∼ Normal . If we want to model Y i in terms of covariate information, what we are really saying is that µ varies across the i = 1 , . . . , n observations (instead of remaining constant). The (multiple) linear regression model is ind µ i , σ 2 � � Y i ∼ Normal µ i = β 0 + X i 1 β 1 + · · · + X ip β p Bayesian and classical linear regression are similar if n ≫ p and the priors are uninformative. However, the results can be different for challenging problems and the interpretation is always different. Regression 2 Last edited October 24, 2017 by <ebalderama@luc.edu>

  3. MCMC Ordinary Least Squares The least squares estimate of β = ( β 0 , β 1 , . . . , β p ) T is n ˆ � ( Y i − µ i ) 2 , β OLS = argmin β i = 1 where µ i = β 0 + X i 1 β 1 + · · · + X ip β p . If the errors are Gaussian then the likelihood is n � − ( Y i − µ i ) 2 � � � n i = 1 ( Y i − µ i ) 2 � � f ( Y i | β , σ 2 ) ∝ exp = exp − 2 σ 2 2 σ 2 i = 1 Therefore, if the errors are Gaussian, ˆ β OLS is also the MLE. Note: ˆ β OLS is unbiased even if the errors are non-Gaussian. Regression 3 Last edited October 24, 2017 by <ebalderama@luc.edu>

  4. MCMC Ordinary Least Squares In algebra notation, Let Y = ( Y 0 , Y 1 , . . . , Y n ) T be the response vector and X be the n × ( p + 1 ) matrix of covariates. Then the mean of Y is X β and the least squares solution is ˆ ( Y − X β ) T ( Y − X β ) = ( X T X ) − 1 X T Y β OLS = argmin β If the errors are Gaussian then the sampling distribution is � � − 1 � β , σ 2 � ˆ X T X β OLS ∼ Normal . Note: If σ 2 is estimated, then the sampling distribution is multivariate t . Regression 4 Last edited October 24, 2017 by <ebalderama@luc.edu>

  5. MCMC Bayesian linear regression The likelihood is ind β 0 + X i 1 β 1 + · · · + X ip β p , σ 2 � � Y i ∼ Normal We will need to set priors for β 0 , β 1 , . . . , β p , σ 2 . Note: For the purpose of setting priors and better interpretability of coefficient estimates later on, it is helpful to standardize both the response and each covariate to have mean 0 and variance 1. Many priors for β have been considered: Improper priors 1 Gaussian priors 2 Double-exponential priors 3 ... 4 Regression 5 Last edited October 24, 2017 by <ebalderama@luc.edu>

  6. MCMC Improper priors The flat prior is f ( β ) = 1. (This is also the Jeffrey’s prior). Note: This is improper, but the posterior is proper under the same conditions required by least squares. With σ 2 known, the posterior is � � − 1 � β OLS , σ 2 � ˆ X T X β | Y ∼ Normal Therefore, shouldn’t the results be similar to least squares? How do they differ? Regression 6 Last edited October 24, 2017 by <ebalderama@luc.edu>

  7. MCMC Improper priors Because we rarely know σ , we set a prior for the error variance σ 2 , typically with InverseGamma ( a , b ) , with a and b set to something small, say a = b = 0 . 01. The posterior for β then follows a multivariate t centered at ˆ β OLS . The Jeffreys prior is f ( β, σ 2 ) = 1 σ 2 , which is the limit as a , b → 0. Regression 7 Last edited October 24, 2017 by <ebalderama@luc.edu>

  8. MCMC Multivariate normal prior Another common prior is Zellner’s g-prior 0 , σ 2 � � − 1 � � X T X β ∼ Normal g Note: This prior is proper assuming X is full rank. The posterior mean is 1 ˆ β OLS 1 + g Note: g controlled the amount of shrinkage. g = 1 n is common, and called the unit information prior . Regression 8 Last edited October 24, 2017 by <ebalderama@luc.edu>

  9. MCMC Univariate Gaussian priors If there are many covariates, or if the covariates are collinear, then ˆ β OLS is unstable. Independent priors can counteract collinearity: 0 , σ 2 � � ind β j ∼ Normal g The posterior mode is p n ( Y i − µ i ) 2 + g � � β 2 argmin j β i = 1 j = 1 Note: In classical statistics, this is known as the ridge regression solution and is used to stabilize the least squares solution. Regression 9 Last edited October 24, 2017 by <ebalderama@luc.edu>

  10. MCMC BLASSO An increasingly popular prior is the double-exponential or Bayesian LASSO prior . The prior is β j ∼ DE ( τ ) with PDF � −| β | � f ( β ) ∝ exp τ Note: Basically, the squared term in the Gaussian prior is replaced with an absolute value. The shape of the PDF is more peaked at 0. This favors settings where there are many β j near zero and a few large β j ; that is, p is large but most of the covariates are noise. Regression 10 Last edited October 24, 2017 by <ebalderama@luc.edu>

  11. MCMC BLASSO 1.0 Gaussian BLASSO 0.8 0.6 Prior 0.4 0.2 0.0 − 3 − 2 − 1 0 1 2 3 β Regression 11 Last edited October 24, 2017 by <ebalderama@luc.edu>

  12. MCMC BLASSO The posterior mode is p n ( Y i − µ i ) 2 + g � � | β j | argmin β i = 1 j = 1 Note: In classical statistics, this is known as the LASSO solution and is used to add stability by shrinking estimates towards 0, and also setting some coefficients to 0. Covariates with coefficients set to 0 can be removed from analysis. Therefore, LASSO performs variable selection and estimation simultaneously! Regression 12 Last edited October 24, 2017 by <ebalderama@luc.edu>

  13. MCMC Logistic regression In logistic regression , we have a binary response Y i ∈ { 0 , 1 } , and logit [ P ( Y i = 1 )] = β 0 + β 1 X i 1 + · · · + β p X ip P ( Y i = 1 ) = expit ( β 0 + β 1 X i 1 + · · · + β p X ip ) ∈ [ 0 , 1 ] � x � The logit link is the log-odds: logit ( x ) = log 1 − x e x The expit transformation is the inverse: expit ( x ) = 1 + e x The β j represents the change in the log odds of Y i = 1 corresponding to a one-unit increase in covariate j . All of the priors discussed apply. Computationally the full conditional distributions are no longer conjugate and so we must use Metropolis sampling. Regression 13 Last edited October 24, 2017 by <ebalderama@luc.edu>

  14. MCMC Bayesian regression packages in R The dlaplace function in the rmutil gives the density of the double-exponential (Laplace) distribution. Of course, there is also rlaplace , plaplace , etc. The BLR function in the BLR package is probably the most common for Bayesian linear regression. It also works well for BLASSO, and is super fast. The MCMClogit function in the MCMCpack package performs Metropolis sampling efficiently for logistic regression. The MCMCpack package also includes many other functions for several other regression methods, e.g., MCMCpoisson and MCMCprobit for Poisson and probit regression, respectively. Another option is to code your own MCMC sampler yourself in R. Regression 14 Last edited October 24, 2017 by <ebalderama@luc.edu>

Recommend


More recommend