mixture of g priors for bayesian variable selection
play

Mixture of g Priors for Bayesian Variable Selection Feng Liang, Rui - PowerPoint PPT Presentation

Introduction Zellners g priors Mixture of g priors Consistency Discussion Mixture of g Priors for Bayesian Variable Selection Feng Liang, Rui Paulo et al. Sheng Zhang Department of Statistics, University of Wisconsin Madison April 30, 2010


  1. Introduction Zellner’s g priors Mixture of g priors Consistency Discussion Mixture of g Priors for Bayesian Variable Selection Feng Liang, Rui Paulo et al. Sheng Zhang Department of Statistics, University of Wisconsin Madison April 30, 2010

  2. Introduction Zellner’s g priors Mixture of g priors Consistency Discussion Outline Introduction 1 Zellner’s g priors 2 Mixture of g priors 3 Consistency 4 Discussion 5

  3. Introduction Zellner’s g priors Mixture of g priors Consistency Discussion Outline Introduction 1 Zellner’s g priors 2 Mixture of g priors 3 Consistency 4 Discussion 5

  4. Introduction Zellner’s g priors Mixture of g priors Consistency Discussion Basic Setup Consider Y ∼ N ( µ , I n /φ ) , where Y = ( y 1 , y 2 , . . . , y n ) T , µ = ( µ 1 , µ 2 , . . . , µ n ) T , I n is the n × n identity matrix, and φ is the precision parameter Potential centered predictors X 1 , . . . , X p Only consider the case n ≥ p + 2 Index the model space by γ p × 1 : � 0 if X j is excluded γ j = 1 if X j is included Under model M γ : µ = 1 n α + X γ β γ

  5. Introduction Zellner’s g priors Mixture of g priors Consistency Discussion Key Idea of Bayesian Variable Selection Put priors on the unknowns θ γ = ( α, β γ , φ ) ∈ Θ γ Update prior probabilities of models p ( M γ ) to p ( M γ ) p ( Y |M γ ) p ( M γ | Y ) = � γ p ( M γ ) p ( Y |M γ ) � where p ( Y |M γ ) = Θ γ p ( Y | θ γ , M γ ) p ( θ γ |M γ ) d θ γ , and p ( M γ ) could be 1 / 2 p Choose the model with greatest p ( M γ | Y )

  6. Introduction Zellner’s g priors Mixture of g priors Consistency Discussion The Goal of the Paper Y | α, β γ , φ, M γ ∼ N (1 n α + X γ β γ , I n /φ ) p ( α, φ |M γ ) = 1 φ β γ | φ, M γ ∼ N (0 , g φ ( X T γ X γ ) − 1 ) (Zellner’s g prior) Several previous work involves choices of calibration of g g acts as a dimensionality penalty The goal of the paper is to propose a new family of priors for g, the hyper-g prior family, to guarantee: robustness of mis-specification of g a closed-form marginal likelihoods computational efficiency desirable consistency properties in model selection

  7. Introduction Zellner’s g priors Mixture of g priors Consistency Discussion Outline Introduction 1 Zellner’s g priors 2 Mixture of g priors 3 Consistency 4 Discussion 5

  8. Introduction Zellner’s g priors Mixture of g priors Consistency Discussion Null-Based Bayes Factors (1) The bayes factor of comparing each of M γ to a base model M b is BF [ M γ : M b ] = p ( Y |M γ ) p ( Y |M b ) To compare two models M γ and M γ ′ , BF [ M γ : M γ ′ ] = BF [ M γ : M b ] BF [ M γ ′ : M b ] The posterior probability could be written as p ( M γ ) BF [ M γ : M b ] p ( M γ | Y ) = � γ ′ p ( M γ ′ ) BF [ M γ ′ : M b ]

  9. Introduction Zellner’s g priors Mixture of g priors Consistency Discussion Null-Based Bayes Factors (2) M b = M N H 0 : β γ = 0 vs . H 0 : β γ � = 0 Recall p ( α, φ |M γ ) = 1 φ and β γ | φ, M γ ∼ N (0 , g γ X γ ) − 1 ) φ ( X T Closed form of marginal likelihood: (1+ g ) ( n − 1 − p γ ) / 2 Γ(( n − 1) / 2) ( π ) ( n − 1) √ n � Y − ¯ √ Y � − ( n − 1) × p ( Y |M γ , g ) = γ )] − ( n − 1) / 2 [1+ g (1 − R 2 The null model p ( Y |M N ) corresponds to R 2 γ = 0 and p γ = 0 BF [ M γ : M N ] = (1 + g ) ( n − 1 − p γ ) / 2 [1 + g (1 − R 2 γ )] − ( n − 1) / 2

  10. Introduction Zellner’s g priors Mixture of g priors Consistency Discussion Paradoxes of fixed g Priors – Bartlett’s Paradox When g → ∞ while n and p γ are fixed: (1 + g ) ( n − 1 − p γ ) / 2 [1 + g (1 − R 2 γ )] − ( n − 1) / 2 BF [ M γ : M N ] = → 0 This means, regardless of the information in the data, the Bayes factor always favors the null model, which is due to the large spread of the prior induced by the noninformative choice of g

  11. Introduction Zellner’s g priors Mixture of g priors Consistency Discussion Paradoxes of fixed g Priors – Information Paradox β γ � 2 → ∞ so that R 2 Suppose � ˆ γ → 1 while n and p γ are fixed Expect BF [ M γ : M N ] → ∞ However, as R 2 γ → 1, (1 + g ) ( n − 1 − p γ ) / 2 [1 + g (1 − R 2 γ )] − ( n − 1) / 2 BF [ M γ : M N ] = (1 + g ) ( n − p γ − 1) / 2 → which is a constant!

  12. Introduction Zellner’s g priors Mixture of g priors Consistency Discussion Choices of g Unit information prior : g = n (BF behaves like BIC) Risk inflation criterion : g = p 2 (minimax perspective) Benchmark prior : g = max ( n , p 2 ) (BRIC) Local empirical Bayes : the MLE of p ( Y |M γ , g ) with the g EBL = max ( F γ − 1 , 0), where nonnegative constraint. ˆ γ R 2 γ / p γ F γ = γ ) / ( n − 1 − p γ ) . (1 − R 2 Global empirical Bayes : (1+ g ) ( n − 1 − p γ ) / 2 g EBL = argmax g > 0 � ˆ γ p ( M γ ) [1+ g (1 − R 2 γ )] ( n − 1) / 2

  13. Introduction Zellner’s g priors Mixture of g priors Consistency Discussion Choices of g and Information Paradox For fixed n and p , The Unit information prior , Risk inflation criterion and the Benchmark prior do not solve the information paradox The two EB approaches do have the desirable behavior Theorem 1 : In the setting of the information paradox with fixed n , p < n and R 2 γ → 1, for both global and local EB estimate of g , (1 + g ) ( n − 1 − p γ ) / 2 [1 + g (1 − R 2 γ )] − ( n − 1) / 2 BF [ M γ : M N ] = → ∞ Proof: by direct checking

  14. Introduction Zellner’s g priors Mixture of g priors Consistency Discussion Outline Introduction 1 Zellner’s g priors 2 Mixture of g priors 3 Consistency 4 Discussion 5

  15. Introduction Zellner’s g priors Mixture of g priors Consistency Discussion Desirable π ( g ) g ∼ π ( g ) The Bayes factor BF [ M γ : M N ] = � ∞ 0 (1 + g ) ( n − 1 − p γ ) / 2 [1 + g (1 − R 2 γ )] − ( n − 1) / 2 π ( g ) dg The posterior mean µ under M γ � = M N : g X γ ˆ α and ˆ � � E [ µ | µ γ , Y ] = 1 n ˆ α + E 1+ g |M γ , Y β γ , where ˆ β are g � least square estimates of α and β , and E 1+ g is regarded as a shrinkage factor The optimal Bayes estimate of µ under the squared error loss: g X γ ˆ � � E [ µ | Y ] = 1 n ˆ α + � γ : M γ � = M N p ( M γ | bY ) E 1+ g |M γ , Y β γ g appears everywhere: BF, posterior mean and prediction Want priors leading to tractable computation for these quantities, and consistent model selection and risk properties

  16. Introduction Zellner’s g priors Mixture of g priors Consistency Discussion Zellner-Siow Cauchy Priors Jeffreys (1961) rejected normal priors essentially for reasons related to BF paradoxes Cauchy prior is the simplest prior to satisfy basic consistency requirement for hypothesis testing The Zellner-Siow priors can be represented as a mixture of g priors with an Inv-Gamma(1 / 2 , n / 2): π ( g ) = ( n / 2) 1 / 2 Γ(1 / 2) g − 3 / 2 e − n / (2 g ) The corresponding integrals are are approximated by Laplace approximation As the model dimensionality increases, the accuracy of the approximation decreases

  17. Introduction Zellner’s g priors Mixture of g priors Consistency Discussion Hyper-g Priors (1) π ( g ) = a − 2 2 (1 + g ) − a / 2 , g > 0 Only consider the case a > 2 when π ( g ) is a proper prior g 1+ g ∼ Beta (1 , a This prior leads to the shrinkage factor 2 − 1) Value of a ≥ 4 tends to put more mass on shrinkage values near 0, which is undesirable, hence only consider 2 < a ≤ 4 g When a = 4, 1+ g has a uniform distribution When a = 3, most of the mass is near 1

  18. Introduction Zellner’s g priors Mixture of g priors Consistency Discussion Hyper-g Priors (2) Main advantage of hyper-g prior : leads to closed form of posterior distribution of g in terms of Gaussian hypergeometric function The posterior distribution of g: p γ + a − 2 p ( g | Y , M γ ) = 2 2 F 1 (( n − 1) / 2 , 1; ( p γ + a ) / 2; R 2 γ ) (1 + g ) ( n − 1 − p γ − a ) / 2 [1 + (1 − R 2 γ ) g ] − ( n − 1) / 2 × 2 F 1 ( a , b ; c ; z ) is convergent for real | z | < 1 with c > b > 0 and for z = ± 1 only if c > a + b and b > 0 To evaluate Gaussian hypergeometric function, numerical overflow is problematic for moderate to large n and large R 2 γ .

  19. Introduction Zellner’s g priors Mixture of g priors Consistency Discussion Hyper-g Priors (3) Gaussian hypergeometric function appears in many quantities of interest: 2 , 1; p γ + a p γ + a − 2 2 F 1 ( n − 1 a − 2 ; R 2 BF [ M γ : M N ] = γ ) 2 2 F 1 (( n − 1) / 2 , 2;( p γ + a ) / 2; R 2 γ ) 2 E [ g |M γ , Y ] = p γ + a − 4 2 F 1 (( n − 1) / 2 , 1;( p γ + a ) / 2; R 2 γ ) 2 F 1 (( n − 1) / 2 , 2;( p γ + a ) / 2+1; R 2 γ ) g 2 1+ g |M γ , Y ] = E [ p γ + a 2 F 1 (( n − 1) / 2 , 1;( p γ + a ) / 2; R 2 γ )

  20. Introduction Zellner’s g priors Mixture of g priors Consistency Discussion Outline Introduction 1 Zellner’s g priors 2 Mixture of g priors 3 Consistency 4 Discussion 5

  21. Introduction Zellner’s g priors Mixture of g priors Consistency Discussion Overview The following three aspects of consistency are considered: 1) the ”information paradox” where R 2 γ → 1 2) the asymptotic consistency of model posterior probabilities as n → ∞ 3) the asymptotic consistency for prediction The above are studied under the assumption of the true model

Recommend


More recommend