Binary probit regression with I-priors Haziq Jamil Supervisors: Dr. Wicher Bergsma & Prof. Irini Moustaki Social Statistics (Year 3) London School of Economics and Political Science 8 May 2017 PhD Presentation Event http://phd3.haziqj.ml
Outline 1 Introduction I-priors PhD Roadmap 2 Probit models with I-priors The latent variable motivation Using I-priors Estimation (and challenges) 3 Variational inference Introduction Mean-field factorisation Variational I-prior probit 4 Examples Cardiac arrhythmia data set Meta-analysis of smoking cessation 5 Summary Haziq Jamil - http://haziqj.ml I-prior probit 8 May 2017 0 / 24
Introduction Probit with I-priors Variational Examples Summary End The regression model • For i = 1 , . . . , n , consider the regression model y i = f ( x i ) + ǫ i (1) ( ǫ 1 , . . . , ǫ n ) ∼ N( 0 , Ψ − 1 ) where f ∈ F , y i ∈ R , and x i = ( x i 1 , . . . , x ip ) ∈ X . ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● y ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● x Haziq Jamil - http://haziqj.ml I-prior probit 8 May 2017 1 / 24
Introduction Probit with I-priors Variational Examples Summary End I-priors • Let F be a reproducing kernel Hilbert space (RKHS) with reproducing kernel h λ : X × X → R . An I-prior on f is � ⊤ ∼ N � � � f ( x 1 ) , . . . , f ( x n ) f 0 , I ( f ) , with f 0 a prior mean, and I the Fisher information for f , given by n n � � � � f ( x ) , f ( x ′ ) ψ kl h λ ( x , x k ) h λ ( x ′ , x l ) . I = k = 1 l = 1 • The I-prior regression model for i = 1 , . . . , n becomes n � y i = f 0 ( x i ) + h λ ( x i , x k ) w k + ǫ i k = 1 (2) ( w 1 , . . . , w n ) ∼ N( 0 , Ψ ) ( ǫ 1 , . . . , ǫ n ) ∼ N( 0 , Ψ − 1 ) . W. Bergsma (2017). “Regression with I-priors”. Manuscript in preparation Haziq Jamil - http://haziqj.ml I-prior probit 8 May 2017 2 / 24
Introduction Probit with I-priors Variational Examples Summary End I-priors (cont.) • Of interest is the posterior regression function characterised by the distribution p ( y | f ) p ( f ) p ( f | y ) = p ( y | f ) p ( f ) d f , � and also the posterior predictive distribution for new data points x new � p ( y new | y ) = p ( y new | y , f new ) p ( f new | y ) d f new with f new = f ( x new ) . • Estimation using EM algorithm or direct maximisation of the marginal likelihood log p ( y ) . • Complete Bayesian estimation also possible. HJ (2017a). iprior: Linear Regression using I-Priors . R Package version 0.6.4: CRAN Haziq Jamil - http://haziqj.ml I-prior probit 8 May 2017 3 / 24
Introduction Probit with I-priors Variational Examples Summary End Fractional Brownian motion (FBM) RKHS Prior y x Haziq Jamil - http://haziqj.ml I-prior probit 8 May 2017 4 / 24
Introduction Probit with I-priors Variational Examples Summary End Fractional Brownian motion (FBM) RKHS Posterior y x Haziq Jamil - http://haziqj.ml I-prior probit 8 May 2017 4 / 24
Introduction Probit with I-priors Variational Examples Summary End Fractional Brownian motion (FBM) RKHS Truth Posterior y x Haziq Jamil - http://haziqj.ml I-prior probit 8 May 2017 4 / 24
Introduction Probit with I-priors Variational Examples Summary End Posterior predictive distribution ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 95% credible interval ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● y ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● x Haziq Jamil - http://haziqj.ml I-prior probit 8 May 2017 5 / 24
Introduction Probit with I-priors Variational Examples Summary End Posterior predictive distribution Posterior predictive check density Observed Replications y Haziq Jamil - http://haziqj.ml I-prior probit 8 May 2017 5 / 24
Introduction Probit with I-priors Variational Examples Summary End PhD Roadmap Bayesian Variable Selection canonical (using I-priors in the ● (linear) ● FBM ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● canonical RKHS) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ✓ ✓ ✗ ✗ ✓ ● ● ● ● ● ● ● ● RKHS I-priors ● ● ● ● ● ● ● ● ● X 1 X 2 X 3 X 4 X 5 ● Unified methodology for Good performance in additive models - Pearson cases with - multilevel models multicollinearity models with functional covariates - Advantages Minimal assumptions • Binary probit models with I-priors • Straightforward inference Extension to binary responses • Performance competetive Estimation using variational inference ● R/iprior ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Estimation: ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● • Direct maximisation ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● • EM algorithm ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● inference / fitted • MCMC (Gibbs/HMC) classification probabilities Haziq Jamil - http://haziqj.ml I-prior probit 8 May 2017 6 / 24
1 Introduction 2 Probit models with I-priors 3 Variational inference 4 Examples 5 Summary Haziq Jamil - http://haziqj.ml I-prior probit 8 May 2017 6 / 24
Introduction Probit with I-priors Variational Examples Summary End The latent variable motivation • Consider binary responses y 1 , . . . , y n together with their corresponding covariates x 1 , . . . , x n . • For i = 1 , . . . , n , model the responses as y i ∼ Bern( p i ) . • Assume that there exists continuous, underlying latent variables y ∗ 1 , . . . , y ∗ n , such that � if y ∗ 1 i ≥ 0 y i = if y ∗ 0 i < 0 . • Model these continuous latent variables according to y ∗ i = f ( x i ) + ǫ i where ( ǫ 1 , . . . , ǫ n ) ∼ N( 0 , Ψ − 1 ) and f ∈ F (some RKHS). Haziq Jamil - http://haziqj.ml I-prior probit 8 May 2017 7 / 24
Introduction Probit with I-priors Variational Examples Summary End Using I-priors • Assume an I-prior on f . Then, α n � �� � � f ( x i ) = f 0 ( x i ) + h λ ( x i , x k ) w k k = 1 ( w 1 , . . . , w n ) ∼ N( 0 , Ψ ) . • For now, consider iid errors Ψ = ψ I n . In this case, p i = P[ y i = 1 ] = P[ y ∗ i ≥ 0 ] = P[ ǫ i ≤ f ( x i )] � � ψ 1 / 2 ( α + � n = Φ k = 1 h λ ( x i , x k ) w k ) where Φ is the CDF of a standard normal. • No loss of generality compared with using an arbitrary threshold τ or error precision ψ . Thus, set ψ = 1. Haziq Jamil - http://haziqj.ml I-prior probit 8 May 2017 8 / 24
Recommend
More recommend