P ROJECTION P REDICTIVE M ODEL S ELECTION F OR G AUSSIAN P ROCESSES Juho Piironen, Aki Vehtari Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, Finland juho.piironen@aalto.fi, aki.vehtari@aalto.fi Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari
Contents I Introduction I Automatic relevance determination (ARD) I Projection predictive method I Examples I Summary Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari
Introduction I Model target y with several input variables x I Only some of the inputs x relevant I Bayesian approach: use a relevant prior and integrate over all uncertainties Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari
Introduction I Model target y with several input variables x I Only some of the inputs x relevant I Bayesian approach: use a relevant prior and integrate over all uncertainties I Radford Neal won the NIPS 2003 feature selection competition using Bayesian methods with all the features (500 – 100 000) Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari
Introduction I Model target y with several input variables x I Only some of the inputs x relevant I Bayesian approach: use a relevant prior and integrate over all uncertainties I Radford Neal won the NIPS 2003 feature selection competition using Bayesian methods with all the features (500 – 100 000) I Sometimes we want to select a minimal subset from x with a good predictive performance I improved model interpretability I reduced measurement costs in the future I reduced prediction time Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari
Gaussian process (GP) regression I GP-prior 0 , k ( x , x 0 ) � � f ( x ) ⇠ GP I Observation model ⇣ ⌘ y | f , � 2 I y | f ⇠ N I Predictive distribution f ⇤ | y ⇠ N ( f ⇤ | µ ⇤ , Σ ⇤ ) , µ ⇤ = K ⇤ ( K + � 2 I ) � 1 y Σ ⇤ = K ⇤⇤ � K ⇤ ( K + � 2 I ) � 1 K T ⇤ . Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari
“Automatic relevance determination” I Squared exponential (SE) or exponentiated quadratic covariance function 0 1 D j ) 2 ( x j � x 0 @ � 1 k SE ( x , x 0 ) = � 2 X A . f exp ` 2 2 j j = 1 I Use of separate length-scales ` j for each input referred to as automatic relevance determination (ARD) I Idea: Optimizing marginal likelihood will yield large values ` j for irrelevant inputs I Problem: Large length-scale may simply mean linearity w.r.t. the input (not irrelevance) Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari
Toy example f 1 ( x 1 ) f 2 ( x 2 ) f 3 ( x 3 ) f 4 ( x 4 ) 2 f ( x ) = f 1 ( x 1 ) + · · · + f 8 ( x 8 ) , 1 0 − 1 ⇣ f , 0 . 3 2 ⌘ − 2 y ⇠ N , − 1 0 1 − 1 0 1 − 1 0 1 − 1 0 1 f 5 ( x 5 ) f 6 ( x 6 ) f 7 ( x 7 ) f 8 ( x 8 ) 2 � � f j = 1 for all j . Var 1 0 ) All inputs equally relevant − 1 − 2 − 1 0 1 − 1 0 1 − 1 0 1 − 1 0 1 1 True relevance 0 . 5 0 2 4 6 8 Input Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari
Toy example f 1 ( x 1 ) f 2 ( x 2 ) f 3 ( x 3 ) f 4 ( x 4 ) 2 f ( x ) = f 1 ( x 1 ) + · · · + f 8 ( x 8 ) , 1 0 − 1 ⇣ f , 0 . 3 2 ⌘ − 2 y ⇠ N , − 1 0 1 − 1 0 1 − 1 0 1 − 1 0 1 f 5 ( x 5 ) f 6 ( x 6 ) f 7 ( x 7 ) f 8 ( x 8 ) 2 � � f j = 1 for all j . Var 1 0 ) All inputs equally relevant − 1 − 2 − 1 0 1 − 1 0 1 − 1 0 1 − 1 0 1 1 True relevance Optimized ARD-values, 0 . 5 ARD-value ARD ( j ) = 1 / ` j (averaged over 100 data realizations, n = 200) 0 2 4 6 8 Input Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari
How about estimating the predictive performance? I Cross-validation gives an (almost) unbiased estimate of the predictive performance I Fast LOO-CV approximations in Vehtari, Mononen, Tolvanen, Sivula, and Winther (2017). Bayesian leave-one-out cross-validation approximations for Gaussian latent variable models. JMLR 17(103):1-38. Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari
How about estimating the predictive performance? I Cross-validation gives an (almost) unbiased estimate of the predictive performance I Fast LOO-CV approximations in Vehtari, Mononen, Tolvanen, Sivula, and Winther (2017). Bayesian leave-one-out cross-validation approximations for Gaussian latent variable models. JMLR 17(103):1-38. I But... Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari
Selection induced bias in variable selection I Even if the model performance estimate is unbiased (like LOO-CV), but it’s noisy (like LOO-CV), then using it for model selection introduces additional fitting to the data Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari
Selection induced bias in variable selection I Even if the model performance estimate is unbiased (like LOO-CV), but it’s noisy (like LOO-CV), then using it for model selection introduces additional fitting to the data I Performance of the selection process itself can be assessed using two level cross-validation, but it does not help choosing better models Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari
Selection induced bias in variable selection I Even if the model performance estimate is unbiased (like LOO-CV), but it’s noisy (like LOO-CV), then using it for model selection introduces additional fitting to the data I Performance of the selection process itself can be assessed using two level cross-validation, but it does not help choosing better models I Bigger problem if there is a large number of models as in covariate selection I Juho Piironen and Aki Vehtari (2017). Comparison of Bayesian predictive methods for model selection. Statistics and Computing , 27(3):711-735. doi:10.1007/s11222-016-9649-y. arXiv preprint arXiv:1503.08650. Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari
Selection induced bias in variable selection n = 20 n = 50 n = 100 − 0.5 − 1.4 − 1.5 − 1.5 − 1.8 − 2.4 − 2.5 − 3.5 − 3.3 − 2.2 0 25 50 0 25 50 0 25 50 Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari
Selection induced bias in variable selection n = 100 n = 200 n = 400 0 . 3 0 . 3 0 . 3 0 0 0 CV-10 − 0 . 3 − 0 . 3 − 0 . 3 − 0 . 6 − 0 . 6 − 0 . 6 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 . 3 0 . 3 0 . 3 0 0 0 WAIC − 0 . 3 − 0 . 3 − 0 . 3 − 0 . 6 − 0 . 6 − 0 . 6 0 100 0 100 0 100 25 50 75 25 50 75 25 50 75 0 . 3 0 . 3 0 . 3 0 0 0 DIC − 0 . 3 − 0 . 3 − 0 . 3 − 0 . 6 − 0 . 6 − 0 . 6 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 . 3 0 . 3 0 . 3 0 0 0 MPP − 0 . 3 − 0 . 3 − 0 . 3 − 0 . 6 − 0 . 6 − 0 . 6 0 100 0 100 0 100 25 50 75 25 50 75 25 50 75 0 . 3 0 . 3 0 . 3 0 0 0 BMA-ref − 0 . 3 − 0 . 3 − 0 . 3 − 0 . 6 − 0 . 6 − 0 . 6 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 . 3 0 . 3 0 . 3 0 0 0 BMA-proj − 0 . 3 − 0 . 3 − 0 . 3 Piironen & Vehtari (2017) − 0 . 6 − 0 . 6 − 0 . 6 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari
Selection induced bias in variable selection n = 100 n = 200 n = 400 0 0 0 CV-10 − 0 . 3 − 0 . 3 − 0 . 3 − 0 . 6 − 0 . 6 − 0 . 6 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 0 0 WAIC − 0 . 3 − 0 . 3 − 0 . 3 − 0 . 6 − 0 . 6 − 0 . 6 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 0 0 DIC − 0 . 3 − 0 . 3 − 0 . 3 − 0 . 6 − 0 . 6 − 0 . 6 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 0 0 MPP − 0 . 3 − 0 . 3 − 0 . 3 − 0 . 6 − 0 . 6 − 0 . 6 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 0 0 BMA-ref − 0 . 3 − 0 . 3 − 0 . 3 − 0 . 6 − 0 . 6 − 0 . 6 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 0 0 0 BMA-proj − 0 . 3 − 0 . 3 − 0 . 3 Piironen & Vehtari (2017) − 0 . 6 − 0 . 6 − 0 . 6 0 25 50 75 100 0 25 50 75 100 0 25 50 75 100 Projection Predictive Model Selection for Gaussian Processes Piironen, Vehtari
Recommend
More recommend