high dimensional variable selection in nonlinear models
play

High-Dimensional Variable Selection in Nonlinear Models that - PowerPoint PPT Presentation

High-Dimensional Variable Selection in Nonlinear Models that Controls the False Discovery Rate Lucas Janson Harvard University Department of Statistics blank line blank line CMSA Big Data Conference, August 18, 2017 Collaborators : Emmanuel


  1. Knockoffs (Barber and Cand` es, 2015) y and X j are n × 1 column vectors of data: n draws from the random variables Y and X j , respectively; design matrix X := [ X 1 · · · X p ] (1) Construct knockoffs : Knockoffs ˜ X j must satisfy, ( ˜ X := [ ˜ X 1 · · · ˜ X p ] ) � � X ⊤ X X ⊤ X − diag { s } [ X ˜ X ] ⊤ [ X ˜ X ] = X ⊤ X − diag { s } X ⊤ X (2) Compute knockoff statistics : Sufficiency: W j only a function of [ X ˜ X ] ⊤ [ X ˜ X ] and [ X ˜ X ] ⊤ y Antisymmetry: swapping values of X j and ˜ X j flips sign of W j Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 5 / 18

  2. Knockoffs (Barber and Cand` es, 2015) y and X j are n × 1 column vectors of data: n draws from the random variables Y and X j , respectively; design matrix X := [ X 1 · · · X p ] (1) Construct knockoffs : Knockoffs ˜ X j must satisfy, ( ˜ X := [ ˜ X 1 · · · ˜ X p ] ) � � X ⊤ X X ⊤ X − diag { s } [ X ˜ X ] ⊤ [ X ˜ X ] = X ⊤ X − diag { s } X ⊤ X (2) Compute knockoff statistics : Sufficiency: W j only a function of [ X ˜ X ] ⊤ [ X ˜ X ] and [ X ˜ X ] ⊤ y Antisymmetry: swapping values of X j and ˜ X j flips sign of W j (3) Find the knockoff threshold : Order the variables by decreasing | W j | and proceed down list Select only variables with positive W j until last time negatives positives ≤ q Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 5 / 18

  3. Knockoffs (Barber and Cand` es, 2015) y and X j are n × 1 column vectors of data: n draws from the random variables Y and X j , respectively; design matrix X := [ X 1 · · · X p ] (1) Construct knockoffs : Knockoffs ˜ X j must satisfy, ( ˜ X := [ ˜ X 1 · · · ˜ X p ] ) � � X ⊤ X X ⊤ X − diag { s } [ X ˜ X ] ⊤ [ X ˜ X ] = X ⊤ X − diag { s } X ⊤ X (2) Compute knockoff statistics : Sufficiency: W j only a function of [ X ˜ X ] ⊤ [ X ˜ X ] and [ X ˜ X ] ⊤ y Antisymmetry: swapping values of X j and ˜ X j flips sign of W j (3) Find the knockoff threshold : Order the variables by decreasing | W j | and proceed down list Select only variables with positive W j until last time negatives positives ≤ q Comments: Finite-sample FDR control and leverages sparsity for power Requires data follow low-dimensional ( n ≥ p ) Gaussian linear model Canonical approach: condition on X , rely heavily on model for y Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 5 / 18

  4. Generalizing the Knockoffs Procedure (1) Construct knockoffs : Artificial versions (“knockoffs”) of each variable Act as controls for assessing importance of original variables Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 6 / 18

  5. Generalizing the Knockoffs Procedure (1) Construct knockoffs : Artificial versions (“knockoffs”) of each variable Act as controls for assessing importance of original variables (2) Compute knockoff statistics : Scalar statistic W j for each variable Measures how much more important a variable appears than its knockoff Positive W j denotes original more important, strength measured by magnitude Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 6 / 18

  6. Generalizing the Knockoffs Procedure (1) Construct knockoffs : Artificial versions (“knockoffs”) of each variable Act as controls for assessing importance of original variables (2) Compute knockoff statistics : Scalar statistic W j for each variable Measures how much more important a variable appears than its knockoff Positive W j denotes original more important, strength measured by magnitude (3) Find the knockoff threshold : (same as before) Order the variables by decreasing | W j | and proceed down list Select only variables with positive W j until last time negatives positives ≤ q Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 6 / 18

  7. Generalizing the Knockoffs Procedure (1) Construct knockoffs : Artificial versions (“knockoffs”) of each variable Act as controls for assessing importance of original variables (2) Compute knockoff statistics : Scalar statistic W j for each variable Measures how much more important a variable appears than its knockoff Positive W j denotes original more important, strength measured by magnitude (3) Find the knockoff threshold : (same as before) Order the variables by decreasing | W j | and proceed down list Select only variables with positive W j until last time negatives positives ≤ q Coin-flipping property : The key to knockoffs is that steps (1) and (2) are done specifically to ensure that, conditional on | W 1 | , . . . , | W p | , the signs of the unimportant/null W j are independently ± 1 with probability 1 / 2 Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 6 / 18

  8. New Interpretation of Knockoffs Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 6 / 18

  9. Knockoffs Without a Model for Y (Cand` es et al., 2016) Instead of modeling y and conditioning on X , condition on y and model X (shifts the burden of knowledge from y onto X ) Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 7 / 18

  10. Knockoffs Without a Model for Y (Cand` es et al., 2016) Instead of modeling y and conditioning on X , condition on y and model X (shifts the burden of knowledge from y onto X ) Explicitly, iid rows of X = ( X i, 1 , . . . , X i,p ) ∼ G where G can be arbitrary but is assumed known Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 7 / 18

  11. Knockoffs Without a Model for Y (Cand` es et al., 2016) Instead of modeling y and conditioning on X , condition on y and model X (shifts the burden of knowledge from y onto X ) Explicitly, iid rows of X = ( X i, 1 , . . . , X i,p ) ∼ G where G can be arbitrary but is assumed known As compared to original knockoffs, removes Restriction on dimension Linear model requirement for Y | X 1 , . . . , X p “Sufficiency” constraint for W j Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 7 / 18

  12. Knockoffs Without a Model for Y (Cand` es et al., 2016) Instead of modeling y and conditioning on X , condition on y and model X (shifts the burden of knowledge from y onto X ) Explicitly, iid rows of X = ( X i, 1 , . . . , X i,p ) ∼ G where G can be arbitrary but is assumed known As compared to original knockoffs, removes Restriction on dimension Linear model requirement for Y | X 1 , . . . , X p “Sufficiency” constraint for W j The rows of X must be i.i.d., not the columns (covariates) Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 7 / 18

  13. Knockoffs Without a Model for Y (Cand` es et al., 2016) Instead of modeling y and conditioning on X , condition on y and model X (shifts the burden of knowledge from y onto X ) Explicitly, iid rows of X = ( X i, 1 , . . . , X i,p ) ∼ G where G can be arbitrary but is assumed known As compared to original knockoffs, removes Restriction on dimension Linear model requirement for Y | X 1 , . . . , X p “Sufficiency” constraint for W j The rows of X must be i.i.d., not the columns (covariates) Nothing about y ’s distribution is assumed or need be known Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 7 / 18

  14. Knockoffs Without a Model for Y (Cand` es et al., 2016) Instead of modeling y and conditioning on X , condition on y and model X (shifts the burden of knowledge from y onto X ) Explicitly, iid rows of X = ( X i, 1 , . . . , X i,p ) ∼ G where G can be arbitrary but is assumed known As compared to original knockoffs, removes Restriction on dimension Linear model requirement for Y | X 1 , . . . , X p “Sufficiency” constraint for W j The rows of X must be i.i.d., not the columns (covariates) Nothing about y ’s distribution is assumed or need be known Robust to overfitting X ’s distribution in preliminary experiments Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 7 / 18

  15. Robustness 1.00 1.00 Exact Cov ● 0.75 0.75 Power FDR 0.50 0.50 0.25 0.25 ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3 . n = 800 , p = 1500 , and target FDR is 10%. Y comes from a binomial linear model with logit link function with 50 nonzero entries. Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 8 / 18

  16. Robustness 1.00 1.00 Exact Cov ● 0.75 ● 0.75 Graph. Lasso Power FDR 0.50 0.50 0.25 0.25 ● ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3 . n = 800 , p = 1500 , and target FDR is 10%. Y comes from a binomial linear model with logit link function with 50 nonzero entries. Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 8 / 18

  17. Robustness 1.00 1.00 Exact Cov 50% Emp. Cov ● 0.75 ● 0.75 Graph. Lasso ● Power FDR 0.50 0.50 0.25 0.25 ● ● ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3 . n = 800 , p = 1500 , and target FDR is 10%. Y comes from a binomial linear model with logit link function with 50 nonzero entries. Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 8 / 18

  18. Robustness 1.00 1.00 Exact Cov 50% Emp. Cov ● 0.75 ● 0.75 Graph. Lasso ● 62.5% Emp. Cov ● Power FDR 0.50 0.50 0.25 0.25 ● ● ● ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3 . n = 800 , p = 1500 , and target FDR is 10%. Y comes from a binomial linear model with logit link function with 50 nonzero entries. Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 8 / 18

  19. Robustness 1.00 1.00 Exact Cov 50% Emp. Cov ● 0.75 ● 0.75 Graph. Lasso ● 62.5% Emp. Cov ● ● 75% Emp. Cov Power FDR 0.50 0.50 0.25 0.25 ● ● ● ● ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3 . n = 800 , p = 1500 , and target FDR is 10%. Y comes from a binomial linear model with logit link function with 50 nonzero entries. Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 8 / 18

  20. Robustness 1.00 1.00 Exact Cov 50% Emp. Cov ● 0.75 ● 0.75 Graph. Lasso ● 62.5% Emp. Cov ● ● 75% Emp. Cov Power FDR 0.50 0.50 ● 87.5% Emp. Cov 0.25 0.25 ● ● ● ● ● ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3 . n = 800 , p = 1500 , and target FDR is 10%. Y comes from a binomial linear model with logit link function with 50 nonzero entries. Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 8 / 18

  21. Robustness 1.00 1.00 Exact Cov 50% Emp. Cov ● 0.75 ● 0.75 Graph. Lasso ● 62.5% Emp. Cov ● ● 75% Emp. Cov Power FDR 0.50 0.50 ● 87.5% Emp. Cov 0.25 0.25 ● ● ● ● ● ● 100% Emp. Cov 0.00 0.00 ● ● 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3 . n = 800 , p = 1500 , and target FDR is 10%. Y comes from a binomial linear model with logit link function with 50 nonzero entries. Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 8 / 18

  22. Shifting the Burden of Knowledge When is it appropriate? 1. Subjects sampled from a population, and 2a. X j highly structured, well-studied, or well-understood, OR Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 9 / 18

  23. Shifting the Burden of Knowledge When is it appropriate? 1. Subjects sampled from a population, and 2a. X j highly structured, well-studied, or well-understood, OR 2b. Large set of unsupervised X data (without Y ’s) Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 9 / 18

  24. Shifting the Burden of Knowledge When is it appropriate? 1. Subjects sampled from a population, and 2a. X j highly structured, well-studied, or well-understood, OR 2b. Large set of unsupervised X data (without Y ’s) For instance, many genome-wide association studies satisfy all conditions: 1. Subjects sampled from a population (oversampling cases still valid) Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 9 / 18

  25. Shifting the Burden of Knowledge When is it appropriate? 1. Subjects sampled from a population, and 2a. X j highly structured, well-studied, or well-understood, OR 2b. Large set of unsupervised X data (without Y ’s) For instance, many genome-wide association studies satisfy all conditions: 1. Subjects sampled from a population (oversampling cases still valid) 2a. Strong spatial structure: linkage disequilibrium models, e.g., Markov chains, are well-studied and work well Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 9 / 18

  26. Shifting the Burden of Knowledge When is it appropriate? 1. Subjects sampled from a population, and 2a. X j highly structured, well-studied, or well-understood, OR 2b. Large set of unsupervised X data (without Y ’s) For instance, many genome-wide association studies satisfy all conditions: 1. Subjects sampled from a population (oversampling cases still valid) 2a. Strong spatial structure: linkage disequilibrium models, e.g., Markov chains, are well-studied and work well 2b. Other studies have collected same or similar SNP arrays on different subjects Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 9 / 18

  27. The New Knockoffs Procedure (1) Construct knockoffs : Exchangeability D [ X 1 ··· X j ··· X p ˜ X 1 ··· ˜ X j ··· ˜ = [ X 1 ··· ˜ X j ··· X p ˜ X 1 ··· X j ··· ˜ X p ] X p ] Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 10 / 18

  28. The New Knockoffs Procedure (1) Construct knockoffs : Exchangeability D [ X 1 ··· X j ··· X p ˜ X 1 ··· ˜ X j ··· ˜ = [ X 1 ··· ˜ X j ··· X p ˜ X 1 ··· X j ··· ˜ X p ] X p ] (2) Compute knockoff statistics : Variable importance measure Z Antisymmetric function f j : R 2 → R , i.e., f j ( z 1 , z 2 ) = − f j ( z 2 , z 1 ) W j = f j ( Z j , � Z j ) , where Z j and � Z j are the variable importances of X j and ˜ X j , respectively Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 10 / 18

  29. The New Knockoffs Procedure (1) Construct knockoffs : Exchangeability D [ X 1 ··· X j ··· X p ˜ X 1 ··· ˜ X j ··· ˜ = [ X 1 ··· ˜ X j ··· X p ˜ X 1 ··· X j ··· ˜ X p ] X p ] (2) Compute knockoff statistics : Variable importance measure Z Antisymmetric function f j : R 2 → R , i.e., f j ( z 1 , z 2 ) = − f j ( z 2 , z 1 ) W j = f j ( Z j , � Z j ) , where Z j and � Z j are the variable importances of X j and ˜ X j , respectively (3) Find the knockoff threshold : (same as before) Order the variables by decreasing | W j | and proceed down list Select only variables with positive W j until last time negatives positives ≤ q Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 10 / 18

  30. Step (1): Construct Knockoffs Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 10 / 18

  31. Knockoff Construction Proof that valid knockoff variables can be generated for any X distribution Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 11 / 18

  32. Knockoff Construction Proof that valid knockoff variables can be generated for any X distribution If ( X 1 , . . . , X p ) multivariate Gaussian, exchangeability reduces to matching first and second moments when X j , ˜ X j swapped For Cov( X 1 , . . . , X p ) = Σ : � � Σ Σ − diag { s } Cov( X 1 , . . . , X p , ˜ X 1 , . . . , ˜ X p ) = Σ − diag { s } Σ For non-Gaussian X , still second-order-correct approximate knockoffs Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 11 / 18

  33. Knockoff Construction Proof that valid knockoff variables can be generated for any X distribution If ( X 1 , . . . , X p ) multivariate Gaussian, exchangeability reduces to matching first and second moments when X j , ˜ X j swapped For Cov( X 1 , . . . , X p ) = Σ : � � Σ Σ − diag { s } Cov( X 1 , . . . , X p , ˜ X 1 , . . . , ˜ X p ) = Σ − diag { s } Σ For non-Gaussian X , still second-order-correct approximate knockoffs Linear algebra and semidefinite programming to find good s Recently: construction for Markov chains and HMMs (Sesia et al., 2017) Constructions also possible for grouped variables (Dai and Barber, 2016) Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 11 / 18

  34. Step (2): Compute Knockoff Statistics Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 11 / 18

  35. Strategy for Choosing Knockoff Statistics Recall W j an antisymmetric function f j of Z j and � Z j (the variable importances of X j and ˜ X j , respectively): W j = f j ( Z j , � Z j ) = − f j ( � Z j , Z j ) Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 12 / 18

  36. Strategy for Choosing Knockoff Statistics Recall W j an antisymmetric function f j of Z j and � Z j (the variable importances of X j and ˜ X j , respectively): W j = f j ( Z j , � Z j ) = − f j ( � Z j , Z j ) For example, Z is magnitude of fitted coefficient β from a lasso regression of y on [ X ˜ X ] f j ( z 1 , z 2 ) = z 1 − z 2 Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 12 / 18

  37. Strategy for Choosing Knockoff Statistics Recall W j an antisymmetric function f j of Z j and � Z j (the variable importances of X j and ˜ X j , respectively): W j = f j ( Z j , � Z j ) = − f j ( � Z j , Z j ) For example, Z is magnitude of fitted coefficient β from a lasso regression of y on [ X ˜ X ] f j ( z 1 , z 2 ) = z 1 − z 2 Lasso Coefficient Difference (LCD) statistic: W j = | β j | − | ˜ β j | Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 12 / 18

  38. Exchangeability Endows Coin-Flipping Recall exchangeability property: for any j , [ X 1 ··· X j ··· X p ˜ X 1 ··· ˜ X j ··· ˜ X p ] D = [ X 1 ··· ˜ X j ··· X p ˜ X 1 ··· X j ··· ˜ X p ] Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 13 / 18

  39. Exchangeability Endows Coin-Flipping Recall exchangeability property: for any j , [ X 1 ··· X j ··· X p ˜ X 1 ··· ˜ X j ··· ˜ X p ] D = [ X 1 ··· ˜ X j ··· X p ˜ X 1 ··· X j ··· ˜ X p ] Coin-flipping property for W j : Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 13 / 18

  40. Exchangeability Endows Coin-Flipping Recall exchangeability property: for any j , [ X 1 ··· X j ··· X p ˜ X 1 ··· ˜ X j ··· ˜ X p ] D = [ X 1 ··· ˜ X j ··· X p ˜ X 1 ··· X j ··· ˜ X p ] Coin-flipping property for W j : for any unimportant variable j , � � � � � �� � � �� � · · · X j · · · ˜ · · · X j · · · ˜ Z j , � � Z j := Z j y , X j · · · , Z j y , X j · · · Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 13 / 18

  41. Exchangeability Endows Coin-Flipping Recall exchangeability property: for any j , [ X 1 ··· X j ··· X p ˜ X 1 ··· ˜ X j ··· ˜ X p ] D = [ X 1 ··· ˜ X j ··· X p ˜ X 1 ··· X j ··· ˜ X p ] Coin-flipping property for W j : for any unimportant variable j , � � � � � �� � � �� � · · · X j · · · ˜ · · · X j · · · ˜ Z j , � � Z j := Z j y , X j · · · , Z j y , X j · · · � � � �� � � �� � D · · · ˜ · · · ˜ � = Z j y , X j · · · X j · · · , Z j y , X j · · · X j · · · Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 13 / 18

  42. Exchangeability Endows Coin-Flipping Recall exchangeability property: for any j , [ X 1 ··· X j ··· X p ˜ X 1 ··· ˜ X j ··· ˜ X p ] D = [ X 1 ··· ˜ X j ··· X p ˜ X 1 ··· X j ··· ˜ X p ] Coin-flipping property for W j : for any unimportant variable j , � � � � � �� � � �� � · · · X j · · · ˜ · · · X j · · · ˜ Z j , � � Z j := Z j y , X j · · · , Z j y , X j · · · � � � �� � � �� � D · · · ˜ · · · ˜ � = Z j y , X j · · · X j · · · , Z j y , X j · · · X j · · · � � � �� � � �� � · · · X j · · · ˜ · · · X j · · · ˜ � = Z j y , X j · · · , Z j y , X j · · · Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 13 / 18

  43. Exchangeability Endows Coin-Flipping Recall exchangeability property: for any j , [ X 1 ··· X j ··· X p ˜ X 1 ··· ˜ X j ··· ˜ X p ] D = [ X 1 ··· ˜ X j ··· X p ˜ X 1 ··· X j ··· ˜ X p ] Coin-flipping property for W j : for any unimportant variable j , � � � � � �� � � �� � · · · X j · · · ˜ · · · X j · · · ˜ Z j , � � Z j := Z j y , X j · · · , Z j y , X j · · · � � � �� � � �� � D · · · ˜ · · · ˜ � = Z j y , X j · · · X j · · · , Z j y , X j · · · X j · · · � � � �� � � �� � · · · X j · · · ˜ · · · X j · · · ˜ � = Z j y , X j · · · , Z j y , X j · · · � � � = Z j , Z j Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 13 / 18

  44. Exchangeability Endows Coin-Flipping Recall exchangeability property: for any j , [ X 1 ··· X j ··· X p ˜ X 1 ··· ˜ X j ··· ˜ X p ] D = [ X 1 ··· ˜ X j ··· X p ˜ X 1 ··· X j ··· ˜ X p ] Coin-flipping property for W j : for any unimportant variable j , � � � � � �� � � �� � · · · X j · · · ˜ · · · X j · · · ˜ Z j , � � Z j := Z j y , X j · · · , Z j y , X j · · · � � � �� � � �� � D · · · ˜ · · · ˜ � = Z j y , X j · · · X j · · · , Z j y , X j · · · X j · · · � � � �� � � �� � · · · X j · · · ˜ · · · X j · · · ˜ � = Z j y , X j · · · , Z j y , X j · · · � � � = Z j , Z j D W j = f j ( Z j , � = f j ( � Z j ) Z j , Z j ) Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 13 / 18

  45. Exchangeability Endows Coin-Flipping Recall exchangeability property: for any j , [ X 1 ··· X j ··· X p ˜ X 1 ··· ˜ X j ··· ˜ X p ] D = [ X 1 ··· ˜ X j ··· X p ˜ X 1 ··· X j ··· ˜ X p ] Coin-flipping property for W j : for any unimportant variable j , � � � � � �� � � �� � · · · X j · · · ˜ · · · X j · · · ˜ Z j , � � Z j := Z j y , X j · · · , Z j y , X j · · · � � � �� � � �� � D · · · ˜ · · · ˜ � = Z j y , X j · · · X j · · · , Z j y , X j · · · X j · · · � � � �� � � �� � · · · X j · · · ˜ · · · X j · · · ˜ � = Z j y , X j · · · , Z j y , X j · · · � � � = Z j , Z j D W j = f j ( Z j , � = f j ( � Z j , Z j ) = − f j ( Z j , � Z j ) Z j ) = − W j Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 13 / 18

  46. Exchangeability Endows Coin-Flipping Recall exchangeability property: for any j , [ X 1 ··· X j ··· X p ˜ X 1 ··· ˜ X j ··· ˜ X p ] D = [ X 1 ··· ˜ X j ··· X p ˜ X 1 ··· X j ··· ˜ X p ] Coin-flipping property for W j : for any unimportant variable j , � � � � � �� � � �� � · · · X j · · · ˜ · · · X j · · · ˜ Z j , � � Z j := Z j y , X j · · · , Z j y , X j · · · � � � �� � � �� � D · · · ˜ · · · ˜ � = Z j y , X j · · · X j · · · , Z j y , X j · · · X j · · · � � � �� � � �� � · · · X j · · · ˜ · · · X j · · · ˜ � = Z j y , X j · · · , Z j y , X j · · · � � � = Z j , Z j D W j = − W j Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 13 / 18

  47. Adaptivity and Prior Information in W j Recall LCD: W j = | β j | − | ˜ β j | , where β j , ˜ β j come from ℓ 1 -penalized regression Adaptivity Cross-validation (on [ X ˜ X ] ) to choose the penalty parameter in LCD Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 14 / 18

  48. Adaptivity and Prior Information in W j Recall LCD: W j = | β j | − | ˜ β j | , where β j , ˜ β j come from ℓ 1 -penalized regression Adaptivity Cross-validation (on [ X ˜ X ] ) to choose the penalty parameter in LCD Higher-level adaptivity: CV to choose best-fitting model for inference Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 14 / 18

  49. Adaptivity and Prior Information in W j Recall LCD: W j = | β j | − | ˜ β j | , where β j , ˜ β j come from ℓ 1 -penalized regression Adaptivity Cross-validation (on [ X ˜ X ] ) to choose the penalty parameter in LCD Higher-level adaptivity: CV to choose best-fitting model for inference − E.g., fit random forest and ℓ 1 -penalized regression; derive feature importance from whichever has lower CV error— still strict FDR control Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 14 / 18

  50. Adaptivity and Prior Information in W j Recall LCD: W j = | β j | − | ˜ β j | , where β j , ˜ β j come from ℓ 1 -penalized regression Adaptivity Cross-validation (on [ X ˜ X ] ) to choose the penalty parameter in LCD Higher-level adaptivity: CV to choose best-fitting model for inference − E.g., fit random forest and ℓ 1 -penalized regression; derive feature importance from whichever has lower CV error— still strict FDR control Can even let analyst look at (masked version of) data to choose Z function Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 14 / 18

  51. Adaptivity and Prior Information in W j Recall LCD: W j = | β j | − | ˜ β j | , where β j , ˜ β j come from ℓ 1 -penalized regression Adaptivity Cross-validation (on [ X ˜ X ] ) to choose the penalty parameter in LCD Higher-level adaptivity: CV to choose best-fitting model for inference − E.g., fit random forest and ℓ 1 -penalized regression; derive feature importance from whichever has lower CV error— still strict FDR control Can even let analyst look at (masked version of) data to choose Z function Prior information Bayesian approach : choose prior and model, and Z j could be the posterior probability that X j contributes to the model Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 14 / 18

  52. Adaptivity and Prior Information in W j Recall LCD: W j = | β j | − | ˜ β j | , where β j , ˜ β j come from ℓ 1 -penalized regression Adaptivity Cross-validation (on [ X ˜ X ] ) to choose the penalty parameter in LCD Higher-level adaptivity: CV to choose best-fitting model for inference − E.g., fit random forest and ℓ 1 -penalized regression; derive feature importance from whichever has lower CV error— still strict FDR control Can even let analyst look at (masked version of) data to choose Z function Prior information Bayesian approach : choose prior and model, and Z j could be the posterior probability that X j contributes to the model Still strict FDR control, even if wrong prior or MCMC has not converged Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 14 / 18

  53. Step (3): Find the Knockoff Threshold Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 14 / 18

  54. Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

  55. Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : 0 Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

  56. Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : W 10 W 2 W 9 W 7 W 6 0 W 3 W 8 W 1 W 4 W 5 Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

  57. Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : | W 9 | | W 2 | | W 7 | | W 10 | | W 6 | 0 | W 8 | | W 3 | | W 1 | | W 4 | | W 5 | Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

  58. Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : | W 9 | | W 2 | | W 7 | | W 10 | | W 6 | 0 | W 8 | | W 3 | | W 1 | | W 4 | | W 5 | # { negative W j } # { positive W j } q = 20% Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

  59. Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : | W 9 | | W 2 | | W 7 | | W 10 | | W 6 | 0 | W 8 | | W 3 | | W 1 | | W 4 | | W 5 | 0 1 # { negative W j } # { positive W j } q = 20% Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

  60. Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : | W 9 | | W 2 | | W 7 | | W 10 | | W 6 | 0 | W 8 | | W 3 | | W 1 | | W 4 | | W 5 | 0 0 2 1 # { negative W j } # { positive W j } q = 20% Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

  61. Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : | W 9 | | W 2 | | W 7 | | W 10 | | W 6 | 0 | W 8 | | W 3 | | W 1 | | W 4 | | W 5 | 0 0 0 3 2 1 # { negative W j } # { positive W j } q = 20% Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

  62. Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : | W 9 | | W 2 | | W 7 | | W 10 | | W 6 | 0 | W 8 | | W 3 | | W 1 | | W 4 | | W 5 | 1 0 0 0 3 3 2 1 # { negative W j } # { positive W j } q = 20% Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

  63. Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : | W 9 | | W 2 | | W 7 | | W 10 | | W 6 | 0 | W 8 | | W 3 | | W 1 | | W 4 | | W 5 | 1 1 0 0 0 4 3 3 2 1 # { negative W j } # { positive W j } q = 20% Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

  64. Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : | W 9 | | W 2 | | W 7 | | W 10 | | W 6 | 0 | W 8 | | W 3 | | W 1 | | W 4 | | W 5 | 1 1 1 0 0 0 5 4 3 3 2 1 # { negative W j } # { positive W j } q = 20% Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

  65. Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : | W 9 | | W 2 | | W 7 | | W 10 | | W 6 | 0 | W 8 | | W 3 | | W 1 | | W 4 | | W 5 | 2 1 1 1 0 0 0 5 5 4 3 3 2 1 # { negative W j } # { positive W j } q = 20% Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

  66. Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : | W 9 | | W 2 | | W 7 | | W 10 | | W 6 | 0 | W 8 | | W 3 | | W 1 | | W 4 | | W 5 | 3 2 1 1 1 0 0 0 5 5 5 4 3 3 2 1 # { negative W j } # { positive W j } q = 20% Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

  67. Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : | W 9 | | W 2 | | W 7 | | W 10 | | W 6 | 0 | W 8 | | W 3 | | W 1 | | W 4 | | W 5 | 3 3 2 1 1 1 0 0 0 6 5 5 5 4 3 3 2 1 # { negative W j } # { positive W j } q = 20% Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

  68. Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : | W 9 | | W 2 | | W 7 | | W 10 | | W 6 | 0 | W 8 | | W 3 | | W 1 | | W 4 | | W 5 | 3 3 3 2 1 1 1 0 0 0 7 6 5 5 5 4 3 3 2 1 # { negative W j } # { positive W j } q = 20% Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

  69. Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : | W 9 | | W 2 | | W 7 | | W 10 | | W 6 | 0 | W 8 | | W 3 | | W 1 | | W 4 | | W 5 | 3 3 3 2 1 1 1 0 0 0 7 6 5 5 5 4 3 3 2 1 # { negative W j } # { positive W j } q = 20% τ ˆ Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

  70. Find the Knockoff Threshold Example with p = 10 and q = 20% = 1 / 5 : | W 7 | | W 6 | 0 | W 1 | | W 4 | | W 5 | S = { 1 , 4 , 5 , 6 , 7 } # { negative W j } # { positive W j } q = 20% τ ˆ Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 15 / 18

  71. Intuition for FDR Control � � # { null X j selected } FDR = E # { total X j selected } Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 16 / 18

  72. Intuition for FDR Control � � # { null X j selected } FDR = E # { total X j selected } � � # { null positive | W j | > ˆ τ } = E # { positive | W j | > ˆ τ } Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 16 / 18

  73. Intuition for FDR Control � � # { null X j selected } FDR = E # { total X j selected } � � # { null positive | W j | > ˆ τ } = E # { positive | W j | > ˆ τ } � � # { null negative | W j | > ˆ τ } ≈ E # { positive | W j | > ˆ τ } Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 16 / 18

  74. Intuition for FDR Control � � # { null X j selected } FDR = E # { total X j selected } � � # { null positive | W j | > ˆ τ } = E # { positive | W j | > ˆ τ } � � # { null negative | W j | > ˆ τ } ≈ E # { positive | W j | > ˆ τ } q � � # { negative | W j | > ˆ τ } ≤ E # { positive | W j | > ˆ τ } ˆ τ Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 16 / 18

  75. GWAS Application Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 16 / 18

  76. Genetic Analysis of Crohn’s Disease 2007 case-control study by WTCCC n ≈ 5 , 000 , p ≈ 375 , 000 ; preprocessing mirrored original analysis Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 17 / 18

  77. Genetic Analysis of Crohn’s Disease 2007 case-control study by WTCCC n ≈ 5 , 000 , p ≈ 375 , 000 ; preprocessing mirrored original analysis Strong spatial structure : second-order knockoffs generated using genetic covariance estimate (Wen and Stephens, 2010) Lucas Janson (Harvard Statistics) Knockoffs for HD Controlled Variable Selection 17 / 18

Recommend


More recommend