what s happening in selective inference iii
play

Whats Happening in Selective Inference III? Emmanuel Cand` es, - PowerPoint PPT Presentation

Whats Happening in Selective Inference III? Emmanuel Cand` es, Stanford University The 2017 Wald Lectures, Joint Statistical Meetings, Baltimore, August 2017 Lecture 3: Special dedication Maryam Mirzakhani 19772017 Life is not supposed


  1. What’s Happening in Selective Inference III? Emmanuel Cand` es, Stanford University The 2017 Wald Lectures, Joint Statistical Meetings, Baltimore, August 2017

  2. Lecture 3: Special dedication Maryam Mirzakhani 1977–2017 “Life is not supposed to be easy”

  3. Knockoffs: Power Analysis Joint with A. Weinstein and R. Barber

  4. Knockoffs: wrapper around a black box Cam we analyze power?

  5. Case study y = Xβ + ǫ iid iid iid ∼ Π = (1 − ǫ ) δ 0 + ǫ Π ⋆ X ij ∼ N (0 , 1 /n ) ǫ i ∼ N (0 , 1) β j

  6. Case study y = Xβ + ǫ iid iid iid ∼ Π = (1 − ǫ ) δ 0 + ǫ Π ⋆ X ij ∼ N (0 , 1 /n ) ǫ i ∼ N (0 , 1) β j Feature importance Z j = sup { λ : | ˆ β j ( λ ) | � = 0 }

  7. Case study y = Xβ + ǫ iid iid iid ∼ Π = (1 − ǫ ) δ 0 + ǫ Π ⋆ X ij ∼ N (0 , 1 /n ) ǫ i ∼ N (0 , 1) β j Feature importance Z j = sup { λ : | ˆ β j ( λ ) | � = 0 } Can carry out theoretical calculations when n, p → ∞ n/p → δ thanks to powerful Approximate Message Passing (AMP) theory of Bayati Montanari (’12) (see also Su, Bogdan & C., ’15)

  8. Π * = 0.7N(0,1)+0.3N(2,1) 0.8 oracle 0.6 F D P 0.4 0.2 δ =1, ε =0.2, σ =0.5 0.0 0.0 0.2 0.4 0.6 0.8 T D P

  9. Π * = 0.7N(0,1)+0.3N(2,1) 0.8 oracle knockoff 0.6 F D P 0.4 0.2 δ =1, ε =0.2, σ =0.5 0.0 0.0 0.2 0.4 0.6 0.8 T D P

  10. Π * = 0.7N(0,1)+0.3N(2,1) 0.8 oracle knockoff 0.6 F D P 0.4 + q=0.3 + 0.2 + + q=0.1 + + q=0.05 δ =1, ε =0.2, σ =0.5 0.0 0.0 0.2 0.4 0.6 0.8 T D P

  11. Π * = δ 50 Π * = 0.7N(0,1)+0.3N(2,1) 0.8 0.8 oracle oracle knockoff knockoff 0.6 0.6 F D P F D P 0.4 0.4 + + + + q=0.3 q=0.3 0.2 0.2 + q=0.1 + q=0.1 + + + q=0.05 + q=0.05 + + 0.0 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 T D P T D P Π * = 0.5 δ 0.1 +0.5 δ 50 Π * = exp( λ )=0.2 0.8 0.8 oracle oracle oracle knockoff knockoff 0.6 0.6 F D P F D P 0.4 0.4 0.2 0.2 q=0.1 + + q=0.1 + + q=0.05 + + + + q=0.05 q=0.01 q=0.01 + + + + 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 T D P T D P

  12. Π * = δ 50 Π * = exp(1) 1.0 0.6 0.5 0.8 + q=0.125 0.4 TDP (knockoff) TDP (knockoff) + q=0.1 0.6 0.3 + q=0.3 + q=0.2 0.2 + q=0.05 0.4 0.1 + q=0.05 0.0 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 TDP (oracle) TDP (oracle) Figure: Π ⋆ = δ 50 (left) and Π ⋆ = exp(1) (right)

  13. Consequence of new scientific paradigm Collect data first = ⇒ Ask questions later Textbook practice Modern practice (1) Select (1) Collect data hypotheses/model/question (2) Select (2) Collect data hypotheses/model/questions (3) Perform inference (3) Perform inference

  14. Consequence of new scientific paradigm Collect data first = ⇒ Ask questions later Textbook practice Modern practice (1) Select (1) Collect data hypotheses/model/question (2) Select (2) Collect data hypotheses/model/questions (3) Perform inference (3) Perform inference 2017 Wald Lectures Explain how I and others are responding Explain various facets of the selective inference problem Contribute to enhanced statistical reasoning

  15. Model selection in practice > model = lm(y ~ . , data = X) > model.AIC = stepAIC(model,direction="both") > summary(model.AIC) Call: lm(formula = y ~ V1 + V2 + V5 + V7 + V8 + V9 + V10, data = X) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.1034 0.1575 0.656 0.5239 V1 0.4716 0.1665 2.832 0.0151 * V2 0.3437 0.1351 2.544 0.0258 * V5 0.7157 0.3147 2.274 0.0421 * V7 0.3336 0.2027 1.646 0.1257 V8 -0.4358 0.1789 -2.436 0.0314 * V9 0.4989 0.1503 3.321 0.0061 ** V10 0.4120 0.2425 1.699 0.1151 --- Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 Residual standard error: 0.6636 on 12 degrees of freedom Multiple R-squared: 0.8073,Adjusted R-squared: 0.6949 F-statistic: 7.181 on 7 and 12 DF, p-value: 0.001629

  16. Model selection in practice ! ! > model = lm(y ~ . , data = X) d d > model.AIC = stepAIC(model,direction="both") e e > summary(model.AIC) t t r r o o Call: t t lm(formula = y ~ V1 + V2 + V5 + V7 + V8 + V9 + V10, data = X) s s i i d d Coefficients: Estimate Std. Error t value Pr(>|t|) y y (Intercept) 0.1034 0.1575 0.656 0.5239 l l e e V1 0.4716 0.1665 2.832 0.0151 * k k V2 0.3437 0.1351 2.544 0.0258 * i i V5 0.7157 0.3147 2.274 0.0421 * l l V7 0.3336 0.2027 1.646 0.1257 e e V8 -0.4358 0.1789 -2.436 0.0314 * c c n n V9 0.4989 0.1503 3.321 0.0061 ** V10 0.4120 0.2425 1.699 0.1151 e e --- r r e e Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 f f n n Residual standard error: 0.6636 on 12 degrees of freedom I I Multiple R-squared: 0.8073,Adjusted R-squared: 0.6949 F-statistic: 7.181 on 7 and 12 DF, p-value: 0.001629

  17. Example from A. Buja 10 � iid y = β 0 x 0 + β j x j + z j n = 250 , z j ∼ N (0 , 1) j =1 Interested in CI for β 0 Select model always including x 0 via BIC

  18. Example from A. Buja 10 � iid y = β 0 x 0 + β j x j + z j n = 250 , z j ∼ N (0 , 1) j =1 Interested in CI for β 0 Select model always including x 0 via BIC Nominal Dist. 0.6 Actual Dist. 0.5 0.4 Density 0.3 Coverage is 83 . 5% < 95% 0.2 For p = 30 , coverage as low 0.1 as 39% 0.0 − 4 − 2 0 2 4 t X Figure: Marginal distribution of post-selection t -statistics

  19. Recall Sori´ c’s warning from Lecture 1 “In a large number of 95% confidence intervals, 95% of them contain the population parameter [...] but it would be wrong to imagine that the same rule also applies to a large number of 95% interesting confidence intervals” iid θ i ∼ N (0 , 0 . 04) , i = 1 , 2 , . . . , 20 Construct level 90 % marginal CIs iid Sample z i ∼ N ( θ i , 1) Select intervals that do not cover 0

  20. Recall Sori´ c’s warning from Lecture 1 “In a large number of 95% confidence intervals, 95% of them contain the population parameter [...] but it would be wrong to imagine that the same rule also applies to a large number of 95% interesting confidence intervals” iid θ i ∼ N (0 , 0 . 04) , i = 1 , 2 , . . . , 20 Construct level 90 % marginal CIs iid Sample z i ∼ N ( θ i , 1) Select intervals that do not cover 0 Through simulations P θ ( θ i ∈ CI i ( α ) | i ∈ S ) ≈ 0 . 043

  21. Geography of error rates A Simultaneous over all possible selection rules (Bonferroni) B Simultaneous over the selected C On the average over the selected (FDR/FCR) D Conditional over the selected

  22. Geography of error rates A Simultaneous over all possible selection rules (Bonferroni) B Simultaneous over the selected C On the average over the selected (FDR/FCR) D Conditional over the selected Wald Lecture III Present vignettes for each territory Not exhaustive (would have also liked to discuss work by Goeman and Solari (’11) on multiple testing for exploratory research) Works I have learned about early and that inspired my thinking

  23. A Simultaneous over all possible selection rules B Simultaneous over the selected C On the average over the selected (FDR/FCR) D Conditional over the selected False Coverage Rate Benjamini & Yekutieli (’05)

  24. Conditional coverage I iid y i ∼ N ( µ, 1) i = 1 , . . . , 200 Select when 95% CI does not cover 0 Conditional coverage can be low and depends on unknown parameter

  25. Conditional coverage II iid y i ∼ N ( µ, 1) i = 1 , . . . , 200 Bonferroni selected and Bonferroni adjusted CIs Better but still no conditional coverage!

  26. Conditional coverage Worthy goal: select set S of parameters and P θ ( θ i ∈ CI i ( α ) | i ∈ S ) ≥ 1 − α Cannot in general be achieved: similar to why pFDR = E ( FDP | R > 0) cannot be controlled; e.g. under global null, conditional on making a rejection, pFDR = 1 Have to settle for a bit less!

  27. False coverage rate Definition False coverage rate (FCR) is defined as � � V CI R CI : # selected parameters FCR = E R CI ∨ 1 V CI : # CIs not covering

  28. False coverage rate Definition False coverage rate (FCR) is defined as � � V CI R CI : # selected parameters FCR = E R CI ∨ 1 V CI : # CIs not covering Similar to FDR: controls type I error over the selected

  29. False coverage rate Definition False coverage rate (FCR) is defined as � � V CI R CI : # selected parameters FCR = E R CI ∨ 1 V CI : # CIs not covering Similar to FDR: controls type I error over the selected Without selection, i.e. |S| = n , the marginal CI’s control the FCR since �� n � i =1 1( θ i / ∈ CI i ( α )) FCR = E ≤ α n

  30. False coverage rate Definition False coverage rate (FCR) is defined as � � V CI R CI : # selected parameters FCR = E R CI ∨ 1 V CI : # CIs not covering Similar to FDR: controls type I error over the selected Without selection, i.e. |S| = n , the marginal CI’s control the FCR since �� n � i =1 1( θ i / ∈ CI i ( α )) FCR = E ≤ α n With selection, marginal CI’s will not generally control the FCR

Recommend


More recommend