What’s Happening in Selective Inference III? Emmanuel Cand` es, Stanford University The 2017 Wald Lectures, Joint Statistical Meetings, Baltimore, August 2017
Lecture 3: Special dedication Maryam Mirzakhani 1977–2017 “Life is not supposed to be easy”
Knockoffs: Power Analysis Joint with A. Weinstein and R. Barber
Knockoffs: wrapper around a black box Cam we analyze power?
Case study y = Xβ + ǫ iid iid iid ∼ Π = (1 − ǫ ) δ 0 + ǫ Π ⋆ X ij ∼ N (0 , 1 /n ) ǫ i ∼ N (0 , 1) β j
Case study y = Xβ + ǫ iid iid iid ∼ Π = (1 − ǫ ) δ 0 + ǫ Π ⋆ X ij ∼ N (0 , 1 /n ) ǫ i ∼ N (0 , 1) β j Feature importance Z j = sup { λ : | ˆ β j ( λ ) | � = 0 }
Case study y = Xβ + ǫ iid iid iid ∼ Π = (1 − ǫ ) δ 0 + ǫ Π ⋆ X ij ∼ N (0 , 1 /n ) ǫ i ∼ N (0 , 1) β j Feature importance Z j = sup { λ : | ˆ β j ( λ ) | � = 0 } Can carry out theoretical calculations when n, p → ∞ n/p → δ thanks to powerful Approximate Message Passing (AMP) theory of Bayati Montanari (’12) (see also Su, Bogdan & C., ’15)
Π * = 0.7N(0,1)+0.3N(2,1) 0.8 oracle 0.6 F D P 0.4 0.2 δ =1, ε =0.2, σ =0.5 0.0 0.0 0.2 0.4 0.6 0.8 T D P
Π * = 0.7N(0,1)+0.3N(2,1) 0.8 oracle knockoff 0.6 F D P 0.4 0.2 δ =1, ε =0.2, σ =0.5 0.0 0.0 0.2 0.4 0.6 0.8 T D P
Π * = 0.7N(0,1)+0.3N(2,1) 0.8 oracle knockoff 0.6 F D P 0.4 + q=0.3 + 0.2 + + q=0.1 + + q=0.05 δ =1, ε =0.2, σ =0.5 0.0 0.0 0.2 0.4 0.6 0.8 T D P
Π * = δ 50 Π * = 0.7N(0,1)+0.3N(2,1) 0.8 0.8 oracle oracle knockoff knockoff 0.6 0.6 F D P F D P 0.4 0.4 + + + + q=0.3 q=0.3 0.2 0.2 + q=0.1 + q=0.1 + + + q=0.05 + q=0.05 + + 0.0 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 T D P T D P Π * = 0.5 δ 0.1 +0.5 δ 50 Π * = exp( λ )=0.2 0.8 0.8 oracle oracle oracle knockoff knockoff 0.6 0.6 F D P F D P 0.4 0.4 0.2 0.2 q=0.1 + + q=0.1 + + q=0.05 + + + + q=0.05 q=0.01 q=0.01 + + + + 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0 T D P T D P
Π * = δ 50 Π * = exp(1) 1.0 0.6 0.5 0.8 + q=0.125 0.4 TDP (knockoff) TDP (knockoff) + q=0.1 0.6 0.3 + q=0.3 + q=0.2 0.2 + q=0.05 0.4 0.1 + q=0.05 0.0 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 TDP (oracle) TDP (oracle) Figure: Π ⋆ = δ 50 (left) and Π ⋆ = exp(1) (right)
Consequence of new scientific paradigm Collect data first = ⇒ Ask questions later Textbook practice Modern practice (1) Select (1) Collect data hypotheses/model/question (2) Select (2) Collect data hypotheses/model/questions (3) Perform inference (3) Perform inference
Consequence of new scientific paradigm Collect data first = ⇒ Ask questions later Textbook practice Modern practice (1) Select (1) Collect data hypotheses/model/question (2) Select (2) Collect data hypotheses/model/questions (3) Perform inference (3) Perform inference 2017 Wald Lectures Explain how I and others are responding Explain various facets of the selective inference problem Contribute to enhanced statistical reasoning
Model selection in practice > model = lm(y ~ . , data = X) > model.AIC = stepAIC(model,direction="both") > summary(model.AIC) Call: lm(formula = y ~ V1 + V2 + V5 + V7 + V8 + V9 + V10, data = X) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.1034 0.1575 0.656 0.5239 V1 0.4716 0.1665 2.832 0.0151 * V2 0.3437 0.1351 2.544 0.0258 * V5 0.7157 0.3147 2.274 0.0421 * V7 0.3336 0.2027 1.646 0.1257 V8 -0.4358 0.1789 -2.436 0.0314 * V9 0.4989 0.1503 3.321 0.0061 ** V10 0.4120 0.2425 1.699 0.1151 --- Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 Residual standard error: 0.6636 on 12 degrees of freedom Multiple R-squared: 0.8073,Adjusted R-squared: 0.6949 F-statistic: 7.181 on 7 and 12 DF, p-value: 0.001629
Model selection in practice ! ! > model = lm(y ~ . , data = X) d d > model.AIC = stepAIC(model,direction="both") e e > summary(model.AIC) t t r r o o Call: t t lm(formula = y ~ V1 + V2 + V5 + V7 + V8 + V9 + V10, data = X) s s i i d d Coefficients: Estimate Std. Error t value Pr(>|t|) y y (Intercept) 0.1034 0.1575 0.656 0.5239 l l e e V1 0.4716 0.1665 2.832 0.0151 * k k V2 0.3437 0.1351 2.544 0.0258 * i i V5 0.7157 0.3147 2.274 0.0421 * l l V7 0.3336 0.2027 1.646 0.1257 e e V8 -0.4358 0.1789 -2.436 0.0314 * c c n n V9 0.4989 0.1503 3.321 0.0061 ** V10 0.4120 0.2425 1.699 0.1151 e e --- r r e e Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 f f n n Residual standard error: 0.6636 on 12 degrees of freedom I I Multiple R-squared: 0.8073,Adjusted R-squared: 0.6949 F-statistic: 7.181 on 7 and 12 DF, p-value: 0.001629
Example from A. Buja 10 � iid y = β 0 x 0 + β j x j + z j n = 250 , z j ∼ N (0 , 1) j =1 Interested in CI for β 0 Select model always including x 0 via BIC
Example from A. Buja 10 � iid y = β 0 x 0 + β j x j + z j n = 250 , z j ∼ N (0 , 1) j =1 Interested in CI for β 0 Select model always including x 0 via BIC Nominal Dist. 0.6 Actual Dist. 0.5 0.4 Density 0.3 Coverage is 83 . 5% < 95% 0.2 For p = 30 , coverage as low 0.1 as 39% 0.0 − 4 − 2 0 2 4 t X Figure: Marginal distribution of post-selection t -statistics
Recall Sori´ c’s warning from Lecture 1 “In a large number of 95% confidence intervals, 95% of them contain the population parameter [...] but it would be wrong to imagine that the same rule also applies to a large number of 95% interesting confidence intervals” iid θ i ∼ N (0 , 0 . 04) , i = 1 , 2 , . . . , 20 Construct level 90 % marginal CIs iid Sample z i ∼ N ( θ i , 1) Select intervals that do not cover 0
Recall Sori´ c’s warning from Lecture 1 “In a large number of 95% confidence intervals, 95% of them contain the population parameter [...] but it would be wrong to imagine that the same rule also applies to a large number of 95% interesting confidence intervals” iid θ i ∼ N (0 , 0 . 04) , i = 1 , 2 , . . . , 20 Construct level 90 % marginal CIs iid Sample z i ∼ N ( θ i , 1) Select intervals that do not cover 0 Through simulations P θ ( θ i ∈ CI i ( α ) | i ∈ S ) ≈ 0 . 043
Geography of error rates A Simultaneous over all possible selection rules (Bonferroni) B Simultaneous over the selected C On the average over the selected (FDR/FCR) D Conditional over the selected
Geography of error rates A Simultaneous over all possible selection rules (Bonferroni) B Simultaneous over the selected C On the average over the selected (FDR/FCR) D Conditional over the selected Wald Lecture III Present vignettes for each territory Not exhaustive (would have also liked to discuss work by Goeman and Solari (’11) on multiple testing for exploratory research) Works I have learned about early and that inspired my thinking
A Simultaneous over all possible selection rules B Simultaneous over the selected C On the average over the selected (FDR/FCR) D Conditional over the selected False Coverage Rate Benjamini & Yekutieli (’05)
Conditional coverage I iid y i ∼ N ( µ, 1) i = 1 , . . . , 200 Select when 95% CI does not cover 0 Conditional coverage can be low and depends on unknown parameter
Conditional coverage II iid y i ∼ N ( µ, 1) i = 1 , . . . , 200 Bonferroni selected and Bonferroni adjusted CIs Better but still no conditional coverage!
Conditional coverage Worthy goal: select set S of parameters and P θ ( θ i ∈ CI i ( α ) | i ∈ S ) ≥ 1 − α Cannot in general be achieved: similar to why pFDR = E ( FDP | R > 0) cannot be controlled; e.g. under global null, conditional on making a rejection, pFDR = 1 Have to settle for a bit less!
False coverage rate Definition False coverage rate (FCR) is defined as � � V CI R CI : # selected parameters FCR = E R CI ∨ 1 V CI : # CIs not covering
False coverage rate Definition False coverage rate (FCR) is defined as � � V CI R CI : # selected parameters FCR = E R CI ∨ 1 V CI : # CIs not covering Similar to FDR: controls type I error over the selected
False coverage rate Definition False coverage rate (FCR) is defined as � � V CI R CI : # selected parameters FCR = E R CI ∨ 1 V CI : # CIs not covering Similar to FDR: controls type I error over the selected Without selection, i.e. |S| = n , the marginal CI’s control the FCR since �� n � i =1 1( θ i / ∈ CI i ( α )) FCR = E ≤ α n
False coverage rate Definition False coverage rate (FCR) is defined as � � V CI R CI : # selected parameters FCR = E R CI ∨ 1 V CI : # CIs not covering Similar to FDR: controls type I error over the selected Without selection, i.e. |S| = n , the marginal CI’s control the FCR since �� n � i =1 1( θ i / ∈ CI i ( α )) FCR = E ≤ α n With selection, marginal CI’s will not generally control the FCR
Recommend
More recommend