from selective inference to adaptive data analysis
play

From selective inference to adaptive data analysis Xiaoying Tian - PowerPoint PPT Presentation

From selective inference to adaptive data analysis Xiaoying Tian Harris December 9, 2016 Acknowledgement My advisor: Jonathan Taylor Other coauthors: Snigdha Panigrahi Jelena Markovic Nan Bi Model selection Observe data ( y


  1. From selective inference to adaptive data analysis Xiaoying Tian Harris December 9, 2016

  2. Acknowledgement My advisor: ◮ Jonathan Taylor Other coauthors: ◮ Snigdha Panigrahi ◮ Jelena Markovic ◮ Nan Bi

  3. Model selection ◮ Observe data ( y , X ), X ∈ R n × p , y ∈ R n

  4. Model selection ◮ Observe data ( y , X ), X ∈ R n × p , y ∈ R n ◮ model = lm(y ∼ X1 + X2 + X3 + X4)

  5. Model selection ◮ Observe data ( y , X ), X ∈ R n × p , y ∈ R n ◮ model = lm(y ∼ X1 + X2 + X3 + X4) model = lm(y ∼ X1 + X2 + X4)

  6. Model selection ◮ Observe data ( y , X ), X ∈ R n × p , y ∈ R n ◮ model = lm(y ∼ X1 + X2 + X3 + X4) model = lm(y ∼ X1 + X2 + X4) model = lm(y ∼ X1 + X3 + X4)

  7. Model selection ◮ Observe data ( y , X ), X ∈ R n × p , y ∈ R n ◮ model = lm(y ∼ X1 + X2 + X3 + X4) model = lm(y ∼ X1 + X2 + X4) model = lm(y ∼ X1 + X3 + X4) ◮ Inference after model selection 1. Use data to select a set of variables E 2. Normal z-test to get p-values

  8. Model selection ◮ Observe data ( y , X ), X ∈ R n × p , y ∈ R n ◮ model = lm(y ∼ X1 + X2 + X3 + X4) model = lm(y ∼ X1 + X2 + X4) model = lm(y ∼ X1 + X3 + X4) ◮ Inference after model selection 1. Use data to select a set of variables E 2. Normal z-test to get p-values ◮ Problem: inflated significance 1. Normal z-tests need adjustment 2. Selection is biased towards “significance”

  9. Inflated Significance Setup: ◮ X ∈ R 100 × 200 has i.i.d normal entries ◮ y = X β + ǫ , ǫ ∼ N (0 , I ) ◮ β = (5 , . . . , 5 , 0 , . . . , 0) � �� � 10 ◮ LASSO, nonzero coefficient set E ◮ z-test, null pvalues for i ∈ E , i �∈ { 1 , . . . , 10 } 0.5 null pvalues after selection 0.4 0.3 frequencies 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 p-values

  10. Inflated Significance Setup: ◮ X ∈ R 100 × 200 has i.i.d normal entries ◮ y = X β + ǫ , ǫ ∼ N (0 , I ) ◮ β = (5 , . . . , 5 , 0 , . . . , 0) � �� � 10 ◮ LASSO, nonzero coefficient set E ◮ z-test, null pvalues for i ∈ E , i �∈ { 1 , . . . , 10 } selective p-values after selection 0.14 0.12 0.10 frequencies 0.08 0.06 0.04 0.02 0.00 0.0 0.2 0.4 0.6 0.8 1.0 p-values

  11. Selective inference: features and caveat ◮ Specific to particular selection procedures ◮ Exact post-selection test ◮ More powerful test

  12. Selective inference: popping the hood Consider the selection for “big effects”: � n i . i . d i =1 X i ◮ X 1 , . . . , X n ∼ N (0 , 1), X = n ◮ Select for “big effects”, X > 1 ◮ Observation: X obs = 1 . 1, with n = 5 ◮ Normal z -test v.s. selective test for H 0 : µ = 0. original distribution for ¯ X conditional distribution after selection 0.9 6 0.8 5 0.7 0.6 4 0.5 3 0.4 0.3 2 0.2 1 0.1 0.0 0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 0 . 0 0 . 5 1 . 0 1 . 5 2 . 0

  13. Selective inference: popping the hood Consider the selection for “big effects”: � n i . i . d i =1 X i ◮ X 1 , . . . , X n ∼ N (0 , 1), X = n ◮ Select for “big effects”, X > 1 ◮ Observation: X obs = 1 . 1, with n = 5 ◮ Normal z -test v.s. selective test for H 0 : µ = 0. original distribution for ¯ X conditional distribution after selection 0 . 9 6 0 . 8 5 0 . 7 0 . 6 4 0 . 5 3 0 . 4 0 . 3 2 0 . 2 1 0 . 1 0 . 0 0 − 1 . 5 − 1 . 0 − 0 . 5 0 . 0 0 . 5 1 . 0 1 . 5 0 . 0 0 . 5 1 . 0 1 . 5 2 . 0

  14. Selective inference: in a nutshell ◮ Selection, e.g. X > 1. ◮ Change of the reference measure ◮ the conditional distribution, e.g. N ( µ, 1 n ), truncated at 1. ◮ Target of inference may depend on the outcome of selection ◮ Example: selection by LASSO

  15. What is the “selected” model? Suppose a set of variables E are suggested by the data for further investigation. ◮ Selected model by Fithian et al. (2014): M E = { N ( X E β E , σ 2 E I ) , β E ∈ R | E | , σ 2 E > 0 } . Target is β E . ◮ Full model by Lee et al. (2016), Berk et al. (2013): M = { N ( µ, σ 2 I ) , µ ∈ R n } . Target is β E ( µ ) = X † E µ . ◮ Nonparametric model: M = {⊗ n F : ( X , Y ) ∼ F } . Target is β E ( F ) = E F [ X T E X E ] − 1 E F [ X E · Y ].

  16. What is the “selected” model? Suppose a set of variables E are suggested by the data for further investigation. ◮ Selected model by Fithian et al. (2014): M E = { N ( X E β E , σ 2 E I ) , β E ∈ R | E | , σ 2 E > 0 } . Target is β E . ◮ Full model by Lee et al. (2016), Berk et al. (2013): M = { N ( µ, σ 2 I ) , µ ∈ R n } . Target is β E ( µ ) = X † E µ . ◮ Nonparametric model: M = {⊗ n F : ( X , Y ) ∼ F } . Target is β E ( F ) = E F [ X T E X E ] − 1 E F [ X E · Y ]. A tool for valid inference after exploratory data analysis.

  17. Selective inference on a DAG ◮ Incoporate randomness through ω X, y ω 1. ( X ∗ , y ∗ ) = ( X , y ) 2. ( X ∗ , y ∗ ) = ( X 1 , y 1 ) 3. ( X ∗ , y ∗ ) = ( X , y + ω ) ◮ Reference measure conditioning on X ∗ , y ∗ E , the yellow node. ◮ Target of inference can be E 1. Not E , but depends on the data E through E 2. “Liberating” target of inference from selection ¯ 3. E incorporate knowledge from E previous literature.

  18. From selective inference to adaptive data analysis Denote the data by S S ω E ¯ E

  19. From selective inference to adaptive data analysis Denote the data by S ω 1 ω 2 S E 1 E 2 ¯ E

  20. Reference measure after selection ◮ Given any point null F 0 , use the conditional distribution F ∗ 0 as reference measure, dF ∗ 0 ( S ) = ℓ F ( S ) . dF 0 ◮ ℓ F is called the selective likelihood ratio . Depends on the selection algorithm and the randomization distribution ω ∼ G . ◮ Tests of the form H 0 : θ ( F ) = θ 0 can be reduced to testing point nulls, e.g. ◮ Score test ◮ Conditioning in exponential families

  21. Computing the reference measure after selection ◮ Selection map ˆ Q results from an optimization problem, ˆ ℓ ( S ; β ) + P ( β ) + ω T β. β ( S , ω ) = arg min β E is the active set of ˆ β . ◮ Selection region A ( S ) = { ω : ˆ Q ( S , ω ) = E } , ω ∼ G � dF ∗ 0 ( S ) = dG ( ω ) . dF 0 A ( S ) S ω { ˆ Q ( S , ω ) = E } is difficult to describe. E

  22. Computing the reference measure after selection ◮ Selection map ˆ Q results from an optimization problem, ˆ ℓ ( S ; β ) + P ( β ) + ω T β. β ( S , ω ) = arg min β E is the active set of ˆ β . ◮ Selection region A ( S ) = { ω : ˆ Q ( S , ω ) = E } , ω ∼ G � dF ∗ 0 ( S ) = dG ( ω ) . dF 0 A ( S ) Let ˆ z ( S , ω ) be the subgradient of the optimization problem. ˆ z − E ˆ β E S { (ˆ β E , ˆ z − E ) ∈ B} , B depends only on the penalty P . E

  23. Monte-Carlo sampler for the conditional distribution Suppose F 0 has density f 0 and G has density g , dF ∗ ˆ z − E ˆ S β E 0 ( S ) dF 0 � g ( ψ ( S , ˆ z − E )) d ˆ = β E , ˆ β E d ˆ z − E , B E where ω = ψ ( S , ˆ β E , ˆ z − E ). ◮ The reparametrization map ψ is easy to compute, Harris et al. (2016) ◮ In simulation, we jointly sample ( S , ˆ β E , ˆ z − E ) from the density below, f 0 ( S ) g ( ψ ( S , ˆ β E , ˆ z − E )) 1 B . Samples of S can be used as reference measure for selective inference.

  24. Interactive Data Analysis Easily generalizable in a sequential/interactive fashion. ˆ ˆ z − E β E S E f 0 ( S ) g ( ψ ( S , ˆ β E , ˆ z − E )) 1 B .

  25. Interactive Data Analysis Easily generalizable in a sequential/interactive fashion. ˆ ˆ z − E 1 ˆ ˆ z − E 2 β E 1 β E 2 S E 1 E 2 f 0 ( S ) g ( ψ 1 ( S , ˆ z − E 1 )) 1 B 1 · g ( ψ 2 ( S , ˆ β E 1 , ˆ β E 2 , ˆ z − E 2 )) 1 B 2 . ◮ Flexible framework. Any selection procedure resulting from a “Loss + Penalty” convex problem. ◮ Examples such as Lasso, logistic Lasso, marginal screening, forward stepwise, graphical Lasso, group Lasso, are considered in Harris et al. (2016). ◮ Many more is possible.

  26. Summary ◮ Selective inference on a DAG ◮ Selection: more than one shot ◮ Feasible implementation of the selective tests https://github.com/selective-inference/Python-software Thank you!

  27. Berk, R., Brown, L., Buja, A., Zhang, K. & Zhao, L. (2013), ‘Valid post-selection inference’, The Annals of Statistics 41 (2), 802–837. URL: http://projecteuclid.org/euclid.aos/1369836961 Fithian, W., Sun, D. & Taylor, J. (2014), ‘Optimal Inference After Model Selection’, arXiv preprint arXiv:1410.2597 . arXiv: 1410.2597. URL: http://arxiv.org/abs/1410.2597 Harris, X. T., Panigrahi, S., Markovic, J., Bi, N. & Taylor, J. (2016), ‘Selective sampling after solving a convex problem’, arXiv preprint arXiv:1609.05609 . Lee, J. D., Sun, D. L., Sun, Y. & Taylor, J. E. (2016), ‘Exact post-selection inference with the lasso’, The Annals of Statistics 44 (3), 907–927. URL: http://projecteuclid.org/euclid.aos/1460381681

Recommend


More recommend