sailing through data discoveries and mirages
play

Sailing Through Data: Discoveries and Mirages Emmanuel Cand` es, - PowerPoint PPT Presentation

Sailing Through Data: Discoveries and Mirages Emmanuel Cand` es, Stanford University 2018 Machine Learning Summer School, Buenos Aires, June 2018 Robustness Robustness 1.00 1.00 Exact Cov 0.75 0.75 Power FDR 0.50 0.50 0.25 0.25


  1. Sailing Through Data: Discoveries and Mirages Emmanuel Cand` es, Stanford University 2018 Machine Learning Summer School, Buenos Aires, June 2018

  2. Robustness

  3. Robustness 1.00 1.00 Exact Cov ● 0.75 0.75 Power FDR 0.50 0.50 0.25 0.25 ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800 , p = 1500 , and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

  4. Robustness 1.00 1.00 Exact Cov ● 0.75 ● 0.75 Graph. Lasso Power FDR 0.50 0.50 0.25 0.25 ● ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800 , p = 1500 , and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

  5. Robustness 1.00 1.00 Exact Cov 50% Emp. Cov ● 0.75 ● 0.75 Graph. Lasso ● Power FDR 0.50 0.50 0.25 0.25 ● ● ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800 , p = 1500 , and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

  6. Robustness 1.00 1.00 Exact Cov 50% Emp. Cov ● 0.75 ● 0.75 Graph. Lasso ● 62.5% Emp. Cov ● Power FDR 0.50 0.50 0.25 0.25 ● ● ● ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800 , p = 1500 , and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

  7. Robustness 1.00 1.00 Exact Cov 50% Emp. Cov ● 0.75 ● 0.75 Graph. Lasso ● 62.5% Emp. Cov ● ● 75% Emp. Cov Power FDR 0.50 0.50 0.25 0.25 ● ● ● ● ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800 , p = 1500 , and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

  8. Robustness 1.00 1.00 Exact Cov 50% Emp. Cov ● 0.75 ● 0.75 Graph. Lasso ● 62.5% Emp. Cov ● ● 75% Emp. Cov Power FDR 0.50 0.50 ● 87.5% Emp. Cov 0.25 0.25 ● ● ● ● ● ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800 , p = 1500 , and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

  9. Robustness 1.00 1.00 Exact Cov 50% Emp. Cov ● 0.75 ● 0.75 Graph. Lasso ● 62.5% Emp. Cov ● ● 75% Emp. Cov Power FDR 0.50 0.50 ● 87.5% Emp. Cov 0.25 0.25 ● ● ● ● ● ● 100% Emp. Cov 0.00 0.00 ● ● 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800 , p = 1500 , and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

  10. Simulations with synthetic Markov chain Markov chain covariates with 5 hidden states. Binomial response 1.0 1.0 0.8 0.8 0.6 0.6 Power FDP 0.4 0.4 0.2 0.2 0.0 0.0 4 5 6 7 8 9 10 12 15 20 4 5 6 7 8 9 10 12 15 20 Signal amplitude Signal amplitude Figure: Power and FDP over 100 repetitions (true F X ) n = 1000 , p = 1000 , target FDR: α = 0 . 1 Z j = | ˆ β j (ˆ λ CV ) | , W j = Z j − ˜ Z j

  11. Robustness Markov chain covariates with 5 hidden states. Binomial response 1.0 1.0 0.8 0.8 0.6 0.6 Power FDP 0.4 0.4 0.2 0.2 0.0 0.0 4 5 6 7 8 9 10 12 15 20 4 5 6 7 8 9 10 12 15 20 Signal amplitude Signal amplitude Figure: Power and FDP over 100 repetitions (estimated F X ) n = 1000 , p = 1000 , target FDR: α = 0 . 1 Z j = | ˆ β j (ˆ λ CV ) | , W j = Z j − ˜ Z j

  12. Simulations with synthetic HMM HMM covariates with latent “clockwise” Markov chain. Binomial response 1.0 1.0 0.8 0.8 0.6 0.6 Power FDP 0.4 0.4 0.2 0.2 0.0 0.0 3 4 5 6 7 8 9 10 15 20 3 4 5 6 7 8 9 10 15 20 Signal amplitude Signal amplitude Figure: Power and FDP over 100 repetitions (true F X ) n = 1000 , p = 1000 , target FDR: α = 0 . 1 Z j = | ˆ β j (ˆ λ CV ) | , W j = Z j − ˜ Z j

  13. Robustness HMM covariates with latent “clockwise” Markov chain. Binomial response 1.0 1.0 0.8 0.8 0.6 0.6 Power FDP 0.4 0.4 0.2 0.2 0.0 0.0 3 4 5 6 7 8 9 10 15 20 3 4 5 6 7 8 9 10 15 20 Signal amplitude Signal amplitude Figure: Power and FDP over 100 repetitions (estimated F X ) n = 1000 , p = 1000 , target FDR: α = 0 . 1 Z j = | ˆ β j (ˆ λ CV ) | , W j = Z j − ˜ Z j

  14. Out-of-sample parameter estimation Inhomogeneous Markov chain covariates with 5 hidden states. Binomial response 1.0 1.0 0.8 0.8 0.6 0.6 Power FDP 0.4 0.4 0.2 0.2 0.0 0.0 10 25 50 75 100 500 1000 5000 10000 10 25 50 75 100 500 1000 5000 10000 Number of unsupervised observations Number of unsupervised observations Figure: Power and FDP over 100 repetitions (estimated F X from independent dataset) n = 1000 , p = 1000 , target FDR: α = 0 . 1 Z j = | ˆ β j (ˆ λ CV ) | , W j = Z j − ˜ Z j

  15. Model-X knockoff variables (robust version) i.i.d. samples from P XY Distr. P X of X only ‘approx’ known Distr. P Y | X of Y | X completely unknown

  16. Model-X knockoff variables (robust version) i.i.d. samples from P XY Distr. P X of X only ‘approx’ known Distr. P Y | X of Y | X completely unknown Knockoffs wrt. to user input Q X (Barber, C. and Samworth, ’18) X = ( ˜ ˜ X 1 , . . . , ˜ Originals X = ( X 1 , . . . , X p ) Knockoffs X p )

  17. Model-X knockoff variables (robust version) i.i.d. samples from P XY Distr. P X of X only ‘approx’ known Distr. P Y | X of Y | X completely unknown Knockoffs wrt. to user input Q X (Barber, C. and Samworth, ’18) X = ( ˜ ˜ X 1 , . . . , ˜ Originals X = ( X 1 , . . . , X p ) Knockoffs X p ) (1) Pairwise exchangeability wrt Q X : If X ∼ Q X ( X, ˜ d ( X, ˜ X ) swap ( S ) = X ) e.g. ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ d ( X 1 , ˜ X 2 , ˜ X 3 , ˜ X 3 ) swap ( { 2 , 3 } ) = X 1 , X 2 , X 3 )

  18. Model-X knockoff variables (robust version) i.i.d. samples from P XY Distr. P X of X only ‘approx’ known Distr. P Y | X of Y | X completely unknown Knockoffs wrt. to user input Q X (Barber, C. and Samworth, ’18) X = ( ˜ ˜ X 1 , . . . , ˜ Originals X = ( X 1 , . . . , X p ) Knockoffs X p ) (1) Pairwise exchangeability wrt Q X : If X ∼ Q X ( X, ˜ d ( X, ˜ X ) swap ( S ) = X ) e.g. ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ d ( X 1 , ˜ X 2 , ˜ X 3 , ˜ X 3 ) swap ( { 2 , 3 } ) = X 1 , X 2 , X 3 ) (2) Ignore Y when constructing knockoffs: ˜ X ⊥ ⊥ Y | X

  19. Model-X knockoff variables (robust version) i.i.d. samples from P XY Distr. P X of X only ‘approx’ known Distr. P Y | X of Y | X completely unknown Knockoffs wrt. to user input Q X (Barber, C. and Samworth, ’18) X = ( ˜ ˜ X 1 , . . . , ˜ Originals X = ( X 1 , . . . , X p ) Knockoffs X p ) (1) Pairwise exchangeability wrt Q X : If X ∼ Q X ( X, ˜ d ( X, ˜ X ) swap ( S ) = X ) e.g. ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ d ( X 1 , ˜ X 2 , ˜ X 3 , ˜ X 3 ) swap ( { 2 , 3 } ) = X 1 , X 2 , X 3 ) (2) Ignore Y when constructing knockoffs: ˜ X ⊥ ⊥ Y | X Only require conditionals Q ( X j | X − j ) which do not have to be compatible

  20. FDR control ˆ S = { W j ≥ τ } � � t : 1+ |{ j : W j ≤ − t }| ≤ q τ = min 1 ∨ |{ j : W j ≥ t }| � �� � � FDP ( t ) t

  21. FDR control ˆ S = { W j ≥ τ } � � t : 1+ |{ j : W j ≤ − t }| ≤ q τ = min 1 ∨ |{ j : W j ≥ t }| � �� � � FDP ( t ) t Theorem (Barber and C. (’15)) If user-input Q X is correct ( Q X = P X ) , then for knockoff+ � # false positives � E ≤ q # selections

  22. Robustness of knockoffs? Does exchangeability hold approx. when Q X � = P X ? 0 __ __ __ __ __ + + + + + + + + + |W| If P X = Q X , coins are unbiased independent Problem: if P X � = Q X , coins may be (slighltly) biased (slightly) dependent

  23. KL divergence condition The KL condition � � � P j ( X ij | X i, − j ) Q j ( � X ij | X i, − j ) � KL j := log ≤ ǫ Q j ( X ij | X i, − j ) P j ( � X ij | X i, − j ) i E [ � KL j ] = KL divergence between distributions of ( X j , � X j , X − j , � ( � X j , X j , X − j , � X − j ) & X − j )

  24. From KL condition to FDR control Theorem (Barber, C. and Samworth (2018)) For any ǫ ≥ 0 � � # false positives j with � KL j ≤ ǫ ≤ q exp( ǫ ) E # selections

  25. From KL condition to FDR control Theorem (Barber, C. and Samworth (2018)) For any ǫ ≥ 0 � � # false positives j with � KL j ≤ ǫ ≤ q exp( ǫ ) E # selections Corollary � � �� � FDR ≤ min q exp( ǫ ) + P max KL j > ǫ ǫ ≥ 0 null j Information theoretically optimal

  26. New directions

  27. ML inspired knockoffs Joint with S. Bates, Y. Romano, M. Sesia and J. Zhou Knockoffs for graphical models Knockoffs via restricted Boltzmann machines Knockoffs via variational auto-encoders? Knockoffs via generative adversarial networks?

  28. Improving power? Joint with Z. Ren and M. Sesia

Recommend


More recommend