Sailing Through Data: Discoveries and Mirages Emmanuel Cand` es, Stanford University 2018 Machine Learning Summer School, Buenos Aires, June 2018
Robustness
Robustness 1.00 1.00 Exact Cov ● 0.75 0.75 Power FDR 0.50 0.50 0.25 0.25 ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800 , p = 1500 , and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries
Robustness 1.00 1.00 Exact Cov ● 0.75 ● 0.75 Graph. Lasso Power FDR 0.50 0.50 0.25 0.25 ● ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800 , p = 1500 , and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries
Robustness 1.00 1.00 Exact Cov 50% Emp. Cov ● 0.75 ● 0.75 Graph. Lasso ● Power FDR 0.50 0.50 0.25 0.25 ● ● ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800 , p = 1500 , and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries
Robustness 1.00 1.00 Exact Cov 50% Emp. Cov ● 0.75 ● 0.75 Graph. Lasso ● 62.5% Emp. Cov ● Power FDR 0.50 0.50 0.25 0.25 ● ● ● ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800 , p = 1500 , and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries
Robustness 1.00 1.00 Exact Cov 50% Emp. Cov ● 0.75 ● 0.75 Graph. Lasso ● 62.5% Emp. Cov ● ● 75% Emp. Cov Power FDR 0.50 0.50 0.25 0.25 ● ● ● ● ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800 , p = 1500 , and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries
Robustness 1.00 1.00 Exact Cov 50% Emp. Cov ● 0.75 ● 0.75 Graph. Lasso ● 62.5% Emp. Cov ● ● 75% Emp. Cov Power FDR 0.50 0.50 ● 87.5% Emp. Cov 0.25 0.25 ● ● ● ● ● ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800 , p = 1500 , and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries
Robustness 1.00 1.00 Exact Cov 50% Emp. Cov ● 0.75 ● 0.75 Graph. Lasso ● 62.5% Emp. Cov ● ● 75% Emp. Cov Power FDR 0.50 0.50 ● 87.5% Emp. Cov 0.25 0.25 ● ● ● ● ● ● 100% Emp. Cov 0.00 0.00 ● ● 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800 , p = 1500 , and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries
Simulations with synthetic Markov chain Markov chain covariates with 5 hidden states. Binomial response 1.0 1.0 0.8 0.8 0.6 0.6 Power FDP 0.4 0.4 0.2 0.2 0.0 0.0 4 5 6 7 8 9 10 12 15 20 4 5 6 7 8 9 10 12 15 20 Signal amplitude Signal amplitude Figure: Power and FDP over 100 repetitions (true F X ) n = 1000 , p = 1000 , target FDR: α = 0 . 1 Z j = | ˆ β j (ˆ λ CV ) | , W j = Z j − ˜ Z j
Robustness Markov chain covariates with 5 hidden states. Binomial response 1.0 1.0 0.8 0.8 0.6 0.6 Power FDP 0.4 0.4 0.2 0.2 0.0 0.0 4 5 6 7 8 9 10 12 15 20 4 5 6 7 8 9 10 12 15 20 Signal amplitude Signal amplitude Figure: Power and FDP over 100 repetitions (estimated F X ) n = 1000 , p = 1000 , target FDR: α = 0 . 1 Z j = | ˆ β j (ˆ λ CV ) | , W j = Z j − ˜ Z j
Simulations with synthetic HMM HMM covariates with latent “clockwise” Markov chain. Binomial response 1.0 1.0 0.8 0.8 0.6 0.6 Power FDP 0.4 0.4 0.2 0.2 0.0 0.0 3 4 5 6 7 8 9 10 15 20 3 4 5 6 7 8 9 10 15 20 Signal amplitude Signal amplitude Figure: Power and FDP over 100 repetitions (true F X ) n = 1000 , p = 1000 , target FDR: α = 0 . 1 Z j = | ˆ β j (ˆ λ CV ) | , W j = Z j − ˜ Z j
Robustness HMM covariates with latent “clockwise” Markov chain. Binomial response 1.0 1.0 0.8 0.8 0.6 0.6 Power FDP 0.4 0.4 0.2 0.2 0.0 0.0 3 4 5 6 7 8 9 10 15 20 3 4 5 6 7 8 9 10 15 20 Signal amplitude Signal amplitude Figure: Power and FDP over 100 repetitions (estimated F X ) n = 1000 , p = 1000 , target FDR: α = 0 . 1 Z j = | ˆ β j (ˆ λ CV ) | , W j = Z j − ˜ Z j
Out-of-sample parameter estimation Inhomogeneous Markov chain covariates with 5 hidden states. Binomial response 1.0 1.0 0.8 0.8 0.6 0.6 Power FDP 0.4 0.4 0.2 0.2 0.0 0.0 10 25 50 75 100 500 1000 5000 10000 10 25 50 75 100 500 1000 5000 10000 Number of unsupervised observations Number of unsupervised observations Figure: Power and FDP over 100 repetitions (estimated F X from independent dataset) n = 1000 , p = 1000 , target FDR: α = 0 . 1 Z j = | ˆ β j (ˆ λ CV ) | , W j = Z j − ˜ Z j
Model-X knockoff variables (robust version) i.i.d. samples from P XY Distr. P X of X only ‘approx’ known Distr. P Y | X of Y | X completely unknown
Model-X knockoff variables (robust version) i.i.d. samples from P XY Distr. P X of X only ‘approx’ known Distr. P Y | X of Y | X completely unknown Knockoffs wrt. to user input Q X (Barber, C. and Samworth, ’18) X = ( ˜ ˜ X 1 , . . . , ˜ Originals X = ( X 1 , . . . , X p ) Knockoffs X p )
Model-X knockoff variables (robust version) i.i.d. samples from P XY Distr. P X of X only ‘approx’ known Distr. P Y | X of Y | X completely unknown Knockoffs wrt. to user input Q X (Barber, C. and Samworth, ’18) X = ( ˜ ˜ X 1 , . . . , ˜ Originals X = ( X 1 , . . . , X p ) Knockoffs X p ) (1) Pairwise exchangeability wrt Q X : If X ∼ Q X ( X, ˜ d ( X, ˜ X ) swap ( S ) = X ) e.g. ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ d ( X 1 , ˜ X 2 , ˜ X 3 , ˜ X 3 ) swap ( { 2 , 3 } ) = X 1 , X 2 , X 3 )
Model-X knockoff variables (robust version) i.i.d. samples from P XY Distr. P X of X only ‘approx’ known Distr. P Y | X of Y | X completely unknown Knockoffs wrt. to user input Q X (Barber, C. and Samworth, ’18) X = ( ˜ ˜ X 1 , . . . , ˜ Originals X = ( X 1 , . . . , X p ) Knockoffs X p ) (1) Pairwise exchangeability wrt Q X : If X ∼ Q X ( X, ˜ d ( X, ˜ X ) swap ( S ) = X ) e.g. ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ d ( X 1 , ˜ X 2 , ˜ X 3 , ˜ X 3 ) swap ( { 2 , 3 } ) = X 1 , X 2 , X 3 ) (2) Ignore Y when constructing knockoffs: ˜ X ⊥ ⊥ Y | X
Model-X knockoff variables (robust version) i.i.d. samples from P XY Distr. P X of X only ‘approx’ known Distr. P Y | X of Y | X completely unknown Knockoffs wrt. to user input Q X (Barber, C. and Samworth, ’18) X = ( ˜ ˜ X 1 , . . . , ˜ Originals X = ( X 1 , . . . , X p ) Knockoffs X p ) (1) Pairwise exchangeability wrt Q X : If X ∼ Q X ( X, ˜ d ( X, ˜ X ) swap ( S ) = X ) e.g. ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ d ( X 1 , ˜ X 2 , ˜ X 3 , ˜ X 3 ) swap ( { 2 , 3 } ) = X 1 , X 2 , X 3 ) (2) Ignore Y when constructing knockoffs: ˜ X ⊥ ⊥ Y | X Only require conditionals Q ( X j | X − j ) which do not have to be compatible
FDR control ˆ S = { W j ≥ τ } � � t : 1+ |{ j : W j ≤ − t }| ≤ q τ = min 1 ∨ |{ j : W j ≥ t }| � �� � � FDP ( t ) t
FDR control ˆ S = { W j ≥ τ } � � t : 1+ |{ j : W j ≤ − t }| ≤ q τ = min 1 ∨ |{ j : W j ≥ t }| � �� � � FDP ( t ) t Theorem (Barber and C. (’15)) If user-input Q X is correct ( Q X = P X ) , then for knockoff+ � # false positives � E ≤ q # selections
Robustness of knockoffs? Does exchangeability hold approx. when Q X � = P X ? 0 __ __ __ __ __ + + + + + + + + + |W| If P X = Q X , coins are unbiased independent Problem: if P X � = Q X , coins may be (slighltly) biased (slightly) dependent
KL divergence condition The KL condition � � � P j ( X ij | X i, − j ) Q j ( � X ij | X i, − j ) � KL j := log ≤ ǫ Q j ( X ij | X i, − j ) P j ( � X ij | X i, − j ) i E [ � KL j ] = KL divergence between distributions of ( X j , � X j , X − j , � ( � X j , X j , X − j , � X − j ) & X − j )
From KL condition to FDR control Theorem (Barber, C. and Samworth (2018)) For any ǫ ≥ 0 � � # false positives j with � KL j ≤ ǫ ≤ q exp( ǫ ) E # selections
From KL condition to FDR control Theorem (Barber, C. and Samworth (2018)) For any ǫ ≥ 0 � � # false positives j with � KL j ≤ ǫ ≤ q exp( ǫ ) E # selections Corollary � � �� � FDR ≤ min q exp( ǫ ) + P max KL j > ǫ ǫ ≥ 0 null j Information theoretically optimal
New directions
ML inspired knockoffs Joint with S. Bates, Y. Romano, M. Sesia and J. Zhou Knockoffs for graphical models Knockoffs via restricted Boltzmann machines Knockoffs via variational auto-encoders? Knockoffs via generative adversarial networks?
Improving power? Joint with Z. Ren and M. Sesia
Recommend
More recommend