Sailing Through Data: Discoveries and Mirages Emmanuel Cand` es, - PowerPoint PPT Presentation

Sailing Through Data: Discoveries and Mirages Emmanuel Cand` es, Stanford University 2018 Machine Learning Summer School, Buenos Aires, June 2018

Robustness

Robustness 1.00 1.00 Exact Cov ● 0.75 0.75 Power FDR 0.50 0.50 0.25 0.25 ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800 , p = 1500 , and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

Robustness 1.00 1.00 Exact Cov ● 0.75 ● 0.75 Graph. Lasso Power FDR 0.50 0.50 0.25 0.25 ● ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800 , p = 1500 , and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

Robustness 1.00 1.00 Exact Cov 50% Emp. Cov ● 0.75 ● 0.75 Graph. Lasso ● Power FDR 0.50 0.50 0.25 0.25 ● ● ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800 , p = 1500 , and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

Robustness 1.00 1.00 Exact Cov 50% Emp. Cov ● 0.75 ● 0.75 Graph. Lasso ● 62.5% Emp. Cov ● Power FDR 0.50 0.50 0.25 0.25 ● ● ● ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800 , p = 1500 , and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

Robustness 1.00 1.00 Exact Cov 50% Emp. Cov ● 0.75 ● 0.75 Graph. Lasso ● 62.5% Emp. Cov ● ● 75% Emp. Cov Power FDR 0.50 0.50 0.25 0.25 ● ● ● ● ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800 , p = 1500 , and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

Robustness 1.00 1.00 Exact Cov 50% Emp. Cov ● 0.75 ● 0.75 Graph. Lasso ● 62.5% Emp. Cov ● ● 75% Emp. Cov Power FDR 0.50 0.50 ● 87.5% Emp. Cov 0.25 0.25 ● ● ● ● ● ● 0.00 0.00 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800 , p = 1500 , and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

Robustness 1.00 1.00 Exact Cov 50% Emp. Cov ● 0.75 ● 0.75 Graph. Lasso ● 62.5% Emp. Cov ● ● 75% Emp. Cov Power FDR 0.50 0.50 ● 87.5% Emp. Cov 0.25 0.25 ● ● ● ● ● ● 100% Emp. Cov 0.00 0.00 ● ● 0.0 0.5 1.0 0.0 0.5 1.0 Relative Frobenius Norm Error Relative Frobenius Norm Error Figure: Covariates are AR(1) with autocorrelation coefficient 0.3. n = 800 , p = 1500 , and target FDR is 10%. Y | X follows logistic model with 50 nonzero entries

Simulations with synthetic Markov chain Markov chain covariates with 5 hidden states. Binomial response 1.0 1.0 0.8 0.8 0.6 0.6 Power FDP 0.4 0.4 0.2 0.2 0.0 0.0 4 5 6 7 8 9 10 12 15 20 4 5 6 7 8 9 10 12 15 20 Signal amplitude Signal amplitude Figure: Power and FDP over 100 repetitions (true F X ) n = 1000 , p = 1000 , target FDR: α = 0 . 1 Z j = | ˆ β j (ˆ λ CV ) | , W j = Z j − ˜ Z j

Robustness Markov chain covariates with 5 hidden states. Binomial response 1.0 1.0 0.8 0.8 0.6 0.6 Power FDP 0.4 0.4 0.2 0.2 0.0 0.0 4 5 6 7 8 9 10 12 15 20 4 5 6 7 8 9 10 12 15 20 Signal amplitude Signal amplitude Figure: Power and FDP over 100 repetitions (estimated F X ) n = 1000 , p = 1000 , target FDR: α = 0 . 1 Z j = | ˆ β j (ˆ λ CV ) | , W j = Z j − ˜ Z j

Simulations with synthetic HMM HMM covariates with latent “clockwise” Markov chain. Binomial response 1.0 1.0 0.8 0.8 0.6 0.6 Power FDP 0.4 0.4 0.2 0.2 0.0 0.0 3 4 5 6 7 8 9 10 15 20 3 4 5 6 7 8 9 10 15 20 Signal amplitude Signal amplitude Figure: Power and FDP over 100 repetitions (true F X ) n = 1000 , p = 1000 , target FDR: α = 0 . 1 Z j = | ˆ β j (ˆ λ CV ) | , W j = Z j − ˜ Z j

Robustness HMM covariates with latent “clockwise” Markov chain. Binomial response 1.0 1.0 0.8 0.8 0.6 0.6 Power FDP 0.4 0.4 0.2 0.2 0.0 0.0 3 4 5 6 7 8 9 10 15 20 3 4 5 6 7 8 9 10 15 20 Signal amplitude Signal amplitude Figure: Power and FDP over 100 repetitions (estimated F X ) n = 1000 , p = 1000 , target FDR: α = 0 . 1 Z j = | ˆ β j (ˆ λ CV ) | , W j = Z j − ˜ Z j

Out-of-sample parameter estimation Inhomogeneous Markov chain covariates with 5 hidden states. Binomial response 1.0 1.0 0.8 0.8 0.6 0.6 Power FDP 0.4 0.4 0.2 0.2 0.0 0.0 10 25 50 75 100 500 1000 5000 10000 10 25 50 75 100 500 1000 5000 10000 Number of unsupervised observations Number of unsupervised observations Figure: Power and FDP over 100 repetitions (estimated F X from independent dataset) n = 1000 , p = 1000 , target FDR: α = 0 . 1 Z j = | ˆ β j (ˆ λ CV ) | , W j = Z j − ˜ Z j

Model-X knockoff variables (robust version) i.i.d. samples from P XY Distr. P X of X only ‘approx’ known Distr. P Y | X of Y | X completely unknown

Model-X knockoff variables (robust version) i.i.d. samples from P XY Distr. P X of X only ‘approx’ known Distr. P Y | X of Y | X completely unknown Knockoffs wrt. to user input Q X (Barber, C. and Samworth, ’18) X = ( ˜ ˜ X 1 , . . . , ˜ Originals X = ( X 1 , . . . , X p ) Knockoffs X p )

Model-X knockoff variables (robust version) i.i.d. samples from P XY Distr. P X of X only ‘approx’ known Distr. P Y | X of Y | X completely unknown Knockoffs wrt. to user input Q X (Barber, C. and Samworth, ’18) X = ( ˜ ˜ X 1 , . . . , ˜ Originals X = ( X 1 , . . . , X p ) Knockoffs X p ) (1) Pairwise exchangeability wrt Q X : If X ∼ Q X ( X, ˜ d ( X, ˜ X ) swap ( S ) = X ) e.g. ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ d ( X 1 , ˜ X 2 , ˜ X 3 , ˜ X 3 ) swap ( { 2 , 3 } ) = X 1 , X 2 , X 3 )

Model-X knockoff variables (robust version) i.i.d. samples from P XY Distr. P X of X only ‘approx’ known Distr. P Y | X of Y | X completely unknown Knockoffs wrt. to user input Q X (Barber, C. and Samworth, ’18) X = ( ˜ ˜ X 1 , . . . , ˜ Originals X = ( X 1 , . . . , X p ) Knockoffs X p ) (1) Pairwise exchangeability wrt Q X : If X ∼ Q X ( X, ˜ d ( X, ˜ X ) swap ( S ) = X ) e.g. ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ d ( X 1 , ˜ X 2 , ˜ X 3 , ˜ X 3 ) swap ( { 2 , 3 } ) = X 1 , X 2 , X 3 ) (2) Ignore Y when constructing knockoffs: ˜ X ⊥ ⊥ Y | X

Model-X knockoff variables (robust version) i.i.d. samples from P XY Distr. P X of X only ‘approx’ known Distr. P Y | X of Y | X completely unknown Knockoffs wrt. to user input Q X (Barber, C. and Samworth, ’18) X = ( ˜ ˜ X 1 , . . . , ˜ Originals X = ( X 1 , . . . , X p ) Knockoffs X p ) (1) Pairwise exchangeability wrt Q X : If X ∼ Q X ( X, ˜ d ( X, ˜ X ) swap ( S ) = X ) e.g. ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ d ( X 1 , ˜ X 2 , ˜ X 3 , ˜ X 3 ) swap ( { 2 , 3 } ) = X 1 , X 2 , X 3 ) (2) Ignore Y when constructing knockoffs: ˜ X ⊥ ⊥ Y | X Only require conditionals Q ( X j | X − j ) which do not have to be compatible

FDR control ˆ S = { W j ≥ τ } � � t : 1+ |{ j : W j ≤ − t }| ≤ q τ = min 1 ∨ |{ j : W j ≥ t }| � �� FDP ( t ) t

FDR control ˆ S = { W j ≥ τ } � � t : 1+ |{ j : W j ≤ − t }| ≤ q τ = min 1 ∨ |{ j : W j ≥ t }| � �� FDP ( t ) t Theorem (Barber and C. (’15)) If user-input Q X is correct ( Q X = P X ) , then for knockoff+ � # false positives � E ≤ q # selections

Robustness of knockoffs? Does exchangeability hold approx. when Q X � = P X ? 0 __ __ __ __ __ + + + + + + + + + |W| If P X = Q X , coins are unbiased independent Problem: if P X � = Q X , coins may be (slighltly) biased (slightly) dependent

KL divergence condition The KL condition � � � P j ( X ij | X i, − j ) Q j ( � X ij | X i, − j ) � KL j := log ≤ ǫ Q j ( X ij | X i, − j ) P j ( � X ij | X i, − j ) i E [ � KL j ] = KL divergence between distributions of ( X j , � X j , X − j , � ( � X j , X j , X − j , � X − j ) & X − j )

From KL condition to FDR control Theorem (Barber, C. and Samworth (2018)) For any ǫ ≥ 0 � � # false positives j with � KL j ≤ ǫ ≤ q exp( ǫ ) E # selections

From KL condition to FDR control Theorem (Barber, C. and Samworth (2018)) For any ǫ ≥ 0 � � # false positives j with � KL j ≤ ǫ ≤ q exp( ǫ ) E # selections Corollary � � �� FDR ≤ min q exp( ǫ ) + P max KL j > ǫ ǫ ≥ 0 null j Information theoretically optimal

New directions

ML inspired knockoffs Joint with S. Bates, Y. Romano, M. Sesia and J. Zhou Knockoffs for graphical models Knockoffs via restricted Boltzmann machines Knockoffs via variational auto-encoders? Knockoffs via generative adversarial networks?

Improving power? Joint with Z. Ren and M. Sesia

Sailing Through Data: Discoveries and Mirages Emmanuel Cand` es, - PowerPoint PPT Presentation

Sailing Through Data: Discoveries and Mirages Emmanuel Cand` es, Stanford University 2018 Machine Learning Summer School, Buenos Aires, June 2018 Robustness Robustness 1.00 1.00 Exact Cov 0.75 0.75 Power FDR 0.50 0.50 0.25 0.25

Sailing Through Data: Discoveries and Mirages Emmanuel Cand` es, Stanford University 2018 Machine

Twilight Tuesdays Sailing is for You! Welcome to Twilight Tuesdays Sailing Fun, Sailing and

6th Grade Parent Academy Discoveries Introductions Discoveries Classes FOUR TOTAL Two

CORPORATE EXPERIENCE AT DOSC About DOSC Dubai Offshore Sailing Club is a non - profit sailing club

Ulley Sailing Club Where small really is beautiful! Aiming to make sailing affordable and

Overview Introduction to Keppel Bay Sailing Academy Royal Yachting Association (UK)

Air Space, Air Time* * Draft Presentation. Marcos Garcia-Rojo, AAR, Fall 10 Halos, Mirages and

Annual Membership Meeting - November 15, 2019 Sharing the Sailing Community Founded January 12,

Resource discoveries and FDI bonanzas Gerhard Toews University of Oxford Pierre-Louis Vzina

SAILING Study Dolutegravir versus Raltegravir in Treatment Experienced SAILING: Study Design

Man Overboard! It could happen to you - how to be best prepared. Navy Offshore Sailing Varsity

Feedback from April 1 schedule change The elimination of the first sailing from Gabriola (and

TUCSON SAILING CLUB BY: KEITH ROSENBAUM: KEITHROSENBAUM@ICLOUD.COM SAFETY PRESENTATION: AUGUST

The concept of electric sailing Uranus beyond Voyager 2: from recent advances to future

September 2015 Sailing Passion A.S.B.L is an apolitcal and secular nonproft organizaton

Welcome to LaSalle Mariners Yacht Club Womens Sailing Adventure 2017 Who is LMYC? LaSalle

CHALMERS FORMULA SAILING BIO-BASED COMPOSITES IN SAILING? CHALMERS FORMULA SAILING 2019-03-14

Welcome You are invited to sail with the Great Lakes Sailing Adventure on board the yacht,

YOUTH START OF SEASON 2019 To encourage fun safe friendly sailing at our club. Our Mission

MYC Summer Sailing Courses Information for Parents AGENDA The People The Boats The

Oceani ceania o a of S f Sailing iling New C New Caled aledoni onia a 2014 2014 Racing

Association with Recent World-Class Gas Discoveries. Yuval Ben-Gai ILDC Energy Yehezkel Druckman

2 0 0 9 Colorado Governor s s 2 0 0 9 Colorado Governor Cup Regatta Cup Regatta

Steering our future, inspired by the past The 400 th anniversary of the sailing of the