sailing through data discoveries and mirages
play

Sailing Through Data: Discoveries and Mirages Emmanuel Cand` es, - PowerPoint PPT Presentation

Sailing Through Data: Discoveries and Mirages Emmanuel Cand` es, Stanford University 2018 Machine Learning Summer School, Buenos Aires, June 2018 Controlled variable selection 15 10 Crohns disease log 10 ( P ) 10 5 0 1 2 3 4 5


  1. Knockoffs as negative controls ● Original ● Knockoffs ● 4 ● Feature Importance ● 3 ● ● ● ● ● ● ● ● ● ● ● ● ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 200 400 600 800 1000 Variables

  2. Exchangeability of feature importance statistics ● 4 ● Knockoff agnostic feature importance Z ● 3 ● ● ● ● ● ● ● ● , ˜ Z 1 , . . . , ˜ ) = z ([ X , ˜ ● ( Z 1 , . . . , Z p X ] , y ) ● Z p ● ● 2 ● ● ● ● ● ● � �� � � �� � ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● originals knockoffs ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 200 400 600 800 1000

  3. Exchangeability of feature importance statistics ● 4 ● Knockoff agnostic feature importance Z ● 3 ● ● ● ● ● ● ● ● , ˜ Z 1 , . . . , ˜ ) = z ([ X , ˜ ● ( Z 1 , . . . , Z p X ] , y ) ● Z p ● ● 2 ● ● ● ● ● ● � �� � � �� � ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● originals knockoffs ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 200 400 600 800 1000 This lecture Can construct knockoff features such that d ( Z j , ˜ = ( ˜ ⇒ j null = Z j ) Z j , Z j ) d ( Z, ˜ = ( Z, ˜ T subset of nulls ⇒ more generally = Z ) swap ( T ) Z ) ˜ ˜ ˜ Z 1 Z 2 Z p Z 1 Z 2 Z p

  4. Knockoffs-adjusted scores 0 __ __ __ __ + + + + + + + |W| if null Ordering of variables + 1-bit p-values Adjusted scores W j with flip-sign property Combine Z j and ˜ Z j into single (knockoff) score W j W j = w j ( Z j , ˜ w j ( ˜ Z j , Z j ) = − w j ( Z j , ˜ Z j ) Z j ) � Z j > ˜ +1 Z j W j = Z j − ˜ W j = Z j ∨ ˜ e.g. Z j · Z j Z j ≤ ˜ − 1 Z j = ⇒ Conditional on | W | , signs of null W j ’s are i.i.d. coin flips

  5. Selection by sequential testing __ __ __ ... + + + + + + + + + + + + 0 |W| t FDP ( t ) = 1+ |S − ( t ) | S + ( t ) = { j : W j ≥ t } � Select S + ( t ) = ⇒ S − ( t ) = { j : W j ≤ − t } 1 ∨ |S + ( t ) | Theorem (Barber and C. (’15)) Select S + ( τ ) , τ = min { t : � FDP ( t ) ≤ q } Knockoff � # false positives � ≤ q E # selections + q − 1 Knockoff+ � # false positives � ≤ q E # selections

  6. Some Pretty Math... (I Think) Proof Sketch of FDR Control

  7. Why does all this work? � � t : 1+ |S − ( t ) | S + ( t ) = { j : W j ≥ t } τ = min |S + ( t ) | ∨ 1 ≤ q S − ( t ) = { j : W j ≤ − t } 0 + + + + + + + __ __ __ __

  8. Why does all this work? � � t : 1+ |S − ( t ) | S + ( t ) = { j : W j ≥ t } τ = min |S + ( t ) | ∨ 1 ≤ q S − ( t ) = { j : W j ≤ − t } 0 + + + + + + + __ __ __ __ FDP ( τ ) = # { j null : j ∈ S + ( τ ) } # { j : j ∈ S + ( τ ) } ∨ 1

  9. Why does all this work? � � t : 1+ |S − ( t ) | S + ( t ) = { j : W j ≥ t } τ = min |S + ( t ) | ∨ 1 ≤ q S − ( t ) = { j : W j ≤ − t } 0 + + + + + + + __ __ __ __ FDP ( τ ) = # { j null : j ∈ S + ( τ )) } # { j : j ∈ S + ( τ ) } ∨ 1 · 1 + # { j null : j ∈ S − ( τ ) } 1 + # { j null : j ∈ S − ( τ ) }

  10. Why does all this work? � � t : 1+ |S − ( t ) | S + ( t ) = { j : W j ≥ t } τ = min |S + ( t ) | ∨ 1 ≤ q S − ( t ) = { j : W j ≤ − t } 0 + + + + + + + __ __ __ __ V + ( τ ) � �� � # { j null : j ∈ S + ( τ ) } FDP ( τ ) ≤ q · 1 + # { j null : j ∈ S − ( τ ) } � �� � V − ( τ )

  11. Why does all this work? � � t : 1+ |S − ( t ) | S + ( t ) = { j : W j ≥ t } τ = min |S + ( t ) | ∨ 1 ≤ q S − ( t ) = { j : W j ≤ − t } 0 + + + + + + + __ __ __ __ V + ( τ ) � �� � # { j null : j ∈ S + ( τ ) } FDP ( τ ) ≤ q · 1 + # { j null : j ∈ S − ( τ ) } � �� � V − ( τ ) To show � � V + ( τ ) ≤ 1 E 1 + V − ( τ )

  12. Martingales V + ( t ) 1 + V − ( t ) is a (super)martingale wrt F t = { σ ( V ± ( u )) } u ≤ t V + ( t ) V − ( t ) , 0 __ __ + + | W | t if null

  13. Martingales V + ( t ) 1 + V − ( t ) is a (super)martingale wrt F t = { σ ( V ± ( u )) } u ≤ t V + ( t ) V − ( t ) , 0 __ __ + + | W | s t if null

  14. Martingales V + ( t ) 1 + V − ( t ) is a (super)martingale wrt F t = { σ ( V ± ( u )) } u ≤ t V + ( t ) V − ( t ) , 0 __ __ + + | W | s t if null V + ( s ) + V − ( s ) = m Conditioned on V + ( s ) + V − ( s ) , V + ( s ) is hypergeometric

  15. Martingales V + ( t ) 1 + V − ( t ) is a (super)martingale wrt F t = { σ ( V ± ( u )) } u ≤ t V + ( t ) V − ( t ) , 0 __ __ + + | W | s t if null V + ( s ) + V − ( s ) = m Conditioned on V + ( s ) + V − ( s ) , V + ( s ) is hypergeometric � � V + ( s ) V + ( t ) 1 + V − ( s ) | V ± ( t ) , V + ( s ) + V − ( s ) ≤ E 1 + V − ( t )

  16. Optional stopping theorem 0 τ if null   Bin (# nulls , 1 / 2) � �� � � �   V + ( τ ) V + (0)   FDR ≤ q E ≤ q E    ≤ q 1 + V − ( τ ) 1 + V − (0) 

  17. Knockoffs for Random Features Joint with Fan, Janson & Lv

  18. Variable selection in arbitrary models Random pair ( X, Y ) (perhaps thousands/millions of covariates) p ( Y | X ) depends on X through which variables?

  19. Variable selection in arbitrary models Random pair ( X, Y ) (perhaps thousands/millions of covariates) p ( Y | X ) depends on X through which variables? Working definition of null variables Say j ∈ H 0 is null iff Y ⊥ ⊥ X j | X − j

  20. Variable selection in arbitrary models Random pair ( X, Y ) (perhaps thousands/millions of covariates) p ( Y | X ) depends on X through which variables? Working definition of null variables Say j ∈ H 0 is null iff Y ⊥ ⊥ X j | X − j ⇒ non nulls are smallest subset S (Markov blanket) s.t. Local Markov property = Y ⊥ ⊥ { X j } j ∈S c | { X j } j ∈S

  21. Variable selection in arbitrary models Random pair ( X, Y ) (perhaps thousands/millions of covariates) p ( Y | X ) depends on X through which variables? Working definition of null variables Say j ∈ H 0 is null iff Y ⊥ ⊥ X j | X − j ⇒ non nulls are smallest subset S (Markov blanket) s.t. Local Markov property = Y ⊥ ⊥ { X j } j ∈S c | { X j } j ∈S 1 Logistic model: P ( Y = 0 | X ) = 1 + e X ⊤ β If variables X 1: p are not perfectly dependent, then j ∈ H 0 ⇐ ⇒ β j = 0

  22. Knockoff features (random X ) i.i.d. samples from p ( X, Y ) Distribution of X known Distribution of Y | X (likelihood) completely unknown

  23. Knockoff features (random X ) i.i.d. samples from p ( X, Y ) Distribution of X known Distribution of Y | X (likelihood) completely unknown Originals X = ( X 1 , . . . , X p ) X = ( ˜ ˜ X 1 , . . . , ˜ Knockoffs X p )

  24. Knockoff features (random X ) i.i.d. samples from p ( X, Y ) Distribution of X known Distribution of Y | X (likelihood) completely unknown Originals X = ( X 1 , . . . , X p ) X = ( ˜ ˜ X 1 , . . . , ˜ Knockoffs X p ) (1) Pairwise exchangeability d ( X, ˜ ( X, ˜ X ) swap ( S ) = X ) e.g. d ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ ( X 1 , ˜ X 2 , ˜ X 3 , ˜ X 3 ) swap ( { 2 , 3 } ) = X 1 , X 2 , X 3 )

  25. Knockoff features (random X ) i.i.d. samples from p ( X, Y ) Distribution of X known Distribution of Y | X (likelihood) completely unknown Originals X = ( X 1 , . . . , X p ) X = ( ˜ ˜ X 1 , . . . , ˜ Knockoffs X p ) (1) Pairwise exchangeability d ( X, ˜ ( X, ˜ X ) swap ( S ) = X ) e.g. d ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ ( X 1 , ˜ X 2 , ˜ X 3 , ˜ X 3 ) swap ( { 2 , 3 } ) = X 1 , X 2 , X 3 ) (2) ˜ X ⊥ ⊥ Y | X (ignore Y when constructing knockoffs)

  26. Exchangeability of feature importance statistics Theorem (C., Fan, Janson Lv (’16)) For knockoff-agnostic scores and any subset T of nulls d = ( Z, ˜ ( Z, Z ) swap( T ) Z ) This holds no matter the relationship between Y and X This holds conditionally on Y ˜ ˜ ˜ Z 1 Z 2 Z p Z 1 Z 2 Z p

  27. Exchangeability of feature importance statistics Theorem (C., Fan, Janson Lv (’16)) For knockoff-agnostic scores and any subset T of nulls d = ( Z, ˜ ( Z, Z ) swap( T ) Z ) This holds no matter the relationship between Y and X This holds conditionally on Y = ⇒ FDR control (conditional on Y ) no matter the relationship between X and Y ˜ ˜ ˜ Z 1 Z 2 Z p Z 1 Z 2 Z p

  28. Knockoffs for Gaussian features Swapping any subset of original and knockoff features leaves (joint) dist. invariant ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ d X 1 , ˜ X 2 , ˜ e.g. T = { 2 , 3 } X 1 , X 2 , X 3 ) X 3 ) Note ˜ d X = X

  29. Knockoffs for Gaussian features Swapping any subset of original and knockoff features leaves (joint) dist. invariant ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ d X 1 , ˜ X 2 , ˜ e.g. T = { 2 , 3 } X 1 , X 2 , X 3 ) X 3 ) Note ˜ d X = X X ∼ N ( µ , Σ )

  30. Knockoffs for Gaussian features Swapping any subset of original and knockoff features leaves (joint) dist. invariant ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ d X 1 , ˜ X 2 , ˜ e.g. T = { 2 , 3 } X 1 , X 2 , X 3 ) X 3 ) Note ˜ d X = X X ∼ N ( µ , Σ ) Possible solution ( X, ˜ X ) ∼ N ( ∗ , ∗∗ )

  31. Knockoffs for Gaussian features Swapping any subset of original and knockoff features leaves (joint) dist. invariant ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ d X 1 , ˜ X 2 , ˜ e.g. T = { 2 , 3 } X 1 , X 2 , X 3 ) X 3 ) Note ˜ d X = X X ∼ N ( µ , Σ ) Possible solution � µ � � � Σ − diag { s } Σ ( X, ˜ X ) ∼ N ( ∗ , ∗∗ ) ∗ = ∗ ∗ = Σ − diag { s } µ Σ

  32. Knockoffs for Gaussian features Swapping any subset of original and knockoff features leaves (joint) dist. invariant ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ d X 1 , ˜ X 2 , ˜ e.g. T = { 2 , 3 } X 1 , X 2 , X 3 ) X 3 ) Note ˜ d X = X X ∼ N ( µ , Σ ) Possible solution � µ � � � Σ − diag { s } Σ ( X, ˜ X ) ∼ N ( ∗ , ∗∗ ) ∗ = ∗ ∗ = Σ − diag { s } µ Σ s such that ∗∗ � 0

  33. Knockoffs for Gaussian features Swapping any subset of original and knockoff features leaves (joint) dist. invariant ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ d X 1 , ˜ X 2 , ˜ e.g. T = { 2 , 3 } X 1 , X 2 , X 3 ) X 3 ) Note ˜ d X = X X ∼ N ( µ , Σ ) Possible solution � µ � � � Σ − diag { s } Σ ( X, ˜ X ) ∼ N ( ∗ , ∗∗ ) ∗ = ∗ ∗ = Σ − diag { s } µ Σ s such that ∗∗ � 0 Given X , sample ˜ X from ˜ X | X (regression formula) Different from knockoff features for fixed X !

  34. Knockoffs inference with random features Pros: Holds for finite samples No parameters No matter the dependence between Y and X No p-values No matter the dimensionality Cons: Need to know distribution of covariates

  35. Relationship with classical setup Classical MF Knockoffs

  36. Relationship with classical setup Classical MF Knockoffs Observations of X are random 1 Observations of X are fixed Inference is conditional on obs. values 1 Often appropriate in ‘big’ data apps: e.g. SNPs of subjects randomly sampled

  37. Relationship with classical setup Classical MF Knockoffs Observations of X are random 1 Observations of X are fixed Inference is conditional on obs. values Model free 2 Strong model linking Y and X 1 Often appropriate in ‘big’ data apps: e.g. SNPs of subjects randomly sampled 2 Shifts the ‘burden’ of knowledge

  38. Relationship with classical setup Classical MF Knockoffs Observations of X are random 1 Observations of X are fixed Inference is conditional on obs. values Model free 2 Strong model linking Y and X Useful inference even if model inexact 3 Useful inference even if model inexact 1 Often appropriate in ‘big’ data apps: e.g. SNPs of subjects randomly sampled 2 Shifts the ‘burden’ of knowledge 3 More later

  39. Shift in the burden of knowledge When are our assumptions useful? When we have large amounts of unsupervised data (e.g. economic studies with same covariate info but different responses) When we have more prior information about the covariates than about their relationship with a response (e.g. GWAS) When we control the distribution of X (experimental crosses in genetics, gene knockout experiments,...)

  40. Obstacles to obtaining p-values Y | X ∼ Bernoulli(logit( X ⊤ β )) Global Null, AR(1) Design 20 Nonzero Coefficients, AR(1) Design 2000 2000 1500 1500 count count 1000 1000 500 500 0 0 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 P−Values P−Values Figure: Distribution of null logistic regression p-values with n = 500 and p = 200

  41. Obstacles to obtaining p-values P { p -val ≤ . . . % } Sett. (1) Sett. (2) Sett. (3) Sett. (4) 5% 16 . 89% (0 . 37) 19 . 17% (0 . 39) 16 . 88% (0 . 37) 16 . 78% (0 . 37) 1% 6 . 78% (0 . 25) 8 . 49% (0 . 28) 7 . 02% (0 . 26) 7 . 03% (0 . 26) 0 . 1% 1 . 53% (0 . 12) 2 . 27% (0 . 15) 1 . 87% (0 . 14) 2 . 04% (0 . 14) Table: Inflated p-value probabilities with estimated Monte Carlo SEs

  42. Shameless plug: distribution of high-dimensional LRTs Wilks’ phenomenon (1938) d → χ 2 2 log L df 30000 20000 Counts 10000 0 0.00 0.25 0.50 0.75 1.00 P−Values

  43. Shameless plug: distribution of high-dimensional LRTs Sur, Chen, Cand` es (2017) Wilks’ phenomenon (1938) � p � d χ 2 d 2 log L → κ → χ 2 2 log L df df n 12500 30000 10000 20000 7500 Counts Counts 5000 10000 2500 0 0 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 P−Values P−Values

  44. ‘Low’ dim. linear model with dependent covariates Z j = | ˆ β j (ˆ λ CV ) | W j = Z j − ˜ Z j Gaussian Response, p = 1000 Methods 1.00 1.00 Methods BHq Marginal BHq Max Lik. BHq Marginal MF Knockoffs BHq Max Lik. Orig. Knockoffs MF Knockoffs Orig. Knockoffs 0.75 0.75 FDR Power 0.50 0.50 0.25 0.25 0.00 0.00 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 Autocorrelation Coefficient Autocorrelation Coefficient Figure: Low-dimensional setting: n = 3000 , p = 1000

  45. ‘Low’ dim. logistic model with indep. covariates Z j = | ˆ β j (ˆ λ CV ) | W j = Z j − ˜ Z j Binomial Response, p = 1000 Methods 1.00 1.00 Methods BHq Marginal BHq Max Lik. BHq Marginal MF Knockoffs BHq Max Lik. MF Knockoffs 0.75 0.75 FDR Power 0.50 0.50 0.25 0.25 0.00 0.00 6 8 10 6 8 10 Coefficient Amplitude Coefficient Amplitude Figure: Low-dimensional setting: n = 3000 , p = 1000

  46. ‘High’ dim. logistic model with dependent covariates Z j = | ˆ β j (ˆ λ CV ) | W j = Z j − ˜ Z j Binomial Response, p = 6000 Methods 1.00 1.00 Methods BHq Marginal MF Knockoffs BHq Marginal MF Knockoffs 0.75 0.75 FDR Power 0.50 0.50 0.25 0.25 0.00 0.00 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 Autocorrelation Coefficient Autocorrelation Coefficient Figure: High-dimensional setting: n = 3000 , p = 6000

  47. Bayesian knockoff statistics BVS (Bayesian variable selection) Z j = P ( β j � = 0 | y , X ) LCD (Lasso coeff. difference) W j = Z j − ˜ Z j

  48. Bayesian knockoff statistics BVS (Bayesian variable selection) Z j = P ( β j � = 0 | y , X ) LCD (Lasso coeff. difference) W j = Z j − ˜ Z j 1.00 Methods 1.00 Methods BVS Knockoffs BVS Knockoffs LCD Knockoffs LCD Knockoffs 0.75 0.75 Power FDR 0.50 0.50 0.25 0.25 0.00 0.00 5 10 15 5 10 15 Amplitude Amplitude Figure: n = 300 , p = 1000 and Bayesian linear model with 60 expected variables Inference is correct even if prior is wrong or MCMC has not converged

  49. Partial summary No valid p-values even for logistic regression Shifts the burden of knowledge to X (covariates); makes sense in many contexts Robustness: simulations show properties of inference hold even when the model for X is only approximately right. Always have access to these diagnostic checks (later) When assumptions are appropriate � gain a lot of power, and can use sophisticated selection techniques

  50. How to Construct Knockoffs for some Graphical Models Joint with Sabatti & Sesia

  51. A general construction (C., Fan, Janson and Lv, ’16) X 1 , X 2 , X 3 ) d ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ X 3 ) Algorithm Sequential Conditional Independent Pairs for j = { 1 , . . . , p } do Sample ˜ X j from law of X j | X - j , ˜ X 1: j − 1 end

  52. A general construction (C., Fan, Janson and Lv, ’16) X 1 , X 2 , X 3 ) d ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ X 3 ) Algorithm Sequential Conditional Independent Pairs for j = { 1 , . . . , p } do Sample ˜ X j from law of X j | X - j , ˜ X 1: j − 1 end e.g. p = 3

  53. A general construction (C., Fan, Janson and Lv, ’16) X 1 , X 2 , X 3 ) d ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ X 3 ) Algorithm Sequential Conditional Independent Pairs for j = { 1 , . . . , p } do Sample ˜ X j from law of X j | X - j , ˜ X 1: j − 1 end e.g. p = 3 Sample ˜ X 1 from X 1 | X − 1

  54. A general construction (C., Fan, Janson and Lv, ’16) X 1 , X 2 , X 3 ) d ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ X 3 ) Algorithm Sequential Conditional Independent Pairs for j = { 1 , . . . , p } do Sample ˜ X j from law of X j | X - j , ˜ X 1: j − 1 end e.g. p = 3 Sample ˜ X 1 from X 1 | X − 1 Joint law of X, ˜ X 1 is known

  55. A general construction (C., Fan, Janson and Lv, ’16) X 1 , X 2 , X 3 ) d ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ X 3 ) Algorithm Sequential Conditional Independent Pairs for j = { 1 , . . . , p } do Sample ˜ X j from law of X j | X - j , ˜ X 1: j − 1 end e.g. p = 3 Sample ˜ X 1 from X 1 | X − 1 Joint law of X, ˜ X 1 is known Sample ˜ X 2 from X 2 | X − 2 , ˜ X 1

  56. A general construction (C., Fan, Janson and Lv, ’16) X 1 , X 2 , X 3 ) d ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ X 3 ) Algorithm Sequential Conditional Independent Pairs for j = { 1 , . . . , p } do Sample ˜ X j from law of X j | X - j , ˜ X 1: j − 1 end e.g. p = 3 Sample ˜ X 1 from X 1 | X − 1 Joint law of X, ˜ X 1 is known Sample ˜ X 2 from X 2 | X − 2 , ˜ X 1 Joint law of X, ˜ X 1:2 is known

  57. A general construction (C., Fan, Janson and Lv, ’16) X 1 , X 2 , X 3 ) d ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ X 3 ) Algorithm Sequential Conditional Independent Pairs for j = { 1 , . . . , p } do Sample ˜ X j from law of X j | X - j , ˜ X 1: j − 1 end e.g. p = 3 Sample ˜ X 1 from X 1 | X − 1 Joint law of X, ˜ X 1 is known Sample ˜ X 2 from X 2 | X − 2 , ˜ X 1 Joint law of X, ˜ X 1:2 is known Sample ˜ X 3 from X 3 | X − 3 , ˜ X 1:2

  58. A general construction (C., Fan, Janson and Lv, ’16) X 1 , X 2 , X 3 ) d ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ X 3 ) Algorithm Sequential Conditional Independent Pairs for j = { 1 , . . . , p } do Sample ˜ X j from law of X j | X - j , ˜ X 1: j − 1 end e.g. p = 3 Sample ˜ X 1 from X 1 | X − 1 Joint law of X, ˜ X 1 is known Sample ˜ X 2 from X 2 | X − 2 , ˜ X 1 Joint law of X, ˜ X 1:2 is known Sample ˜ X 3 from X 3 | X − 3 , ˜ X 1:2 Joint law of X, ˜ X is known and is pairwise exchangeable!

  59. A general construction (C., Fan, Janson and Lv, ’16) X 1 , X 2 , X 3 ) d ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ X 3 ) Algorithm Sequential Conditional Independent Pairs for j = { 1 , . . . , p } do Sample ˜ X j from law of X j | X - j , ˜ X 1: j − 1 end e.g. p = 3 Sample ˜ X 1 from X 1 | X − 1 Joint law of X, ˜ X 1 is known Sample ˜ X 2 from X 2 | X − 2 , ˜ X 1 Joint law of X, ˜ X 1:2 is known Sample ˜ X 3 from X 3 | X − 3 , ˜ X 1:2 Joint law of X, ˜ X is known and is pairwise exchangeable! Usually not practical, easy in some cases (e.g. Markov chains)

  60. Knockoff copies of a Markov chain X = ( X 1 , X 2 , . . . , X p ) is a Markov chain p � Q j ( X j | X j − 1 ) ( X ∼ MC ( q 1 , Q ) ) p ( X 1 , . . . , X p ) = q 1 ( X 1 ) j =2 X 1 X 2 X 3 X 4 Observed variables ˜ ˜ ˜ ˜ X 1 X 2 X 3 X 4 Knockoff variables

  61. Knockoff copies of a Markov chain X = ( X 1 , X 2 , . . . , X p ) is a Markov chain p � Q j ( X j | X j − 1 ) ( X ∼ MC ( q 1 , Q ) ) p ( X 1 , . . . , X p ) = q 1 ( X 1 ) j =2 X 1 X 2 X 3 X 4 Observed variables ˜ ˜ ˜ ˜ X 1 X 2 X 3 X 4 Knockoff variables

  62. Knockoff copies of a Markov chain X = ( X 1 , X 2 , . . . , X p ) is a Markov chain p � Q j ( X j | X j − 1 ) ( X ∼ MC ( q 1 , Q ) ) p ( X 1 , . . . , X p ) = q 1 ( X 1 ) j =2 X 1 X 2 X 3 X 4 Observed variables ˜ ˜ ˜ ˜ X 1 X 2 X 3 X 4 Knockoff variables

  63. Knockoff copies of a Markov chain X = ( X 1 , X 2 , . . . , X p ) is a Markov chain p � Q j ( X j | X j − 1 ) ( X ∼ MC ( q 1 , Q ) ) p ( X 1 , . . . , X p ) = q 1 ( X 1 ) j =2 X 1 X 2 X 3 X 4 Observed variables ˜ ˜ ˜ ˜ X 1 X 2 X 3 X 4 Knockoff variables

  64. Knockoff copies of a Markov chain X = ( X 1 , X 2 , . . . , X p ) is a Markov chain p � Q j ( X j | X j − 1 ) ( X ∼ MC ( q 1 , Q ) ) p ( X 1 , . . . , X p ) = q 1 ( X 1 ) j =2 X 1 X 2 X 3 X 4 Observed variables ˜ ˜ ˜ ˜ X 1 X 2 X 3 X 4 Knockoff variables

  65. Knockoff copies of a Markov chain X = ( X 1 , X 2 , . . . , X p ) is a Markov chain p � Q j ( X j | X j − 1 ) ( X ∼ MC ( q 1 , Q ) ) p ( X 1 , . . . , X p ) = q 1 ( X 1 ) j =2 X 1 X 2 X 3 X 4 Observed variables ˜ ˜ ˜ ˜ X 1 X 2 X 3 X 4 Knockoff variables General algorithm can be implemented efficiently in the case of a Markov chain

  66. Knockoff copies of a Markov chain X = ( X 1 , X 2 , . . . , X p ) is a Markov chain p � Q j ( X j | X j − 1 ) ( X ∼ MC ( q 1 , Q ) ) p ( X 1 , . . . , X p ) = q 1 ( X 1 ) j =2 X 1 X 2 X 3 X 4 Observed variables ˜ ˜ ˜ ˜ X 1 X 2 X 3 X 4 Knockoff variables General algorithm can be implemented efficiently in the case of a Markov chain

  67. Knockoff copies of a Markov chain X = ( X 1 , X 2 , . . . , X p ) is a Markov chain p � Q j ( X j | X j − 1 ) ( X ∼ MC ( q 1 , Q ) ) p ( X 1 , . . . , X p ) = q 1 ( X 1 ) j =2 X 1 X 2 X 3 X 4 Observed variables ˜ ˜ ˜ ˜ X 1 X 2 X 3 X 4 Knockoff variables General algorithm can be implemented efficiently in the case of a Markov chain

  68. Knockoff copies of a Markov chain X = ( X 1 , X 2 , . . . , X p ) is a Markov chain p � Q j ( X j | X j − 1 ) ( X ∼ MC ( q 1 , Q ) ) p ( X 1 , . . . , X p ) = q 1 ( X 1 ) j =2 X 1 X 2 X 3 X 4 Observed variables ˜ ˜ ˜ ˜ X 1 X 2 X 3 X 4 Knockoff variables General algorithm can be implemented efficiently in the case of a Markov chain

  69. Knockoff copies of a Markov chain X = ( X 1 , X 2 , . . . , X p ) is a Markov chain p � Q j ( X j | X j − 1 ) ( X ∼ MC ( q 1 , Q ) ) p ( X 1 , . . . , X p ) = q 1 ( X 1 ) j =2 X 1 X 2 X 3 X 4 Observed variables ˜ ˜ ˜ ˜ X 1 X 2 X 3 X 4 Knockoff variables General algorithm can be implemented efficiently in the case of a Markov chain

  70. Knockoff copies of a Markov chain X = ( X 1 , X 2 , . . . , X p ) is a Markov chain p � Q j ( X j | X j − 1 ) ( X ∼ MC ( q 1 , Q ) ) p ( X 1 , . . . , X p ) = q 1 ( X 1 ) j =2 X 1 X 2 X 3 X 4 Observed variables ˜ ˜ ˜ ˜ X 1 X 2 X 3 X 4 Knockoff variables General algorithm can be implemented efficiently in the case of a Markov chain

  71. Recursive update of normalizing constants

  72. Sampling ˜ X 1 p ( X 1 | X − 1 ) = p ( X 1 | X 2 )

  73. Sampling ˜ X 1 p ( X 1 | X − 1 ) = p ( X 1 | X 2 ) = p ( X 1 , X 2 ) p ( X 2 )

  74. Sampling ˜ X 1 = q 1 ( X 1 ) Q 2 ( X 2 | X 1 ) p ( X 1 | X − 1 ) = p ( X 1 | X 2 ) = p ( X 1 , X 2 ) p ( X 2 ) Z 1 ( X 2 ) � Z 1 ( z ) = q 1 ( u ) Q 2 ( z | u ) u

  75. Sampling ˜ X 1 = q 1 ( X 1 ) Q 2 ( X 2 | X 1 ) p ( X 1 | X − 1 ) = p ( X 1 | X 2 ) = p ( X 1 , X 2 ) p ( X 2 ) Z 1 ( X 2 ) � Z 1 ( z ) = q 1 ( u ) Q 2 ( z | u ) u Sampling ˜ X 2 p ( X 2 | X − 2 , ˜ X 1 ) = p ( X 2 | X 1 , X 3 , ˜ X 1 )

  76. Sampling ˜ X 1 = q 1 ( X 1 ) Q 2 ( X 2 | X 1 ) p ( X 1 | X − 1 ) = p ( X 1 | X 2 ) = p ( X 1 , X 2 ) p ( X 2 ) Z 1 ( X 2 ) � Z 1 ( z ) = q 1 ( u ) Q 2 ( z | u ) u Sampling ˜ X 2 X 1 ) ∝ Q 2 ( X 2 | X 1 ) Q 3 ( X 3 | X 2 ) Q 2 ( X 2 | ˜ X 1 ) p ( X 2 | X − 2 , ˜ X 1 ) = p ( X 2 | X 1 , X 3 , ˜ Z 1 ( X 2 )

  77. Sampling ˜ X 1 = q 1 ( X 1 ) Q 2 ( X 2 | X 1 ) p ( X 1 | X − 1 ) = p ( X 1 | X 2 ) = p ( X 1 , X 2 ) p ( X 2 ) Z 1 ( X 2 ) � Z 1 ( z ) = q 1 ( u ) Q 2 ( z | u ) u Sampling ˜ X 2 X 1 ) ∝ Q 2 ( X 2 | X 1 ) Q 3 ( X 3 | X 2 ) Q 2 ( X 2 | ˜ X 1 ) p ( X 2 | X − 2 , ˜ X 1 ) = p ( X 2 | X 1 , X 3 , ˜ Z 1 ( X 2 ) normalization constant Z 2 ( X 3 ) � Q 2 ( u | X 1 ) Q 3 ( z | u ) Q 2 ( u | ˜ X 1 ) Z 2 ( z ) = Z 1 ( u ) u

  78. Sampling ˜ X 3 p ( X 3 | X − 3 , ˜ X 1 , ˜ X 2 ) = p ( X 3 | X 2 , X 4 , ˜ X 1 , ˜ X 2 )

  79. Sampling ˜ X 3 p ( X 3 | X − 3 , ˜ X 1 , ˜ X 2 ) = p ( X 3 | X 2 , X 4 , ˜ X 1 , ˜ X 2 ) ∝ Q 3 ( X 3 | X 2 ) Q 4 ( X 4 | X 3 ) Q 3 ( X 3 | ˜ X 2 ) Z 2 ( X 3 )

  80. Sampling ˜ X 3 p ( X 3 | X − 3 , ˜ X 1 , ˜ X 2 ) = p ( X 3 | X 2 , X 4 , ˜ X 1 , ˜ X 2 ) ∝ Q 3 ( X 3 | X 2 ) Q 4 ( X 4 | X 3 ) Q 3 ( X 3 | ˜ X 2 ) Z 2 ( X 3 ) normalization constant Z 3 ( X 4 ) � Q 3 ( u | X 2 ) Q 4 ( z | u ) Q 3 ( u | ˜ X 2 ) Z 3 ( z ) = Z 2 ( u ) u

  81. Sampling ˜ X 3 p ( X 3 | X − 3 , ˜ X 1 , ˜ X 2 ) = p ( X 3 | X 2 , X 4 , ˜ X 1 , ˜ X 2 ) ∝ Q 3 ( X 3 | X 2 ) Q 4 ( X 4 | X 3 ) Q 3 ( X 3 | ˜ X 2 ) Z 2 ( X 3 ) normalization constant Z 3 ( X 4 ) � Q 3 ( u | X 2 ) Q 4 ( z | u ) Q 3 ( u | ˜ X 2 ) Z 3 ( z ) = Z 2 ( u ) u And so on sampling ˜ X j ... Computationally efficient O ( p )

  82. Hidden Markov Models (HMMs) X = ( X 1 , X 2 , . . . , X p ) is a HMM if � H ∼ MC ( q 1 , Q ) (latent Markov chain) ind. X j | H ∼ X j | H j ∼ f j ( X j ; H j ) (emission distribution) H 1 H 2 H 3 X 1 X 2 X 3

  83. Hidden Markov Models (HMMs) X = ( X 1 , X 2 , . . . , X p ) is a HMM if � H ∼ MC ( q 1 , Q ) (latent Markov chain) ind. X j | H ∼ X j | H j ∼ f j ( X j ; H j ) (emission distribution) H 1 H 2 H 3 X 1 X 2 X 3

  84. Hidden Markov Models (HMMs) X = ( X 1 , X 2 , . . . , X p ) is a HMM if � H ∼ MC ( q 1 , Q ) (latent Markov chain) ind. X j | H ∼ X j | H j ∼ f j ( X j ; H j ) (emission distribution) H 1 H 2 H 3 X 1 X 2 X 3

  85. Hidden Markov Models (HMMs) X = ( X 1 , X 2 , . . . , X p ) is a HMM if � H ∼ MC ( q 1 , Q ) (latent Markov chain) ind. X j | H ∼ X j | H j ∼ f j ( X j ; H j ) (emission distribution) H 1 H 2 H 3 X 1 X 2 X 3

  86. Hidden Markov Models (HMMs) X = ( X 1 , X 2 , . . . , X p ) is a HMM if � H ∼ MC ( q 1 , Q ) (latent Markov chain) ind. X j | H ∼ X j | H j ∼ f j ( X j ; H j ) (emission distribution) H 1 H 2 H 3 X 1 X 2 X 3

  87. Hidden Markov Models (HMMs) X = ( X 1 , X 2 , . . . , X p ) is a HMM if � H ∼ MC ( q 1 , Q ) (latent Markov chain) ind. X j | H ∼ X j | H j ∼ f j ( X j ; H j ) (emission distribution) H 1 H 2 H 3 X 1 X 2 X 3 The H variables are latent and only the X variables are observed

  88. Haplotypes and genotypes Haplotype Set of alleles on a single chromosome 0/1 for common/rare allele Genotype Unordered pair of alleles at a single marker 0 1 0 1 1 0 Haplotype M + 1 1 0 0 1 1 Haplotype P 1 2 0 1 2 1 Genotypes

  89. A phenomenological HMM for haplotype & genotype data Figure: Six haplotypes: color indicates ‘ancestor’ at each marker (Scheet, ’06)

  90. A phenomenological HMM for haplotype & genotype data Figure: Six haplotypes: color indicates ‘ancestor’ at each marker (Scheet, ’06) Haplotype estimation/phasing (Browning, ’11) Imputation of missing SNPs (Marchini, ’10) IMPUTE (Marchini, ’07) fastPHASE (Scheet, ’06) MaCH (Li, ’10)

  91. A phenomenological HMM for haplotype & genotype data Figure: Six haplotypes: color indicates ‘ancestor’ at each marker (Scheet, ’06) Haplotype estimation/phasing (Browning, ’11) Imputation of missing SNPs (Marchini, ’10) IMPUTE (Marchini, ’07) fastPHASE (Scheet, ’06) MaCH (Li, ’10) New application of same HMM: generation of knockoff copies of genotypes! Each genotype: sum of two independent HMM haplotype sequences

  92. Knockoff copies of a hidden Markov model Theorem (Sesia, Sabatti, C. ’17) A knockoff copy of ˜ X of X can be constructed as ˜ ˜ ˜ H 1 H 2 H 3 H 1 H 2 H 1 latent variables knockoff latent variables ˜ ˜ ˜ X 1 X 2 X 3 X 1 X 2 X 3 observed variables knockoff variables

  93. Knockoff copies of a hidden Markov model Theorem (Sesia, Sabatti, C. ’17) A knockoff copy of ˜ X of X can be constructed as (1) Sample H from p ( H | X ) using forward-backward algorithm ˜ ˜ ˜ H 1 H 2 H 3 H 1 H 2 H 1 imputed latent variables knockoff latent variables ˜ ˜ ˜ X 1 X 2 X 3 X 1 X 2 X 3 observed variables knockoff variables

  94. Knockoff copies of a hidden Markov model Theorem (Sesia, Sabatti, C. ’17) A knockoff copy of ˜ X of X can be constructed as (1) Sample H from p ( H | X ) using forward-backward algorithm (2) Generate a knockoff ˜ H of H using the SCIP algorithm for a Markov chain ˜ ˜ ˜ H 1 H 2 H 3 H 1 H 2 H 1 imputed latent variables knockoff latent variables ˜ ˜ ˜ X 1 X 2 X 3 X 1 X 2 X 3 observed variables knockoff variables

  95. Knockoff copies of a hidden Markov model Theorem (Sesia, Sabatti, C. ’17) A knockoff copy of ˜ X of X can be constructed as (1) Sample H from p ( H | X ) using forward-backward algorithm (2) Generate a knockoff ˜ H of H using the SCIP algorithm for a Markov chain (3) Sample ˜ X from the emission distribution of X given H = ˜ H ˜ ˜ ˜ H 1 H 2 H 3 H 1 H 2 H 1 imputed latent variables knockoff latent variables ˜ ˜ ˜ X 1 X 2 X 3 X 1 X 2 X 3 observed variables knockoff variables

Recommend


More recommend