Knockoffs as negative controls ● Original ● Knockoffs ● 4 ● Feature Importance ● 3 ● ● ● ● ● ● ● ● ● ● ● ● ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 200 400 600 800 1000 Variables
Exchangeability of feature importance statistics ● 4 ● Knockoff agnostic feature importance Z ● 3 ● ● ● ● ● ● ● ● , ˜ Z 1 , . . . , ˜ ) = z ([ X , ˜ ● ( Z 1 , . . . , Z p X ] , y ) ● Z p ● ● 2 ● ● ● ● ● ● � �� � � �� � ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● originals knockoffs ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 200 400 600 800 1000
Exchangeability of feature importance statistics ● 4 ● Knockoff agnostic feature importance Z ● 3 ● ● ● ● ● ● ● ● , ˜ Z 1 , . . . , ˜ ) = z ([ X , ˜ ● ( Z 1 , . . . , Z p X ] , y ) ● Z p ● ● 2 ● ● ● ● ● ● � �� � � �� � ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● originals knockoffs ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 200 400 600 800 1000 This lecture Can construct knockoff features such that d ( Z j , ˜ = ( ˜ ⇒ j null = Z j ) Z j , Z j ) d ( Z, ˜ = ( Z, ˜ T subset of nulls ⇒ more generally = Z ) swap ( T ) Z ) ˜ ˜ ˜ Z 1 Z 2 Z p Z 1 Z 2 Z p
Knockoffs-adjusted scores 0 __ __ __ __ + + + + + + + |W| if null Ordering of variables + 1-bit p-values Adjusted scores W j with flip-sign property Combine Z j and ˜ Z j into single (knockoff) score W j W j = w j ( Z j , ˜ w j ( ˜ Z j , Z j ) = − w j ( Z j , ˜ Z j ) Z j ) � Z j > ˜ +1 Z j W j = Z j − ˜ W j = Z j ∨ ˜ e.g. Z j · Z j Z j ≤ ˜ − 1 Z j = ⇒ Conditional on | W | , signs of null W j ’s are i.i.d. coin flips
Selection by sequential testing __ __ __ ... + + + + + + + + + + + + 0 |W| t FDP ( t ) = 1+ |S − ( t ) | S + ( t ) = { j : W j ≥ t } � Select S + ( t ) = ⇒ S − ( t ) = { j : W j ≤ − t } 1 ∨ |S + ( t ) | Theorem (Barber and C. (’15)) Select S + ( τ ) , τ = min { t : � FDP ( t ) ≤ q } Knockoff � # false positives � ≤ q E # selections + q − 1 Knockoff+ � # false positives � ≤ q E # selections
Some Pretty Math... (I Think) Proof Sketch of FDR Control
Why does all this work? � � t : 1+ |S − ( t ) | S + ( t ) = { j : W j ≥ t } τ = min |S + ( t ) | ∨ 1 ≤ q S − ( t ) = { j : W j ≤ − t } 0 + + + + + + + __ __ __ __
Why does all this work? � � t : 1+ |S − ( t ) | S + ( t ) = { j : W j ≥ t } τ = min |S + ( t ) | ∨ 1 ≤ q S − ( t ) = { j : W j ≤ − t } 0 + + + + + + + __ __ __ __ FDP ( τ ) = # { j null : j ∈ S + ( τ ) } # { j : j ∈ S + ( τ ) } ∨ 1
Why does all this work? � � t : 1+ |S − ( t ) | S + ( t ) = { j : W j ≥ t } τ = min |S + ( t ) | ∨ 1 ≤ q S − ( t ) = { j : W j ≤ − t } 0 + + + + + + + __ __ __ __ FDP ( τ ) = # { j null : j ∈ S + ( τ )) } # { j : j ∈ S + ( τ ) } ∨ 1 · 1 + # { j null : j ∈ S − ( τ ) } 1 + # { j null : j ∈ S − ( τ ) }
Why does all this work? � � t : 1+ |S − ( t ) | S + ( t ) = { j : W j ≥ t } τ = min |S + ( t ) | ∨ 1 ≤ q S − ( t ) = { j : W j ≤ − t } 0 + + + + + + + __ __ __ __ V + ( τ ) � �� � # { j null : j ∈ S + ( τ ) } FDP ( τ ) ≤ q · 1 + # { j null : j ∈ S − ( τ ) } � �� � V − ( τ )
Why does all this work? � � t : 1+ |S − ( t ) | S + ( t ) = { j : W j ≥ t } τ = min |S + ( t ) | ∨ 1 ≤ q S − ( t ) = { j : W j ≤ − t } 0 + + + + + + + __ __ __ __ V + ( τ ) � �� � # { j null : j ∈ S + ( τ ) } FDP ( τ ) ≤ q · 1 + # { j null : j ∈ S − ( τ ) } � �� � V − ( τ ) To show � � V + ( τ ) ≤ 1 E 1 + V − ( τ )
Martingales V + ( t ) 1 + V − ( t ) is a (super)martingale wrt F t = { σ ( V ± ( u )) } u ≤ t V + ( t ) V − ( t ) , 0 __ __ + + | W | t if null
Martingales V + ( t ) 1 + V − ( t ) is a (super)martingale wrt F t = { σ ( V ± ( u )) } u ≤ t V + ( t ) V − ( t ) , 0 __ __ + + | W | s t if null
Martingales V + ( t ) 1 + V − ( t ) is a (super)martingale wrt F t = { σ ( V ± ( u )) } u ≤ t V + ( t ) V − ( t ) , 0 __ __ + + | W | s t if null V + ( s ) + V − ( s ) = m Conditioned on V + ( s ) + V − ( s ) , V + ( s ) is hypergeometric
Martingales V + ( t ) 1 + V − ( t ) is a (super)martingale wrt F t = { σ ( V ± ( u )) } u ≤ t V + ( t ) V − ( t ) , 0 __ __ + + | W | s t if null V + ( s ) + V − ( s ) = m Conditioned on V + ( s ) + V − ( s ) , V + ( s ) is hypergeometric � � V + ( s ) V + ( t ) 1 + V − ( s ) | V ± ( t ) , V + ( s ) + V − ( s ) ≤ E 1 + V − ( t )
Optional stopping theorem 0 τ if null Bin (# nulls , 1 / 2) � �� � � � V + ( τ ) V + (0) FDR ≤ q E ≤ q E ≤ q 1 + V − ( τ ) 1 + V − (0)
Knockoffs for Random Features Joint with Fan, Janson & Lv
Variable selection in arbitrary models Random pair ( X, Y ) (perhaps thousands/millions of covariates) p ( Y | X ) depends on X through which variables?
Variable selection in arbitrary models Random pair ( X, Y ) (perhaps thousands/millions of covariates) p ( Y | X ) depends on X through which variables? Working definition of null variables Say j ∈ H 0 is null iff Y ⊥ ⊥ X j | X − j
Variable selection in arbitrary models Random pair ( X, Y ) (perhaps thousands/millions of covariates) p ( Y | X ) depends on X through which variables? Working definition of null variables Say j ∈ H 0 is null iff Y ⊥ ⊥ X j | X − j ⇒ non nulls are smallest subset S (Markov blanket) s.t. Local Markov property = Y ⊥ ⊥ { X j } j ∈S c | { X j } j ∈S
Variable selection in arbitrary models Random pair ( X, Y ) (perhaps thousands/millions of covariates) p ( Y | X ) depends on X through which variables? Working definition of null variables Say j ∈ H 0 is null iff Y ⊥ ⊥ X j | X − j ⇒ non nulls are smallest subset S (Markov blanket) s.t. Local Markov property = Y ⊥ ⊥ { X j } j ∈S c | { X j } j ∈S 1 Logistic model: P ( Y = 0 | X ) = 1 + e X ⊤ β If variables X 1: p are not perfectly dependent, then j ∈ H 0 ⇐ ⇒ β j = 0
Knockoff features (random X ) i.i.d. samples from p ( X, Y ) Distribution of X known Distribution of Y | X (likelihood) completely unknown
Knockoff features (random X ) i.i.d. samples from p ( X, Y ) Distribution of X known Distribution of Y | X (likelihood) completely unknown Originals X = ( X 1 , . . . , X p ) X = ( ˜ ˜ X 1 , . . . , ˜ Knockoffs X p )
Knockoff features (random X ) i.i.d. samples from p ( X, Y ) Distribution of X known Distribution of Y | X (likelihood) completely unknown Originals X = ( X 1 , . . . , X p ) X = ( ˜ ˜ X 1 , . . . , ˜ Knockoffs X p ) (1) Pairwise exchangeability d ( X, ˜ ( X, ˜ X ) swap ( S ) = X ) e.g. d ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ ( X 1 , ˜ X 2 , ˜ X 3 , ˜ X 3 ) swap ( { 2 , 3 } ) = X 1 , X 2 , X 3 )
Knockoff features (random X ) i.i.d. samples from p ( X, Y ) Distribution of X known Distribution of Y | X (likelihood) completely unknown Originals X = ( X 1 , . . . , X p ) X = ( ˜ ˜ X 1 , . . . , ˜ Knockoffs X p ) (1) Pairwise exchangeability d ( X, ˜ ( X, ˜ X ) swap ( S ) = X ) e.g. d ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ ( X 1 , ˜ X 2 , ˜ X 3 , ˜ X 3 ) swap ( { 2 , 3 } ) = X 1 , X 2 , X 3 ) (2) ˜ X ⊥ ⊥ Y | X (ignore Y when constructing knockoffs)
Exchangeability of feature importance statistics Theorem (C., Fan, Janson Lv (’16)) For knockoff-agnostic scores and any subset T of nulls d = ( Z, ˜ ( Z, Z ) swap( T ) Z ) This holds no matter the relationship between Y and X This holds conditionally on Y ˜ ˜ ˜ Z 1 Z 2 Z p Z 1 Z 2 Z p
Exchangeability of feature importance statistics Theorem (C., Fan, Janson Lv (’16)) For knockoff-agnostic scores and any subset T of nulls d = ( Z, ˜ ( Z, Z ) swap( T ) Z ) This holds no matter the relationship between Y and X This holds conditionally on Y = ⇒ FDR control (conditional on Y ) no matter the relationship between X and Y ˜ ˜ ˜ Z 1 Z 2 Z p Z 1 Z 2 Z p
Knockoffs for Gaussian features Swapping any subset of original and knockoff features leaves (joint) dist. invariant ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ d X 1 , ˜ X 2 , ˜ e.g. T = { 2 , 3 } X 1 , X 2 , X 3 ) X 3 ) Note ˜ d X = X
Knockoffs for Gaussian features Swapping any subset of original and knockoff features leaves (joint) dist. invariant ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ d X 1 , ˜ X 2 , ˜ e.g. T = { 2 , 3 } X 1 , X 2 , X 3 ) X 3 ) Note ˜ d X = X X ∼ N ( µ , Σ )
Knockoffs for Gaussian features Swapping any subset of original and knockoff features leaves (joint) dist. invariant ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ d X 1 , ˜ X 2 , ˜ e.g. T = { 2 , 3 } X 1 , X 2 , X 3 ) X 3 ) Note ˜ d X = X X ∼ N ( µ , Σ ) Possible solution ( X, ˜ X ) ∼ N ( ∗ , ∗∗ )
Knockoffs for Gaussian features Swapping any subset of original and knockoff features leaves (joint) dist. invariant ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ d X 1 , ˜ X 2 , ˜ e.g. T = { 2 , 3 } X 1 , X 2 , X 3 ) X 3 ) Note ˜ d X = X X ∼ N ( µ , Σ ) Possible solution � µ � � � Σ − diag { s } Σ ( X, ˜ X ) ∼ N ( ∗ , ∗∗ ) ∗ = ∗ ∗ = Σ − diag { s } µ Σ
Knockoffs for Gaussian features Swapping any subset of original and knockoff features leaves (joint) dist. invariant ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ d X 1 , ˜ X 2 , ˜ e.g. T = { 2 , 3 } X 1 , X 2 , X 3 ) X 3 ) Note ˜ d X = X X ∼ N ( µ , Σ ) Possible solution � µ � � � Σ − diag { s } Σ ( X, ˜ X ) ∼ N ( ∗ , ∗∗ ) ∗ = ∗ ∗ = Σ − diag { s } µ Σ s such that ∗∗ � 0
Knockoffs for Gaussian features Swapping any subset of original and knockoff features leaves (joint) dist. invariant ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ d X 1 , ˜ X 2 , ˜ e.g. T = { 2 , 3 } X 1 , X 2 , X 3 ) X 3 ) Note ˜ d X = X X ∼ N ( µ , Σ ) Possible solution � µ � � � Σ − diag { s } Σ ( X, ˜ X ) ∼ N ( ∗ , ∗∗ ) ∗ = ∗ ∗ = Σ − diag { s } µ Σ s such that ∗∗ � 0 Given X , sample ˜ X from ˜ X | X (regression formula) Different from knockoff features for fixed X !
Knockoffs inference with random features Pros: Holds for finite samples No parameters No matter the dependence between Y and X No p-values No matter the dimensionality Cons: Need to know distribution of covariates
Relationship with classical setup Classical MF Knockoffs
Relationship with classical setup Classical MF Knockoffs Observations of X are random 1 Observations of X are fixed Inference is conditional on obs. values 1 Often appropriate in ‘big’ data apps: e.g. SNPs of subjects randomly sampled
Relationship with classical setup Classical MF Knockoffs Observations of X are random 1 Observations of X are fixed Inference is conditional on obs. values Model free 2 Strong model linking Y and X 1 Often appropriate in ‘big’ data apps: e.g. SNPs of subjects randomly sampled 2 Shifts the ‘burden’ of knowledge
Relationship with classical setup Classical MF Knockoffs Observations of X are random 1 Observations of X are fixed Inference is conditional on obs. values Model free 2 Strong model linking Y and X Useful inference even if model inexact 3 Useful inference even if model inexact 1 Often appropriate in ‘big’ data apps: e.g. SNPs of subjects randomly sampled 2 Shifts the ‘burden’ of knowledge 3 More later
Shift in the burden of knowledge When are our assumptions useful? When we have large amounts of unsupervised data (e.g. economic studies with same covariate info but different responses) When we have more prior information about the covariates than about their relationship with a response (e.g. GWAS) When we control the distribution of X (experimental crosses in genetics, gene knockout experiments,...)
Obstacles to obtaining p-values Y | X ∼ Bernoulli(logit( X ⊤ β )) Global Null, AR(1) Design 20 Nonzero Coefficients, AR(1) Design 2000 2000 1500 1500 count count 1000 1000 500 500 0 0 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 P−Values P−Values Figure: Distribution of null logistic regression p-values with n = 500 and p = 200
Obstacles to obtaining p-values P { p -val ≤ . . . % } Sett. (1) Sett. (2) Sett. (3) Sett. (4) 5% 16 . 89% (0 . 37) 19 . 17% (0 . 39) 16 . 88% (0 . 37) 16 . 78% (0 . 37) 1% 6 . 78% (0 . 25) 8 . 49% (0 . 28) 7 . 02% (0 . 26) 7 . 03% (0 . 26) 0 . 1% 1 . 53% (0 . 12) 2 . 27% (0 . 15) 1 . 87% (0 . 14) 2 . 04% (0 . 14) Table: Inflated p-value probabilities with estimated Monte Carlo SEs
Shameless plug: distribution of high-dimensional LRTs Wilks’ phenomenon (1938) d → χ 2 2 log L df 30000 20000 Counts 10000 0 0.00 0.25 0.50 0.75 1.00 P−Values
Shameless plug: distribution of high-dimensional LRTs Sur, Chen, Cand` es (2017) Wilks’ phenomenon (1938) � p � d χ 2 d 2 log L → κ → χ 2 2 log L df df n 12500 30000 10000 20000 7500 Counts Counts 5000 10000 2500 0 0 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 P−Values P−Values
‘Low’ dim. linear model with dependent covariates Z j = | ˆ β j (ˆ λ CV ) | W j = Z j − ˜ Z j Gaussian Response, p = 1000 Methods 1.00 1.00 Methods BHq Marginal BHq Max Lik. BHq Marginal MF Knockoffs BHq Max Lik. Orig. Knockoffs MF Knockoffs Orig. Knockoffs 0.75 0.75 FDR Power 0.50 0.50 0.25 0.25 0.00 0.00 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 Autocorrelation Coefficient Autocorrelation Coefficient Figure: Low-dimensional setting: n = 3000 , p = 1000
‘Low’ dim. logistic model with indep. covariates Z j = | ˆ β j (ˆ λ CV ) | W j = Z j − ˜ Z j Binomial Response, p = 1000 Methods 1.00 1.00 Methods BHq Marginal BHq Max Lik. BHq Marginal MF Knockoffs BHq Max Lik. MF Knockoffs 0.75 0.75 FDR Power 0.50 0.50 0.25 0.25 0.00 0.00 6 8 10 6 8 10 Coefficient Amplitude Coefficient Amplitude Figure: Low-dimensional setting: n = 3000 , p = 1000
‘High’ dim. logistic model with dependent covariates Z j = | ˆ β j (ˆ λ CV ) | W j = Z j − ˜ Z j Binomial Response, p = 6000 Methods 1.00 1.00 Methods BHq Marginal MF Knockoffs BHq Marginal MF Knockoffs 0.75 0.75 FDR Power 0.50 0.50 0.25 0.25 0.00 0.00 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 Autocorrelation Coefficient Autocorrelation Coefficient Figure: High-dimensional setting: n = 3000 , p = 6000
Bayesian knockoff statistics BVS (Bayesian variable selection) Z j = P ( β j � = 0 | y , X ) LCD (Lasso coeff. difference) W j = Z j − ˜ Z j
Bayesian knockoff statistics BVS (Bayesian variable selection) Z j = P ( β j � = 0 | y , X ) LCD (Lasso coeff. difference) W j = Z j − ˜ Z j 1.00 Methods 1.00 Methods BVS Knockoffs BVS Knockoffs LCD Knockoffs LCD Knockoffs 0.75 0.75 Power FDR 0.50 0.50 0.25 0.25 0.00 0.00 5 10 15 5 10 15 Amplitude Amplitude Figure: n = 300 , p = 1000 and Bayesian linear model with 60 expected variables Inference is correct even if prior is wrong or MCMC has not converged
Partial summary No valid p-values even for logistic regression Shifts the burden of knowledge to X (covariates); makes sense in many contexts Robustness: simulations show properties of inference hold even when the model for X is only approximately right. Always have access to these diagnostic checks (later) When assumptions are appropriate � gain a lot of power, and can use sophisticated selection techniques
How to Construct Knockoffs for some Graphical Models Joint with Sabatti & Sesia
A general construction (C., Fan, Janson and Lv, ’16) X 1 , X 2 , X 3 ) d ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ X 3 ) Algorithm Sequential Conditional Independent Pairs for j = { 1 , . . . , p } do Sample ˜ X j from law of X j | X - j , ˜ X 1: j − 1 end
A general construction (C., Fan, Janson and Lv, ’16) X 1 , X 2 , X 3 ) d ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ X 3 ) Algorithm Sequential Conditional Independent Pairs for j = { 1 , . . . , p } do Sample ˜ X j from law of X j | X - j , ˜ X 1: j − 1 end e.g. p = 3
A general construction (C., Fan, Janson and Lv, ’16) X 1 , X 2 , X 3 ) d ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ X 3 ) Algorithm Sequential Conditional Independent Pairs for j = { 1 , . . . , p } do Sample ˜ X j from law of X j | X - j , ˜ X 1: j − 1 end e.g. p = 3 Sample ˜ X 1 from X 1 | X − 1
A general construction (C., Fan, Janson and Lv, ’16) X 1 , X 2 , X 3 ) d ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ X 3 ) Algorithm Sequential Conditional Independent Pairs for j = { 1 , . . . , p } do Sample ˜ X j from law of X j | X - j , ˜ X 1: j − 1 end e.g. p = 3 Sample ˜ X 1 from X 1 | X − 1 Joint law of X, ˜ X 1 is known
A general construction (C., Fan, Janson and Lv, ’16) X 1 , X 2 , X 3 ) d ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ X 3 ) Algorithm Sequential Conditional Independent Pairs for j = { 1 , . . . , p } do Sample ˜ X j from law of X j | X - j , ˜ X 1: j − 1 end e.g. p = 3 Sample ˜ X 1 from X 1 | X − 1 Joint law of X, ˜ X 1 is known Sample ˜ X 2 from X 2 | X − 2 , ˜ X 1
A general construction (C., Fan, Janson and Lv, ’16) X 1 , X 2 , X 3 ) d ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ X 3 ) Algorithm Sequential Conditional Independent Pairs for j = { 1 , . . . , p } do Sample ˜ X j from law of X j | X - j , ˜ X 1: j − 1 end e.g. p = 3 Sample ˜ X 1 from X 1 | X − 1 Joint law of X, ˜ X 1 is known Sample ˜ X 2 from X 2 | X − 2 , ˜ X 1 Joint law of X, ˜ X 1:2 is known
A general construction (C., Fan, Janson and Lv, ’16) X 1 , X 2 , X 3 ) d ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ X 3 ) Algorithm Sequential Conditional Independent Pairs for j = { 1 , . . . , p } do Sample ˜ X j from law of X j | X - j , ˜ X 1: j − 1 end e.g. p = 3 Sample ˜ X 1 from X 1 | X − 1 Joint law of X, ˜ X 1 is known Sample ˜ X 2 from X 2 | X − 2 , ˜ X 1 Joint law of X, ˜ X 1:2 is known Sample ˜ X 3 from X 3 | X − 3 , ˜ X 1:2
A general construction (C., Fan, Janson and Lv, ’16) X 1 , X 2 , X 3 ) d ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ X 3 ) Algorithm Sequential Conditional Independent Pairs for j = { 1 , . . . , p } do Sample ˜ X j from law of X j | X - j , ˜ X 1: j − 1 end e.g. p = 3 Sample ˜ X 1 from X 1 | X − 1 Joint law of X, ˜ X 1 is known Sample ˜ X 2 from X 2 | X − 2 , ˜ X 1 Joint law of X, ˜ X 1:2 is known Sample ˜ X 3 from X 3 | X − 3 , ˜ X 1:2 Joint law of X, ˜ X is known and is pairwise exchangeable!
A general construction (C., Fan, Janson and Lv, ’16) X 1 , X 2 , X 3 ) d ( X 1 , ˜ X 2 , ˜ X 3 , ˜ = ( X 1 , X 2 , X 3 , ˜ X 1 , ˜ X 2 , ˜ X 3 ) Algorithm Sequential Conditional Independent Pairs for j = { 1 , . . . , p } do Sample ˜ X j from law of X j | X - j , ˜ X 1: j − 1 end e.g. p = 3 Sample ˜ X 1 from X 1 | X − 1 Joint law of X, ˜ X 1 is known Sample ˜ X 2 from X 2 | X − 2 , ˜ X 1 Joint law of X, ˜ X 1:2 is known Sample ˜ X 3 from X 3 | X − 3 , ˜ X 1:2 Joint law of X, ˜ X is known and is pairwise exchangeable! Usually not practical, easy in some cases (e.g. Markov chains)
Knockoff copies of a Markov chain X = ( X 1 , X 2 , . . . , X p ) is a Markov chain p � Q j ( X j | X j − 1 ) ( X ∼ MC ( q 1 , Q ) ) p ( X 1 , . . . , X p ) = q 1 ( X 1 ) j =2 X 1 X 2 X 3 X 4 Observed variables ˜ ˜ ˜ ˜ X 1 X 2 X 3 X 4 Knockoff variables
Knockoff copies of a Markov chain X = ( X 1 , X 2 , . . . , X p ) is a Markov chain p � Q j ( X j | X j − 1 ) ( X ∼ MC ( q 1 , Q ) ) p ( X 1 , . . . , X p ) = q 1 ( X 1 ) j =2 X 1 X 2 X 3 X 4 Observed variables ˜ ˜ ˜ ˜ X 1 X 2 X 3 X 4 Knockoff variables
Knockoff copies of a Markov chain X = ( X 1 , X 2 , . . . , X p ) is a Markov chain p � Q j ( X j | X j − 1 ) ( X ∼ MC ( q 1 , Q ) ) p ( X 1 , . . . , X p ) = q 1 ( X 1 ) j =2 X 1 X 2 X 3 X 4 Observed variables ˜ ˜ ˜ ˜ X 1 X 2 X 3 X 4 Knockoff variables
Knockoff copies of a Markov chain X = ( X 1 , X 2 , . . . , X p ) is a Markov chain p � Q j ( X j | X j − 1 ) ( X ∼ MC ( q 1 , Q ) ) p ( X 1 , . . . , X p ) = q 1 ( X 1 ) j =2 X 1 X 2 X 3 X 4 Observed variables ˜ ˜ ˜ ˜ X 1 X 2 X 3 X 4 Knockoff variables
Knockoff copies of a Markov chain X = ( X 1 , X 2 , . . . , X p ) is a Markov chain p � Q j ( X j | X j − 1 ) ( X ∼ MC ( q 1 , Q ) ) p ( X 1 , . . . , X p ) = q 1 ( X 1 ) j =2 X 1 X 2 X 3 X 4 Observed variables ˜ ˜ ˜ ˜ X 1 X 2 X 3 X 4 Knockoff variables
Knockoff copies of a Markov chain X = ( X 1 , X 2 , . . . , X p ) is a Markov chain p � Q j ( X j | X j − 1 ) ( X ∼ MC ( q 1 , Q ) ) p ( X 1 , . . . , X p ) = q 1 ( X 1 ) j =2 X 1 X 2 X 3 X 4 Observed variables ˜ ˜ ˜ ˜ X 1 X 2 X 3 X 4 Knockoff variables General algorithm can be implemented efficiently in the case of a Markov chain
Knockoff copies of a Markov chain X = ( X 1 , X 2 , . . . , X p ) is a Markov chain p � Q j ( X j | X j − 1 ) ( X ∼ MC ( q 1 , Q ) ) p ( X 1 , . . . , X p ) = q 1 ( X 1 ) j =2 X 1 X 2 X 3 X 4 Observed variables ˜ ˜ ˜ ˜ X 1 X 2 X 3 X 4 Knockoff variables General algorithm can be implemented efficiently in the case of a Markov chain
Knockoff copies of a Markov chain X = ( X 1 , X 2 , . . . , X p ) is a Markov chain p � Q j ( X j | X j − 1 ) ( X ∼ MC ( q 1 , Q ) ) p ( X 1 , . . . , X p ) = q 1 ( X 1 ) j =2 X 1 X 2 X 3 X 4 Observed variables ˜ ˜ ˜ ˜ X 1 X 2 X 3 X 4 Knockoff variables General algorithm can be implemented efficiently in the case of a Markov chain
Knockoff copies of a Markov chain X = ( X 1 , X 2 , . . . , X p ) is a Markov chain p � Q j ( X j | X j − 1 ) ( X ∼ MC ( q 1 , Q ) ) p ( X 1 , . . . , X p ) = q 1 ( X 1 ) j =2 X 1 X 2 X 3 X 4 Observed variables ˜ ˜ ˜ ˜ X 1 X 2 X 3 X 4 Knockoff variables General algorithm can be implemented efficiently in the case of a Markov chain
Knockoff copies of a Markov chain X = ( X 1 , X 2 , . . . , X p ) is a Markov chain p � Q j ( X j | X j − 1 ) ( X ∼ MC ( q 1 , Q ) ) p ( X 1 , . . . , X p ) = q 1 ( X 1 ) j =2 X 1 X 2 X 3 X 4 Observed variables ˜ ˜ ˜ ˜ X 1 X 2 X 3 X 4 Knockoff variables General algorithm can be implemented efficiently in the case of a Markov chain
Knockoff copies of a Markov chain X = ( X 1 , X 2 , . . . , X p ) is a Markov chain p � Q j ( X j | X j − 1 ) ( X ∼ MC ( q 1 , Q ) ) p ( X 1 , . . . , X p ) = q 1 ( X 1 ) j =2 X 1 X 2 X 3 X 4 Observed variables ˜ ˜ ˜ ˜ X 1 X 2 X 3 X 4 Knockoff variables General algorithm can be implemented efficiently in the case of a Markov chain
Recursive update of normalizing constants
Sampling ˜ X 1 p ( X 1 | X − 1 ) = p ( X 1 | X 2 )
Sampling ˜ X 1 p ( X 1 | X − 1 ) = p ( X 1 | X 2 ) = p ( X 1 , X 2 ) p ( X 2 )
Sampling ˜ X 1 = q 1 ( X 1 ) Q 2 ( X 2 | X 1 ) p ( X 1 | X − 1 ) = p ( X 1 | X 2 ) = p ( X 1 , X 2 ) p ( X 2 ) Z 1 ( X 2 ) � Z 1 ( z ) = q 1 ( u ) Q 2 ( z | u ) u
Sampling ˜ X 1 = q 1 ( X 1 ) Q 2 ( X 2 | X 1 ) p ( X 1 | X − 1 ) = p ( X 1 | X 2 ) = p ( X 1 , X 2 ) p ( X 2 ) Z 1 ( X 2 ) � Z 1 ( z ) = q 1 ( u ) Q 2 ( z | u ) u Sampling ˜ X 2 p ( X 2 | X − 2 , ˜ X 1 ) = p ( X 2 | X 1 , X 3 , ˜ X 1 )
Sampling ˜ X 1 = q 1 ( X 1 ) Q 2 ( X 2 | X 1 ) p ( X 1 | X − 1 ) = p ( X 1 | X 2 ) = p ( X 1 , X 2 ) p ( X 2 ) Z 1 ( X 2 ) � Z 1 ( z ) = q 1 ( u ) Q 2 ( z | u ) u Sampling ˜ X 2 X 1 ) ∝ Q 2 ( X 2 | X 1 ) Q 3 ( X 3 | X 2 ) Q 2 ( X 2 | ˜ X 1 ) p ( X 2 | X − 2 , ˜ X 1 ) = p ( X 2 | X 1 , X 3 , ˜ Z 1 ( X 2 )
Sampling ˜ X 1 = q 1 ( X 1 ) Q 2 ( X 2 | X 1 ) p ( X 1 | X − 1 ) = p ( X 1 | X 2 ) = p ( X 1 , X 2 ) p ( X 2 ) Z 1 ( X 2 ) � Z 1 ( z ) = q 1 ( u ) Q 2 ( z | u ) u Sampling ˜ X 2 X 1 ) ∝ Q 2 ( X 2 | X 1 ) Q 3 ( X 3 | X 2 ) Q 2 ( X 2 | ˜ X 1 ) p ( X 2 | X − 2 , ˜ X 1 ) = p ( X 2 | X 1 , X 3 , ˜ Z 1 ( X 2 ) normalization constant Z 2 ( X 3 ) � Q 2 ( u | X 1 ) Q 3 ( z | u ) Q 2 ( u | ˜ X 1 ) Z 2 ( z ) = Z 1 ( u ) u
Sampling ˜ X 3 p ( X 3 | X − 3 , ˜ X 1 , ˜ X 2 ) = p ( X 3 | X 2 , X 4 , ˜ X 1 , ˜ X 2 )
Sampling ˜ X 3 p ( X 3 | X − 3 , ˜ X 1 , ˜ X 2 ) = p ( X 3 | X 2 , X 4 , ˜ X 1 , ˜ X 2 ) ∝ Q 3 ( X 3 | X 2 ) Q 4 ( X 4 | X 3 ) Q 3 ( X 3 | ˜ X 2 ) Z 2 ( X 3 )
Sampling ˜ X 3 p ( X 3 | X − 3 , ˜ X 1 , ˜ X 2 ) = p ( X 3 | X 2 , X 4 , ˜ X 1 , ˜ X 2 ) ∝ Q 3 ( X 3 | X 2 ) Q 4 ( X 4 | X 3 ) Q 3 ( X 3 | ˜ X 2 ) Z 2 ( X 3 ) normalization constant Z 3 ( X 4 ) � Q 3 ( u | X 2 ) Q 4 ( z | u ) Q 3 ( u | ˜ X 2 ) Z 3 ( z ) = Z 2 ( u ) u
Sampling ˜ X 3 p ( X 3 | X − 3 , ˜ X 1 , ˜ X 2 ) = p ( X 3 | X 2 , X 4 , ˜ X 1 , ˜ X 2 ) ∝ Q 3 ( X 3 | X 2 ) Q 4 ( X 4 | X 3 ) Q 3 ( X 3 | ˜ X 2 ) Z 2 ( X 3 ) normalization constant Z 3 ( X 4 ) � Q 3 ( u | X 2 ) Q 4 ( z | u ) Q 3 ( u | ˜ X 2 ) Z 3 ( z ) = Z 2 ( u ) u And so on sampling ˜ X j ... Computationally efficient O ( p )
Hidden Markov Models (HMMs) X = ( X 1 , X 2 , . . . , X p ) is a HMM if � H ∼ MC ( q 1 , Q ) (latent Markov chain) ind. X j | H ∼ X j | H j ∼ f j ( X j ; H j ) (emission distribution) H 1 H 2 H 3 X 1 X 2 X 3
Hidden Markov Models (HMMs) X = ( X 1 , X 2 , . . . , X p ) is a HMM if � H ∼ MC ( q 1 , Q ) (latent Markov chain) ind. X j | H ∼ X j | H j ∼ f j ( X j ; H j ) (emission distribution) H 1 H 2 H 3 X 1 X 2 X 3
Hidden Markov Models (HMMs) X = ( X 1 , X 2 , . . . , X p ) is a HMM if � H ∼ MC ( q 1 , Q ) (latent Markov chain) ind. X j | H ∼ X j | H j ∼ f j ( X j ; H j ) (emission distribution) H 1 H 2 H 3 X 1 X 2 X 3
Hidden Markov Models (HMMs) X = ( X 1 , X 2 , . . . , X p ) is a HMM if � H ∼ MC ( q 1 , Q ) (latent Markov chain) ind. X j | H ∼ X j | H j ∼ f j ( X j ; H j ) (emission distribution) H 1 H 2 H 3 X 1 X 2 X 3
Hidden Markov Models (HMMs) X = ( X 1 , X 2 , . . . , X p ) is a HMM if � H ∼ MC ( q 1 , Q ) (latent Markov chain) ind. X j | H ∼ X j | H j ∼ f j ( X j ; H j ) (emission distribution) H 1 H 2 H 3 X 1 X 2 X 3
Hidden Markov Models (HMMs) X = ( X 1 , X 2 , . . . , X p ) is a HMM if � H ∼ MC ( q 1 , Q ) (latent Markov chain) ind. X j | H ∼ X j | H j ∼ f j ( X j ; H j ) (emission distribution) H 1 H 2 H 3 X 1 X 2 X 3 The H variables are latent and only the X variables are observed
Haplotypes and genotypes Haplotype Set of alleles on a single chromosome 0/1 for common/rare allele Genotype Unordered pair of alleles at a single marker 0 1 0 1 1 0 Haplotype M + 1 1 0 0 1 1 Haplotype P 1 2 0 1 2 1 Genotypes
A phenomenological HMM for haplotype & genotype data Figure: Six haplotypes: color indicates ‘ancestor’ at each marker (Scheet, ’06)
A phenomenological HMM for haplotype & genotype data Figure: Six haplotypes: color indicates ‘ancestor’ at each marker (Scheet, ’06) Haplotype estimation/phasing (Browning, ’11) Imputation of missing SNPs (Marchini, ’10) IMPUTE (Marchini, ’07) fastPHASE (Scheet, ’06) MaCH (Li, ’10)
A phenomenological HMM for haplotype & genotype data Figure: Six haplotypes: color indicates ‘ancestor’ at each marker (Scheet, ’06) Haplotype estimation/phasing (Browning, ’11) Imputation of missing SNPs (Marchini, ’10) IMPUTE (Marchini, ’07) fastPHASE (Scheet, ’06) MaCH (Li, ’10) New application of same HMM: generation of knockoff copies of genotypes! Each genotype: sum of two independent HMM haplotype sequences
Knockoff copies of a hidden Markov model Theorem (Sesia, Sabatti, C. ’17) A knockoff copy of ˜ X of X can be constructed as ˜ ˜ ˜ H 1 H 2 H 3 H 1 H 2 H 1 latent variables knockoff latent variables ˜ ˜ ˜ X 1 X 2 X 3 X 1 X 2 X 3 observed variables knockoff variables
Knockoff copies of a hidden Markov model Theorem (Sesia, Sabatti, C. ’17) A knockoff copy of ˜ X of X can be constructed as (1) Sample H from p ( H | X ) using forward-backward algorithm ˜ ˜ ˜ H 1 H 2 H 3 H 1 H 2 H 1 imputed latent variables knockoff latent variables ˜ ˜ ˜ X 1 X 2 X 3 X 1 X 2 X 3 observed variables knockoff variables
Knockoff copies of a hidden Markov model Theorem (Sesia, Sabatti, C. ’17) A knockoff copy of ˜ X of X can be constructed as (1) Sample H from p ( H | X ) using forward-backward algorithm (2) Generate a knockoff ˜ H of H using the SCIP algorithm for a Markov chain ˜ ˜ ˜ H 1 H 2 H 3 H 1 H 2 H 1 imputed latent variables knockoff latent variables ˜ ˜ ˜ X 1 X 2 X 3 X 1 X 2 X 3 observed variables knockoff variables
Knockoff copies of a hidden Markov model Theorem (Sesia, Sabatti, C. ’17) A knockoff copy of ˜ X of X can be constructed as (1) Sample H from p ( H | X ) using forward-backward algorithm (2) Generate a knockoff ˜ H of H using the SCIP algorithm for a Markov chain (3) Sample ˜ X from the emission distribution of X given H = ˜ H ˜ ˜ ˜ H 1 H 2 H 3 H 1 H 2 H 1 imputed latent variables knockoff latent variables ˜ ˜ ˜ X 1 X 2 X 3 X 1 X 2 X 3 observed variables knockoff variables
Recommend
More recommend