Graphlet Screening (GS) Achieves Optimal Rate in Variable Selection Jiashun Jin Carnegie Mellon University Collaborated with Cun-Hui Zhang (Rutgers) Qi Zhang (Univ. of Pittsburgh) Jiashun Jin Graphlet Screening (GS)
Variable selection Y = X β + z , X = X n , p , z ∼ N (0 , I n ) ◮ p ≫ n ≫ 1 ◮ signals are rare and weak ◮ let G = X ′ X be the Gram matrix ◮ diagonals of G are normalized to 1 ◮ G is sparse (few large entries each row) Jiashun Jin Graphlet Screening (GS)
Subset selection 2 + λ 2 1 2 � Y − X β � 2 2 � β � 0 ◮ L 0 -penalization method ◮ Variants: Cp, AIC, BIC, RIC ◮ Computationally challenging Mallows (1973), Akaike (1974), Schwartz (1978), Foster & George (1994) Jiashun Jin Graphlet Screening (GS)
The lasso 1 2 � Y − X β � 2 2 + λ � β � 1 ◮ L 1 -penalization method; Basis Pursuit ◮ Widely used ◮ computationally efficient even when p is large ◮ in the noiseless case , if signals sufficiently sparse, equivalent to L 0 -penalization Chen et al. (1998); Tibshirani (1996); Donoho (2006) Jiashun Jin Graphlet Screening (GS)
Limitation of L 0 -Penalization, I Ex. Y = X β + z , z ∼ N (0 , I n ), β j take values from { 0 , τ } and D 0 . . . 0 0 D . . . 0 � � 1 a G = X ′ X = , D = . . . ... . . . a 1 . . . 0 0 . . . D { 1 , 2 , . . . p } partitions into 3 types of 2 × 2 blocks: ◮ I. No signal ◮ II. One signal ◮ III. Two signals Jiashun Jin Graphlet Screening (GS)
Limitation of L 0 penalization, II ◮ one-stage method ◮ one tuning parameter ◮ does not exploit ‘local’ graphical structure Therefore, many penalization methods (e.g. lasso, SCAD, MC+, Dantzig selector) are non-optimal, as L 0 -penalization is the ‘idol’ these methods mimic ‘local’: neighboring nodes in geodesic distance of a graph (TBD) Jiashun Jin Graphlet Screening (GS)
Where are the signals? Tukey, J.W. (1965). Which part of the sample contains the information, Proc. Natl. Sci. Acad. John Wilder Tukey (1915-2000) Jiashun Jin Graphlet Screening (GS)
Graph of Strong Dependence (GOSD) GOSD is the graph G = ( V , E ): ◮ V = { 1 , 2 , . . . , p } : each variable is a node ◮ An edge between nodes i and j iff 1 � ≥ � � � G ( i , j ) log( p ) , say ◮ G = X ′ X sparse = ⇒ G sparse Jiashun Jin Graphlet Screening (GS)
Signal sparsity and graph sparsity ◮ Despite its sparsity, G is usually complicate ◮ Denote the support of β by S = S ( β ) = { 1 ≤ i ≤ p , β i � = 0 } Restricting nodes to S forms a subgraph G S ◮ Key insight : G S decomposes into many small-size components that are disconnected to each other Component: a maximal connected subgraph Jiashun Jin Graphlet Screening (GS)
For today Graphlet Screening (GS): ◮ gs -step: graphlet screening by sequential χ 2 -tests ◮ gc -step: graphlet cleaning by Penalized MLE ◮ Focus: rare and weak signals Jiashun Jin Graphlet Screening (GS)
Graphlet screening (gs-step), Initial stage Y = X β + z , X = X n , p , z ∼ N (0 , I n ); G : GOSD ◮ Fix m ≥ 1 (small) ◮ Let {G t : 1 ≤ t ≤ T } be all connected subgraphs of G with size ≤ m ◮ arranged by size, ties breaking lexicographically: p = 10, m = 3, T = 30; 2 1 1 4 {G t , 1 ≤ t ≤ T } : 5 3 { 1 } , { 2 } , . . . { 10 } 10 { 1 , 2 } , { 1 , 7 } , . . . , { 9 , 10 } 6 { 1 , 2 , 4 } , { 1 , 2 , 7 } , . . . , { 8 , 9 , 10 } 7 9 8 8 Jiashun Jin Graphlet Screening (GS)
gs -step, II. Updating stage {G t } T X = [ x 1 , x 2 , . . . , x p ] , t =1 : all connected subgraphs with size ≤ m For t = 1 , 2 , . . . , T ◮ S t − 1 : set of retained indices in last stage ◮ Define T ( Y ; D , F ) = � P G t Y � 2 − � P F Y � 2 ◮ F = G t ∩ S t − 1 : nodes accepted previously ◮ D = G t \ F : nodes currently under investigation ◮ P F : projection from R n to subspace { x j : j ∈ F } ◮ Adding nodes in D to S t − 1 iff T ( Y ; D , F ) > t ( D , F ) , t ( D , F ): threshold TBD Once accepted, a node is kept until the end of gs-step Jiashun Jin Graphlet Screening (GS)
Comparison with marginal regression (computational complexity) ◮ Marginal screening ◮ ineffective (neglects ‘local’ graphical structure) ◮ ‘brute-forth’ m -variate screening is computationally challenging: O ( p m ) ◮ gs -step ◮ only screens connected subgraphs of G ◮ if maximum degree of G ≤ K , then there are ≤ C ( eK ) m p such subgraphs Fan & Lv (2008), Wasserman & Roeder (2009), Frieze & Molloy (1999) Jiashun Jin Graphlet Screening (GS)
Two important properties of gs-step S ∗ ≡ S T : set of survived nodes in the end of gs -step If both signals and Graph G are sparse: ◮ Sure Screening (SS) : S ∗ retains all but a small proportion of signals ◮ Separable After Screening (SAS) : S ∗ decomposes into many small-size components Jiashun Jin Graphlet Screening (GS)
Reduce to many small-size regression, I I 0 ⊂ S ∗ : a component G = X ′ X , G I 0 : G I 0 , I 0 : row restriction; row & column restriction ◮ Restrict regression to I 0 ⇒ X ′ Y = X ′ X β + X ′ z Y = X β + z = ⇒ ( X ′ Y ) I 0 = ( G β ) I 0 + ( X ′ z ) I 0 = ◮ ( X ′ z ) I 0 ∼ N (0 , G I 0 , I 0 ) since z ∼ N (0 , I n ) ◮ Key : ( G β ) I 0 ≈ G I 0 , I 0 β I 0 ◮ Result : many small-size regression: ( X ′ Y ) I 0 ≈ N � G I 0 , I 0 β I 0 , G I 0 , I 0 � Jiashun Jin Graphlet Screening (GS)
Reduce to small-size regression, II Why ( G β ) I 0 ≡ G I 0 β ≈ G I 0 , I 0 β I 0 ? β I 0 0 G I 0 β = G I 0 , I 0 � G I 0 , J 0 � . . . � � β J 0 0 . . . ◮ I 0 , J 0 ⊂ S ∗ : components ◮ By SS property, β � = 0 ◮ By SAS property, G I 0 , J 0 ≈ 0 Jiashun Jin Graphlet Screening (GS)
Graphlet cleaning (gc-step) Y = X β + z , z ∼ N (0 , I n ) ◮ I 0 : a component of S ∗ ; S ∗ : set of all survived nodes ◮ β I 0 : restricting rows of β to I 0 ◮ X ∗ , I 0 : restricting columns of X to I 0 Fixing ( u gs , v gs ), ∈ S ∗ : set ˆ ◮ j / β j = 0 ◮ j ∈ S ∗ : estimate β I 0 via minimizing � P I 0 ( Y − X ∗ , I 0 θ ) � 2 + ( u gs ) 2 � θ � 0 , where an entry of θ is 0 or ≥ v gs in magnitude Jiashun Jin Graphlet Screening (GS)
Random design model X ′ 1 ∼ N (0 , 1 iid , Y = X β + z , X = . . . X i n Ω) X ′ n ◮ Ω: unknown correlation matrix ◮ Ex: Compressive Sensing, Computer Security Dinur and Nissim (2004), Nowak et al. (2007) Jiashun Jin Graphlet Screening (GS)
Rare and Weak signal model Y = X β + z , z ∼ N (0 , I n ) iid µ ∈ Θ ∗ β = b ◦ µ, ∼ Bernoulli ( ǫ ) , b i p ( τ, a ) ◮ b ◦ µ ∈ R p : ( b ◦ µ ) j = b j µ j p ( τ, a ) = { µ ∈ R p : τ ≤ | µ j | ≤ a τ } , a > 1 ◮ Θ ∗ ◮ Two key parameters: ǫ : sparsity; τ : (minimum) signal strength Jiashun Jin Graphlet Screening (GS)
Asymptotic framework Use p as driving asymptotic parameter, and tie ( ǫ, τ, n ) to p by fixed parameters ◮ Signal rarity: ǫ = ǫ p = p − ϑ , 0 < ϑ < 1 ◮ Signal weakness: � τ = τ p = 2 r log( p ) , r > 0 ◮ Sample size: n = p θ , (1 − ϑ ) < θ < 1 , so that p ǫ p ≪ n p ≪ p Jiashun Jin Graphlet Screening (GS)
Limitation of ‘Oracle Property’ Oracle property or probability of exact support recovery is a widely used criterion for assessing optimality in variable selection However, when signals are rare and weak, it is usually impossible to have exact recovery Jiashun Jin Graphlet Screening (GS)
Minimax Hamming distance Measuring errors with Hamming distance: � p �� H p (ˆ � sgn (ˆ � β, ǫ p , µ ; Ω) = E 1 β j ) � = sgn ( β j ) j =1 Minimax Hamming distance: H p (ˆ Hamm ∗ p ( ϑ, θ, r , a , Ω) = inf sup β, ǫ p , µ ; Ω) ˆ β µ ∈ Θ ∗ p ( τ p , a ) Jiashun Jin Graphlet Screening (GS)
Exponent ρ ∗ j = ρ ∗ j ( ϑ, r , Ω) � � Define ω = ω ( S 0 , S 1 ; Ω) = inf δ δ ′ Ω δ where � u ( k ) = 0 , i / ∈ S k δ ≡ u (0) − u (1) : i , k = 0 , 1 1 ≤ | u ( k ) | ≤ a , i ∈ S k i Define 4 + ( | S 1 | − | S 0 | ) 2 ϑ 2 ρ ( S 0 , S 1 ; ϑ, r , a , Ω) = | S 0 | + | S 1 | ϑ + ω r 2 4 ω r Minimax rate critically depends on the exponents: ρ ∗ j = ρ ∗ j ( ϑ, r ; Ω) = ( S 0 , S 1 ): j ∈ S 0 ∪ S 1 ρ ( S 0 , S 1 , ϑ, r , a , Ω) min ◮ not dependent on ( θ, a ) (mild regularity cond.) ◮ computable; has explicit form for some Ω Jiashun Jin Graphlet Screening (GS)
Graph of Least Favorable (GOLF) Define sets of least favorable configuration at site j � � ( S ∗ 0 j , S ∗ 1 j ) = argmax { ( S 0 , S 1 ): j ∈ S 0 ∪ S 1 } ρ ( S 0 , S 1 ; ϑ, r , a , Ω) Definition . GOLF is the graph G ⋄ = ( V , E ) where V = { 1 , 2 , . . . , p } and there is an edge between i and j if and only if ( S ∗ 0 j ∪ S ∗ 1 j ) ∩ ( S ∗ 0 k ∪ S ∗ 1 k ) � = ∅ Jiashun Jin Graphlet Screening (GS)
Lower bound iid µ ∈ Θ ∗ β = b ◦ µ, ∼ Bernoulli ( ǫ p ) , b j p ( τ p , a ) � ǫ p = p − ϑ , τ p = 2 r log( p ) Theorem 1 . Let d ( G ⋄ ) be the maximum degree of GOLF. As p → ∞ , � p j =1 p − ρ ∗ L p j Hamm ∗ p ( ϑ, θ, r , a , Ω) ≥ d p ( G ⋄ ) where L p is a generic multi-log( p ) term. Jiashun Jin Graphlet Screening (GS)
Recommend
More recommend