Outline Bregman Iteration Path Consistency Discussion Sparse Recovery via Differential Inclusions Yuan Yao School of Mathematical Sciences Peking University September 2, 2014 with Stanley Osher (UCLA), Feng Ruan (PKU & Stanford), Jiechao Xiong (PKU), and Wotao Yin (UCLA), et al. Yuan Yao Bregman ISS
Outline Bregman Iteration Path Consistency Discussion 1 Inverse Scale Space (ISS) Dynamics ISS Dynamics of Bregman Inverse Scale Space Discrete Algorithm: Linearized Bregman Iteration 2 Path Consistency Theory Sign-consistency l 2 -consistency 3 Discussion Yuan Yao Bregman ISS
Outline Bregman Iteration Path Consistency Discussion ISS Background Assume that β ∗ ∈ R p is sparse and unknown. Consider recovering β ∗ from y = X β ∗ + ǫ, where ǫ is noise . Note • S := supp ( β ∗ ) and T be its complement. • X S ( X T ) be the columns of X with indices restricted on S ( T ) • ǫ ∼ N (0 , σ 2 ) (sub-Gaussian in general) • X is n -by- p , with p ≫ n . Yuan Yao Bregman ISS
Outline Bregman Iteration Path Consistency Discussion ISS Statistical Consistency of Algorithms • Orthogonal Matching Pursuit (OMP, Mallat-Zhang’93) • noise-free: Tropp’04 • noise: Cai-Wang’11 • LASSO (Tibshirani’96) • sign-consistency: Yuan-Lin’06, Zhao-Yu’06, Zou’07, Wainwright’09 • l 2 -consistency: Ritov-Bickel-Tsybakov’09 (also Dantzig) • related: BPDN (Chen-Donoho-Saunders’96), Dantzig Selector (Candes-Tao’07) • Anything else do you wanna hear? Yuan Yao Bregman ISS
Outline Bregman Iteration Path Consistency Discussion ISS Optimization + Noise = H.D. Statistics? • p >> n : impossible to be strongly convex n β L ( β ) := 1 � ρ ( y i − x T min i β ) , convex ρ (Huber’73) n i =1 • in presence of noise, not every optimizer arg min L ( β ) is desired: mostly overfitting • convex constraint/penalization: avoid overfiting, tractable but lead to bias ⇒ non-convex? (hard to find global optimizer) • dynamics: every algorithm is dynamics (Turing), not necessarily optimizing an objective function Yuan Yao Bregman ISS
Outline Bregman Iteration Path Consistency Discussion ISS Inverse Scale Space (ISS) Dynamics • Bregman ISS ρ ( t ) = 1 nX T ( y − X β ( t )) , ˙ ρ ( t ) ∈ ∂ � β ( t ) � 1 . Yuan Yao Bregman ISS
Outline Bregman Iteration Path Consistency Discussion ISS Inverse Scale Space (ISS) Dynamics • Bregman ISS ρ ( t ) = 1 nX T ( y − X β ( t )) , ˙ ρ ( t ) ∈ ∂ � β ( t ) � 1 . s.t. X T y = X T X β. Limit is solution to min β � β � 1 , Yuan Yao Bregman ISS
Outline Bregman Iteration Path Consistency Discussion ISS Inverse Scale Space (ISS) Dynamics • Bregman ISS ρ ( t ) = 1 nX T ( y − X β ( t )) , ˙ ρ ( t ) ∈ ∂ � β ( t ) � 1 . s.t. X T y = X T X β. Limit is solution to min β � β � 1 , • Linearized Bregman ISS ρ ( t ) + 1 β ( t ) = 1 ˙ nX T ( y − X β ( t )) , ˙ κ ρ ( t ) ∈ ∂ � β ( t ) � 1 . Yuan Yao Bregman ISS
Outline Bregman Iteration Path Consistency Discussion ISS Inverse Scale Space (ISS) Dynamics • Bregman ISS ρ ( t ) = 1 nX T ( y − X β ( t )) , ˙ ρ ( t ) ∈ ∂ � β ( t ) � 1 . s.t. X T y = X T X β. Limit is solution to min β � β � 1 , • Linearized Bregman ISS ρ ( t ) + 1 β ( t ) = 1 ˙ nX T ( y − X β ( t )) , ˙ κ ρ ( t ) ∈ ∂ � β ( t ) � 1 . 2 κ � β � 2 1 s.t. X T y = X T X β. Limit is solution to min β � β � 1 + 2 , Yuan Yao Bregman ISS
Outline Bregman Iteration Path Consistency Discussion ISS Algorithmic regularization We claim that there exists points on their paths ( β ( t ) , ρ ( t )) t ≥ 0 , which are • sparse • sign-consistent (the same sparsity pattern of nonzeros as true signal) • unbiased (or less bias) than LASSO Yuan Yao Bregman ISS
Outline Bregman Iteration Path Consistency Discussion Bias of LASSO Oracle Estimator If S is disclosed by an oracle, the oracle estimator is the subset least square solution with ˜ T = 0 and for Σ n = 1 n X T β ∗ S X S → Σ S , � 1 � S + 1 ˜ S = Σ − 1 nX T n Σ − 1 n X T β ∗ = β ∗ S y S ǫ, (1) n “Oracle properties” • Model selection consistency : supp (˜ β ∗ ) = S ; S ∼ N ( β ∗ , σ 2 • Normality : ˜ β ∗ n Σ − 1 n ). β ∗ is unbiased, i.e. E [˜ So ˜ β ∗ ] = β ∗ . Yuan Yao Bregman ISS
Outline Bregman Iteration Path Consistency Discussion Bias of LASSO Recall LASSO LASSO: β � β � 1 + t 2 n � y − X β � 2 min 2 . optimality condition: ρ t t = 1 nX T ( y − X β t ) , (2a) ρ t ∈ ∂ � β t � 1 , (2b) where λ = 1 / t is often used in literature. • Tibshirani’1996 (LASSO) • Chen-Donoho-Saunders’1996 (BPDN) Yuan Yao Bregman ISS
Outline Bregman Iteration Path Consistency Discussion Bias of LASSO The Bias of LASSO • Path consistency: ∃ τ n ∈ (0 , ∞ ), supp (ˆ β τ n ) = S (e.g. , Zhao-Yu’06, Zou’06, Yuan-Lin’07, Wainwright’09) • LASSO is biased S − 1 (ˆ β τ n ) S = ˜ β ∗ Σ − 1 n ρ τ n , τ n > 0 τ n • e.g. X = Id , n = p = 1, � 0 , if τ < 1 / y ; ˆ β τ = y − 1 /τ, otherwise , • (Fan-Li’2001) non-convex penalty is necessary (SCAD, Zhang’s PLUS, Zou’s Adaptive LASSO, etc.) • Any other simple scheme? Yuan Yao Bregman ISS
Outline Bregman Iteration Path Consistency Discussion Dynamics of Bregman Inverse Scale Space Differentiation of LASSO’s KKT Equation Taking derivative (assuming differentiability) w.r.t. t ρ t = 1 nX T ( y − X β t ) t ρ t = 1 nX T ( y − X ( ˙ ⇒ ˙ β t t + β t )) , ρ t ∈ ∂ � β t � 1 • Debias: sign-consistency ( sign ( β τ ) = sign ( β ∗ )) ⇒ oracle τ := ˙ β τ τ + β τ = ˜ estimator β ′ β ∗ • e.g. X = Id , n = p = 1, � 0 , if t < 1 / y ; β ′ t = y , otherwise , Yuan Yao Bregman ISS
Outline Bregman Iteration Path Consistency Discussion Dynamics of Bregman Inverse Scale Space Inverse scale space (ISS) Nonlinear ODE (differential inclusion) ρ t = 1 nX T ( y − X β t ) , ˙ (3a) ρ t ∈ ∂ � β t � 1 . (3b) starting at t = 0 and ρ (0) = β (0) = 0 . • Replace ρ/ t in LASSO by d ρ/ d t • Burger-Gilboa-Osher-Xu’06 (image recovery and recovers the objects in an image in an inverse-scale order as t increases (larger objects appear in β t first)) Yuan Yao Bregman ISS
Outline Bregman Iteration Path Consistency Discussion Dynamics of Bregman Inverse Scale Space Solution Path • β t is piece-wise constant in t : � y − X β � 2 β t k +1 = arg min β 2 subject to ( ρ t k +1 ) i β i ≥ 0 ∀ i ∈ S k +1 , (4) ∀ j ∈ T k +1 . β j = 0 • t k +1 = sup { t > t k : ρ t k + t − t k n X T ( y − X β t k ) ∈ ∂ � β t k � 1 } • ρ t is piece-wise linear in t , t − t k ρ t = ρ t k + t k +1 − t k ρ t k +1 , t ∈ [ t k , t k +1 ) , β t = β t k , • Sign consistency ρ t = sign ( β ∗ ) ⇒ β t = ˜ β ∗ Yuan Yao Bregman ISS
Outline Bregman Iteration Path Consistency Discussion Discrete Algorithm: Linearized Bregman Iteration Discretized Algorithm Damped Dynamics : continuous solution path ρ t + 1 β t = 1 ˙ nX T ( y − X β t ) , ρ t ∈ ∂ � β t � 1 . ˙ (5) κ Linearized Bregman Iteration as forward Euler discretization (Osher-Burger-Goldfarb-Xu-Yin’05, Yin-Osher-Goldfarb-Darbon’08): for ρ k ∈ ∂ � β k � 1 , ρ k +1 + 1 κβ k +1 = ρ k + 1 κβ k + α k n X T ( y − X β k ) , • Damping factor: κ > 0 • Step size: α k Yuan Yao Bregman ISS
Outline Bregman Iteration Path Consistency Discussion Discrete Algorithm: Linearized Bregman Iteration Comparisons Linearized Bregman Iteration : z t +1 = z t − α t X T ( X κ Shrink ( z t , 1) − y ) • This is not ISTA : z t +1 = Shrink ( z t − α t X T ( Xz t − y ) , λ ) • ISTA solves LASSO for fixed λ • This is not OMP which only adds in variables. • This is not Donoho-Maleki-Montanari’s AMP Yuan Yao Bregman ISS
Outline Bregman Iteration Path Consistency Discussion Discrete Algorithm: Linearized Bregman Iteration AUC of ISS often beats LASSO n = 200, p = 100, S = { 1 , . . . , 30 } , x i ∼ N (0 , Σ p ) ( σ ij = 1 / (3 p ) for i � = j and 1 otherwise) Yuan Yao Bregman ISS
Outline Bregman Iteration Path Consistency Discussion Discrete Algorithm: Linearized Bregman Iteration But regularization paths are different. Yuan Yao Bregman ISS
Outline Bregman Iteration Path Consistency Discussion Path Consistency Theory We are going to present a consistency theory where • Under what conditions one can achieve • sign consistency (model selection consistency) • l 2 -consistency ( � β ( t ) − ˜ β ∗ � 2 ≤ O ( � s log p / n )) • When sign-consistency holds, Bregman ISS path returns the oracle estimator without bias • Early stopping regularization against overfitting noise Yuan Yao Bregman ISS
Outline Bregman Iteration Path Consistency Discussion Assumptions (A1) Restricted Strongly Convex: ∃ γ ∈ (0 , 1], 1 nX T S X S ≥ γ I (A2) Incoherence/Irrepresentable Condition: ∃ η ∈ (0 , 1), � � − 1 � � 1 � 1 � 1 � � T X † � nX T � nX T nX T = T X S S X S ≤ 1 − η � � � � S � � � � ∞ � � ∞ • The incoherence condition is used independently in Tropp’04, Yuan-Lin’05, Zhao-Yu’06, and Zou’06, Wainwright’09,etc. Yuan Yao Bregman ISS
Recommend
More recommend