sparse recovery via differential inclusions
play

Sparse Recovery via Differential Inclusions Yuan Yao School of - PowerPoint PPT Presentation

Outline Bregman Iteration Path Consistency Discussion Sparse Recovery via Differential Inclusions Yuan Yao School of Mathematical Sciences Peking University September 2, 2014 with Stanley Osher (UCLA), Feng Ruan (PKU & Stanford),


  1. Outline Bregman Iteration Path Consistency Discussion Sparse Recovery via Differential Inclusions Yuan Yao School of Mathematical Sciences Peking University September 2, 2014 with Stanley Osher (UCLA), Feng Ruan (PKU & Stanford), Jiechao Xiong (PKU), and Wotao Yin (UCLA), et al. Yuan Yao Bregman ISS

  2. Outline Bregman Iteration Path Consistency Discussion 1 Inverse Scale Space (ISS) Dynamics ISS Dynamics of Bregman Inverse Scale Space Discrete Algorithm: Linearized Bregman Iteration 2 Path Consistency Theory Sign-consistency l 2 -consistency 3 Discussion Yuan Yao Bregman ISS

  3. Outline Bregman Iteration Path Consistency Discussion ISS Background Assume that β ∗ ∈ R p is sparse and unknown. Consider recovering β ∗ from y = X β ∗ + ǫ, where ǫ is noise . Note • S := supp ( β ∗ ) and T be its complement. • X S ( X T ) be the columns of X with indices restricted on S ( T ) • ǫ ∼ N (0 , σ 2 ) (sub-Gaussian in general) • X is n -by- p , with p ≫ n . Yuan Yao Bregman ISS

  4. Outline Bregman Iteration Path Consistency Discussion ISS Statistical Consistency of Algorithms • Orthogonal Matching Pursuit (OMP, Mallat-Zhang’93) • noise-free: Tropp’04 • noise: Cai-Wang’11 • LASSO (Tibshirani’96) • sign-consistency: Yuan-Lin’06, Zhao-Yu’06, Zou’07, Wainwright’09 • l 2 -consistency: Ritov-Bickel-Tsybakov’09 (also Dantzig) • related: BPDN (Chen-Donoho-Saunders’96), Dantzig Selector (Candes-Tao’07) • Anything else do you wanna hear? Yuan Yao Bregman ISS

  5. Outline Bregman Iteration Path Consistency Discussion ISS Optimization + Noise = H.D. Statistics? • p >> n : impossible to be strongly convex n β L ( β ) := 1 � ρ ( y i − x T min i β ) , convex ρ (Huber’73) n i =1 • in presence of noise, not every optimizer arg min L ( β ) is desired: mostly overfitting • convex constraint/penalization: avoid overfiting, tractable but lead to bias ⇒ non-convex? (hard to find global optimizer) • dynamics: every algorithm is dynamics (Turing), not necessarily optimizing an objective function Yuan Yao Bregman ISS

  6. Outline Bregman Iteration Path Consistency Discussion ISS Inverse Scale Space (ISS) Dynamics • Bregman ISS ρ ( t ) = 1 nX T ( y − X β ( t )) , ˙ ρ ( t ) ∈ ∂ � β ( t ) � 1 . Yuan Yao Bregman ISS

  7. Outline Bregman Iteration Path Consistency Discussion ISS Inverse Scale Space (ISS) Dynamics • Bregman ISS ρ ( t ) = 1 nX T ( y − X β ( t )) , ˙ ρ ( t ) ∈ ∂ � β ( t ) � 1 . s.t. X T y = X T X β. Limit is solution to min β � β � 1 , Yuan Yao Bregman ISS

  8. Outline Bregman Iteration Path Consistency Discussion ISS Inverse Scale Space (ISS) Dynamics • Bregman ISS ρ ( t ) = 1 nX T ( y − X β ( t )) , ˙ ρ ( t ) ∈ ∂ � β ( t ) � 1 . s.t. X T y = X T X β. Limit is solution to min β � β � 1 , • Linearized Bregman ISS ρ ( t ) + 1 β ( t ) = 1 ˙ nX T ( y − X β ( t )) , ˙ κ ρ ( t ) ∈ ∂ � β ( t ) � 1 . Yuan Yao Bregman ISS

  9. Outline Bregman Iteration Path Consistency Discussion ISS Inverse Scale Space (ISS) Dynamics • Bregman ISS ρ ( t ) = 1 nX T ( y − X β ( t )) , ˙ ρ ( t ) ∈ ∂ � β ( t ) � 1 . s.t. X T y = X T X β. Limit is solution to min β � β � 1 , • Linearized Bregman ISS ρ ( t ) + 1 β ( t ) = 1 ˙ nX T ( y − X β ( t )) , ˙ κ ρ ( t ) ∈ ∂ � β ( t ) � 1 . 2 κ � β � 2 1 s.t. X T y = X T X β. Limit is solution to min β � β � 1 + 2 , Yuan Yao Bregman ISS

  10. Outline Bregman Iteration Path Consistency Discussion ISS Algorithmic regularization We claim that there exists points on their paths ( β ( t ) , ρ ( t )) t ≥ 0 , which are • sparse • sign-consistent (the same sparsity pattern of nonzeros as true signal) • unbiased (or less bias) than LASSO Yuan Yao Bregman ISS

  11. Outline Bregman Iteration Path Consistency Discussion Bias of LASSO Oracle Estimator If S is disclosed by an oracle, the oracle estimator is the subset least square solution with ˜ T = 0 and for Σ n = 1 n X T β ∗ S X S → Σ S , � 1 � S + 1 ˜ S = Σ − 1 nX T n Σ − 1 n X T β ∗ = β ∗ S y S ǫ, (1) n “Oracle properties” • Model selection consistency : supp (˜ β ∗ ) = S ; S ∼ N ( β ∗ , σ 2 • Normality : ˜ β ∗ n Σ − 1 n ). β ∗ is unbiased, i.e. E [˜ So ˜ β ∗ ] = β ∗ . Yuan Yao Bregman ISS

  12. Outline Bregman Iteration Path Consistency Discussion Bias of LASSO Recall LASSO LASSO: β � β � 1 + t 2 n � y − X β � 2 min 2 . optimality condition: ρ t t = 1 nX T ( y − X β t ) , (2a) ρ t ∈ ∂ � β t � 1 , (2b) where λ = 1 / t is often used in literature. • Tibshirani’1996 (LASSO) • Chen-Donoho-Saunders’1996 (BPDN) Yuan Yao Bregman ISS

  13. Outline Bregman Iteration Path Consistency Discussion Bias of LASSO The Bias of LASSO • Path consistency: ∃ τ n ∈ (0 , ∞ ), supp (ˆ β τ n ) = S (e.g. , Zhao-Yu’06, Zou’06, Yuan-Lin’07, Wainwright’09) • LASSO is biased S − 1 (ˆ β τ n ) S = ˜ β ∗ Σ − 1 n ρ τ n , τ n > 0 τ n • e.g. X = Id , n = p = 1, � 0 , if τ < 1 / y ; ˆ β τ = y − 1 /τ, otherwise , • (Fan-Li’2001) non-convex penalty is necessary (SCAD, Zhang’s PLUS, Zou’s Adaptive LASSO, etc.) • Any other simple scheme? Yuan Yao Bregman ISS

  14. Outline Bregman Iteration Path Consistency Discussion Dynamics of Bregman Inverse Scale Space Differentiation of LASSO’s KKT Equation Taking derivative (assuming differentiability) w.r.t. t ρ t = 1 nX T ( y − X β t ) t ρ t = 1 nX T ( y − X ( ˙ ⇒ ˙ β t t + β t )) , ρ t ∈ ∂ � β t � 1 • Debias: sign-consistency ( sign ( β τ ) = sign ( β ∗ )) ⇒ oracle τ := ˙ β τ τ + β τ = ˜ estimator β ′ β ∗ • e.g. X = Id , n = p = 1, � 0 , if t < 1 / y ; β ′ t = y , otherwise , Yuan Yao Bregman ISS

  15. Outline Bregman Iteration Path Consistency Discussion Dynamics of Bregman Inverse Scale Space Inverse scale space (ISS) Nonlinear ODE (differential inclusion) ρ t = 1 nX T ( y − X β t ) , ˙ (3a) ρ t ∈ ∂ � β t � 1 . (3b) starting at t = 0 and ρ (0) = β (0) = 0 . • Replace ρ/ t in LASSO by d ρ/ d t • Burger-Gilboa-Osher-Xu’06 (image recovery and recovers the objects in an image in an inverse-scale order as t increases (larger objects appear in β t first)) Yuan Yao Bregman ISS

  16. Outline Bregman Iteration Path Consistency Discussion Dynamics of Bregman Inverse Scale Space Solution Path • β t is piece-wise constant in t : � y − X β � 2 β t k +1 = arg min β 2 subject to ( ρ t k +1 ) i β i ≥ 0 ∀ i ∈ S k +1 , (4) ∀ j ∈ T k +1 . β j = 0 • t k +1 = sup { t > t k : ρ t k + t − t k n X T ( y − X β t k ) ∈ ∂ � β t k � 1 } • ρ t is piece-wise linear in t ,  t − t k ρ t = ρ t k + t k +1 − t k ρ t k +1 ,  t ∈ [ t k , t k +1 ) , β t = β t k ,  • Sign consistency ρ t = sign ( β ∗ ) ⇒ β t = ˜ β ∗ Yuan Yao Bregman ISS

  17. Outline Bregman Iteration Path Consistency Discussion Discrete Algorithm: Linearized Bregman Iteration Discretized Algorithm Damped Dynamics : continuous solution path ρ t + 1 β t = 1 ˙ nX T ( y − X β t ) , ρ t ∈ ∂ � β t � 1 . ˙ (5) κ Linearized Bregman Iteration as forward Euler discretization (Osher-Burger-Goldfarb-Xu-Yin’05, Yin-Osher-Goldfarb-Darbon’08): for ρ k ∈ ∂ � β k � 1 , ρ k +1 + 1 κβ k +1 = ρ k + 1 κβ k + α k n X T ( y − X β k ) , • Damping factor: κ > 0 • Step size: α k Yuan Yao Bregman ISS

  18. Outline Bregman Iteration Path Consistency Discussion Discrete Algorithm: Linearized Bregman Iteration Comparisons Linearized Bregman Iteration : z t +1 = z t − α t X T ( X κ Shrink ( z t , 1) − y ) • This is not ISTA : z t +1 = Shrink ( z t − α t X T ( Xz t − y ) , λ ) • ISTA solves LASSO for fixed λ • This is not OMP which only adds in variables. • This is not Donoho-Maleki-Montanari’s AMP Yuan Yao Bregman ISS

  19. Outline Bregman Iteration Path Consistency Discussion Discrete Algorithm: Linearized Bregman Iteration AUC of ISS often beats LASSO n = 200, p = 100, S = { 1 , . . . , 30 } , x i ∼ N (0 , Σ p ) ( σ ij = 1 / (3 p ) for i � = j and 1 otherwise) Yuan Yao Bregman ISS

  20. Outline Bregman Iteration Path Consistency Discussion Discrete Algorithm: Linearized Bregman Iteration But regularization paths are different. Yuan Yao Bregman ISS

  21. Outline Bregman Iteration Path Consistency Discussion Path Consistency Theory We are going to present a consistency theory where • Under what conditions one can achieve • sign consistency (model selection consistency) • l 2 -consistency ( � β ( t ) − ˜ β ∗ � 2 ≤ O ( � s log p / n )) • When sign-consistency holds, Bregman ISS path returns the oracle estimator without bias • Early stopping regularization against overfitting noise Yuan Yao Bregman ISS

  22. Outline Bregman Iteration Path Consistency Discussion Assumptions (A1) Restricted Strongly Convex: ∃ γ ∈ (0 , 1], 1 nX T S X S ≥ γ I (A2) Incoherence/Irrepresentable Condition: ∃ η ∈ (0 , 1), � � − 1 � � 1 � 1 � 1 � � T X † � nX T � nX T nX T = T X S S X S ≤ 1 − η � � � � S � � � � ∞ � � ∞ • The incoherence condition is used independently in Tropp’04, Yuan-Lin’05, Zhao-Yu’06, and Zou’06, Wainwright’09,etc. Yuan Yao Bregman ISS

Recommend


More recommend