Sparse Recovery via Differential Inclusions Yuan Yao School of - PowerPoint PPT Presentation

Outline Bregman Iteration Path Consistency Discussion Sparse Recovery via Differential Inclusions Yuan Yao School of Mathematical Sciences Peking University September 2, 2014 with Stanley Osher (UCLA), Feng Ruan (PKU & Stanford), Jiechao Xiong (PKU), and Wotao Yin (UCLA), et al. Yuan Yao Bregman ISS

Outline Bregman Iteration Path Consistency Discussion 1 Inverse Scale Space (ISS) Dynamics ISS Dynamics of Bregman Inverse Scale Space Discrete Algorithm: Linearized Bregman Iteration 2 Path Consistency Theory Sign-consistency l 2 -consistency 3 Discussion Yuan Yao Bregman ISS

Outline Bregman Iteration Path Consistency Discussion ISS Background Assume that β ∗ ∈ R p is sparse and unknown. Consider recovering β ∗ from y = X β ∗ + ǫ, where ǫ is noise . Note • S := supp ( β ∗ ) and T be its complement. • X S ( X T ) be the columns of X with indices restricted on S ( T ) • ǫ ∼ N (0 , σ 2 ) (sub-Gaussian in general) • X is n -by- p , with p ≫ n . Yuan Yao Bregman ISS

Outline Bregman Iteration Path Consistency Discussion ISS Statistical Consistency of Algorithms • Orthogonal Matching Pursuit (OMP, Mallat-Zhang’93) • noise-free: Tropp’04 • noise: Cai-Wang’11 • LASSO (Tibshirani’96) • sign-consistency: Yuan-Lin’06, Zhao-Yu’06, Zou’07, Wainwright’09 • l 2 -consistency: Ritov-Bickel-Tsybakov’09 (also Dantzig) • related: BPDN (Chen-Donoho-Saunders’96), Dantzig Selector (Candes-Tao’07) • Anything else do you wanna hear? Yuan Yao Bregman ISS

Outline Bregman Iteration Path Consistency Discussion ISS Optimization + Noise = H.D. Statistics? • p >> n : impossible to be strongly convex n β L ( β ) := 1 � ρ ( y i − x T min i β ) , convex ρ (Huber’73) n i =1 • in presence of noise, not every optimizer arg min L ( β ) is desired: mostly overfitting • convex constraint/penalization: avoid overfiting, tractable but lead to bias ⇒ non-convex? (hard to find global optimizer) • dynamics: every algorithm is dynamics (Turing), not necessarily optimizing an objective function Yuan Yao Bregman ISS

Outline Bregman Iteration Path Consistency Discussion ISS Inverse Scale Space (ISS) Dynamics • Bregman ISS ρ ( t ) = 1 nX T ( y − X β ( t )) , ˙ ρ ( t ) ∈ ∂ � β ( t ) � 1 . Yuan Yao Bregman ISS

Outline Bregman Iteration Path Consistency Discussion ISS Inverse Scale Space (ISS) Dynamics • Bregman ISS ρ ( t ) = 1 nX T ( y − X β ( t )) , ˙ ρ ( t ) ∈ ∂ � β ( t ) � 1 . s.t. X T y = X T X β. Limit is solution to min β � β � 1 , Yuan Yao Bregman ISS

Outline Bregman Iteration Path Consistency Discussion ISS Inverse Scale Space (ISS) Dynamics • Bregman ISS ρ ( t ) = 1 nX T ( y − X β ( t )) , ˙ ρ ( t ) ∈ ∂ � β ( t ) � 1 . s.t. X T y = X T X β. Limit is solution to min β � β � 1 , • Linearized Bregman ISS ρ ( t ) + 1 β ( t ) = 1 ˙ nX T ( y − X β ( t )) , ˙ κ ρ ( t ) ∈ ∂ � β ( t ) � 1 . Yuan Yao Bregman ISS

Outline Bregman Iteration Path Consistency Discussion ISS Inverse Scale Space (ISS) Dynamics • Bregman ISS ρ ( t ) = 1 nX T ( y − X β ( t )) , ˙ ρ ( t ) ∈ ∂ � β ( t ) � 1 . s.t. X T y = X T X β. Limit is solution to min β � β � 1 , • Linearized Bregman ISS ρ ( t ) + 1 β ( t ) = 1 ˙ nX T ( y − X β ( t )) , ˙ κ ρ ( t ) ∈ ∂ � β ( t ) � 1 . 2 κ � β � 2 1 s.t. X T y = X T X β. Limit is solution to min β � β � 1 + 2 , Yuan Yao Bregman ISS

Outline Bregman Iteration Path Consistency Discussion ISS Algorithmic regularization We claim that there exists points on their paths ( β ( t ) , ρ ( t )) t ≥ 0 , which are • sparse • sign-consistent (the same sparsity pattern of nonzeros as true signal) • unbiased (or less bias) than LASSO Yuan Yao Bregman ISS

Outline Bregman Iteration Path Consistency Discussion Bias of LASSO Oracle Estimator If S is disclosed by an oracle, the oracle estimator is the subset least square solution with ˜ T = 0 and for Σ n = 1 n X T β ∗ S X S → Σ S , � 1 � S + 1 ˜ S = Σ − 1 nX T n Σ − 1 n X T β ∗ = β ∗ S y S ǫ, (1) n “Oracle properties” • Model selection consistency : supp (˜ β ∗ ) = S ; S ∼ N ( β ∗ , σ 2 • Normality : ˜ β ∗ n Σ − 1 n ). β ∗ is unbiased, i.e. E [˜ So ˜ β ∗ ] = β ∗ . Yuan Yao Bregman ISS

Outline Bregman Iteration Path Consistency Discussion Bias of LASSO Recall LASSO LASSO: β � β � 1 + t 2 n � y − X β � 2 min 2 . optimality condition: ρ t t = 1 nX T ( y − X β t ) , (2a) ρ t ∈ ∂ � β t � 1 , (2b) where λ = 1 / t is often used in literature. • Tibshirani’1996 (LASSO) • Chen-Donoho-Saunders’1996 (BPDN) Yuan Yao Bregman ISS

Outline Bregman Iteration Path Consistency Discussion Bias of LASSO The Bias of LASSO • Path consistency: ∃ τ n ∈ (0 , ∞ ), supp (ˆ β τ n ) = S (e.g. , Zhao-Yu’06, Zou’06, Yuan-Lin’07, Wainwright’09) • LASSO is biased S − 1 (ˆ β τ n ) S = ˜ β ∗ Σ − 1 n ρ τ n , τ n > 0 τ n • e.g. X = Id , n = p = 1, � 0 , if τ < 1 / y ; ˆ β τ = y − 1 /τ, otherwise , • (Fan-Li’2001) non-convex penalty is necessary (SCAD, Zhang’s PLUS, Zou’s Adaptive LASSO, etc.) • Any other simple scheme? Yuan Yao Bregman ISS

Outline Bregman Iteration Path Consistency Discussion Dynamics of Bregman Inverse Scale Space Differentiation of LASSO’s KKT Equation Taking derivative (assuming differentiability) w.r.t. t ρ t = 1 nX T ( y − X β t ) t ρ t = 1 nX T ( y − X ( ˙ ⇒ ˙ β t t + β t )) , ρ t ∈ ∂ � β t � 1 • Debias: sign-consistency ( sign ( β τ ) = sign ( β ∗ )) ⇒ oracle τ := ˙ β τ τ + β τ = ˜ estimator β ′ β ∗ • e.g. X = Id , n = p = 1, � 0 , if t < 1 / y ; β ′ t = y , otherwise , Yuan Yao Bregman ISS

Outline Bregman Iteration Path Consistency Discussion Dynamics of Bregman Inverse Scale Space Inverse scale space (ISS) Nonlinear ODE (differential inclusion) ρ t = 1 nX T ( y − X β t ) , ˙ (3a) ρ t ∈ ∂ � β t � 1 . (3b) starting at t = 0 and ρ (0) = β (0) = 0 . • Replace ρ/ t in LASSO by d ρ/ d t • Burger-Gilboa-Osher-Xu’06 (image recovery and recovers the objects in an image in an inverse-scale order as t increases (larger objects appear in β t first)) Yuan Yao Bregman ISS

Outline Bregman Iteration Path Consistency Discussion Dynamics of Bregman Inverse Scale Space Solution Path • β t is piece-wise constant in t : � y − X β � 2 β t k +1 = arg min β 2 subject to ( ρ t k +1 ) i β i ≥ 0 ∀ i ∈ S k +1 , (4) ∀ j ∈ T k +1 . β j = 0 • t k +1 = sup { t > t k : ρ t k + t − t k n X T ( y − X β t k ) ∈ ∂ � β t k � 1 } • ρ t is piece-wise linear in t ,  t − t k ρ t = ρ t k + t k +1 − t k ρ t k +1 ,  t ∈ [ t k , t k +1 ) , β t = β t k ,  • Sign consistency ρ t = sign ( β ∗ ) ⇒ β t = ˜ β ∗ Yuan Yao Bregman ISS

Outline Bregman Iteration Path Consistency Discussion Discrete Algorithm: Linearized Bregman Iteration Discretized Algorithm Damped Dynamics : continuous solution path ρ t + 1 β t = 1 ˙ nX T ( y − X β t ) , ρ t ∈ ∂ � β t � 1 . ˙ (5) κ Linearized Bregman Iteration as forward Euler discretization (Osher-Burger-Goldfarb-Xu-Yin’05, Yin-Osher-Goldfarb-Darbon’08): for ρ k ∈ ∂ � β k � 1 , ρ k +1 + 1 κβ k +1 = ρ k + 1 κβ k + α k n X T ( y − X β k ) , • Damping factor: κ > 0 • Step size: α k Yuan Yao Bregman ISS

Outline Bregman Iteration Path Consistency Discussion Discrete Algorithm: Linearized Bregman Iteration Comparisons Linearized Bregman Iteration : z t +1 = z t − α t X T ( X κ Shrink ( z t , 1) − y ) • This is not ISTA : z t +1 = Shrink ( z t − α t X T ( Xz t − y ) , λ ) • ISTA solves LASSO for fixed λ • This is not OMP which only adds in variables. • This is not Donoho-Maleki-Montanari’s AMP Yuan Yao Bregman ISS

Outline Bregman Iteration Path Consistency Discussion Discrete Algorithm: Linearized Bregman Iteration AUC of ISS often beats LASSO n = 200, p = 100, S = { 1 , . . . , 30 } , x i ∼ N (0 , Σ p ) ( σ ij = 1 / (3 p ) for i � = j and 1 otherwise) Yuan Yao Bregman ISS

Outline Bregman Iteration Path Consistency Discussion Discrete Algorithm: Linearized Bregman Iteration But regularization paths are different. Yuan Yao Bregman ISS

Outline Bregman Iteration Path Consistency Discussion Path Consistency Theory We are going to present a consistency theory where • Under what conditions one can achieve • sign consistency (model selection consistency) • l 2 -consistency ( � β ( t ) − ˜ β ∗ � 2 ≤ O ( � s log p / n )) • When sign-consistency holds, Bregman ISS path returns the oracle estimator without bias • Early stopping regularization against overfitting noise Yuan Yao Bregman ISS

Outline Bregman Iteration Path Consistency Discussion Assumptions (A1) Restricted Strongly Convex: ∃ γ ∈ (0 , 1], 1 nX T S X S ≥ γ I (A2) Incoherence/Irrepresentable Condition: ∃ η ∈ (0 , 1), � � − 1 � � 1 � 1 � 1 � � T X † � nX T � nX T nX T = T X S S X S ≤ 1 − η � � � � S � � � � ∞ � � ∞ • The incoherence condition is used independently in Tropp’04, Yuan-Lin’05, Zhao-Yu’06, and Zou’06, Wainwright’09,etc. Yuan Yao Bregman ISS

Sparse Recovery via Differential Inclusions Yuan Yao School of - PowerPoint PPT Presentation

Outline Bregman Iteration Path Consistency Discussion Sparse Recovery via Differential Inclusions Yuan Yao School of Mathematical Sciences Peking University September 2, 2014 with Stanley Osher (UCLA), Feng Ruan (PKU & Stanford),

Differential inclusions and applications Sweeping process Introduction New assumption Juliette

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Sparse Recovery and Fourier Sampling Eric Price MIT Eric Price (MIT) Sparse Recovery and

DIFFERENTIAL AROMA VOL DIFFERENTIAL AROMA VOL DIFFERENTIAL AROMA VOLATILES DIFFERENTIAL AROMA

Rigid Inclusions for Support of Roadways on Challenging Soils JASON GRIFFIN Vice President of

Mechanics of heterogeneous media Method of inclusions and its applications for random fibre

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Adaptive Sparse Recovery Eric Price MIT 2012-04-26 Joint work with Piotr Indyk and David

On strong invariance for semilinear differential inclusions Ovidiu Crj a "Al. I.

Tutorial: Differential Categories and Cartesian Differential Categories JS Pacaud Lemay FMCS

MLSS 06 - Canberra Elements Hierarchical Basis Sparse Grids Sparse Grids Combination

Differential Inclusion Method in High Dimensional Statistics Yuan Yao HKUST July 14, 2018 Yuan

Tutorial: Sparse Recovery Using Sparse Matrices Piotr Indyk MIT Problem Formulation

Strip Recovery: Strip Recovery: Strip Recovery: Strip Recovery: A 12 A 12- -Step

High-Dimensional Pattern Recognition via Sparse Representation Allen Y. Yang University of

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural

Global solution to non-convex optimization problems involving an approximate 0 penalization

Additional Topics on Linear Regression Ping Yu School of Economics and Finance The University of

Growth and Scalability Joe Black "product/market fit means being in a good market with a

Non-Convex Relaxations for Rank Regularization Carl Olsson 2019-05-01 Carl Olsson 2019-05-01 1

Graphlet Screening (GS) Achieves Optimal Rate in Variable Selection Jiashun Jin Carnegie Mellon

Forecasting intraday-load curve using sparse learning methods Dominique Picard LPMA- Universit

Boosting: more than an ensemble method for prediction Peter B uhlmann ETH Z urich

German EoI for Power Converters of SIS100 - SIS100 Dipole Power Converter 1678 k -

Sparse Recovery via Differential Inclusions Yuan Yao School of - PowerPoint PPT Presentation

Outline Bregman Iteration Path Consistency Discussion Sparse Recovery via Differential Inclusions Yuan Yao School of Mathematical Sciences Peking University September 2, 2014 with Stanley Osher (UCLA), Feng Ruan (PKU & Stanford),

Differential inclusions and applications Sweeping process Introduction New assumption Juliette

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

Sparse Recovery and Fourier Sampling Eric Price MIT Eric Price (MIT) Sparse Recovery and

DIFFERENTIAL AROMA VOL DIFFERENTIAL AROMA VOL DIFFERENTIAL AROMA VOLATILES DIFFERENTIAL AROMA

Rigid Inclusions for Support of Roadways on Challenging Soils JASON GRIFFIN Vice President of

Mechanics of heterogeneous media Method of inclusions and its applications for random fibre

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Adaptive Sparse Recovery Eric Price MIT 2012-04-26 Joint work with Piotr Indyk and David

On strong invariance for semilinear differential inclusions Ovidiu Crj a &quot;Al. I.

Tutorial: Differential Categories and Cartesian Differential Categories JS Pacaud Lemay FMCS

MLSS 06 - Canberra Elements Hierarchical Basis Sparse Grids Sparse Grids Combination

Differential Inclusion Method in High Dimensional Statistics Yuan Yao HKUST July 14, 2018 Yuan

Tutorial: Sparse Recovery Using Sparse Matrices Piotr Indyk MIT Problem Formulation

Strip Recovery: Strip Recovery: Strip Recovery: Strip Recovery: A 12 A 12- -Step

High-Dimensional Pattern Recognition via Sparse Representation Allen Y. Yang University of

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural

Global solution to non-convex optimization problems involving an approximate 0 penalization

Additional Topics on Linear Regression Ping Yu School of Economics and Finance The University of

Growth and Scalability Joe Black &quot;product/market fit means being in a good market with a

Non-Convex Relaxations for Rank Regularization Carl Olsson 2019-05-01 Carl Olsson 2019-05-01 1

Graphlet Screening (GS) Achieves Optimal Rate in Variable Selection Jiashun Jin Carnegie Mellon

Forecasting intraday-load curve using sparse learning methods Dominique Picard LPMA- Universit

Boosting: more than an ensemble method for prediction Peter B uhlmann ETH Z urich

German EoI for Power Converters of SIS100 - SIS100 Dipole Power Converter 1678 k -

On strong invariance for semilinear differential inclusions Ovidiu Crj a "Al. I.

Growth and Scalability Joe Black "product/market fit means being in a good market with a