Greedy Sparsity-Constrained Optimization Sohail Bahmani with Petros Boufounos and Bhiksha Raj 45 th Asilomar Conference, Nov. 2011
Outline Background Compressed Sensing Problem Formulation Generalizing Compressed Sensing Example Prior Work GraSP Algorithm Main Result Required Conditions Example: ℓ 2 -regularized Logistic Regression 45 th Asilomar Conference, Nov. 2011
Compressed Sensing (1) Linear Inverse Problem 𝐲 ⋆ ∈ ℝ 𝒒 Sparse signal Measurement matrix 𝐁 ∈ ℝ 𝑜×𝑞 𝐳 = 𝐁𝐲 ⋆ + 𝐟 Measurement 𝐟 ∈ ℝ 𝑜 Noise Given 𝐳 and 𝐁 with 𝑜 ≪ 𝑞 , estimate 𝐲 ⋆ Applications: Biomedical Imaging, Image Denoising, Image Segmentation, Filter Design, System Identification, etc 45 th Asilomar Conference, Nov. 2011
Compressed Sensing (2) 𝑞 𝑞 𝐲 0 = supp 𝐲 = 𝕁 𝑦 𝑗 ≠ 0 𝐲 1 = 𝑦 𝑗 𝑗=1 𝑗=1 ℓ 𝟏 -minimization ℓ 𝟐 -minimization arg min arg min 𝐲 0 𝐲 1 𝐲 𝐲 (L0) (L1) 𝐁𝐲 − 𝐳 2 ≤ 𝜗 𝐁𝐲 − 𝐳 2 ≤ 𝜗 subject to subject to (1) Convexify ℓ 𝟏 -constrained LS ℓ 𝟐 -constrained LS arg min arg min 2 2 𝐲 𝐁𝐲 − 𝐳 2 𝐲 𝐁𝐲 − 𝐳 2 (C0) (C1) 𝐲 0 ≤ 𝑡 𝐲 1 ≤ 𝑆 subject to subject to Use ℓ 1 -norm as a proxy for ℓ 0 -pseudonorm (Greedy) Approximate Solvers 45 th Asilomar Conference, Nov. 2011
Generalizing Compressed Sensing Common assumptions in CS consider nonlinear relations The relation between the input and response has a linear form: 𝒛 = 𝐁𝐲 + 𝐟 2 The error is usually measured in squared error : 𝑔 𝐲 = 𝐁𝐲 − 𝐳 2 other measures of fidelity General Formulation Let 𝑔: ℝ 𝑞 → ℝ be a cost function. Approximate the solution to 𝐲 = arg min 𝑔(𝐲) 𝐲 subject to 𝐲 0 ≤ 𝑡. 2 we get the ℓ 0 -constrained least squares formulation in CS For 𝑔 𝐲 = 𝐁𝐲 − 𝐳 2 We will see ℓ 𝟑 -regularized logistic loss as another example for 𝑔 𝐲 More generally, 𝑔 𝐲 can be the empirical loss associated with some observations in statistical estimation problems 45 th Asilomar Conference, Nov. 2011
Example Gene selection problem Data points 𝐛 ∈ ℝ 𝑞 : Gene expression coefficients obtained from tissue samples Labels 𝑧 ∈ 0,1 : Determines healthy ( 𝑧 = 0 ) vs. cancer ( 𝑧 = 1 ) samples 𝑜 Observation: 𝑜 copies of 𝐛, 𝑧 namely iid instances 𝐛 𝑗 , 𝑧 𝑗 𝑗=1 Restriction: Fewer samples than dimensions, i.e., 𝑜 < 𝑞 Goal: Find 𝑡 ≪ 𝑞 entries (i.e., variables) of data points 𝐛 using which label 𝑧 can be predicted with least “error” Nonlinearity MLE 𝑧|𝐛 has a likelihood function that depends on a 𝑡 -sparse parameter vector 𝐲 1 𝑜 𝑜 Empirical loss: 𝑔 𝐲 = −log 𝑚(𝐲 ; 𝐛 𝑗 , 𝑧 𝑗 ) 𝑗=1 Min. the loss (equivalent to max. joint likelihood) to estimate true parameter 𝐲 ⋆ 45 th Asilomar Conference, Nov. 2011
Prior Work In statistical estimation framework: convex 𝑔 + ℓ 1 -regularization Kakade et al. [AISTAT’ 09] : Loss functions from exponential family Negahban et al. [NIPS’ 09] : M-estimators and “decomposable” norms Agarwal et al. [NIPS’ 10] : Projected Gradient Descent with ℓ 1 -constraint Issue: Sparsity cannot be guaranteed to be optimal, because Nonlinearity causes solution-dependent error bounds that can become very large ℓ 1 -regularization is merely a proxy to induce sparsity We consider a greedy algorithm for the problem Algorithm enforces sparsity directly Generally has lower computational complexity 45 th Asilomar Conference, Nov. 2011
Algorithm Inspired by the CoSaMP algorithm Gradient Support Pursuit [Needell & Tropp ’09] 𝑔 ⋅ and 𝑡 Input 𝐲 𝐴 = 𝛼𝑔 𝐲 𝐜 𝐜 𝑡 𝐲 Output 1 𝐲 = 𝟏 0. Initialize supp 𝐲 4 Repeat 𝒰 𝐴 = 𝛼𝑔 𝐲 1. Compute Gradient Ω 3 2. Identify Coordinates Ω = supp 𝐴 2𝑡 2 𝒰 = supp 𝐲 ⋃Ω 3. Merge Supports 5 4. Find Crude Estimate 𝐜 = arg min 𝑔 𝐲 𝐲 s.t. 𝐲| 𝒰 𝑑 = 𝟏 𝐲 = 𝐜 𝑡 5. Prune Until Halting Condition Holds Tractable because 𝑔 obeys certain conditions Update 45 th Asilomar Conference, Nov. 2011
Main Result Theroem If 𝑔 satisfies certain properties then the estimate obtained at the 𝑗 -th iteration of GraSP obeys 𝑗 − 𝐲 ⋆ 2 ≤ 𝜆 𝑗 𝐲 ⋆ 2 + 𝐷 𝛼 𝑔 𝐲 ⋆ 𝐲 , ℐ 2 where ℐ contains the indices of the 3𝑡 largest coordinates of 𝛼𝑔 𝐲 ⋆ in magnitude. For 𝜆 < 1 (ie., contraction factor ) we get linear rate of convergence up to an approximation error In statistical estimation problems 𝛼𝑔 𝐲 ⋆ | ℐ can be related to the statistical precision of the estimator 45 th Asilomar Conference, Nov. 2011
Required Conditions Definition (Stable Hessian Property) For 𝑔: ℝ 𝑞 ⟶ ℝ with Hessian 𝐈 𝑔 ⋅ let 𝚬 T 𝐈 𝑔 𝐲 𝚬 𝐵 𝑙 𝐲 ≔ sup supp 𝐲 ⋃supp 𝚬 ≤𝑙 𝚬 2 =1 𝚬 T 𝐈 𝑔 𝐲 𝚬 . 𝐶 𝑙 𝐲 ≔ inf supp 𝐲 ⋃supp 𝚬 ≤𝑙 𝚬 2 =1 Then we say 𝑔 satisfies SHP of order 𝑙 with constant 𝜈 𝑙 if we have 𝐵 𝑙 𝐲 𝐶 𝑙 𝐲 ≤ 𝜈 𝑙 for all 𝑙 -sparse vectors 𝐲 . SHP basically says that symmetric restrictions of the Hessian are well-conditioned 1 2 as in CS, SHP implies the Restricted Isometry Property For 𝑔 𝐲 = 2 𝐁𝐲 − 𝐳 2 1 + 𝜀 𝑙 ≤ 𝜈 𝑙 ⇒ 𝜀 𝑙 ≤ 𝜈 𝑙 − 1 𝜈 𝑙 + 1 1 − 𝜀 𝑙 45 th Asilomar Conference, Nov. 2011
Example Logistic model: 1 𝑧 ∣ 𝐛; 𝐲 ~ Bernoulli( 1+𝑓 − 𝐛,𝐲 ) 𝑜 For iid observation pairs 𝐛 𝑗 , 𝑧 𝑗 write the logistic loss as 𝑗=1 𝑜 ℒ 𝐲 ≔ 1 𝑜 log 1 + 𝑓 𝐛 𝑗 ,𝐲 − 𝑧 𝑗 𝐛 𝑗 , 𝐲 . 𝑗=1 ℓ 𝟑 -regularized logistic regression with sparsity constraint: 𝑔 𝐲 = ℒ 𝐲 + 𝜃 2 arg min 2 𝐲 2 𝐲 subject to 𝐲 0 ≤ 𝑡 . 𝛽 𝑙 We can show 𝜈 𝑙 ≤ 1 + 4𝜃 , where 𝛽 𝑙 = max 𝜇 max (𝐁 ) subject to ≤ 𝑙. 45 th Asilomar Conference, Nov. 2011
Main Result Revisited If 𝑔 satisfies certain properties then the estimate obtained at the 𝑗 -th iteration of GraSP obeys 𝑗 − 𝐲 ⋆ 2 ≤ 𝜆 𝑗 𝐲 ⋆ 2 + 𝐷 𝛼 𝑔 𝐲 ⋆ 𝐲 , ℐ 2 where ℐ contains the indices of the 3𝑡 largest coordinates of 𝛼𝑔 𝐲 ⋆ in magnitude. Theorem If 𝑔 satisfies SHP of order 𝟓𝒕 with constant 𝝂 𝟓𝒕 < 𝟑 and 𝑪 𝟓𝒕 𝐲 > 𝝑 , then the estimate obtained at the 𝑗 -th iteration of GraSP obeys 2 − 1 𝑗 𝐲 ⋆ 2 + 2 𝜈 4𝑡 + 2 𝑗 − 𝐲 ⋆ 𝛼 𝑔 𝐲 ⋆ 𝐲 2 ≤ 𝜈 4𝑡 , 2 𝜗 2 − 𝜈 4𝑡 ℐ 2 where ℐ contains the indices of the 3𝑡 largest coordinates of 𝛼𝑔 𝐲 ⋆ in magnitude. 45 th Asilomar Conference, Nov. 2011
Summary Extend CS results to Nonlinear Models and Different Error Measures ℓ 1 - regularization may not yield sufficiently sparse solutions because of the type of cost functions introduced by nonlinearities in the model GraSP Algorithm Greedy method that always gives a sparse solution Accuracy is guaranteed for the class of functions that satisfy SHP Linear rate of convergence up to the approximation error Some interesting problems to study Deterministic results, e.g., using equivalent of incoherence Relax SHP to an entirely local condition 45 th Asilomar Conference, Nov. 2011
Recommend
More recommend