greedy sparsity constrained optimization
play

Greedy Sparsity-Constrained Optimization Sohail Bahmani with Petros - PowerPoint PPT Presentation

Greedy Sparsity-Constrained Optimization Sohail Bahmani with Petros Boufounos and Bhiksha Raj 45 th Asilomar Conference, Nov. 2011 Outline Background Compressed Sensing Problem Formulation Generalizing Compressed Sensing


  1. Greedy Sparsity-Constrained Optimization Sohail Bahmani with Petros Boufounos and Bhiksha Raj 45 th Asilomar Conference, Nov. 2011

  2. Outline  Background  Compressed Sensing  Problem Formulation  Generalizing Compressed Sensing  Example  Prior Work  GraSP Algorithm  Main Result  Required Conditions  Example: ℓ 2 -regularized Logistic Regression 45 th Asilomar Conference, Nov. 2011

  3. Compressed Sensing (1) Linear Inverse Problem 𝐲 ⋆ ∈ ℝ 𝒒 Sparse signal Measurement matrix 𝐁 ∈ ℝ 𝑜×𝑞 𝐳 = 𝐁𝐲 ⋆ + 𝐟 Measurement 𝐟 ∈ ℝ 𝑜 Noise Given 𝐳 and 𝐁 with 𝑜 ≪ 𝑞 , estimate 𝐲 ⋆  Applications:  Biomedical Imaging, Image Denoising, Image Segmentation, Filter Design, System Identification, etc 45 th Asilomar Conference, Nov. 2011

  4. Compressed Sensing (2) 𝑞 𝑞 𝐲 0 = supp 𝐲 = 𝕁 𝑦 𝑗 ≠ 0 𝐲 1 = 𝑦 𝑗 𝑗=1 𝑗=1 ℓ 𝟏 -minimization ℓ 𝟐 -minimization arg min arg min 𝐲 0 𝐲 1 𝐲 𝐲 (L0) (L1) 𝐁𝐲 − 𝐳 2 ≤ 𝜗 𝐁𝐲 − 𝐳 2 ≤ 𝜗 subject to subject to (1) Convexify ℓ 𝟏 -constrained LS ℓ 𝟐 -constrained LS arg min arg min 2 2 𝐲 𝐁𝐲 − 𝐳 2 𝐲 𝐁𝐲 − 𝐳 2 (C0) (C1) 𝐲 0 ≤ 𝑡 𝐲 1 ≤ 𝑆 subject to subject to Use ℓ 1 -norm as a proxy for ℓ 0 -pseudonorm (Greedy) Approximate Solvers 45 th Asilomar Conference, Nov. 2011

  5. Generalizing Compressed Sensing  Common assumptions in CS consider nonlinear relations  The relation between the input and response has a linear form: 𝒛 = 𝐁𝐲 + 𝐟 2  The error is usually measured in squared error : 𝑔 𝐲 = 𝐁𝐲 − 𝐳 2 other measures of fidelity General Formulation Let 𝑔: ℝ 𝑞 → ℝ be a cost function. Approximate the solution to 𝐲 = arg min 𝑔(𝐲) 𝐲 subject to 𝐲 0 ≤ 𝑡. 2 we get the ℓ 0 -constrained least squares formulation in CS  For 𝑔 𝐲 = 𝐁𝐲 − 𝐳 2  We will see ℓ 𝟑 -regularized logistic loss as another example for 𝑔 𝐲  More generally, 𝑔 𝐲 can be the empirical loss associated with some observations in statistical estimation problems 45 th Asilomar Conference, Nov. 2011

  6. Example  Gene selection problem  Data points 𝐛 ∈ ℝ 𝑞 : Gene expression coefficients obtained from tissue samples  Labels 𝑧 ∈ 0,1 : Determines healthy ( 𝑧 = 0 ) vs. cancer ( 𝑧 = 1 ) samples 𝑜  Observation: 𝑜 copies of 𝐛, 𝑧 namely iid instances 𝐛 𝑗 , 𝑧 𝑗 𝑗=1  Restriction: Fewer samples than dimensions, i.e., 𝑜 < 𝑞  Goal: Find 𝑡 ≪ 𝑞 entries (i.e., variables) of data points 𝐛 using which label 𝑧 can be predicted with least “error” Nonlinearity  MLE  𝑧|𝐛 has a likelihood function that depends on a 𝑡 -sparse parameter vector 𝐲 1 𝑜 𝑜 Empirical loss: 𝑔 𝐲 = −log 𝑚(𝐲 ; 𝐛 𝑗 , 𝑧 𝑗 ) 𝑗=1  Min. the loss (equivalent to max. joint likelihood) to estimate true parameter 𝐲 ⋆ 45 th Asilomar Conference, Nov. 2011

  7. Prior Work  In statistical estimation framework: convex 𝑔 + ℓ 1 -regularization  Kakade et al. [AISTAT’ 09] : Loss functions from exponential family  Negahban et al. [NIPS’ 09] : M-estimators and “decomposable” norms  Agarwal et al. [NIPS’ 10] : Projected Gradient Descent with ℓ 1 -constraint  Issue: Sparsity cannot be guaranteed to be optimal, because  Nonlinearity causes solution-dependent error bounds that can become very large  ℓ 1 -regularization is merely a proxy to induce sparsity  We consider a greedy algorithm for the problem  Algorithm enforces sparsity directly  Generally has lower computational complexity 45 th Asilomar Conference, Nov. 2011

  8. Algorithm Inspired by the CoSaMP algorithm Gradient Support Pursuit [Needell & Tropp ’09] 𝑔 ⋅ and 𝑡 Input 𝐲 𝐴 = 𝛼𝑔 𝐲 𝐜 𝐜 𝑡 𝐲 Output 1 𝐲 = 𝟏 0. Initialize supp 𝐲 4 Repeat 𝒰 𝐴 = 𝛼𝑔 𝐲 1. Compute Gradient Ω 3 2. Identify Coordinates Ω = supp 𝐴 2𝑡 2 𝒰 = supp 𝐲 ⋃Ω 3. Merge Supports 5 4. Find Crude Estimate 𝐜 = arg min 𝑔 𝐲 𝐲 s.t. 𝐲| 𝒰 𝑑 = 𝟏 𝐲 = 𝐜 𝑡 5. Prune Until Halting Condition Holds Tractable because 𝑔 obeys certain conditions Update 45 th Asilomar Conference, Nov. 2011

  9. Main Result Theroem If 𝑔 satisfies certain properties then the estimate obtained at the 𝑗 -th iteration of GraSP obeys 𝑗 − 𝐲 ⋆ 2 ≤ 𝜆 𝑗 𝐲 ⋆ 2 + 𝐷 𝛼 𝑔 𝐲 ⋆ 𝐲 , ℐ 2 where ℐ contains the indices of the 3𝑡 largest coordinates of 𝛼𝑔 𝐲 ⋆ in magnitude.  For 𝜆 < 1 (ie., contraction factor ) we get linear rate of convergence up to an approximation error  In statistical estimation problems 𝛼𝑔 𝐲 ⋆ | ℐ can be related to the statistical precision of the estimator 45 th Asilomar Conference, Nov. 2011

  10. Required Conditions Definition (Stable Hessian Property) For 𝑔: ℝ 𝑞 ⟶ ℝ with Hessian 𝐈 𝑔 ⋅ let 𝚬 T 𝐈 𝑔 𝐲 𝚬 𝐵 𝑙 𝐲 ≔ sup supp 𝐲 ⋃supp 𝚬 ≤𝑙 𝚬 2 =1 𝚬 T 𝐈 𝑔 𝐲 𝚬 . 𝐶 𝑙 𝐲 ≔ inf supp 𝐲 ⋃supp 𝚬 ≤𝑙 𝚬 2 =1 Then we say 𝑔 satisfies SHP of order 𝑙 with constant 𝜈 𝑙 if we have 𝐵 𝑙 𝐲 𝐶 𝑙 𝐲 ≤ 𝜈 𝑙 for all 𝑙 -sparse vectors 𝐲 .  SHP basically says that symmetric restrictions of the Hessian are well-conditioned 1 2 as in CS, SHP implies the Restricted Isometry Property  For 𝑔 𝐲 = 2 𝐁𝐲 − 𝐳 2 1 + 𝜀 𝑙 ≤ 𝜈 𝑙 ⇒ 𝜀 𝑙 ≤ 𝜈 𝑙 − 1 𝜈 𝑙 + 1 1 − 𝜀 𝑙 45 th Asilomar Conference, Nov. 2011

  11. Example  Logistic model: 1  𝑧 ∣ 𝐛; 𝐲 ~ Bernoulli( 1+𝑓 − 𝐛,𝐲 ) 𝑜  For iid observation pairs 𝐛 𝑗 , 𝑧 𝑗 write the logistic loss as 𝑗=1 𝑜 ℒ 𝐲 ≔ 1 𝑜 log 1 + 𝑓 𝐛 𝑗 ,𝐲 − 𝑧 𝑗 𝐛 𝑗 , 𝐲 . 𝑗=1  ℓ 𝟑 -regularized logistic regression with sparsity constraint: 𝑔 𝐲 = ℒ 𝐲 + 𝜃 2 arg min 2 𝐲 2 𝐲 subject to 𝐲 0 ≤ 𝑡 . 𝛽 𝑙  We can show 𝜈 𝑙 ≤ 1 + 4𝜃 , where 𝛽 𝑙 = max 𝒧 𝜇 max (𝐁 𝒧 ) subject to 𝒧 ≤ 𝑙. 45 th Asilomar Conference, Nov. 2011

  12. Main Result Revisited If 𝑔 satisfies certain properties then the estimate obtained at the 𝑗 -th iteration of GraSP obeys 𝑗 − 𝐲 ⋆ 2 ≤ 𝜆 𝑗 𝐲 ⋆ 2 + 𝐷 𝛼 𝑔 𝐲 ⋆ 𝐲 , ℐ 2 where ℐ contains the indices of the 3𝑡 largest coordinates of 𝛼𝑔 𝐲 ⋆ in magnitude. Theorem If 𝑔 satisfies SHP of order 𝟓𝒕 with constant 𝝂 𝟓𝒕 < 𝟑 and 𝑪 𝟓𝒕 𝐲 > 𝝑 , then the estimate obtained at the 𝑗 -th iteration of GraSP obeys 2 − 1 𝑗 𝐲 ⋆ 2 + 2 𝜈 4𝑡 + 2 𝑗 − 𝐲 ⋆ 𝛼 𝑔 𝐲 ⋆ 𝐲 2 ≤ 𝜈 4𝑡 , 2 𝜗 2 − 𝜈 4𝑡 ℐ 2 where ℐ contains the indices of the 3𝑡 largest coordinates of 𝛼𝑔 𝐲 ⋆ in magnitude. 45 th Asilomar Conference, Nov. 2011

  13. Summary  Extend CS results to Nonlinear Models and Different Error Measures  ℓ 1 - regularization may not yield sufficiently sparse solutions because of the type of cost functions introduced by nonlinearities in the model  GraSP Algorithm  Greedy method that always gives a sparse solution  Accuracy is guaranteed for the class of functions that satisfy SHP  Linear rate of convergence up to the approximation error  Some interesting problems to study  Deterministic results, e.g., using equivalent of incoherence  Relax SHP to an entirely local condition 45 th Asilomar Conference, Nov. 2011

Recommend


More recommend