On the convergence rate of iterative 1 minimization algorithms - PowerPoint PPT Presentation

On the convergence rate of iterative ℓ 1 minimization algorithms Ignace Loris Applied Inverse Problems Vienna 21/7/2009 Ignace Loris Convergence of ℓ 1 algorithms

Theme comparison of iterative algorithms for sparse recovery: sparse ↔ “ ℓ 1 ” ← algorithms influence of noise and ill-conditioning on effectiveness � influence of noise and ill-conditioning on speed Ignace Loris Convergence of ℓ 1 algorithms

Overview sparsity through ℓ 1 penalty: illustration on toy example When does it work? ↔ How to make it work? Speed comparison of iterative algorithms Speed comparison of ‘continuation’ algorithms Ignace Loris Convergence of ℓ 1 algorithms

Setting Data vector: y (noisy) Model vector: x (coefficients in some basis) Linear relationship: Kx = y Problems: insufficient data, inconsistent data, ill-conditioning of K Solution: minimize a penalized least-squares functional: x rec = arg min � Kx − y � 2 + penalty x Ignace Loris Convergence of ℓ 1 algorithms

Old and new penalties Traditional: ℓ 2 -norm penalties x rec = arg min � Kx − y � 2 + λ � x � 2 x (advantage: linear equations) Sometimes better reconstructions by using other information Suppose the desired model x is sparse (few nonzero components) In that case, ℓ 1 penalty is suitable (Donoho et al. 2006 etc): x rec = arg min � Kx − y � 2 + 2 λ � x � 1 x Ignace Loris Convergence of ℓ 1 algorithms

Question 1: When does it work? Experiment: x input = sparse Input : y = Kx + n Data : x rec ℓ 1 reconstruction : Compare x rec to x input When does this work? Ignace Loris Convergence of ℓ 1 algorithms

ℓ 1 penalty for sparse recovery: toy example K x input 0.8 30 25 0.6 20 15 10 0.4 5 0 0.2 0 20 40 60 80 100 20 40 60 80 100 y = Kx input + n Synthetic noisy data: Ignace Loris Convergence of ℓ 1 algorithms

ℓ 1 penalty for sparse recovery: toy example K 30 25 20 15 10 5 0 0 20 40 60 80 100 y = Kx input + n Synthetic noisy data: � Kx − y � 2 + λ � x � 2 � Kx − y � 2 + 2 λ � x � 1 min min x x Ignace Loris Convergence of ℓ 1 algorithms

ℓ 1 penalty for sparse recovery: toy example K 30 25 20 15 10 5 0 0 20 40 60 80 100 y = Kx input + n Synthetic noisy data: � Kx − y � 2 + λ � x � 2 � Kx − y � 2 + 2 λ � x � 1 min min x x 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 20 40 60 80 100 20 40 60 80 100 Ignace Loris Convergence of ℓ 1 algorithms

ℓ 1 penalty for sparse recovery: toy example K x input 0.8 30 25 0.6 20 15 10 0.4 5 0 0.2 0 20 40 60 80 100 20 40 60 80 100 y = Kx input + n Synthetic noisy data: � Kx − y � 2 + λ � x � 2 � Kx − y � 2 + 2 λ � x � 1 min min x x 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 20 40 60 80 100 20 40 60 80 100 Ignace Loris Convergence of ℓ 1 algorithms

Success rate of noiseless sparse recovery with ℓ 1 no noise, K=random matrix (elements from a Gaussian distribution) Prepare data: y = Kx input Reconstruct: x rec = arg min Kx = y � x � 1 Ignace Loris Convergence of ℓ 1 algorithms

Success rate of noiseless sparse recovery with ℓ 1 no noise, K=random matrix (elements from a Gaussian distribution) Prepare data: y = Kx input 1 Reconstruct: x rec = arg min Kx = y � x � 1 Gray level ÷ probability of x rec = x input N = number of variables k / n n = number of data k = # of nonzero components in model ← White: success rate of perfect recovery=100% “Compressed Sensing” n / N 1 Bruckstein et al. (2009) Ignace Loris Convergence of ℓ 1 algorithms

Reconstruction error for noisy sparse recovery with ℓ 1 2% noise, K=random matrix (elements from a Gaussian distribution) Prepare data: y = Kx input + n Reconstruct: x rec = arg min x � Kx − y � 2 + 2 λ � x � 1 with � Kx rec − y � = � n � Ignace Loris Convergence of ℓ 1 algorithms

Reconstruction error for noisy sparse recovery with ℓ 1 2% noise, K=random matrix (elements from a Gaussian distribution) Prepare data: y = Kx input + n 1 Reconstruct: x rec = arg min x � Kx − y � 2 + 2 λ � x � 1 with � Kx rec − y � = � n � Gray level ÷ average error � x rec − x input � / � x input � � � N = number of variables k / n n = number of data k = # of nonzero components in x input ← reconstruction error= 2% n / N 1 Loris and Verhoeven (2009). Ignace Loris Convergence of ℓ 1 algorithms

Reconstruction error for noisy sparse recovery with ℓ 1 2% noise, K = submatrix of square matrix with condition # = 10 4 Prepare data: y = Kx input + n Reconstruct: x rec = arg min x � Kx − y � 2 + 2 λ � x � 1 with � Kx rec − y � = � n � Ignace Loris Convergence of ℓ 1 algorithms

Reconstruction error for noisy sparse recovery with ℓ 1 2% noise, K = submatrix of square matrix with condition # = 10 4 Prepare data: y = Kx input + n 1 Reconstruct: x rec = arg min x � Kx − y � 2 + 2 λ � x � 1 with � Kx rec − y � = � n � Gray level ÷ average error � x rec − x input � / � x input � � � N = number of variables k / n n = number of data k = # of nonzero components in x input ← reconstruction error= 2% n / N 1 Loris and Verhoeven (2009). Ignace Loris Convergence of ℓ 1 algorithms

Important consequence for evaluation of speed-up: success of sparse recovery by ℓ 1 minimisation depends on noise and condition number OUR goal here: evaluate convergence rate of iterative ℓ 1 minimization algorithms: x ( n ) � Kx − y � 2 + 2 λ � x � 1 ? x ≡ arg min → ¯ − x Hence: compare x ( n ) with ¯ x , not with x input ! catch 22? Ignace Loris Convergence of ℓ 1 algorithms

x ( λ ) Direct method for finding ¯ Minimization of: F ( x ) = � K x − y � 2 + 2 λ � x � 1 Nonlinear variational equations: � ( K T ( y − K ¯ x )) i = λ sgn (¯ x i ) x i � = 0 ¯ if | ( K T ( y − K ¯ x )) i | ≤ λ x i = 0 ¯ if Variational equations are piecewise linear ⇓ x ( λ ) is piecewise linear in λ ¯ 1 a direct method exists 2 (Efron et al., 2004; Osborne et al., 2000) Ignace Loris Convergence of ℓ 1 algorithms

x ( λ ) Direct method for finding ¯ If λ ≥ λ max ≡ max i | ( K T y ) i | , then ¯ x ( λ ) = 0. x ( λ > 0 ) is found by letting λ ց ¯ Need to solve ‘small’ linear system at every step x ( λ n ) is reached. Continue until desired solution ¯ x − y � 2 = � n � 2 ) ( E.g. when � K ¯ x ( λ ) for λ max ≥ λ ≥ λ min Result: ¯ Time-complexity: at first linear, later cubic 600 time (s) 500 400 300 200 100 200 400 600 800 1000 x # nonzero components of ¯ Alternative: iterative method Ignace Loris Convergence of ℓ 1 algorithms

Iterative soft-thresholding Iterative Soft Thresholding: x ( 0 ) = 0 x n + 1 = T ( x n ) with x + K T ( y − Kx ) � � T ( x ) ≡ S λ and S λ = component-wise soft-thresholding Advantages: very simple 1 x ) ≤ � x 0 − ¯ x � 2 F ( x n ) − F (¯ ∀ n > 0 2 2 n Ignace Loris Convergence of ℓ 1 algorithms

Iterative soft-thresholding IST is not very fast: 1 � x n − ¯ x � / � ¯ x � 0 5h 10h 15h 20h Real inversions: millions of variables Speed-up needed Ignace Loris Convergence of ℓ 1 algorithms

Fast Iterative Soft-Thresholding Algorithm Fista (Beck and Teboulle, 2008) x n + 1 = T ( x n + β n ( x n − x n − 1 )) with same � x + K T ( y − Kx ) � T ( x ) ≡ S λ 1.0 0.8 and where ( β n ) n is a fixed sequence of numbers: 0.6 0.4 0.2 0 20 40 60 80 100 Advantages: very simple 1 x ) ≤ 2 � x 0 − ¯ x � 2 F ( x n ) − F (¯ ∀ n > 0 2 n 2 Goes back to (Nesterov, 1983) Ignace Loris Convergence of ℓ 1 algorithms

Other iterative methods Projected Steepest Descent (Daubechies et al. 2008): x ( n ) + β ( n ) r ( n ) � x ( n + 1 ) = P R � (1) with r ( n ) = K T ( y − Kx ( n ) ) and β ( n ) = � r ( n ) � 2 / � Kr ( n ) � 2 . P R ( · ) = orthogonal projection onto an ℓ 1 ball of radius R : R the ‘GPSR-algorithm’ (Figueiredo et al., 2007) uses auxiliary variables u , v ≥ 0 with x = u − v the ‘ ℓ 1 -ls-algorithm’ (Kim et al., 2007), an interior point method using preconditioned conjugate gradient substeps Sparsa (Wright et al., 2009) . . . How to choose? → How to compare? Ignace Loris Convergence of ℓ 1 algorithms

How to compare? Evaluation of speed-up x � how ? � x n − ¯ convergence: − → 0 1 1 (a) (e) 0 5m 10m (b) (a) (c) (d) 0 10m 2h 4h 6h 8h Only gives an idea for one value of penalization parameter λ . → Prefer a range of λ : λ max ≥ λ ≥ λ min In one picture: error: � x n − ¯ x � / � ¯ x � 1 penalty: λ 2 computer time needed: t 3 How? Ignace Loris Convergence of ℓ 1 algorithms

Approximation isochrones: IST 20 100 200 x ( λ ) supp ¯ ← more sparse less sparse → 1 1m x � x � / � ¯ � x n − ¯ 10m 0 0 2 4 6 8 10 12 14 ← large λ log 2 λ max /λ small λ → Ignace Loris Convergence of ℓ 1 algorithms

Approximation isochrones: FISTA 20 100 200 x ( λ ) supp ¯ ← more sparse less sparse → 1 x � x � / � ¯ � x n − ¯ 0 0 2 4 6 8 10 12 14 ← large λ log 2 λ max /λ small λ → Ignace Loris Convergence of ℓ 1 algorithms

On the convergence rate of iterative 1 minimization algorithms - PowerPoint PPT Presentation

On the convergence rate of iterative 1 minimization algorithms Ignace Loris Applied Inverse Problems Vienna 21/7/2009 Ignace Loris Convergence of 1 algorithms Theme comparison of iterative algorithms for sparse recovery: sparse

Labor Classification Yrs Rate 1 Rate 2 Rate 3 Rate 4 Rate 5 Rate 6 Rate 7 Rate 8 Rate 9

On the rate of convergence of the Biggins martingale The rate of convergence Biggins martingale

Variable Rate Debt Options: Auction Rate Securities Auction Rate Securities What are Auction Rate

Convergence of Bender/Denk algorithm L 2 -error estimates and rate of convergence for Monte Carlo

Convergence Rate of Markov Chains Will Perkins April 16, 2013 Convergence Last class we saw that

Multi- -Disciplinary Convergence in Life Sciences: Disciplinary Convergence in Life Sciences:

OPCW SAB TWG OPCW SAB TWG OPCW SAB TWG OPCW SAB TWG Convergence in Chemistry and Biology

Asymptotics Review Harvard Math Camp - Econometrics Ashesh Rambachan Summer 2018 Outline Types

II of large Number Lattin in probability almost convergence convergence sure - - "

NS NSF Convergence Accelerator Chaitan Baru Senior Science Advisor, Convergence Accelerator

CS 557 BGP Convergence Improved BGP Convergence via Ghost Flushing Bremler-Barr, Afek, Schwarz,

Some Thoughts on MC Convergence first, would like to define what I mean two kinds of

19 FEM Convergence Requirements IFEM Ch 19 Slide 1 Introduction to FEM Convergence

27 MARCH 2014 27 MARCH 2012 1 BITR and BDTI Rate evolution BITR Rate Evolution (ws) BDTI Rate

Rate Proceeding November 5, 2019 Chehalis Agenda Whats Driving the Rate Increase?

Interest Rate Swap and Interest Rate Swap and Variable Rate Debt Programs Variable Rate Debt

Loris Marchal HDR defense Memory and data aware scheduling committee : Umit C ataly

Efficient, Modular Metadata Management with Loris Richard van Heuven van Staereling Raja

Tools for Collaboration Dr. Gabriela Avram Introduction to Digital Media 17 Introduction p

From control of networks to networked control Wing Shing Wong Department of Information

Time-Cost Trade-offs of Pipelined Dataflow Applications Jonathan Kho 1 , Erik Saule 4 , Anas

Order-Optimal Permutation Codes in the Generalized Cayley Metric Siyi Yang , Clayton Schoeny, Lara

Approaching Mean-Variance Efficiency for Large Portfolios Yingying Li Department of ISOM &

The 5 th Competition on Syntax-Guided Synthesis Rajeev Alur, Dana Fisman, Rishabh Singh and

On the convergence rate of iterative 1 minimization algorithms - PowerPoint PPT Presentation

On the convergence rate of iterative 1 minimization algorithms Ignace Loris Applied Inverse Problems Vienna 21/7/2009 Ignace Loris Convergence of 1 algorithms Theme comparison of iterative algorithms for sparse recovery: sparse

Labor Classification Yrs Rate 1 Rate 2 Rate 3 Rate 4 Rate 5 Rate 6 Rate 7 Rate 8 Rate 9

On the rate of convergence of the Biggins martingale The rate of convergence Biggins martingale

Variable Rate Debt Options: Auction Rate Securities Auction Rate Securities What are Auction Rate

Convergence of Bender/Denk algorithm L 2 -error estimates and rate of convergence for Monte Carlo

Convergence Rate of Markov Chains Will Perkins April 16, 2013 Convergence Last class we saw that

Multi- -Disciplinary Convergence in Life Sciences: Disciplinary Convergence in Life Sciences:

OPCW SAB TWG OPCW SAB TWG OPCW SAB TWG OPCW SAB TWG Convergence in Chemistry and Biology

Asymptotics Review Harvard Math Camp - Econometrics Ashesh Rambachan Summer 2018 Outline Types

II of large Number Lattin in probability almost convergence convergence sure - - &quot;

NS NSF Convergence Accelerator Chaitan Baru Senior Science Advisor, Convergence Accelerator

CS 557 BGP Convergence Improved BGP Convergence via Ghost Flushing Bremler-Barr, Afek, Schwarz,

Some Thoughts on MC Convergence first, would like to define what I mean two kinds of

19 FEM Convergence Requirements IFEM Ch 19 Slide 1 Introduction to FEM Convergence

27 MARCH 2014 27 MARCH 2012 1 BITR and BDTI Rate evolution BITR Rate Evolution (ws) BDTI Rate

Rate Proceeding November 5, 2019 Chehalis Agenda Whats Driving the Rate Increase?

Interest Rate Swap and Interest Rate Swap and Variable Rate Debt Programs Variable Rate Debt

Loris Marchal HDR defense Memory and data aware scheduling committee : Umit C ataly

Efficient, Modular Metadata Management with Loris Richard van Heuven van Staereling Raja

Tools for Collaboration Dr. Gabriela Avram Introduction to Digital Media 17 Introduction p

From control of networks to networked control Wing Shing Wong Department of Information

Time-Cost Trade-offs of Pipelined Dataflow Applications Jonathan Kho 1 , Erik Saule 4 , Anas

Order-Optimal Permutation Codes in the Generalized Cayley Metric Siyi Yang , Clayton Schoeny, Lara

Approaching Mean-Variance Efficiency for Large Portfolios Yingying Li Department of ISOM &amp;

The 5 th Competition on Syntax-Guided Synthesis Rajeev Alur, Dana Fisman, Rishabh Singh and

II of large Number Lattin in probability almost convergence convergence sure - - "

Approaching Mean-Variance Efficiency for Large Portfolios Yingying Li Department of ISOM &