On the regularization properties of some spectral gradient methods - PowerPoint PPT Presentation

On the regularization properties of some spectral gradient methods Daniela di Serafino Department of Mathematics and Physics, Second University of Naples daniela.diserafino@unina2.it contributions from R. De Asmundis, G. Landi, W.W. Hager, G. Toraldo, M. Viola, H. Zhang PING (Inverse Problems in Geophysics) GNCS Project Opening Workshop – Florence, April 6, 2016

Outline Linear discrete inverse problems and gradient methods 1 Recent spectral gradient methods for QP: SDA and SDC 2 Regularization properties of SDA and SDC 3 Extension to bound-constrained QP 4 Possible applications in solving nonlinear inverse problems 5 Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, 2016 1 / 24

Linear discrete inverse problems and gradient methods Linear discrete inverse problem A ∈ R p × n , n ∈ R p , x ∈ R n , p ≥ n b = A x + n , A and b known data, A ill-conditioned, with singular values decaying to zero, and full rank n unknown, representing perturbations in the data x unknown, representing the object to be recovered Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, 2016 2 / 24

Linear discrete inverse problems and gradient methods Linear discrete inverse problem A ∈ R p × n , n ∈ R p , x ∈ R n , p ≥ n b = A x + n , A and b known data, A ill-conditioned, with singular values decaying to zero, and full rank n unknown, representing perturbations in the data x unknown, representing the object to be recovered 1 2 � b − A x � 2 Reformulation as linear least squares problem: minimize x ∈ R n n n � � u T u T i b i n x † = A † b = Exact least squares solution: v i = x true + v i σ i σ i i =1 i =1 A = U Σ V T , U = [ u 1 , . . . , u p ] ∈ R p × p , V = [ v 1 , . . . , v n ] ∈ R n × n , Σ = diag ( σ 1 , . . . , σ n ) ∈ R p × n useless, because the noise is amplified! Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, 2016 2 / 24

Linear discrete inverse problems and gradient methods Filter factors and iterative regularization n � u T i b Regularization by filter factors: x reg = φ i v i σ i i =1 choose φ i ≈ 1 to preserve the components of the solution corresponding to large σ i ’s, and φ i ≈ 0 to filter out the components corresponding to small σ i ’s Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, 2016 3 / 24

Linear discrete inverse problems and gradient methods Filter factors and iterative regularization n � u T i b Regularization by filter factors: x reg = φ i v i σ i i =1 choose φ i ≈ 1 to preserve the components of the solution corresponding to large σ i ’s, and φ i ≈ 0 to filter out the components corresponding to small σ i ’s Iterative regularization methods, with a suitable early stop, can provide useful regularized solutions x reg Widely investigated classical iterative methods (see, e.g., [Hanke ’95; Engl, Hanke & Neubauer ’96; Nagy & Palmer ’05]) : Landweber and Steepest Descent (SD): very slow but “stable” convergence, rarely used in practice unless they are coupled with ad hoc preconditioners CG (CGLS, LSQR): fast in reducing the error, but too sensitive to stopping criteria (an early or late stopping may significantly deteriorate the solution) Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, 2016 3 / 24

Linear discrete inverse problems and gradient methods Gradient methods for convex quadratic problems General framework f ( x ) ≡ 1 2 x T Q x − c T x QP: minimize choose x 0 ∈ R n ; k = 0 x ∈ R n while (not stop cond) do old origins [Cauchy 1847; Akaike 1959; g k = Q x − c Forsythe 1968] compute a suitable steplength α k x k +1 = x k − α k g k long considered bad and ineffective k = k + 1 because of slow convergence rate and end while oscillatory behaviour Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, 2016 4 / 24

Linear discrete inverse problems and gradient methods Gradient methods for convex quadratic problems General framework f ( x ) ≡ 1 2 x T Q x − c T x QP: minimize choose x 0 ∈ R n ; k = 0 x ∈ R n while (not stop cond) do old origins [Cauchy 1847; Akaike 1959; g k = Q x − c Forsythe 1968] compute a suitable steplength α k x k +1 = x k − α k g k long considered bad and ineffective k = k + 1 because of slow convergence rate and end while oscillatory behaviour Starting from [Barzilai & Borwein ’88], several more efficient gradient methods have been developed, with steplengths related to Hessian spectral properties [Friedlander, Mart´ ınez, Molina & Raydan ’99; Dai & Yuan ’03, ’05; Fletcher ’05, ’12; Dai, Hager, Schittowski & Zhang ’06; Yuan ’06, ’08; Frassoldati, Zanni & Zanghirati ’08; De Asmundis, dS , Riccio & Toraldo ’13; De Asmundis, dS , Hager, Toraldo & Zhang ’14; Gonzaga & Schneider ’15] ⇒ interest in the use of the new gradient methods as regularization methods [Ascher, van den Doel, Huang & Svaiter ’09; Cornelio, Porta, Prato & Zanni ’13; De Asmundis, dS & Landi ’16] Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, 2016 4 / 24

Linear discrete inverse problems and gradient methods Analysis of gradient methods (for linear least squares) g k = A T ( A x k − b ) , k = 0 , 1 , 2 , . . . if g 0 = � n i =1 µ 0 Write g k in terms of the SVD of A : i v i , then n k � � i = µ 0 (1 − α j σ 2 µ k µ k g k = i v i , i ) i i =1 j =0 Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, 2016 5 / 24

Linear discrete inverse problems and gradient methods Analysis of gradient methods (for linear least squares) g k = A T ( A x k − b ) , k = 0 , 1 , 2 , . . . if g 0 = � n i =1 µ 0 Write g k in terms of the SVD of A : i v i , then n k � � i = µ 0 (1 − α j σ 2 µ k µ k g k = i v i , i ) i i =1 j =0 if at the k -th iteration µ k i = 0 for some i , then µ l i = 0 for l > k µ k i = 0 iff µ 0 i = 0 or α j = 1 /σ 2 i for some j ≤ k  | µ k +1 | << | µ k i |  α k ≈ 1 i | µ k +1 | µ k = ⇒ | < r | if r > i r σ 2  i | µ k +1 | µ k if r < i and λ r > 2 σ 2 | > r | r i µ 0 µ 0 Non-restrictive assumptions: σ 1 > σ 2 > · · · > σ n , 1 � = 0, n � = 0 Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, 2016 5 / 24

Recent spectral gradient methods for QP: SDA and SDC A framework for building fast gradient methods A new steplength selection rule � α SD if mod( k , h + m ) < h k α k = α s ¯ otherwise, with s = max { i ≤ k : mod( i , h + m ) = h } h ≥ 2 α SD classical (Cauchy) SD steplength k α s “special” steplength with spectral properties ¯ In other words: make h consecutive exact line searches and then compute a different steplength, to be kept constant and applied in m consecutive gradient iterations Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, 2016 6 / 24

Recent spectral gradient methods for QP: SDA and SDC SDA method [De Asmundis, dS , Riccio & Toraldo ’13] Set ¯ α s = � α s , where � � − 1 1 1 � α s = + α SD α SD s s − 1 Let { x k } be the sequence of iterates generated by the SD method applied to the least squares problem, starting from any point x 0 . Then 1 k →∞ � lim α k = . σ 2 1 + σ 2 n Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, 2016 7 / 24

Recent spectral gradient methods for QP: SDA and SDC SDA method [De Asmundis, dS , Riccio & Toraldo ’13] Set ¯ α s = � α s , where � � − 1 1 1 � α s = + α SD α SD s s − 1 Let { x k } be the sequence of iterates generated by the SD method applied to the least squares problem, starting from any point x 0 . Then 1 k →∞ � lim α k = . σ 2 1 + σ 2 n SDA (SD with Alignment) combines the tendency of SD to choose its search direction in span { v 1 , v n } the tendency of the gradient method with α k = 1 / ( σ 2 1 + σ 2 n ) to align the search direction with v n , R -linear conv., but significant improvement of practical convergence speed over SD Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, 2016 7 / 24

Recent spectral gradient methods for QP: SDA and SDC SDC method [De Asmundis, dS , Hager, Toraldo & Zhang ’14] Set ¯ α s equal to the Yuan steplength [Yuan ’06] − 1  �  � 2 � � � g s � 2 1 1 1 1 α Y � s = 2 − + 4 � 2 + + �   α SD α SD α SD α SD � α SD s − 1 � g s − 1 � s s s − 1 s − 1 Let { x k } be the sequence generated by the SD method applied to the least squares problem, starting from any point x 0 . Then k = 1 k →∞ α Y lim . σ 2 1 Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, 2016 8 / 24

On the regularization properties of some spectral gradient methods - PowerPoint PPT Presentation

On the regularization properties of some spectral gradient methods Daniela di Serafino Department of Mathematics and Physics, Second University of Naples daniela.diserafino@unina2.it contributions from R. De Asmundis, G. Landi, W.W. Hager, G.

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 L. Rosasco

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

Regularization Overview Regularization Overview Problems & Multicollinearity We will

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Regularization Regularization is a general approach to add a complexity parameter to a

An indefinite inverse spectral problem of Stieltjes type Andreas Fleige, OTIND 2016 (joint work

CS7015 (Deep Learning) : Lecture 8 Regularization: Bias Variance Tradeoff, l2 regularization,

Regularization Paths Boosting fits a regularization path toward a max-margin classifier.

LIC-Based Regularization of Multi-Valued Images David Tschumperl CNRS UMR 6072 (GREYC/ENSICAEN)

Regularization of optimal control problems Daniel Wachsmuth (RICAM Linz) joint work with Gerd

Iterative regularization for general inverse problems Guillaume Garrigos with L. Rosasco and S.

10. Regularization More on tradeoffs Regularization Effect of using different norms

Regularization Methods for System Identification Input Design Biqiang MU Academy of Mathematics

Manifold Regularization Lorenzo Rosasco MIT, 9.520 L. Rosasco Manifold Regularization About

Regularization for Multi-Output Learning Lorenzo Rosasco 9.520 L. Rosasco Regularization for

What does Mathematical Notation actually mean, and how can computers process it? James Davenport

Multiplier tricks for spectral convergence Sabine B ogli (Imperial College London) Re 15

A generic algorithm for some optimization problems in rotagraphs and fasciagraphs Marwane

Recent advances on the acceleration of first-order methods in convex optimization . Juan

INTRODUCTION TO RHYTHM YU / LAMONT MARCH 27, 2018 2 REVIEW OF VOCAL TRACT LENGTH Review

Open problems in coding and cryptography Grard Cohen May 2, 2012 1 / 1 Outline 1 Packings 2

COMP 3403 Algorithm Analysis Part 1 Chapters 1 3 Jim Diamond CAR 409 Jodrey School

Sketchy Decisions Joel A. Tropp Steele Family Professor of Applied & Computational

On the regularization properties of some spectral gradient methods - PowerPoint PPT Presentation

On the regularization properties of some spectral gradient methods Daniela di Serafino Department of Mathematics and Physics, Second University of Naples daniela.diserafino@unina2.it contributions from R. De Asmundis, G. Landi, W.W. Hager, G.

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 L. Rosasco

An Introduction to Spectral Learning Hanxiao Liu November 8, 2013 An Introduction to Spectral

Regularization Overview Regularization Overview Problems &amp; Multicollinearity We will

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Regularization Regularization is a general approach to add a complexity parameter to a

An indefinite inverse spectral problem of Stieltjes type Andreas Fleige, OTIND 2016 (joint work

CS7015 (Deep Learning) : Lecture 8 Regularization: Bias Variance Tradeoff, l2 regularization,

Regularization Paths Boosting fits a regularization path toward a max-margin classifier.

LIC-Based Regularization of Multi-Valued Images David Tschumperl CNRS UMR 6072 (GREYC/ENSICAEN)

Regularization of optimal control problems Daniel Wachsmuth (RICAM Linz) joint work with Gerd

Iterative regularization for general inverse problems Guillaume Garrigos with L. Rosasco and S.

10. Regularization More on tradeoffs Regularization Effect of using different norms

Regularization Methods for System Identification Input Design Biqiang MU Academy of Mathematics

Manifold Regularization Lorenzo Rosasco MIT, 9.520 L. Rosasco Manifold Regularization About

Regularization for Multi-Output Learning Lorenzo Rosasco 9.520 L. Rosasco Regularization for

What does Mathematical Notation actually mean, and how can computers process it? James Davenport

Multiplier tricks for spectral convergence Sabine B ogli (Imperial College London) Re 15

A generic algorithm for some optimization problems in rotagraphs and fasciagraphs Marwane

Recent advances on the acceleration of first-order methods in convex optimization . Juan

INTRODUCTION TO RHYTHM YU / LAMONT MARCH 27, 2018 2 REVIEW OF VOCAL TRACT LENGTH Review

Open problems in coding and cryptography Grard Cohen May 2, 2012 1 / 1 Outline 1 Packings 2

COMP 3403 Algorithm Analysis Part 1 Chapters 1 3 Jim Diamond CAR 409 Jodrey School

Sketchy Decisions Joel A. Tropp Steele Family Professor of Applied &amp; Computational

Regularization Overview Regularization Overview Problems & Multicollinearity We will

Sketchy Decisions Joel A. Tropp Steele Family Professor of Applied & Computational