On the regularization properties of some spectral gradient methods Daniela di Serafino Department of Mathematics and Physics, Second University of Naples daniela.diserafino@unina2.it contributions from R. De Asmundis, G. Landi, W.W. Hager, G. Toraldo, M. Viola, H. Zhang PING (Inverse Problems in Geophysics) GNCS Project Opening Workshop – Florence, April 6, 2016
Outline Linear discrete inverse problems and gradient methods 1 Recent spectral gradient methods for QP: SDA and SDC 2 Regularization properties of SDA and SDC 3 Extension to bound-constrained QP 4 Possible applications in solving nonlinear inverse problems 5 Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, 2016 1 / 24
Linear discrete inverse problems and gradient methods Linear discrete inverse problem A ∈ R p × n , n ∈ R p , x ∈ R n , p ≥ n b = A x + n , A and b known data, A ill-conditioned, with singular values decaying to zero, and full rank n unknown, representing perturbations in the data x unknown, representing the object to be recovered Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, 2016 2 / 24
Linear discrete inverse problems and gradient methods Linear discrete inverse problem A ∈ R p × n , n ∈ R p , x ∈ R n , p ≥ n b = A x + n , A and b known data, A ill-conditioned, with singular values decaying to zero, and full rank n unknown, representing perturbations in the data x unknown, representing the object to be recovered 1 2 � b − A x � 2 Reformulation as linear least squares problem: minimize x ∈ R n n n � � u T u T i b i n x † = A † b = Exact least squares solution: v i = x true + v i σ i σ i i =1 i =1 A = U Σ V T , U = [ u 1 , . . . , u p ] ∈ R p × p , V = [ v 1 , . . . , v n ] ∈ R n × n , Σ = diag ( σ 1 , . . . , σ n ) ∈ R p × n useless, because the noise is amplified! Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, 2016 2 / 24
Linear discrete inverse problems and gradient methods Filter factors and iterative regularization n � u T i b Regularization by filter factors: x reg = φ i v i σ i i =1 choose φ i ≈ 1 to preserve the components of the solution corresponding to large σ i ’s, and φ i ≈ 0 to filter out the components corresponding to small σ i ’s Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, 2016 3 / 24
Linear discrete inverse problems and gradient methods Filter factors and iterative regularization n � u T i b Regularization by filter factors: x reg = φ i v i σ i i =1 choose φ i ≈ 1 to preserve the components of the solution corresponding to large σ i ’s, and φ i ≈ 0 to filter out the components corresponding to small σ i ’s Iterative regularization methods, with a suitable early stop, can provide useful regularized solutions x reg Widely investigated classical iterative methods (see, e.g., [Hanke ’95; Engl, Hanke & Neubauer ’96; Nagy & Palmer ’05]) : Landweber and Steepest Descent (SD): very slow but “stable” convergence, rarely used in practice unless they are coupled with ad hoc preconditioners CG (CGLS, LSQR): fast in reducing the error, but too sensitive to stopping criteria (an early or late stopping may significantly deteriorate the solution) Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, 2016 3 / 24
Linear discrete inverse problems and gradient methods Gradient methods for convex quadratic problems General framework f ( x ) ≡ 1 2 x T Q x − c T x QP: minimize choose x 0 ∈ R n ; k = 0 x ∈ R n while (not stop cond) do old origins [Cauchy 1847; Akaike 1959; g k = Q x − c Forsythe 1968] compute a suitable steplength α k x k +1 = x k − α k g k long considered bad and ineffective k = k + 1 because of slow convergence rate and end while oscillatory behaviour Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, 2016 4 / 24
Linear discrete inverse problems and gradient methods Gradient methods for convex quadratic problems General framework f ( x ) ≡ 1 2 x T Q x − c T x QP: minimize choose x 0 ∈ R n ; k = 0 x ∈ R n while (not stop cond) do old origins [Cauchy 1847; Akaike 1959; g k = Q x − c Forsythe 1968] compute a suitable steplength α k x k +1 = x k − α k g k long considered bad and ineffective k = k + 1 because of slow convergence rate and end while oscillatory behaviour Starting from [Barzilai & Borwein ’88], several more efficient gradient methods have been developed, with steplengths related to Hessian spectral properties [Friedlander, Mart´ ınez, Molina & Raydan ’99; Dai & Yuan ’03, ’05; Fletcher ’05, ’12; Dai, Hager, Schittowski & Zhang ’06; Yuan ’06, ’08; Frassoldati, Zanni & Zanghirati ’08; De Asmundis, dS , Riccio & Toraldo ’13; De Asmundis, dS , Hager, Toraldo & Zhang ’14; Gonzaga & Schneider ’15] ⇒ interest in the use of the new gradient methods as regularization methods [Ascher, van den Doel, Huang & Svaiter ’09; Cornelio, Porta, Prato & Zanni ’13; De Asmundis, dS & Landi ’16] Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, 2016 4 / 24
Linear discrete inverse problems and gradient methods Analysis of gradient methods (for linear least squares) g k = A T ( A x k − b ) , k = 0 , 1 , 2 , . . . if g 0 = � n i =1 µ 0 Write g k in terms of the SVD of A : i v i , then n k � � i = µ 0 (1 − α j σ 2 µ k µ k g k = i v i , i ) i i =1 j =0 Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, 2016 5 / 24
Linear discrete inverse problems and gradient methods Analysis of gradient methods (for linear least squares) g k = A T ( A x k − b ) , k = 0 , 1 , 2 , . . . if g 0 = � n i =1 µ 0 Write g k in terms of the SVD of A : i v i , then n k � � i = µ 0 (1 − α j σ 2 µ k µ k g k = i v i , i ) i i =1 j =0 if at the k -th iteration µ k i = 0 for some i , then µ l i = 0 for l > k µ k i = 0 iff µ 0 i = 0 or α j = 1 /σ 2 i for some j ≤ k | µ k +1 | << | µ k i | α k ≈ 1 i | µ k +1 | µ k = ⇒ | < r | if r > i r σ 2 i | µ k +1 | µ k if r < i and λ r > 2 σ 2 | > r | r i µ 0 µ 0 Non-restrictive assumptions: σ 1 > σ 2 > · · · > σ n , 1 � = 0, n � = 0 Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, 2016 5 / 24
Recent spectral gradient methods for QP: SDA and SDC A framework for building fast gradient methods A new steplength selection rule � α SD if mod( k , h + m ) < h k α k = α s ¯ otherwise, with s = max { i ≤ k : mod( i , h + m ) = h } h ≥ 2 α SD classical (Cauchy) SD steplength k α s “special” steplength with spectral properties ¯ In other words: make h consecutive exact line searches and then compute a different steplength, to be kept constant and applied in m consecutive gradient iterations Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, 2016 6 / 24
Recent spectral gradient methods for QP: SDA and SDC SDA method [De Asmundis, dS , Riccio & Toraldo ’13] Set ¯ α s = � α s , where � � − 1 1 1 � α s = + α SD α SD s s − 1 Let { x k } be the sequence of iterates generated by the SD method applied to the least squares problem, starting from any point x 0 . Then 1 k →∞ � lim α k = . σ 2 1 + σ 2 n Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, 2016 7 / 24
Recent spectral gradient methods for QP: SDA and SDC SDA method [De Asmundis, dS , Riccio & Toraldo ’13] Set ¯ α s = � α s , where � � − 1 1 1 � α s = + α SD α SD s s − 1 Let { x k } be the sequence of iterates generated by the SD method applied to the least squares problem, starting from any point x 0 . Then 1 k →∞ � lim α k = . σ 2 1 + σ 2 n SDA (SD with Alignment) combines the tendency of SD to choose its search direction in span { v 1 , v n } the tendency of the gradient method with α k = 1 / ( σ 2 1 + σ 2 n ) to align the search direction with v n , R -linear conv., but significant improvement of practical convergence speed over SD Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, 2016 7 / 24
Recent spectral gradient methods for QP: SDA and SDC SDC method [De Asmundis, dS , Hager, Toraldo & Zhang ’14] Set ¯ α s equal to the Yuan steplength [Yuan ’06] − 1 � � 2 � � � g s � 2 1 1 1 1 α Y � s = 2 − + 4 � 2 + + � α SD α SD α SD α SD � α SD s − 1 � g s − 1 � s s s − 1 s − 1 Let { x k } be the sequence generated by the SD method applied to the least squares problem, starting from any point x 0 . Then k = 1 k →∞ α Y lim . σ 2 1 Daniela di Serafino (II Univ. Naples) Regulariz. properties of gradient methods PING Workshop, April 6, 2016 8 / 24
Recommend
More recommend