spectral properties of steplength selections in gradient
play

Spectral properties of steplength selections in gradient methods: - PowerPoint PPT Presentation

Spectral properties of steplength selections in gradient methods: from unconstrained to constrained optimization L. Zanni Department of Physics, Informatics and Mathematics, University of Modena and Reggio Emilia, Italy Variational Methods and


  1. Spectral properties of steplength selections in gradient methods: from unconstrained to constrained optimization L. Zanni Department of Physics, Informatics and Mathematics, University of Modena and Reggio Emilia, Italy Variational Methods and Optimization in Imaging IHP - Paris, 4 - 8 February 2019 Joint work with: S. Crisci, V. Ruggiero , University of Ferrara, Italy F. Porta , University of Modena and Reggio Emilia, Italy L. Zanni Spectral properties of steplength selections in gradient methods Paris, 4 - 8 February 2019

  2. Outline Gradient methods for unconstrained problems 1 Spectral properties of steplength selections Design selection rules by exploiting spectral properties From the quadratic case to general unconstrained problems Gradient projection methods for box-constrained problems 2 Spectral properties of steplengths in the quadratic case New steplength rules taking into account the constraints Scaled gradient projection methods 3 Define the diagonal scaling The steplengths in variable metric approaches Practical behaviour in imaging Conclusions 4 L. Zanni Spectral properties of steplength selections in gradient methods Paris, 4 - 8 February 2019

  3. Motivation for the steplength analysis Constrained optimization problems min x ∈ Ω f ( x ) (1) f : R N − → R continuously differentiable function Ω ⊂ R N , nonempty closed convex set defined by simple constraints Gradient Projection (GP) methods for min x ∈ Ω f ( x ) x ( k ) + ϑ k d ( k ) x ( k +1) = d ( k ) = P Ω x ( k ) − α k ∇ f ( x ( k ) ) � � − x ( k ) ϑ k ∈ (0 , 1] , P Ω ( x ) = argmin z ∈ Ω � z − x � α k > 0 , Usually the updating rules for the steplength α k are those exploited in the unconstrained case: is this a suitable choice? L. Zanni Spectral properties of steplength selections in gradient methods Paris, 4 - 8 February 2019

  4. Spectral analysis of steplength selections ➤ The unconstrained case ➤ The box-constrained case ➤ The Scaled Gradient Projection methods L. Zanni Spectral properties of steplength selections in gradient methods Paris, 4 - 8 February 2019

  5. Steplength selection: the unconstrained case The recipe exploited by state-of-the-art selection rules: define steplengths by trying to capture, in an inexpensive way, some second order information design selection rules in the strictly convex quadratic case: f ( x ) = 1 2 x T A x − b T x , A symmetric positive definite second order information ↔ spectral properties of A design selection rules that generalize, in an inexpensive way, to non-quadratic cases ∇ 2 f ( x ( k ) ) depends on the iterations but ∇ 2 f ( x ( k ) ) → ∇ 2 f ( x ∗ ) L. Zanni Spectral properties of steplength selections in gradient methods Paris, 4 - 8 February 2019

  6. A popular example: the Barzilai-Borwein (BB) selection rules Consider the gradient method for the problem min f ( x ) : x ( k +1) = x ( k ) − α k ∇ f ( x ( k ) ) k = 0 , 1 , . . . , Suggestion [Barzilai-Borwein, IMA J. Num. Anal. 1988] : Force the matrix ( α k I ) − 1 to approximate the Hessian ∇ 2 f ( x ( k ) ) by imposing quasi-Newton properties � ( αI ) − 1 s ( k − 1) − z ( k − 1) � = s ( k − 1) T s ( k − 1) α BB1 = argmin k s ( k − 1) T z ( k − 1) α ∈ R or � s ( k − 1) − ( αI ) z ( k − 1) � = s ( k − 1) T z ( k − 1) α BB2 = argmin k z ( k − 1) T z ( k − 1) α ∈ R s ( k − 1) = x ( k ) − x ( k − 1) � z ( k − 1) = ( ∇ f ( x ( k ) ) − ∇ f ( x ( k − 1) )) . � where , L. Zanni Spectral properties of steplength selections in gradient methods Paris, 4 - 8 February 2019

  7. Spectral properties of the BB steplength rules Consider a gradient method for the quadratic unconstrained case: min f ( x ) ≡ 1 2 x T A x − b T x , A = diag ( λ 1 , . . . , λ N ) , 0 < λ 1 < · · · < λ N x ( k +1) = x ( k ) − α k g ( k ) , g ( k ) = ∇ f ( x ( k ) ) , k = 0 , 1 , . . . ➩ g ( k +1) = (1 − α k λ i ) g ( k ) i = 1 , . . . , N i i g ( k +1) g ( k + j ) - α k = 1 ⇒ = 0 ⇒ = 0 , j = 2 , 3 . . . λ i i i g ( k + N ) = 0 (Finite Termination) - α k + i − 1 = 1 λ i , i = 1 , . . . , N ⇒ α k must aim at approximating the inverse of the eigenvalues of A L. Zanni Spectral properties of steplength selections in gradient methods Paris, 4 - 8 February 2019

  8. BB rules in the quadratic case = g ( k − 1) T A g ( k − 1) = g ( k − 1) T g ( k − 1) 1 1 ≤ α BB2 α BB1 ≤ g ( k − 1) T A g ( k − 1) ≤ g ( k − 1) T A 2 g ( k − 1) k k λ N λ 1 Example A = diag ( λ 1 , . . . , λ 10 ) , λ i = 111 i − 110 f ( x ) = 1 2 x T A x − b T x b random vector; b i ∈ [ − 10 , 10] stopping rule: � g ( k ) � ≤ 10 − 8 � g (0) � L. Zanni Spectral properties of steplength selections in gradient methods Paris, 4 - 8 February 2019

  9. Quadratic case: exploiting spectral properties In the quadratic case ( A = diag ( λ 1 , . . . , λ N ) , 0 < λ 1 < · · · < λ N ), we have g ( k +1) = (1 − α k λ j ) g ( k ) • j = 1 , . . . , N j j  � � � � � g ( k +1) � g ( k ) � ≪ very useful � � � �  i i  �    α k ≈ 1  � � � � � g ( k +1) � g ( k ) • ⇒ � < if j < i useful � � � � j j λ i �    � � � � � g ( k +1) � g ( k )  � > if j > i, λ j > 2 λ i dangerous  � � � �  j j � α BB2 /α BB1 = cos 2 ( g ( k − 1) , A g ( k − 1) ) • k k Idea for improving the BB rules : force a sequence of small α BB2 to reduce | g i | for large i , leading to k gradients in which these components are not dominant after a sequence of small α k , if α BB2 /α BB1 ≈ 1 , exploit k k g T g α BB1 = aiming at obtaining α BB1 ≈ 1 /λ i for small i g T A g L. Zanni Spectral properties of steplength selections in gradient methods Paris, 4 - 8 February 2019

  10. Practical implementations of this idea: ABB and ABBmin rules Alternate Barzilai-Borwein selection rule [Zhou-Gao-Dai, COAP (2006) ]  α BB 2 α BB 2 if k < τ, τ ∈ (0 , 1)  k α BB 1  α ABB k = k  α BB 1 otherwise  k ABBmin rule [Frassoldati-Zanghirati-Zanni, JIMO (2008) ]  � α BB 2 � if α BB 2 / α BB 1 min | j = max { 1 , k − M α } , ..., k < τ j k k ABB min  α = k α BB 1 otherwise  k where M α > 0 is a parameter. L. Zanni Spectral properties of steplength selections in gradient methods Paris, 4 - 8 February 2019

  11. ABB and ABBmin rules on the previous toy problem ABB min ABB 0 0 10 10 −1 −1 10 10 α k α k −2 −2 10 10 −3 −3 10 10 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 40 45 Iterations Iterations Error Cauchy Steepest Descent (CSD) 0 10 α k = argmin α> 0 f ( x ( k ) − α k g ( k ) ) CSD BB1 −2 10 BB2 α k = α BB 1 BB1 → ABB ||x k −x * ||/||x * || k ABB min −4 10 α k = α BB 2 BB2 → k −6 ABB → alternation 10 ABBmin → modified alternation 50 100 150 200 250 300 350 400 450 500 Iterations L. Zanni Spectral properties of steplength selections in gradient methods Paris, 4 - 8 February 2019

  12. Similar behaviour on randomly generated test problems Quadratic test problems: N = 1000 λ N = 10 4 , λ 1 = 1 , λ i , i = 2 , . . . , N − 1 , log-spaced λ = 10 3 , λ = 1 , λ i = λ + ( λ − λ ) ∗ s i , i = 1 , . . . , N, s i ∈ (0 , 0 . 2) , i = 1 , . . . , N/ 2 , s i ∈ (0 . 8 , 1) , i = N/ 2 + 1 , . . . , N. [Di Serafino-Ruggiero-Toraldo-Z., AMC 2018] L. Zanni Spectral properties of steplength selections in gradient methods Paris, 4 - 8 February 2019

  13. Other efficient steplength rules based on spectral properties [Pronzato-Zhigljavsky, Comput. Optim. Appl. 50 (2011)] [Fletcher, Math. Program. Ser. A 135 (2012)] [Pronzato-Zhigljavsky-Bukina, Acta Appl. Math. 127 (2013)] [De Asmundis-Di Serafino-Riccio-Toraldo, IMA J. Numer. Anal. 33 (2013)]] [De Asmundis-Di Serafino-Hager-Toraldo-Zhan, Comput. Optim. Appl. 59 (2014)] [Gonzaga-Schneider, Comput. Optim. Appl. 63 (2016)] [Gonzaga, Math. Program. Ser. A 160 (2016)] Aimed at breaking the well-known cycling behaviour of the Steepest Descent method they share R-linear convergence rate in the quadratic case not all these rules easily generalize to general non-quadratic problems (BB-based rules have this crucial property) L. Zanni Spectral properties of steplength selections in gradient methods Paris, 4 - 8 February 2019

  14. General unconstrained problems : min x ∈ R N f ( x ) Gradient methods with nonmonotone linesearch: Init. : x (0) ∈ R N , 0 < α min ≤ α max , α 0 ∈ [ α min , α max ] , δ, σ ∈ (0 , 1) , M ∈ N ; for k = 0 , 1 , . . . f ref = max { f ( x ( k − j ) ) , 0 ≤ j ≤ min( k, M ) } ; ν k = α k ; f ( x ( k ) − ν k g ( k ) ) > f ref − σν k g ( k ) T g ( k ) while (line search) ν k = δν k ; end x ( k +1) = x ( k ) − ν k g ( k ) ; define a tentative steplength α k +1 ∈ [ α min , α max ] end ➤ tentative steplength: exploit effective steplength selections designed for the quadratic case and generalizable in an inexpensive way. ➤ R-linear convergence of { f ( x ( k ) ) } when f is strongly convex with Lipschitz-cont. gradient ( [Dai, JOTA 2002], [Dai-Liao, IMA J.Num.Anal. 2002] ) L. Zanni Spectral properties of steplength selections in gradient methods Paris, 4 - 8 February 2019

Recommend


More recommend