Trust Regions in Large-Scale Optimization and Regularization Marielba Rojas Department of Informatics and Mathematical Modelling Technical University of Denmark Visiting Delft University of Technology, The Netherlands GAMM Workshop on Applied and Numerical Linear Algebra Technische Universit¨ at Hamburg-Harburg Hamburg, Germany September 11-12, 2008 1
Part of this work is joint with Sandra A. Santos, Campinas, Brazil Danny C. Sorensen, Rice, USA Thanks Wake Forest University, CERFACS, and T.U. Delft. 2
Outline Trust Regions in Optimization Trust Regions in Regularization The Trust-Region Subproblem (TRS) Methods for the large-scale TRS Comparisons Applications Concluding Remarks 3
Trust Regions in Optimization 4
Unconstrained Optimization min f ( x ) R n x ∈ I where f ( x ) is a nonlinear, twice continuously-differentiable function. 5
Unconstrained Optimization min f ( x ) R n x ∈ I where f ( x ) is a nonlinear, twice continuously-differentiable function. Most methods for this problem generate a sequence of iterates x 0 , x 1 , . . . , x k such that f ( x k +1 ) < f ( x k ). Each x k minimizes a simple (linear, quadratic) model of f . 6
Unconstrained Optimization min f ( x ) R n x ∈ I where f ( x ) is a nonlinear, twice continuously-differentiable function. Most methods for this problem generate a sequence of iterates x 0 , x 1 , . . . , x k such that f ( x k +1 ) < f ( x k ). Each x k minimizes a simple (linear, quadratic) model of f . Two strategies to move from x k to x k +1 = x k + d : Line Search and Trust Region. 7
Unconstrained Optimization min f ( x ) R n x ∈ I where f ( x ) is a nonlinear, twice continuously-differentiable function. Most methods for this problem generate a sequence of iterates x 0 , x 1 , . . . , x k such that f ( x k +1 ) < f ( x k ). Each x k minimizes a simple (linear, quadratic) model of f . Two strategies to move from x k to x k +1 = x k + d : Line Search and Trust Region. Consider the following quadratic model of f at x k q k ( d ) = f ( x k ) + ∇ f ( x k ) T d + 1 2 d T Hd , where H is a symmetric matrix. 8
Unconstrained Optimization Line Search Methods : 9
Unconstrained Optimization Line Search Methods : Find the minimizer d k of the convex quadratic q k ( d ). Search along d k for a suitable step length α . α d k is the step. Require positive definite H . 10
Unconstrained Optimization Line Search Methods : Find the minimizer d k of the convex quadratic q k ( d ). Search along d k for a suitable step length α . α d k is the step. Require positive definite H . Trust-Region Methods : 11
Unconstrained Optimization Line Search Methods : Find the minimizer d k of the convex quadratic q k ( d ). Search along d k for a suitable step length α . α d k is the step. Require positive definite H . Trust-Region Methods : R n s . t . � d � ≤ ∆ k , ∆ k > 0 } . Find a minimizer of q k in { d ∈ I R n s . t . � d � ≤ ∆ k } is the trust region: { d ∈ I a region where we trust the model q k to be a good representation of f . ∆ k is the trust-region radius. d k is the step. Do not require positive definite H . 12
Unconstrained Optimization Remarks: 13
Unconstrained Optimization Remarks: Line Search and Trust Region are globalization techniques: transform local methods into global ones, ie methods that converge to a stationary point or to a local minimizer from any starting point. 14
Unconstrained Optimization Remarks: Line Search and Trust Region are globalization techniques: transform local methods into global ones, ie methods that converge to a stationary point or to a local minimizer from any starting point. Trust-Region Methods are slightly more robust. 15
Unconstrained Optimization Remarks: Line Search and Trust Region are globalization techniques: transform local methods into global ones, ie methods that converge to a stationary point or to a local minimizer from any starting point. Trust-Region Methods are slightly more robust. The Levenberg-Marquardt Method ( 1944, 1963 ) for nonlinear least squares problems is considered as the first trust-region method ( Mor´ e 1978 ). 16
Trust-Region Methods Given x 0 and ∆ 0 begin k := 1; ∆ := ∆ 0 ; repeat set d k as a solution to min q k ( d ) s.t. � d � ≤ ∆; ̺ := f ( x k ) − f ( x k +1 ) q k (0) − q k ( d k ) ; % gain factor if ̺ > 0 . 75 ∆ := 2 ∗ ∆; end if ̺ < 0 . 25 ∆ := ∆ / 3; end if ̺ > 0 x k := x k − 1 + d k ; end k := k + 1; until convergence end 17
Trust-Region Methods Main calculation per iteration: Trust-Region Subproblem (TRS) min 2 d T Hd + g T d 1 s . t . � d � ≤ ∆ where: g = ∇ f ( x k ). H is a symmetric matrix, usually an approximation to ∇ 2 f ( x k ). ∆ > 0. 18
Trust Regions in Regularization 19
Regularization: Linear Tikhonov Regularization: 1 2 � Ax − b � 2 2 + λ � x � 2 min 2 R n x ∈ I R m × n , m ≥ n large, from ill-posed problems. A ∈ I R m , containing noise, and A T b � = 0. b ∈ I λ > 0 is the Tikhonov regularization parameter. 20
Regularization: Linear Tikhonov Regularization: 1 2 � Ax − b � 2 2 + λ � x � 2 min 2 R n x ∈ I R m × n , m ≥ n large, from ill-posed problems. A ∈ I R m , containing noise, and A T b � = 0. b ∈ I λ > 0 is the Tikhonov regularization parameter. is equivalent to ( see Eld´ en 1977 ) 21
Regularization: Linear Tikhonov Regularization: 1 2 � Ax − b � 2 2 + λ � x � 2 min 2 R n x ∈ I R m × n , m ≥ n large, from ill-posed problems. A ∈ I R m , containing noise, and A T b � = 0. b ∈ I λ > 0 is the Tikhonov regularization parameter. is equivalent to ( see Eld´ en 1977 ) 1 2 � Ax − b � 2 min ( TRS ) 2 s . t . � x � 2 ≤ ∆ where ∆ > 0, plays the role of the regularization parameter. 22
Regularization: Nonlinear, Constrained min f ( x ) + λ g ( x ) x ∈ S R n , and λ is a where f , g are nonlinear functions, S ⊂ I regularization parameter. 23
Regularization: Nonlinear, Constrained min f ( x ) + λ g ( x ) x ∈ S R n , and λ is a where f , g are nonlinear functions, S ⊂ I regularization parameter. R n , Example 1: min � F ( x ) � 2 2 + λ � x � 2 2 s . t . x ∈ I R n → I R m . F : I 24
Regularization: Nonlinear, Constrained min f ( x ) + λ g ( x ) x ∈ S R n , and λ is a where f , g are nonlinear functions, S ⊂ I regularization parameter. R n , Example 1: min � F ( x ) � 2 2 + λ � x � 2 2 s . t . x ∈ I R n → I R m . F : I Could be solved with a trust-region method: 25
Regularization: Nonlinear, Constrained min f ( x ) + λ g ( x ) x ∈ S R n , and λ is a where f , g are nonlinear functions, S ⊂ I regularization parameter. R n , Example 1: min � F ( x ) � 2 2 + λ � x � 2 2 s . t . x ∈ I R n → I R m . F : I Could be solved with a trust-region method: Google returns 11,600 hits for “Levenberg-Marquardt nonlinear regularization” (all words), and 10,800 for “trust region nonlinear regularization” (all words). 26
Regularization: Nonlinear, Constrained min f ( x ) + λ g ( x ) x ∈ S R n , and λ is a where f , g are nonlinear functions, S ⊂ I regularization parameter. R n , Example 1: min � F ( x ) � 2 2 + λ � x � 2 2 s . t . x ∈ I R n → I R m . F : I Could be solved with a trust-region method: Google returns 11,600 hits for “Levenberg-Marquardt nonlinear regularization” (all words), and 10,800 for “trust region nonlinear regularization” (all words). Example 2: min 2 � Ax − b � 2 s . t . � x � 2 ≤ ∆ , x ≥ 0. 1 Could be solved with a trust-region-based method ( R & Steihaug 2002 ). 27
TRS in Optimization and Regularization Optimization Regularization Several TRS Linear: One TRS Nonlinear, Constrained: Several TRS (potential) Hard Case (potential) Hard Case (Near HC) not common likely 28
The Trust-Region Subproblem 29
The Trust-Region Subproblem (TRS) 1 2 x T Hx + g T x min s . t . � x �≤ ∆ R n × n , H = H T , n large. H ∈ I R n , g � = 0. g ∈ I ∆ > 0. � · � is the Euclidean norm. 30
The Trust-Region Subproblem (TRS) 1 2 x T Hx + g T x min s . t . � x �≤ ∆ R n × n , H = H T , n large. H ∈ I R n , g � = 0. g ∈ I ∆ > 0. � · � is the Euclidean norm. In optimization: H ≈ ∇ 2 f ( x k ), g = ∇ f ( x k ). 31
The Trust-Region Subproblem (TRS) 1 2 x T Hx + g T x min s . t . � x �≤ ∆ R n × n , H = H T , n large. H ∈ I R n , g � = 0. g ∈ I ∆ > 0. � · � is the Euclidean norm. In optimization: H ≈ ∇ 2 f ( x k ), g = ∇ f ( x k ). In (linear) regularization: H = A T A , g = − A T b . 32
Characterization of solutions. Gay 1981, Sorensen 1982. x ∗ with � x ∗ � ≤ ∆ is a solution of TRS with Lagrange multiplier λ ∗ , if and only if (i) ( H − λ ∗ I ) x ∗ = − g . (ii) H − λ ∗ I positive semidefinite. (iii) λ ∗ ≤ 0. (iv) λ ∗ ( � x ∗ � − ∆) = 0 33
Characterization of solutions. Gay 1981, Sorensen 1982. x ∗ with � x ∗ � ≤ ∆ is a solution of TRS with Lagrange multiplier λ ∗ , if and only if (i) ( H − λ ∗ I ) x ∗ = − g . (ii) H − λ ∗ I positive semidefinite. (iii) λ ∗ ≤ 0. (iv) λ ∗ ( � x ∗ � − ∆) = 0 Remark: � x � − ∆ = 0 is the secular equation. 34
Recommend
More recommend