MATH529 Fundamentals of Optimization Trust Region Algorithms Marco - PowerPoint PPT Presentation

MATH529 – Fundamentals of Optimization Trust Region Algorithms Marco A. Montes de Oca Mathematical Sciences, University of Delaware, USA 1 / 23

Line Search vs. Trust Region Line Search Select a search (descent) direction p k . Select step size α k to ensure sufficient descent along f ( x k + α k p k ). Move to new point x k +1 = x k + α k p k . Trust Region Build model m k of f at x k . (Similar to Newton’s method.) k p + 1 p ∈ R n m k ( p ) = f k + g T 2 p T B k p Solve p k = min s.t. || p || ≤ ∆ k If predicted decrease is good enough, then x k +1 = x k + p k . Otherwise, x k +1 = x k and improve the model. 2 / 23

Acceptance criterion To measure how well the predicted decrease matches the actual decrease, we use: ρ k = f ( x k ) − f ( x k + p k ) . m k (0) − m k ( p k ) Given that m k (0) − m k ( p k ) > 0, if ρ k < 0 then the predicted reduction is not obtained, the step is rejected and ∆ k is decreased. If ρ k ≈ 1, then accept p k and increase ∆ k . If ρ k > 0 but not ≈ 1, then accept p k and do not change ∆ k . If ρ k > 0 but ≈ 0, the step may be accepted or not, and ∆ k is decreased. 3 / 23

Algorithm Inttialization: k = 0, ∆ 0 > 0, and x 0 by educated guess. Set η g ∈ (0 , 1) (typically, η g = 0 . 9), η a ∈ (0 , η g ) (typically, η a = 0 . 1), γ e ≥ 1 (typically, γ e = 2), and γ s ∈ (0 , 1) (typically, γ s = 0 . 5). Until convergence do: Build model m k ( p ). Solve trust region subproblem (result in p k ) Test acceptance criterion (result in ρ k ). If ρ k ≥ η g , then x k +1 = x k + p k and ∆ k +1 = γ e ∆ k Else If ρ k ≥ η a , then x k +1 = x k + p k Else If ρ k < η a , then ∆ k +1 = γ s ∆ k Increase k by one 4 / 23

Solving the trust region subproblem approximately We want to solve the subproblem as efficiently as possible. We want a solution that at least decreases the model as much as the steepest descent would subject to the size of the trust region. 5 / 23

Solving the trust region subproblem approximately From Ruszczy´ nski A. “Nonlinear Optimization” pp. 268. Princeton University Press. 2006. 6 / 23

Cauchy Point The Cauchy point can be found by minimizing the model along a line segment. k = − ∆ k g k Thus, let p s || g k || . (Point at the border of the trust region in the direction of steepest descent.) k = − τ k ∆ k g k The Cauchy point is p C k = τ k p s || g k || . To find τ k , consider k ) + 1 g ( τ ) = m k ( τ p s k ) = f k + g T k ( τ p s 2 ( τ p s k ) T B k ( τ p s k ) k + τ 2 m k ( τ p s k ) = f k + τ g T k p s 2 ( p s k ) T B k p s k Differentiating wrt τ : 0 = g ′ ( τ ) = g T k p s k + τ ( p s k ) T B k p s k , which means that 7 / 23

Cauchy Point g T k p s τ k = − k k . (1) k ) T B k p s ( p s k = − ∆ k g k Substituting p s || g k || in (1): g k T ( − ∆ k g k || g k || ) || g k || || g k || 3 1 1 τ k = − || g k || ) = k B k g k ) = k B k g k . ( − ∆ k g k || g k || ) T B k ( − ∆ k g k || g k || 2 ( g T 1 g T ∆ k ∆ k However, there may be two problems: a) τ k > ∆ k , or b) g T k B k g k ≤ 0, that is, B k is not positive definite. So, we define the Cauchy point as follows: Definition (Cauchy Point) k = − τ k ∆ k g k p C k = τ k p s || g k || , where || g k || 3 1 τ k = 1 if g T k B k g k ≤ 0, or τ k = min { 1 , k B k g k } otherwise. ∆ k g T 8 / 23

Cauchy step is a baseline of performance A reduction at least as good as the one obtained with the Cauchy step guarantees that the trust-region method is convergent. The Cauchy step is just a steepest descent step with fixed length (∆ k ). (Thus, it is inefficient.) The direction of the Cauchy step does not depend directly on B k , which means that curvature information is not exploited in its calculation. 9 / 23

Improvements over Cauchy step The main idea is to incorporate information provided by the “full k = − B − 1 step” (Newton step for the local model m k ): p B k g k whenever || p B k || ≤ ∆ k . Dogleg Method k be the solution to the subproblem. If ∆ k ≥ || p B Let p ⋆ k || , then k = − ∆ k g k k = p B k . If, however, ∆ k << || p B k ≈ p s p ⋆ k || , then p ⋆ || g k || . The idea of the dogleg method is to combine these two directions and search the minimum of the model along the resulting path p ( τ ): � � τ p U 0 ≤ τ ≤ 1, k � p ( τ ) = p U k + ( τ − 1)( p B k − p U k ) 1 < τ ≤ 2, k = − g T k g k where 0 ≤ τ ≤ 2, and p U k B k g k g k , i.e., the steepest descent g T step with exact length (see that if || p C k || < ∆ k , p U k = p C k ). 10 / 23

Dogleg Method Adapted from Nocedal J. and Wright S. “Numerical Optimization” 2nd. Ed. pp. 74. Springer. 2006. 11 / 23

Dogleg Method If B k is positive definite, m ( � p ( τ )) is a decreasing function of τ (Lemma 4.2, page 75). Therefore: p ( τ ) is attained at τ = 2 if || p B The minimum along � k || ≤ ∆ k . If || p B k || > ∆ k , we need to find τ such that || � p ( τ ) || = ∆ k . 12 / 23

Dogleg Method Example: f ( x , y ) = x 2 + 10 y 2 13 / 23

2D Subspace Minimization The dogleg is completely contained in the plane spanned by p U k and p B k . Therefore, one may extend the search to the whole subspace spanned by p U k and p B k , span [ p U k , p B k ]. 14 / 23

2D Subspace Minimization Given span [ p U k , p B k ] = { v | a p U k + b p B k } , a , b ∈ R . The subproblem is thus: � � k ) T ∇ f k + 1 f k + ( a p U k + b p B 2( a p U k + b p B k ) T B k ( a p U k + b p B min k ) a , b ∈ R s.t. || a p U k + b p B k || ≤ ∆ k , which can be solved using tools from constrained optimization. (To be discussed after break.) 15 / 23

Issues 16 / 23

Indefinite Hessians Problem: Newton’s step may not be decreasing. Example: Newton’s step solves the system Hf k p = −∇ f k . Now,   10 0 0  p = − (1 , − 3 , 2) T = ( − 1 , 3 , − 2) T . Thus,  0 3 0 0 0 − 1 p = ( − 1 / 10 , 1 , 2). However, p T ∇ f k > 0, thus p is not a descent direction. Solution approaches: Replace negative eigenvalues by some small positive number. Replace negative eigenvalues by their negative. 17 / 23

Replace negative eigenvalues by some small positive number   10 0 0   , so p T ∇ f k < 0, but p = ? Now Hf k = 0 3 0 10 − 6 0 0 18 / 23

Replace negative eigenvalues by some small positive number   10 0 0   , so p T ∇ f k < 0, but p = ? Now Hf k = 0 3 0 10 − 6 0 0 19 / 23

Replace negative eigenvalues by their negative   10 0 0   , so p T ∇ f k < 0, but p = ? Now Hf k = 0 3 0 0 0 1 20 / 23

In practice Perturb B k with β I such that: ( B k + β I ) p = − g , β (∆ k − || p || ) = 0, and B k + β I is positive semidefinite. with β ∈ ( − λ 1 , − 2 λ 1 ], where λ 1 is the most negative eigenvalue of B . 21 / 23

Further improvements Iterative solution of the subproblem: To avoid direct Hessian manipulation. Scaling: || D p || ≤ ∆ k . This created elliptical trust regions, which reduce the problem of different scaling of some variables. 22 / 23

Other methods Conjugate Gradient Methods: A set of nonzero vectors { p 0 , p 1 , . . . , ... p n } are conjugate wrt to a symmetric positive definite matrix A if p T i A p j = 0, for all i � = j . Quasi-Newton Methods: Use changes in gradient information to estimate a model of the function in order to achive superlinear convergence. Example: B k +1 α k p k = ∇ f k +1 − ∇ f k (BFGS Method). Derivative-free methods. Heuristic methods. 23 / 23

MATH529 Fundamentals of Optimization Trust Region Algorithms Marco - PowerPoint PPT Presentation

MATH529 Fundamentals of Optimization Trust Region Algorithms Marco A. Montes de Oca Mathematical Sciences, University of Delaware, USA 1 / 23 Line Search vs. Trust Region Line Search Select a search (descent) direction p k . Select step

MATH529 Fundamentals of Optimization Fundamentals of Constrained Optimization VIII:

MATH529 Fundamentals of Optimization Fundamentals of Constrained Optimization IV Marco A.

MATH529 Fundamentals of Optimization Fundamentals of Constrained Optimization VII: Duality

MATH529 Fundamentals of Optimization Fundamentals of Constrained Optimization II Marco A.

MATH529 Fundamentals of Optimization Fundamentals of Constrained Optimization VI: Duality

MATH529 Fundamentals of Optimization Fundamentals of Constrained Optimization V: Linear

MATH529 Fundamentals of Optimization Unconstrained Optimization II Marco A. Montes de Oca

MATH529 Fundamentals of Optimization Unconstrained Optimization I Marco A. Montes de Oca

MATH529 Fundamentals of Optimization Welcome! Marco A. Montes de Oca Mathematical Sciences,

Trust Region Method Lectures for PHD course on Numerical optimization Enrico Bertolazzi DIMS

TULA REGION TULA Moscow REGION Moscow region Kaluga region Tula Novomoskovsk Ryazan

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

MODULE 5 HVAC FUNDAMENTALS OF MODERN LABORATORY DESIGN Module 5 PG1 5 HVAC FUNDAMENTALS OF

Trust But Verify Trust But Verify Trust But Verify Trust But Verify What Is CEC Entertainment?

Dynamics, robustness and fragility Private trust Public trust of trust Conclusions Dusko

Gods stories Gods stories Trust Trust To Rely Upon Something Totally Trust trust:

Incremental Encoding of Pseudo-Boolean Goal Functions based on Comparator Networks Micha l

Unit 7: Multiple Linear Regression Lecture 1: Introduction to MLR Statistics 101 Thomas

CS 557 RED Random Early Detection Gateways for Congestion Avoidance S. Floyd and V. Jacobson,

Dynamic Ownership, Private Benefits, and Stock Prices Raffaele Corvino Cass Business School -

INDIAN HEALTH SERVICE FY 2016 PRESIDENT'S BUDGET Indian Health Service 1 FY 2016 Presidents

The Exotic Wave in 190 GeV p p at COMPASS Tobias Schl

Observation of the bottomonium ground state, b (1S), at BaBar PHENO 2009 Madison, 11-13 th May

Pentaquarks at LHCb Nathan Jurik Syracuse University On behalf of the LHCb Collaboration 2