Reduced-Hessian Methods for Constrained Optimization Philip E. Gill University of California, San Diego Joint work with: Michael Ferry & Elizabeth Wong 11th US & Mexico Workshop on Optimization and its Applications Huatulco, Mexico, January 8–12, 2018. UC San Diego | Center for Computational Mathematics 1/45
Our honoree . . . UC San Diego | Center for Computational Mathematics 2/45
Outline 1 Reduced-Hessian Methods for Unconstrained Optimization 2 Bound-Constrained Optimization 3 Quasi-Wolfe Line Search 4 Reduced-Hessian Methods for Bound-Constrained Optimization 5 Some Numerical Results UC San Diego | Center for Computational Mathematics 3/45
Reduced-Hessian Methods for Unconstrained Optimization UC San Diego | Center for Computational Mathematics 4/45
Definitions Minimize f : R n �→ R ∈ C 2 with quasi-Newton line-search method: Given x k , let f k = f ( x k ), g k = ∇ f ( x k ), and H k ≈ ∇ 2 f ( x k ). Choose p k such that x k + p k minimizes the quadratic model q k ( x ) = f k + g T k ( x − x k ) + 1 2 ( x − x k ) T H k ( x − x k ) If H k is positive definite then p k satisfies H k p k = − g k (qN step) UC San Diego | Center for Computational Mathematics 5/45
Definitions Define x k +1 = x k + α k p k where α k is obtained from line search on φ k ( α ) = f ( x k + α p k ) • Armijo condition : φ k ( α ) < φ k (0) + η A αφ ′ η A ∈ (0 , 1 k (0) , 2 ) • (strong) Wolfe conditions : φ k ( α ) < φ k (0) + η A αφ ′ η A ∈ (0 , 1 k (0) , 2 ) | φ ′ k ( α ) | ≤ η W | φ ′ k (0) | , η W ∈ ( η A , 1) UC San Diego | Center for Computational Mathematics 6/45
f ( x k + αp k ) Wolfe conditions α
Quasi-Newton Methods Updating H k : • H 0 = σ I n where σ > 0 • Compute H k +1 as the BFGS update to H k , i.e., 1 1 H k s k s T y k y T H k +1 = H k − k H k + k , s T y T k H k s k k s k where s k = x k +1 − x k , y k = g k +1 − g k , and y T k s k approximates the curvature of f along p k . • Wolfe condition guarantees that H k can be updated. One option to calculate p k : • Store upper-triangular Cholesky factor R k where R T k R k = H k UC San Diego | Center for Computational Mathematics 8/45
Quasi-Newton Methods Updating H k : • H 0 = σ I n where σ > 0 • Compute H k +1 as the BFGS update to H k , i.e., 1 1 H k s k s T y k y T H k +1 = H k − k H k + k , s T y T k H k s k k s k where s k = x k +1 − x k , y k = g k +1 − g k , and y T k s k approximates the curvature of f along p k . • Wolfe condition guarantees that H k can be updated. One option to calculate p k : • Store upper-triangular Cholesky factor R k where R T k R k = H k UC San Diego | Center for Computational Mathematics 8/45
Reduced-Hessian Methods (Fenelon, 1981 and Siegel, 1992) Let G k = span( g 0 , g 1 , . . . , g k ) and G ⊥ k be the orthogonal complement of G k in R n . Result Consider a quasi-Newton method with BFGS update applied to a general nonlinear function. If H 0 = σ I ( σ > 0), then: • p k ∈ G k for all k . • If z ∈ G k and w ∈ G ⊥ k , then H k z ∈ G k and H k w = σ w . UC San Diego | Center for Computational Mathematics 9/45
Reduced-Hessian Methods (Fenelon, 1981 and Siegel, 1992) Let G k = span( g 0 , g 1 , . . . , g k ) and G ⊥ k be the orthogonal complement of G k in R n . Result Consider a quasi-Newton method with BFGS update applied to a general nonlinear function. If H 0 = σ I ( σ > 0), then: • p k ∈ G k for all k . • If z ∈ G k and w ∈ G ⊥ k , then H k z ∈ G k and H k w = σ w . UC San Diego | Center for Computational Mathematics 9/45
Reduced-Hessian Methods (Fenelon, 1981 and Siegel, 1992) Let G k = span( g 0 , g 1 , . . . , g k ) and G ⊥ k be the orthogonal complement of G k in R n . Result Consider a quasi-Newton method with BFGS update applied to a general nonlinear function. If H 0 = σ I ( σ > 0), then: • p k ∈ G k for all k . • If z ∈ G k and w ∈ G ⊥ k , then H k z ∈ G k and H k w = σ w . UC San Diego | Center for Computational Mathematics 9/45
Reduced-Hessian Methods (Fenelon, 1981 and Siegel, 1992) Let G k = span( g 0 , g 1 , . . . , g k ) and G ⊥ k be the orthogonal complement of G k in R n . Result Consider a quasi-Newton method with BFGS update applied to a general nonlinear function. If H 0 = σ I ( σ > 0), then: • p k ∈ G k for all k . • If z ∈ G k and w ∈ G ⊥ k , then H k z ∈ G k and H k w = σ w . UC San Diego | Center for Computational Mathematics 9/45
Reduced-Hessian Methods Significance of p k ∈ G k : • No need to minimize the quadratic model over the full space. • Search directions lie in an expanding sequence of subspaces. Significance of H k z ∈ G k and H k w = σ w : • Curvature stored in H k along any unit vector in G ⊥ k is σ . • All nontrivial curvature information in H k can be stored in a smaller r k × r k matrix, where r k = dim( G k ). UC San Diego | Center for Computational Mathematics 10/45
Reduced-Hessian Methods Significance of p k ∈ G k : • No need to minimize the quadratic model over the full space. • Search directions lie in an expanding sequence of subspaces. Significance of H k z ∈ G k and H k w = σ w : • Curvature stored in H k along any unit vector in G ⊥ k is σ . • All nontrivial curvature information in H k can be stored in a smaller r k × r k matrix, where r k = dim( G k ). UC San Diego | Center for Computational Mathematics 10/45
Reduced-Hessian Methods Given a matrix B k ∈ R n × r k , whose columns span G k , let • B k = Z k T k be the QR decomposition of B k ; • W k be a matrix whose orthonormal columns span G ⊥ k ; � � • Q k = Z k W k . H k p k = − g k ⇔ ( Q T k H k Q k ) Q T k p k = − Q T Then, k g k , where � � � � Z T Z T Z T k H k Z k k H k W k k H k Z k 0 Q T k H k Q k = = W T W T k H k Z k k H k W k 0 σ I n − r k � � Z T k g k Q T k g k = . 0 UC San Diego | Center for Computational Mathematics 11/45
Reduced-Hessian Methods Given a matrix B k ∈ R n × r k , whose columns span G k , let • B k = Z k T k be the QR decomposition of B k ; • W k be a matrix whose orthonormal columns span G ⊥ k ; � � • Q k = Z k W k . H k p k = − g k ⇔ ( Q T k H k Q k ) Q T k p k = − Q T Then, k g k , where � � � � Z T Z T Z T k H k Z k k H k W k k H k Z k 0 Q T k H k Q k = = W T W T k H k Z k k H k W k 0 σ I n − r k � � Z T k g k Q T k g k = . 0 UC San Diego | Center for Computational Mathematics 11/45
Reduced-Hessian Methods Given a matrix B k ∈ R n × r k , whose columns span G k , let • B k = Z k T k be the QR decomposition of B k ; • W k be a matrix whose orthonormal columns span G ⊥ k ; � � • Q k = Z k W k . H k p k = − g k ⇔ ( Q T k H k Q k ) Q T k p k = − Q T Then, k g k , where � � � � Z T Z T Z T k H k Z k k H k W k k H k Z k 0 Q T k H k Q k = = W T W T k H k Z k k H k W k 0 σ I n − r k � � Z T k g k Q T k g k = . 0 UC San Diego | Center for Computational Mathematics 11/45
Reduced-Hessian Methods A reduced-Hessian (RH) method obtains p k from Z T k HZ k q k = − Z T p k = Z k q k where q k solves k g k , (RH step) which is equivalent to (qN step). In practice, we use a Cholesky factorization R T k R k = Z T k H k Z k . • The new gradient g k +1 is accepted iff � ( I − Z k Z T k ) g k +1 � > ǫ . • Store and update Z k , R k , Z T k p k , Z T k g k , and Z T k g k +1 . UC San Diego | Center for Computational Mathematics 12/45
Reduced-Hessian Methods A reduced-Hessian (RH) method obtains p k from Z T k HZ k q k = − Z T p k = Z k q k where q k solves k g k , (RH step) which is equivalent to (qN step). In practice, we use a Cholesky factorization R T k R k = Z T k H k Z k . • The new gradient g k +1 is accepted iff � ( I − Z k Z T k ) g k +1 � > ǫ . • Store and update Z k , R k , Z T k p k , Z T k g k , and Z T k g k +1 . UC San Diego | Center for Computational Mathematics 12/45
H k = Q k Q T k H k Q k Q T k � � Z T � � Z k � � k H k Z k 0 = Z k W k 0 σ I n − r k W k = Z k ( Z T k H k Z k ) Z T k + σ ( I − Z k Z T k ) . ⇒ any z such that Z T k z = 0 satisfies H k z = σ z . UC San Diego | Center for Computational Mathematics 13/45
Reduced-Hessian Method Variants Reinitialization: If g k +1 �∈ G k , the Cholesky factor R k is updated as � R k � 0 R k +1 ← √ σ k +1 , 0 where σ k +1 is based on the latest estimate of the curvature, e.g., σ k +1 = y T k s k . s T k s k Lingering: restrict search direction to a smaller subspace and allow the subspace to expand only when f is suitably minimized on that subspace. UC San Diego | Center for Computational Mathematics 14/45
Reduced-Hessian Method Variants Reinitialization: If g k +1 �∈ G k , the Cholesky factor R k is updated as � R k � 0 R k +1 ← √ σ k +1 , 0 where σ k +1 is based on the latest estimate of the curvature, e.g., σ k +1 = y T k s k . s T k s k Lingering: restrict search direction to a smaller subspace and allow the subspace to expand only when f is suitably minimized on that subspace. UC San Diego | Center for Computational Mathematics 14/45
Recommend
More recommend