Bundle Method Proximal Term Hessian Heuristic Implementation Experiments A Dynamic Approach to Scaling in Bundle Methods for Convex Optimization Christoph Helmberg joint work with Alois Pichler TU Chemnitz • The Bundle Method and the Aggregate • Dynamic Choice of the Proximal Term • Relation to the Hessian in the Smooth Case • A Cheaper Scaling Heuristic • Implementational Issues • Some Numerical Experiments
Bundle Method Proximal Term Hessian Heuristic Implementation Experiments The Bundle Method for Nonsmooth Convex Optimization y ∈ R M min f ( y ) s.t. with f : R M → R convex (nonsmooth), M = { 1 , . . . , m } some index set
Bundle Method Proximal Term Hessian Heuristic Implementation Experiments The Bundle Method for Nonsmooth Convex Optimization y ∈ R M min f ( y ) s.t. with f : R M → R convex (nonsmooth), M = { 1 , . . . , m } some index set f is specified by a first order oracle : y ∈ R M it returns given ¯ • f (¯ y ) ∈ R function value f(y) y ) ∈ R M • g (¯ some subgradient g (not nec. unique) y ∀ y ∈ R M satisfying f ( y ) ≥ f (¯ y ) + � g (¯ y ) , y − ¯ y � (subg. ineq.)
Bundle Method Proximal Term Hessian Heuristic Implementation Experiments The Bundle Method for Nonsmooth Convex Optimization y ∈ R M min f ( y ) s.t. with f : R M → R convex (nonsmooth), M = { 1 , . . . , m } some index set f is specified by a first order oracle : y ∈ R M it returns given ¯ γ • f (¯ y ) ∈ R function value g y ) ∈ R M • g (¯ some subgradient (not nec. unique) ∀ y ∈ R M satisfying f ( y ) ≥ f (¯ y ) + � g (¯ y ) , y − ¯ y � (subg. ineq.) Each ω = ( γ, g ), γ = f (¯ y ) − � g , ¯ y � generates a linear minorant of f ∀ y ∈ R M f ω ( y ) := γ + � g , y � ≤ f ( y )
Bundle Method Proximal Term Hessian Heuristic Implementation Experiments The Bundle Method for Nonsmooth Convex Optimization y ∈ R M min f ( y ) s.t. with f : R M → R convex (nonsmooth), M = { 1 , . . . , m } some index set f is specified by a first order oracle : y ∈ R M it returns given ¯ • f (¯ y ) ∈ R function value y ) ∈ R M • g (¯ some subgradient (not nec. unique) ∀ y ∈ R M satisfying f ( y ) ≥ f (¯ y ) + � g (¯ y ) , y − ¯ y � (subg. ineq.) Each ω = ( γ, g ), γ = f (¯ y ) − � g , ¯ y � generates a linear minorant of f ∀ y ∈ R M f ω ( y ) := γ + � g , y � ≤ f ( y ) The collected minorants form the bundle , from this we select a model � y i � y i ) , γ = f (¯ y i ) − W ⊆ conv { ( γ, g ): g = g (¯ g , ¯ , i = 1 , . . . , k } ,
Bundle Method Proximal Term Hessian Heuristic Implementation Experiments The Bundle Method for Nonsmooth Convex Optimization y ∈ R M min f ( y ) s.t. with f : R M → R convex (nonsmooth), M = { 1 , . . . , m } some index set f is specified by a first order oracle : y ∈ R M it returns given ¯ • f (¯ y ) ∈ R function value y ) ∈ R M • g (¯ some subgradient (not nec. unique) ∀ y ∈ R M satisfying f ( y ) ≥ f (¯ y ) + � g (¯ y ) , y − ¯ y � (subg. ineq.) Each ω = ( γ, g ), γ = f (¯ y ) − � g , ¯ y � generates a linear minorant of f ∀ y ∈ R M f ω ( y ) := γ + � g , y � ≤ f ( y ) The collected minorants form the bundle , from this we select a model � y i � y i ) , γ = f (¯ y i ) − W ⊆ conv { ( γ, g ): g = g (¯ g , ¯ , i = 1 , . . . , k } , Any closed proper convex function is the sup over its linear minorants, f ( y ) = sup γ + � g , y � , choose compact W ⊆ W . ( γ, g ) ∈ W
Bundle Method Proximal Term Hessian Heuristic Implementation Experiments The Bundle Method for Nonsmooth Convex Optimization y ∈ R M min f ( y ) s.t. with f : R M → R convex (nonsmooth), M = { 1 , . . . , m } some index set f is specified by a first order oracle : y ∈ R M it returns given ¯ • f (¯ y ) ∈ R function value y ) ∈ R M • g (¯ some subgradient (not nec. unique) ∀ y ∈ R M satisfying f ( y ) ≥ f (¯ y ) + � g (¯ y ) , y − ¯ y � (subg. ineq.) Each ω = ( γ, g ), γ = f (¯ y ) − � g , ¯ y � generates a linear minorant of f ∀ y ∈ R M f ω ( y ) := γ + � g , y � ≤ f ( y ) The collected minorants form the bundle , from this we select a model � y i � y i ) , γ = f (¯ y i ) − W ⊆ conv { ( γ, g ): g = g (¯ g , ¯ , i = 1 , . . . , k } , Maximizing over all ω ∈ W gives a cutting model minorizing f , ∀ y ∈ R M f W ( y ) := max ω ∈ W f ω ( y ) ≤ f ( y )
Bundle Method Proximal Term Hessian Heuristic Implementation Experiments Proximal Bundle Method [Lemar´ echal78,Kiwiel90] convex function 3 2.5 2 Input: a convex function 1.5 given by a first order oracle 1 0.5 0 −0.5 1 −1 −1 0.5 −0.5 0 0 −0.5 0.5 1 −1
Bundle Method Proximal Term Hessian Heuristic Implementation Experiments Proximal Bundle Method [Lemar´ echal78,Kiwiel90] convex function 3 2.5 2 Input: a convex function 1.5 given by a first order oracle 1 0.5 0 −0.5 1 −1 −1 0.5 y −0.5 0 0 −0.5 0.5 1 −1
Bundle Method Proximal Term Hessian Heuristic Implementation Experiments Proximal Bundle Method [Lemar´ echal78,Kiwiel90] cutting plane model with g ∈ ∂ f (ˆ y ) 3 2.5 2 Input: a convex function 1.5 given by a first order oracle 1 0.5 0 −0.5 1 −1 0.5 −1 −0.5 0 y 0 −0.5 0.5 −1 1
Bundle Method Proximal Term Hessian Heuristic Implementation Experiments Proximal Bundle Method [Lemar´ echal78,Kiwiel90] cutting plane model with g ∈ ∂ f (ˆ y ) 3 2.5 2 Input: a convex function 1.5 given by a first order oracle 1 0.5 0 −0.5 1 −1 0.5 −1 −0.5 0 y 0 −0.5 0.5 −1 1 1. Find a candidate by solving min ω ∈ W f ω ( y ) max y
Bundle Method Proximal Term Hessian Heuristic Implementation Experiments Proximal Bundle Method [Lemar´ echal78,Kiwiel90] solve augmented model → ¯ y 3 2.5 2 Input: a convex function 1.5 given by a first order oracle 1 0.5 0 −0.5 1 −1 0.5 −1 −0.5 0 y 0 −0.5 0.5 −1 1 1. Find a candidate by solving the quadratic model y � 2 ω ∈ W f ω ( y ) + u min max 2 � y − ˆ y
Bundle Method Proximal Term Hessian Heuristic Implementation Experiments Proximal Bundle Method [Lemar´ echal78,Kiwiel90] solve augmented model → ¯ y 3 2.5 2 Input: a convex function 1.5 given by a first order oracle 1 0.5 0 −0.5 y+ 1 −1 0.5 −1 −0.5 0 y 0 −0.5 0.5 −1 1 1. Find a candidate by solving the quadratic model y � 2 ω ∈ W f ω ( y ) + u min max 2 � y − ˆ y 2. Evaluate the function and determine a subgradient (oracle)
Bundle Method Proximal Term Hessian Heuristic Implementation Experiments Proximal Bundle Method [Lemar´ echal78,Kiwiel90] solve augmented model → ¯ y 3 2.5 2 Input: a convex function 1.5 given by a first order oracle 1 0.5 0 −0.5 y+ 1 −1 0.5 −1 −0.5 0 y 0 −0.5 0.5 −1 1 1. Find a candidate by solving the quadratic model y � 2 ω ∈ W f ω ( y ) + u min max 2 � y − ˆ y 2. Evaluate the function and determine a subgradient (oracle) 3. Decide on • null step • descent step
Bundle Method Proximal Term Hessian Heuristic Implementation Experiments Proximal Bundle Method [Lemar´ echal78,Kiwiel90] improve cutting model in ¯ y 3 2.5 Input: a convex function 2 1.5 given by a first order oracle 1 0.5 0 −0.5 1 −1 0.5 −1 −0.5 0 0 −0.5 0.5 1 −1 1. Find a candidate by solving the quadratic model y � 2 ω ∈ W f ω ( y ) + u min max 2 � y − ˆ y 2. Evaluate the function and determine a subgradient (oracle) 3. Decide on • null step • descent step 4. Update model to contain at least aggregate and new minorant and iterate
Bundle Method Proximal Term Hessian Heuristic Implementation Experiments The Aggregate and Convergence Given weight u > 0, the quadratic subproblem is a saddle point problem � y � 2 = min ξ ω ( γ + g ⊤ y ) + u y � 2 ω ∈ W f ω ( y )+ u min max 2 � y − ˆ max 2 � y − ˆ y y ξω ≥ 0 � ξ ω =1 ( γ, g ) ∈ W
Bundle Method Proximal Term Hessian Heuristic Implementation Experiments The Aggregate and Convergence Given weight u > 0, the quadratic subproblem is a saddle point problem � y � 2 = min ξ ω ( γ + g ⊤ y ) + u y � 2 ω ∈ W f ω ( y )+ u min max 2 � y − ˆ max 2 � y − ˆ y y ξω ≥ 0 � ξ ω =1 ( γ, g ) ∈ W
Bundle Method Proximal Term Hessian Heuristic Implementation Experiments The Aggregate and Convergence Given weight u > 0, the quadratic subproblem is a saddle point problem � y � 2 = ξ ω ( γ + g ⊤ y ) + u y � 2 ω ∈ W f ω ( y )+ u min max 2 � y − ˆ max min 2 � y − ˆ y y ξω ≥ 0 � ξ ω =1 ( γ, g ) ∈ W
Bundle Method Proximal Term Hessian Heuristic Implementation Experiments The Aggregate and Convergence Given weight u > 0, the quadratic subproblem is a saddle point problem � y � 2 = ξ ω ( γ + g ⊤ y ) + u y � 2 ω ∈ W f ω ( y )+ u min max 2 � y − ˆ max min 2 � y − ˆ y y ξω ≥ 0 � ξ ω =1 ( γ, g ) ∈ W ω ) over R n × conv W yields Determining the saddle point (¯ y , ¯ • ¯ ω = (¯ γ, ¯ g ), the aggregate (the “best” minorant in conv W ), y − 1 • ¯ y = ˆ u ¯ g , the next candidate for evaluation.
Recommend
More recommend