computational optimization
play

Computational Optimization Advance Topics NonSmooth Optimization - PowerPoint PPT Presentation

Computational Optimization Advance Topics NonSmooth Optimization Reference: Nonlinear Optimization, Ruszynski,2006 Best Linear Separator: Supporting Plane Method Maximize distance Between two para supporting planes Distance = x


  1. Computational Optimization Advance Topics NonSmooth Optimization Reference: Nonlinear Optimization, Ruszynski,2006

  2. Best Linear Separator: Supporting Plane Method Maximize distance Between two para supporting planes Distance ⋅ = δ x w = “ Margin ” δ − β = ⋅ = β || || x w w

  3. Linearly Inseparable Case: Soft Margin Method Hinge loss 0 1 � + ∑ ( ) − ⋅ + 1 2 min || || max(0,1 ) w C y x w b i i 2 , , wb z = 1 i

  4. Nonsmooth Optimization If Objective is not differentiable Constraints are not diffentiable Then problem is nonsmooth. For today’s lecture, assume everything is convex but possible nonsmooth.

  5. Common non-smooth problems Problems involving max functions Problems involving absolute values Exact penalty formulations Lagrangian dual problems

  6. Strategy I Smooth the nonsmooth problem by reformulating with added variables and constraints � ∑ + 1 2 min || || w C z i 2 , , w b z = 1 i ( ) ⋅ + + ≥ 1 y x w b z i i i . s t ≥ = � 0 1,.., z i i But this increases the problem size

  7. Strategy II Tackle the nonsmooth problem directly. Problems can still be quite nice. Convex functions are always continuous. Need to generalize optimality conditions. Need to generalize algorithms.

  8. Subgradient Generalization of the gradient Definition → n Let : be a convex function. f R R Hinge loss ∈ n a vector g such that R ≥ + f(y) f(x) g' (y - x) is a subgradien t of at x. f 0 1 ≥ + f(y) f(x) g'(y-x)

  9. Subdifferential The subgradient may not be unique The set of all subgradients of f at x is called the subdifferential. ∈ ∂ ( x ) g f If f is differentiable, the subdifferential consists of one point, the gradient of f at x.

  10. Subgradient f(x)=max(0,1-x) Subgradient f(x) ∂ = > ( ) 0 x 1 f x if ∂ = − = ( ) [ 1 , 0 ] x 1 f x if ∂ = − < ( ) 1 x 1 f x if 0 1 ≥ + f(y) f(x) g'(y-x)

  11. Subgradient Method Analogous to Steepest Descent Basic algorithm + = − α 1 k k k k x x g ∈ k k ( ) where g f x α = τ γ k k k is stepsize 1 γ = k . . e g k max( ,|| ||) C g τ = k constant

  12. Stepsize harder The subgradient is not necessarily a direction of descent Contour plot − k g of function ∂ ( x ) f But fixed stepsize schemes can still work

  13. Subgradient descent Algorithms Like gradient descent but with subgradient. Catch the function may not decrease! Stepsize a bit tricky. Usually use fixed step sizes that must be sufficiently small. Or use trust region methods. Converges despite all that.

  14. Next hardest problem ∈ Solve min ( ) . . f x s t x X o Assuming project of x on to Xo is easy for example P(x)=min(||c-x||^2 s.t. L ≤ c ≤ U)

  15. Projected Subgradient descent method Basic algorithm + = − α 1 k k k k ( ) x P x g ∈ k k ( ) where g f x α k is stepsize Optimal if + = − α 1 k k k k ( ) x P x g

  16. Cutting Plane Methods Observe subgradient inequality holds for all y ≥ + k k k f(y) f(x ) g '(y-x ) { } ≥ + k k k f(y) max f(x ) g '(y-x ) k ≥ + 2 2 2 f(y) f(x ) g '(y-x ) ≥ + 1 1 1 f(y) f(x ) g '(y-x )

  17. Cutting Plane Algorithm To solve min f(x) with f subdifferential Start with x1 For k=1,2,…. ∈ ∂ k k ( ) g f x ∈ k arg min x z ≥ + − = i i i . . ( ) '( ) 1,.., s t z f x g y x i k + = 1 k k ( ) ( ) if f x f x then stop optimal

  18. Cutting Plane Method Converges for quite general cases If f is piecewise linear, requires a finite number of cuts. Easy to adapt to linearly constrained cases as well. Can converge slowly. Number of cuts is not bounded in general.

  19. Dual Problem is non smooth Optimize convex program min ( ) f x = . . s t Ax b ∈ 0 x X Lagrangian dual function θ λ = + λ − ( ) min ( ) ' ( ) f x b Ax ∈ x X 0 Lagrangian dual problem θ λ max ( ) λ ≥ 0

  20. Dual function subgradient Subgradient found by solving for ∈ + λ − k k arg min ( ) '( ) x f x b Ax ∈ x X then 0 = − ∈∂ θ λ k k k ( ) ( ) g b Ax

  21. Cutting Plane Method for Dual Problem Similar to unconstrained case except solve for some large fixed C λ ∈ k arg min z ≥ θ λ + − λ = i i i . . ( ) '( ) 1,.., s t z g y i k − ≤ ≤ C y C C constraints insure problem always has a solution

  22. Recover primal variables At optimality need to get back the primal solution x* Look at KKT of master problem Show that using multipliers of master, u, k ∑ = k * x u k x = 1 i

  23. Bundle Methods Problem with cutting plane method is that they may require too many cuts. Bundle methods get around this difficult by using a regularized master problem ρ 2 + − k min z x w ∈ 2 , z y X 0 ≥ + − ∈ i i i k . . ( ) '( ) s t z f x g y x i J

  24. Bundle Methods Wk is called the center. You don’t want to change the center unless you have added enough constraints to get a good decrease. Can drop some or all of the constraints that have 0 Lagrangian multiplers in the regularized master problem.

  25. Bundle algorithm 0. Set k=1, J={} and v1 = -infinity Calculate f(xk) and gk 1. if f(xk) < vk, add k to constraints in J 2. if k=1 or f(xk)<=(1-a)f(w k-1 )+af k-1 (x k ) then w k =x k else w k = w k-1 . Solve restricted master for (x k+1 ,v k+1 ) 3. If fk(x k+1 )=f(w k ) , then stop x k+1 optimal 4. Update J by removing cuts with negative 5. multipliers from solving the subproblems.

  26. Bundle Methods for Nonsmooth optimization No step size needed Nice check for optimality if a function achieves its lower bound it is optimal. Reduces to a series of nice convex quadratic subproblems Can remove constraints while still adding Finite convergence for piecewise linear convex function with polyhedral constraints. Can be extended to nonconvex nonsmooth optimization but things get a bit more tricky. Still only uses first order information so can be slow.

Recommend


More recommend