recent advances on the acceleration of first order
play

Recent advances on the acceleration of first-order methods in convex - PowerPoint PPT Presentation

Recent advances on the acceleration of first-order methods in convex optimization . Recent advances on the acceleration of first-order methods in convex optimization . Juan PEYPOUQUET Universidad T ecnica Federico Santa Mar a Second


  1. Recent advances on the acceleration of first-order methods in convex optimization . Recent advances on the acceleration of first-order methods in convex optimization . Juan PEYPOUQUET Universidad T´ ecnica Federico Santa Mar´ ıa Second Workshop on Algorithms and Dynamics for Games and Optimization Santiago, January 25, 2016 . . . . . . . . . . . . . . . . . . . . .. . . .. . .. . .. . .. .. . .. . .. . .. . .. . . .. .. . . .. . .. . .. .. . .. . .. . .. . .. .

  2. Recent advances on the acceleration of first-order methods in convex optimization Content Basic first-order descent methods Nesterov’s acceleration Dynamic interpretation Damped Inertial Gradient System (DIGS) Properties of DIGS trajectories and accelerated algorithms A first-order variant bearing second-order information in time and space . . . . . . . . . . . . . . . . . . . . .. . . .. . .. . .. .. . . .. . .. .. . . .. . .. .. . .. . . .. . .. .. . .. . .. . .. . .. . .. .

  3. Recent advances on the acceleration of first-order methods in convex optimization Content Basic first-order descent methods Nesterov’s acceleration Dynamic interpretation Damped Inertial Gradient System (DIGS) Properties of DIGS trajectories and accelerated algorithms A first-order variant bearing second-order information in time and space . . . . . . . . . . . . . . . . . . . . .. . . .. . .. . .. .. . . .. . .. .. . . .. . .. .. . .. . . .. . .. .. . .. . .. . .. . .. . .. .

  4. Recent advances on the acceleration of first-order methods in convex optimization Content Basic first-order descent methods Nesterov’s acceleration Dynamic interpretation Damped Inertial Gradient System (DIGS) Properties of DIGS trajectories and accelerated algorithms A first-order variant bearing second-order information in time and space . . . . . . . . . . . . . . . . . . . . .. . . .. . .. . .. .. . . .. . .. .. . . .. . .. .. . .. . . .. . .. .. . .. . .. . .. . .. . .. .

  5. Recent advances on the acceleration of first-order methods in convex optimization Content Basic first-order descent methods Nesterov’s acceleration Dynamic interpretation Damped Inertial Gradient System (DIGS) Properties of DIGS trajectories and accelerated algorithms A first-order variant bearing second-order information in time and space . . . . . . . . . . . . . . . . . . . . .. . . .. . .. . .. .. . . .. . .. .. . . .. . .. .. . .. . . .. . .. .. . .. . .. . .. . .. . .. .

  6. Recent advances on the acceleration of first-order methods in convex optimization Content Basic first-order descent methods Nesterov’s acceleration Dynamic interpretation Damped Inertial Gradient System (DIGS) Properties of DIGS trajectories and accelerated algorithms A first-order variant bearing second-order information in time and space . . . . . . . . . . . . . . . . . . . . .. . . .. . .. . .. .. . . .. . .. .. . . .. . .. .. . .. . . .. . .. .. . .. . .. . .. . .. . .. .

  7. Recent advances on the acceleration of first-order methods in convex optimization B ASIC D ESCENT M ETHODS . . . . . . . . . . . . . . . . . . . . .. . . .. . .. .. . . .. . .. . .. . .. .. . .. . .. . .. . . .. .. . . .. .. . .. . .. . .. . .. .

  8. b b Recent advances on the acceleration of first-order methods in convex optimization Basic (first-order) descent methods Steepest descent dynamics: ˙ x ( t ) = −∇ ϕ ( x ( t )) , x ( 0 ) = x 0 ∇ ϕ ( x 0 ) x 0 x ( t ) S d x ( t ) ⟩ = −∥∇ ϕ ( x ( t )) ∥ 2 = −∥ ˙ x ( t ) ∥ 2 dt ϕ ( x ( t )) = ⟨∇ ϕ ( x ( t )) , ˙ . . . . . . . . . . . . . . . . . . . . .. . . .. . .. . .. . .. .. . .. . . .. . .. .. . . .. .. . . .. . .. . .. .. . .. . .. . .. . .. .

  9. b b Recent advances on the acceleration of first-order methods in convex optimization Basic (first-order) descent methods Steepest descent dynamics: ˙ x ( t ) = −∇ ϕ ( x ( t )) , x ( 0 ) = x 0 ∇ ϕ ( x 0 ) x 0 x ( t ) S d x ( t ) ⟩ = −∥∇ ϕ ( x ( t )) ∥ 2 = −∥ ˙ x ( t ) ∥ 2 dt ϕ ( x ( t )) = ⟨∇ ϕ ( x ( t )) , ˙ . . . . . . . . . . . . . . . . . . . . .. . . .. . .. . .. . .. .. . .. . . .. . .. .. . . .. .. . . .. . .. . .. .. . .. . .. . .. . .. .

  10. Recent advances on the acceleration of first-order methods in convex optimization Basic (first-order) descent methods Explicit discretization → gradient method (Cauchy 1847): x k + 1 − x k = −∇ ϕ ( x k ) ⇐ ⇒ x k + 1 = x k − λ ∇ ϕ ( x k ) . λ Implicit discretization → proximal method (Martinet 1970): z k + 1 − z k = −∇ ϕ ( z k + 1 ) ⇐ ⇒ z k + 1 + λ ∇ ϕ ( z k + 1 ) = z k . λ . . . . . . . . . . . . . . . . . . . . . .. . .. .. . .. . . .. . .. . .. . .. .. . .. . .. . .. . . .. .. . .. . .. . .. . .. . .. . .. .

  11. Recent advances on the acceleration of first-order methods in convex optimization Basic (first-order) descent methods Explicit discretization → gradient method (Cauchy 1847): x k + 1 − x k = −∇ ϕ ( x k ) ⇐ ⇒ x k + 1 = x k − λ ∇ ϕ ( x k ) . λ Implicit discretization → proximal method (Martinet 1970): z k + 1 − z k = −∇ ϕ ( z k + 1 ) ⇐ ⇒ z k + 1 + λ ∇ ϕ ( z k + 1 ) = z k . λ . . . . . . . . . . . . . . . . . . . . . .. . .. .. . .. . . .. . .. . .. . .. .. . .. . .. . .. . . .. .. . .. . .. . .. . .. . .. . .. .

  12. b b b Recent advances on the acceleration of first-order methods in convex optimization Basic (first-order) descent methods Gradient Proximal x k + 1 = x k − λ ∇ ϕ ( x k ) z k + 1 + λ ∇ ϕ ( z k + 1 ) = z k z k = x k S z k + 1 x k + 1 . . . . . . . . . . . . . . . . . . . . .. . . .. . .. . .. . .. .. . .. . .. . .. . . .. . .. .. . . .. . .. . .. .. . .. . .. . .. . .. .

  13. b b b Recent advances on the acceleration of first-order methods in convex optimization Basic (first-order) descent methods Gradient Proximal x k + 1 = x k − λ ∇ ϕ ( x k ) z k + 1 + λ ∇ ϕ ( z k + 1 ) = z k z k = x k S z k + 1 x k + 1 . . . . . . . . . . . . . . . . . . . . .. . . .. . .. . .. . .. .. . .. . .. . .. . . .. . .. .. . . .. . .. . .. .. . .. . .. . .. . .. .

  14. Recent advances on the acceleration of first-order methods in convex optimization Pros and cons Gradient method + Lower computational cost per iteration (explicit formula), easy implementation − Convergence depends strongly on the regularity of the function (typically ϕ ∈ C 1 , 1 ) and on the step sizes Proximal point algorithm + More stability, convergence certificate for a larger class of functions ( ∇ ϕ → ∂ϕ ), independent of the step size − Higher computational cost per iteration (implicit formula), often requires inexact computation . . . . . . . . . . . . . . . . . . . . .. . . .. . .. . .. . .. .. . .. . .. . .. . .. . . .. .. . . .. . .. . .. .. . .. . .. . .. . .. .

  15. Recent advances on the acceleration of first-order methods in convex optimization Pros and cons Gradient method + Lower computational cost per iteration (explicit formula), easy implementation − Convergence depends strongly on the regularity of the function (typically ϕ ∈ C 1 , 1 ) and on the step sizes Proximal point algorithm + More stability, convergence certificate for a larger class of functions ( ∇ ϕ → ∂ϕ ), independent of the step size − Higher computational cost per iteration (implicit formula), often requires inexact computation . . . . . . . . . . . . . . . . . . . . .. . . .. . .. . .. . .. .. . .. . .. . .. . .. . . .. .. . . .. . .. . .. .. . .. . .. . .. . .. .

  16. Recent advances on the acceleration of first-order methods in convex optimization Combining smooth and nonsmooth functions Problem min { Φ( x ) := F ( x ) + G ( x ) : x ∈ H } , where F is not smooth but G is. Forward-Backward Method ( x k → x k + 1 2 → x k + 1 ) x k + 1 + λ∂ F ( x k + 1 ) ∋ x k + 1 2 = x k − λ ∇ G ( x k ) x k + 1 = Prox λ F ◦ Grad λ G ( x k ) . . . . . . . . . . . . . . . . . . . . .. . .. . . .. .. . .. . .. . . .. . .. . .. . .. . .. .. . .. . .. . . .. .. . .. . .. . .. . .. .

  17. Recent advances on the acceleration of first-order methods in convex optimization Combining smooth and nonsmooth functions Problem min { Φ( x ) := F ( x ) + G ( x ) : x ∈ H } , where F is not smooth but G is. Forward-Backward Method ( x k → x k + 1 2 → x k + 1 ) x k + 1 + λ∂ F ( x k + 1 ) ∋ x k + 1 2 = x k − λ ∇ G ( x k ) x k + 1 = Prox λ F ◦ Grad λ G ( x k ) . . . . . . . . . . . . . . . . . . . . .. . .. . . .. .. . .. . .. . . .. . .. . .. . .. . .. .. . .. . .. . . .. .. . .. . .. . .. . .. .

Recommend


More recommend