acceleration through optimistic no regret dynamics
play

Acceleration through Optimistic No-Regret Dynamics Jun-Kun Wang and - PowerPoint PPT Presentation

Acceleration through Optimistic No-Regret Dynamics Jun-Kun Wang and Jacob Abernethy Georgia Tech Jun-Kun Wang and Jacob Abernethy Acceleration through Optimistic No-Regret Dynamics Convex Optimization min x X f ( x ) (1) Method: Gradient


  1. Acceleration through Optimistic No-Regret Dynamics Jun-Kun Wang and Jacob Abernethy Georgia Tech Jun-Kun Wang and Jacob Abernethy Acceleration through Optimistic No-Regret Dynamics

  2. Convex Optimization min x ∈X f ( x ) (1) Method: Gradient Descent, Frank-Wolfe method, Nesterov’s accelerated method, Heavy Ball ... etc. Jun-Kun Wang and Jacob Abernethy Acceleration through Optimistic No-Regret Dynamics

  3. Convex Optimization min x ∈X f ( x ) (1) Method: Gradient Descent, Frank-Wolfe method, Nesterov’s accelerated method, Heavy Ball ... etc. L -smooth convex problems min x ∈X f ( x ) . : Nesterov’s accelerated method: O ( 1 T 2 ) . L -smooth and µ -strongly convex problems min x ∈X f ( x ) . Denote κ := L µ . : Nesterov’s accelerated method: O ( exp ( − T √ κ )) . Jun-Kun Wang and Jacob Abernethy Acceleration through Optimistic No-Regret Dynamics

  4. Online learning (minimizing regret) Online learning protocol: 1: for t = 1 to T do Play w t according to OnlineAlgorithm w � � ℓ 1 ( w 1 ) , . . . , ℓ t − 1 ( w t − 1 ) . 2: 3: Receive loss function ℓ t ( · ) and suffer loss ℓ t ( w t ) . 4: end for T := � T t = 1 ℓ t ( w t ) − � T Regret w t = 1 ℓ t ( w ∗ ) . convex loss functions { ℓ t ( · ) } T t = 1 . Regret w = O ( 1 T ) . T √ T strongly convex loss functions { ℓ t ( · ) } T t = 1 . Regret w = O ( log T T ) . T T Jun-Kun Wang and Jacob Abernethy Acceleration through Optimistic No-Regret Dynamics

  5. New perspective: A two-player zero-sum game A zero-sum game ( Fenchel game ) g ( x , y ) := � x , y � − f ∗ ( y ) . V ∗ := min def Fenchel y ∈Y � x , y � − f ∗ ( y ) x ∈X max y ∈Y g ( x , y ) = min x ∈X max = x ∈X f ( x ) . min Jun-Kun Wang and Jacob Abernethy Acceleration through Optimistic No-Regret Dynamics

  6. New perspective: A two-player zero-sum game A zero-sum game ( Fenchel game ) g ( x , y ) := � x , y � − f ∗ ( y ) . V ∗ := min def Fenchel y ∈Y � x , y � − f ∗ ( y ) x ∈X max y ∈Y g ( x , y ) = min x ∈X max = x ∈X f ( x ) . min Equivalent to solving the underlying optimization problem! If ( ˆ x , ˆ y ) is an ǫ -equilibrium of the game, then f (ˆ x ) ≤ min f ( x ) + ǫ. x Jun-Kun Wang and Jacob Abernethy Acceleration through Optimistic No-Regret Dynamics

  7. Meta algorithm for Fenchel-game Algorithm 0 Meta Algorithm 1: Given a sequence of weights { α t } . 2: for t = 1 , 2 , . . . , T do y t := OnlineAlgorithm Y ( g ( x 1 , · ) , . . . , g ( x t − 1 , · ) . 3: x t := OnlineAlgorithm X ( g ( · , y 1 ) , . . . , g ( · , y t − 1 ) , g ( · , y t )) . 4: y -player‘s loss function: α t ℓ t ( y ) := α t ( f ∗ ( y ) − � x t , y � ) . 5: x -player‘s loss function: α t h t ( x ) := α t ( � x , y t � − f ∗ ( y t )) . 6: 7: end for � � T � � T s = 1 α s x s s = 1 α s y s 8: Output (¯ x T , ¯ y T ) := , . A T A T Let x ∗ = arg min x f ( x ) . � T � T α -R EG y := t = 1 α t ℓ t ( y t ) − min y ∈Y t = 1 α t ℓ t ( y ) (2) T T � � α -R EG x α t h t ( x ∗ ) := α t h t ( x t ) − (3) t = 1 t = 1 Jun-Kun Wang and Jacob Abernethy Acceleration through Optimistic No-Regret Dynamics

  8. Meta algorithm for Fenchel-game Algorithm 0 Meta Algorithm 1: Given a sequence of weights { α t } . 2: for t = 1 , 2 , . . . , T do y t := OnlineAlgorithm Y ( g ( x 1 , · ) , . . . , g ( x t − 1 , · ) . 3: x t := OnlineAlgorithm X ( g ( · , y 1 ) , . . . , g ( · , y t − 1 ) , g ( · , y t )) . 4: y -player‘s loss function: α t ℓ t ( y ) := α t ( f ∗ ( y ) − � x t , y � ) . 5: x -player‘s loss function: α t h t ( x ) := α t ( � x , y t � − f ∗ ( y t )) . 6: 7: end for � � T � � T s = 1 α s x s s = 1 α s y s 8: Output (¯ x T , ¯ y T ) := , . A T A T , A T := � T Define the weighted average regret α -R EG := α -R EG t = 1 α t . A T x T ) ≤ min x f ( x ) + α -R EG x + α -R EG y Theorem: f (¯ . A T A T Jun-Kun Wang and Jacob Abernethy Acceleration through Optimistic No-Regret Dynamics

  9. Nesterov’s 1983 accelerated method (Unconstrained Optimization: min x ∈ R n f ( x )) ) Algorithm 1 Nesterov’s method from the Meta Algorithm 1: Given the sequence of weights { α t = t } . 2: for t = 1 , 2 , . . . , T do y -player plays Optimistic-FTL . 3: � t − 1 y t ← ∇ f ( � x t ) = arg min y ∈Y s = 1 α s ℓ s ( y ) + m t ( y ) , A t ( α t x t − 1 + � t − 1 x t := 1 � where m t ( y ) = α t ℓ t − 1 ( y ) and s = 1 α s x s ) . 4: x -player plays Gradient Descent . x t = x t − 1 − γ t α t ∇ h t ( x ) = x t − 1 − γ t α t y t = x t − 1 − γ t α t ∇ f ( � x t ) . 5: 6: end for � � T � � T s = 1 α s x s s = 1 α s y s 7: Output (¯ x T , ¯ y T ) := , . A T A T x t − 1 x t + 1 ) + ( t − 1 ¯ x t + 1 = ¯ 4 L ∇ f ( � t + 2 )(¯ x t − ¯ x t − 1 ) . Jun-Kun Wang and Jacob Abernethy Acceleration through Optimistic No-Regret Dynamics

  10. Other accelerated variants (Constrained Optimization: min x ∈K f ( x )) ) Algorithm 2 Nesterov‘s method from the Meta Algorithm 1: Given the sequence of weights { α t = t } . 2: for t = 1 , 2 , . . . , T do y -player plays Optimistic-FTL . 3: � t − 1 y t ← ∇ f ( � x t ) = arg min y ∈Y s = 1 α s ℓ s ( y ) + m t ( y ) , A t ( α t x t − 1 + � t − 1 x t := 1 where m t ( y ) = α t ℓ t − 1 ( y ) and � s = 1 α s x s ) . (A) x -player plays Mirror Descent . 4: x t = arg min x ∈K γ t � x , α t y t � + V x t − 1 ( x ) . 5: Or, (B) x -player plays Be-The-Regularized-Leader . 6: � t s = 1 θ s � x , α s y s � + 1 x t = arg min x ∈K η R ( x ) , 7: 8: end for � � T � � T s = 1 α s x s s = 1 α s y s 9: Output (¯ x T , ¯ y T ) := . , A T A T (A) Nesterov’s 1988 (1-memory) and (B) Nesterov’s 2005 ( ∞ -memory) accelerated method Jun-Kun Wang and Jacob Abernethy Acceleration through Optimistic No-Regret Dynamics

  11. Heavy Ball method (Unconstrained Optimization: min x ∈ R n f ( x )) ) Algorithm 3 Heavy Ball from the Meta Algorithm 1: Given the sequence of weights { α t = t } . 2: for t = 1 , 2 , . . . , T do y -player plays FTL . 3: � t − 1 � t − 1 s = 1 α s x s y t ← ∇ f (¯ s = 1 α s ℓ s ( y ) ¯ x t − 1 ) = arg min y ∈Y x t − 1 := A t − 1 x -player plays Gradient Descent . 4: x t = x t − 1 − γ t α t ∇ h t ( x ) = x t − 1 − γ t α t y t = x t − 1 − γ t α t ∇ f ( � x t ) . 5: 6: end for � � T � � T s = 1 α s x s s = 1 α s y s 7: Output (¯ x T , ¯ y T ) := . , A T A T x t − 1 − γ t α 2 x t − 1 ) + ( α t A t − 2 x t = ¯ ¯ A t ∇ f (¯ A t α t − 1 )(¯ x t − 1 − ¯ x t − 2 ) . (Heavy ball) t x t − 1 − γ t α 2 x t ) + ( α t A t − 2 ¯ x t = ¯ A t α t − 1 )(¯ x t − 1 − ¯ A t ∇ f ( � x t − 2 ) . (Nesterov‘s alg.) t Jun-Kun Wang and Jacob Abernethy Acceleration through Optimistic No-Regret Dynamics

  12. Analysis: L -smooth convex optimization problems y -player plays Optimistic-FTL � t − 1 y t ← ∇ f ( � x t ) = arg min y ∈Y s = 1 α s ℓ s ( y ) + α t ℓ t − 1 ( y ) α -R EG y := � T � T t = 1 α t ℓ t ( y ) ≤ � T L α 2 A t � x t − 1 − x t � 2 . t = 1 α t ℓ t ( y t ) − min t t = 1 y ∈Y x -player plays MirrorDescent x t = arg min x ∈K γ ′ t �∇ f ( � x t ) , x � + V x t − 1 ( x ) α -R EG x := � T t = 1 α t h t ( x t ) − � T γ T − � T t = 1 α t h t ( x ∗ ) ≤ D 2 γ t � x t − 1 − x t � 2 . 1 t = 1 where D is a constant such that V x t ( x ∗ ) ≤ D for all t . � 2 γ t ) � x t − 1 − x t � 2 � γ T + � T ✭ t = 1 ✭✭✭✭✭✭✭✭✭✭ ( α 2 1 D 1 = O ( LD f (¯ x T ) − min x ∈X f ( x ) ≤ A t L − T 2 ) . t A T 1 1 as long as γ t satisfying CL ≤ γ t ≤ 4 L for some constant C > 4. Jun-Kun Wang and Jacob Abernethy Acceleration through Optimistic No-Regret Dynamics

  13. Thank you! Other instances of the meta-algorithm Accelerated linear rate of Nesterov’s method for strongly convex and smooth problems Accelerated Proximal Method Accelerated Frank-Wolfe Come to our poster #156! Jun-Kun Wang and Jacob Abernethy Acceleration through Optimistic No-Regret Dynamics

Recommend


More recommend