On Adaptive Strategies and Convex Optimization Algorithms Joon Kwon joint work with Panayotis Mertikopoulos Institut de Math´ ematiques de Jussieu Universit´ e Pierre-et-Marie-Curie Paris, France Workshop on Algorithms and Dynamics for Games and Optimization Playa Blanca, Tongoy, Chile October 2013
Framework ( V , ∥ · ∥ ) a normed space of finite dimension and ( V ∗ , ∥ · ∥ ∗ ) its dual C ⊂ V a convex compact set Nature chooses a sequence u 1 , . . . , u n , . . . ∈ V ∗ ▶ choose x 1 ∈ C ▶ u 1 is revealed ▶ get payoff ⟨ u 1 | x 1 ⟩ . . . ▶ A stage n + 1, knowing u 1 , . . . , u n choose x n +1 ∈ C ▶ u n +1 is revealed ▶ get payoff ⟨ u n +1 | x n +1 ⟩ ( V ∗ ) n − → C σ = ( σ n ) n ⩾ 1 σ n +1 : ( u 1 . . . , u n ) �− → x n +1 n ∑ maximize ⟨ u k | x k ⟩ k =1
The Case of the simplex ▶ V = V ∗ = R d � { } d � ∑ x ∈ R d � ▶ C = ∆ d = x i = 1 prob. dist. on { 1 , . . . , d } ⇝ + � � i =1 ▶ Choose x n +1 ∈ ∆ d , ▶ Draw i n +1 ∈ { 1 , . . . , d } according to x n +1 , ▶ Get payoff u i n +1 n +1 . [ n ] n ∑ ∑ u i k = ⟨ u k | x k ⟩ E k k =1 k =1
The Regret Wish : A strategy σ such that: [ ( )] n n 1 ∑ ∑ ∀ ( u n ) n ⩾ 1 , lim sup max ⟨ u k | x ⟩ − ⟨ u k | x k ⟩ ⩽ 0 n x ∈ C n → + ∞ k =1 k =1 � �� � Regret Speed of convergence?
Extension to convex losses ▶ ℓ n : C − → R convex loss functions ▶ Loss: ℓ n ( x n ) n n n ∑ ∑ ∑ ℓ k ( x k ) − min ℓ k ( x ) = max ( ℓ k ( x k ) − ℓ k ( x )) x ∈ C x ∈ C k =1 k =1 k =1 n ∑ ⩽ max ⟨∇ ℓ k ( x k ) | x k − x ⟩ x ∈ C k =1 n n ∑ ∑ = max ⟨−∇ ℓ k ( x k ) | x ⟩ − ⟨−∇ ℓ k ( x k ) | x k ⟩ x ∈ C k =1 k =1 n n ∑ ∑ = max ⟨ u k | x ⟩ − ⟨ u k | x k ⟩ x ∈ C k =1 k =1 u n = −∇ ℓ n ( x n )
Convex optimization ▶ f : C − → R convex function ℓ n = f n n n 1 1 ℓ k ( x ) = 1 ∑ ∑ ∑ ℓ k ( x k ) − min f ( x k ) − min x ∈ C f ( x ) n n n x ∈ C k =1 k =1 k =1
A Family of strategies u 1 , u 2 , . . . , u n ∈ V ∗ ↓ n ∑ u k ∈ V ∗ k =1 ↓ ( n ) ∑ x n +1 = Q u k k =1 ( Q : V ∗ − → C )
V ∗ Q : − → C �− → arg max {⟨ y | x ⟩ − h ( x ) } y x ∈ C = - . . . . . argmax argmax h : C − → R convex ▶ continous ⇝ Q h ( y ) exists h max ▶ strictly convex ⇝ Q h ( y ) is h min unique . . . ( ) n ∑ x n +1 = Q h η n u k = Q h ( y n ) η n > 0 and ↘ k =1
Some known strategies and algorithms ▶ Exponential Weight Algorithm (EWA) ▶ 1 / √ n -Exponential Weight Algorithm (1 / √ n -EWA) ▶ Vanishingly Smooth Fictitious Play (VSFP) ▶ Smooth Fictitious Play (SFP) ▶ Projected Subgradient Method (PSM) ▶ Mirror Descent (MD) ▶ Online Gradient Descent (OGD) ▶ Online Mirror Descent (OMD) ▶ Follow the Regularized Leader (FRL)
Exponential Weight Algorithm ▶ C = ∆ d ( ) n ∑ exp η u k , i k =1 x n +1 , i = ) . ( d n ∑ ∑ exp η u k , j j =1 k =1 d e y i ∑ h ( x ) = x i log x i − → Q h ( y ) i = ∑ d j =1 e y j i =1 ( ) n ∑ x n +1 = Q h η u k k =1
Projected Subgradient Method ∥ x − y n ∥ 2 x n +1 = arg min 2 x ∈ C y n = − ∑ n k =1 γ k ∇ f ( x k ) { } ∥ x ∥ 2 2 − 2 ⟨ y n | x ⟩ + ∥ y n ∥ 2 = arg min 2 x n +1 = arg min ∥ x − y n ∥ 2 . x ∈ C x ∈ C { } ⟨ y n | x ⟩ − 1 2 ∥ x ∥ 2 = arg max 2 x ∈ C h ( x ) = 1 2 ∥ x ∥ 2 2 u n = − γ n ∇ f ( x n )
Name C h η n u n ∥ · ∥ References d Littlestone, Warmuth 1994 ∑ EW ∆ d x i log x i η – ∥ · ∥ 1 Sorin 2009 i =1 d 1 / √ n -EW η Auer, Cesa-Bianchi, ∑ ∆ d x i log x i √ n – ∥ · ∥ 1 Gentile 2002 i =1 η n α VSFP ∆ d any – ∥ · ∥ 1 Bena¨ ım, Faure 2013 α ∈ ( − 1 , 0) η Fudenberg, Levine 1995 SFP ∆ d any – ∥ · ∥ 1 Bena¨ ım, Hofbauer, Sorin 2006 n 1 2 ∥ · ∥ 2 PSM any 1 − γ n ∇ f ( x n ) ∥ · ∥ 2 Polyak 69? 2 Nemirovski, Yudin 1983 MD any any 1 − γ n ∇ f ( x n ) any Beck, Teboulle 2003 1 2 ∥ · ∥ 2 OGD any 1 − γ n ∇ f n ( x n ) ∥ · ∥ 2 Zinkevich 2003 2 OMD any any η −∇ f n ( x n ) any Shalev-Shwartz 2007 FRL any any η – any Shalev-Shwartz 2007
Interrelations . . . . . . . . . . • FRL MD OGD VSFP SFP 1 √ n -EW OMD PSM EW
The Continuous-Time Counterpart V ∗ R ∗ u : − → η : − → R + R + + meas. cont., ↘ t �− → u t t �− → η t ( ) n ∑ x n +1 = Q h η n u k k =1 ∫ t ( ) x t = Q h ˜ η t = Q h ( y t ) u s ds 0 Theorem ∀ ( u t ) t ∈ R + , ∫ t ∫ t x s ⟩ ds ⩽ h max − h min ∀ t ⩾ 0 , max ⟨ u s | x ⟩ ds − ⟨ u s | ˜ η t x ∈ C 0 0
The Analysis ∫ t ∫ t x s ⟩ ds ⩽ h max − h min max ⟨ u s | x ⟩ ds − ⟨ u s | ˜ η t x ∈ C 0 0 ∫ t ⟨ y t | x ⟩ ⩽ h ∗ ( y t ) ⟨ u s | x ⟩ ds = 1 + h ( x ) η t η t η t 0 ∫ t ( h ∗ ( y s ) ) ⩽ h ∗ (0) d ds + h max + η 0 ds η s η t 0 � �� � η s /η 2 ⩽ ⟨ u s | ˜ x s ⟩ + h min ˙ s ∫ t ( ) ⩽ − h min − 1 + 1 + h max + ⟨ u s | ˜ x s ⟩ ds + h min η 0 η t η 0 η t 0 ∫ t x s ⟩ ds + h max − h min ⟨ u s | ˜ ⩽ η t 0
Back to Discrete Time ∫ t ∫ t x s ⟩ ds ⩽ h max − h min max ⟨ u s | x ⟩ ds − ⟨ u s | ˜ η t x ∈ C 0 0 ( u n ) n ⩾ 1 , h , ( η n ) n ⩾ 1 u t = u ⌈ t ⌉ , η t cont. interp. of η n x n +1 = Q h ( y n ) n ∑ x t = Q h ( y t ) ˜ y n = η n u k ∫ t k =1 y t = η t u s ds 0 n n ∫ n ∫ n ∑ ∑ max ⟨ u k | x ⟩ − ⟨ u k | x k ⟩ ⩽ ? max ⟨ u t | x ⟩ dt − ⟨ u t | ˜ x t ⟩ dt x ∈ C x ∈ C 0 0 k =1 k =1 ∫ n ∫ n ⟨ ⟩ ⟨ u t | ˜ x t ⟩ dt u t | ˜ x ⌊ t ⌋ dt 0 0
� � � � � ⟩� �⟨ ⟩ �⟨ � = � ˜ − ⟨ u s | ˜ x s ⟩ � ˜ x ⌊ s ⌋ − ˜ u s x ⌊ s ⌋ u s x s � � � ⩽ ∥ u s ∥ ∗ � ˜ x ⌊ s ⌋ − ˜ x s � � �� � ⩽ 1 � � � Q h ( y ⌊ s ⌋ ) − Q h ( y s ) ⩽ � � � ⩽ K � y s − y ⌊ s ⌋ � ∗ � � ∫ s ∫ v � � � � η v u + ( − ˙ η v ) u v dv ⩽ K � � � � ⌊ s ⌋ 0 ∗ ⩽ K ( η s − s ˙ η s )
Q h = ∇ h ∗ ⇒ h 1 ∇ h ∗ K -Lipschitz ⇐ K -strongly convex Definition f is C -strongly convex wrt ∥ · ∥ if ∀ x , y , ∀ λ ∈ [0 , 1], f ( λ x + (1 − λ ) y ) ⩽ λ f ( x ) + (1 − λ ) f ( y ) − C 2 λ (1 − λ ) ∥ y − x ∥ 2 d x i log x i ∑ is 1-strongly convex wrt ∥ · ∥ 1 i =1 1 2 ∥ · ∥ 2 is 1-strongly convex wrt ∥ · ∥ 2 2
Theorem 1. h K-strongly convex on C wrt ∥ · ∥ 2. ( η n ) n ⩾ 1 positive and nonincreasing 3. η t a continuous and nonincresing interpolation ( ) n ∑ 4. x n +1 = Q h η n u k k =1 Then, for every sequence ∥ u n ∥ ∗ ⩽ M, ∫ n n n + M 2 ⟨ u k | x k ⟩ ⩽ h max − h min ∑ ∑ max ⟨ u k | x k ⟩ − ( η t − t ˙ η t ) dt . η n K x ∈ C 0 k =1 k =1
Name Assumption Bound on the regret log d EW ∥ u n ∥ ∞ ⩽ 1 + η n η ( log d ) √ n 1 / √ n -EW ∥ u n ∥ ∞ ⩽ 1 + 3 η η h max − h min n − α + η (1 − α ) C (1 + α ) n α +1 VSFP ∥ u n ∥ ∞ ⩽ 1 η h max − h min n + η (1 + log n ) SFP ∥ u n ∥ ∞ ⩽ 1 η K ∥ C ∥ 2 / 2 + M 2 ∑ n k =1 γ 2 k PSM ∥∇ f ∥ 2 ⩽ M ∑ n k =1 γ k h max − h min + M 2 / (2 K ) ∑ n k =1 γ 2 k MD ∥∇ f ∥ ∗ ⩽ M ∑ n k =1 γ k ∥ C ∥ 2 / 2 + M 2 ∑ n k =1 γ 2 k OGD ∥∇ f n ∥ 2 ⩽ M ∑ n k =1 γ k + η M 2 h max − h min OMD ∥∇ f n ∥ ∗ ⩽ M n η K + η M 2 h max − h min FRL ∥ u n ∥ ∗ ⩽ M n η K
Recommend
More recommend