optimal algorithms for online convex optimization with
play

Optimal Algorithms for Online Convex Optimization with Multi-Point - PowerPoint PPT Presentation

Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback Alekh Agarwal Ofer Dekel Lin Xiao UC Berkeley Microsoft Research Online Convex Optimization (Full-Info) Adversary Player Online Convex Optimization


  1. Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback Alekh Agarwal Ofer Dekel Lin Xiao UC Berkeley Microsoft Research

  2. Online Convex Optimization (Full-Info) Adversary Player

  3. Online Convex Optimization (Full-Info) Adversary Player x 1 K x 1

  4. Online Convex Optimization (Full-Info) Adversary Player x 1 ℓ 1 K x 1

  5. Online Convex Optimization (Full-Info) Player updates x t +1 = Π K ( x t − η ∇ ℓ t ( x t )). Adversary Player x 1 ℓ 1 ∇ ℓ 1 ( x 1 ) K x 1 x 2

  6. Online Convex Optimization (Full-Info) Adversary Player x 1 ℓ 1 x T ℓ T K x T x 1 x 2 Minimize regret: R T = � T � T t =1 ℓ t ( x t ) − min x ∈K t =1 ℓ t ( x ).

  7. Bandit Convex Optimization Adversary Player

  8. Bandit Convex Optimization Adversary Player x 1 K x 1

  9. Bandit Convex Optimization Adversary Player x 1 ℓ 1 ( x 1 ) ℓ 1 K x 1

  10. Bandit Gradient Descent [FKM’05] Adversary Player x 1 Full−Info K x 1

  11. Bandit Gradient Descent [FKM’05] Adversary Player y 1 x 1 Full−Info K y 1 x 1

  12. Bandit Gradient Descent [FKM’05] Adversary Player y 1 ℓ 1 ( y 1 ) x 1 Full−Info K y 1 ℓ 1 x 1

  13. Bandit Gradient Descent [FKM’05] Updates x t +1 = Π (1 − ξ ) K ( x t − η t g t ). Adversary Player y 1 ℓ 1 ( y 1 ) x 1 g 1 Full−Info K y 1 ℓ 1 x 1 Minimize regret: R T = � T � T t =1 ℓ t ( y t ) − min x ∈K t =1 ℓ t ( x ).

  14. A survey of known regret bounds Linear Convex Strongly Convex Upper Lower Upper Lower Upper Lower √ √ √ √ Full-Info O ( T ) O ( T ) O ( T ) O ( T ) O (log T ) O (log T ) Deterministic results against completely adaptive adversaries in Full-Info.

  15. A survey of known regret bounds Linear Convex Strongly Convex Upper Lower Upper Lower Upper Lower √ √ √ √ Full-Info O ( T ) O ( T ) O ( T ) O ( T ) O (log T ) O (log T ) √ √ √ √ O ( T 3 / 4 ) O ( T 2 / 3 ) Bandit O ( T ) O ( T ) O ( T ) O ( T )? Deterministic results against completely adaptive adversaries in Full-Info. High probability results against adaptive adversaries for Bandit.

  16. The Multi-Point (MP) feedback setup Want to interpolate between bandit and full information. Player allowed several queries per round. Adversary reveals value of ℓ t at all points picked. Average regret on points played: T k 1 � � R T = ℓ t ( y t , i ) − min x ∈K ℓ t ( x ) . k t =1 i =1

  17. A survey of known regret bounds Linear Convex Strongly Convex Upper Lower Upper Lower Upper Lower √ √ √ √ Full-Info O ( T ) O ( T ) O ( T ) O ( T ) O (log T ) O (log T ) √ √ √ √ O ( T 3 / 4 ) O ( T 2 / 3 ) Bandit O ( T ) O ( T ) O ( T ) O ( T )? √ √ √ √ MP Bandit O ( T ) O ( T ) O ( T ) O ( T ) O (log T ) O (log T ) Deterministic results against completely adaptive adversaries in Full-Info. High probability results against adaptive adversaries for Bandit.

  18. Properties of gradient estimator g t [FKM’05] g t = d δ ℓ t ( x t + δ u t ) u t . Unbiased for linear functions. Nearly unbiased for general convex functions. ℓ t x t − δ x t + δ x t

  19. Properties of gradient estimator g t [FKM’05] g t = d δ ℓ t ( x t + δ u t ) u t . Unbiased for linear functions. Nearly unbiased for general convex functions. ℓ t ( x t + δ ) ℓ t 2 δℓ t ℓ t ( x t − δ ) x t − δ x t x t + δ Regret bounds scale with � g t � . � g t � grows as 1 /δ .

  20. Gradient Descent Algorithm with two queries per round (GD2P) g t = d Estimates gradient ˜ 2 δ ( ℓ t ( x t + δ u t ) − ℓ t ( x t − δ u t )) u t . Updates x t +1 = Π (1 − ξ ) K ( x t − η ˜ g t ) . Adversary Player x 1 Full−Info K x 1

  21. Gradient Descent Algorithm with two queries per round (GD2P) g t = d Estimates gradient ˜ 2 δ ( ℓ t ( x t + δ u t ) − ℓ t ( x t − δ u t )) u t . Updates x t +1 = Π (1 − ξ ) K ( x t − η ˜ g t ) . Adversary Player { y 1 , 1 , y 1 , 2 } x 1 Full−Info K y 1 , 1 x 1 y 1 , 2

  22. Gradient Descent Algorithm with two queries per round (GD2P) g t = d Estimates gradient ˜ 2 δ ( ℓ t ( x t + δ u t ) − ℓ t ( x t − δ u t )) u t . Updates x t +1 = Π (1 − ξ ) K ( x t − η ˜ g t ) . Adversary Player { y 1 , 1 , y 1 , 2 } { ℓ 1 ( y 1 , 1 ) , ℓ 1 ( y 1 , 2 ) } x 1 Full−Info K ℓ 1 y 1 , 1 x 1 y 1 , 2

  23. Gradient Descent Algorithm with two queries per round (GD2P) g t = d Estimates gradient ˜ 2 δ ( ℓ t ( x t + δ u t ) − ℓ t ( x t − δ u t )) u t . Updates x t +1 = Π (1 − ξ ) K ( x t − η ˜ g t ) . Adversary Player { y 1 , 1 , y 1 , 2 } { ℓ 1 ( y 1 , 1 ) , ℓ 1 ( y 1 , 2 ) } x 1 Full−Info g 1 ˜ K ℓ 1 y 1 , 1 x 1 y 1 , 2

  24. Properties of the gradient estimator ˜ g t g t = d g t = d δ ℓ t ( x t + δ u t ) u t , ˜ 2 δ ( ℓ t ( x t + δ u t ) − ℓ t ( x t − δ u t )) u t . Identical to g t in expectation, E ˜ g t = E g t . Bounded norm � ˜ g t � ≤ dG . g t � = d � ˜ 2 δ � ( ℓ t ( x t + δ u t ) − ℓ t ( x t − δ u t )) u t �

  25. Properties of the gradient estimator ˜ g t g t = d g t = d δ ℓ t ( x t + δ u t ) u t , ˜ 2 δ ( ℓ t ( x t + δ u t ) − ℓ t ( x t − δ u t )) u t . Identical to g t in expectation, E ˜ g t = E g t . Bounded norm � ˜ g t � ≤ dG . g t � = d � ˜ 2 δ � ( ℓ t ( x t + δ u t ) − ℓ t ( x t − δ u t )) u t � = d 2 δ | ℓ t ( x t + δ u t ) − ℓ t ( x t − δ u t ) |

  26. Properties of the gradient estimator ˜ g t g t = d g t = d δ ℓ t ( x t + δ u t ) u t , ˜ 2 δ ( ℓ t ( x t + δ u t ) − ℓ t ( x t − δ u t )) u t . Identical to g t in expectation, E ˜ g t = E g t . Bounded norm � ˜ g t � ≤ dG . g t � = d � ˜ 2 δ � ( ℓ t ( x t + δ u t ) − ℓ t ( x t − δ u t )) u t � = d 2 δ | ℓ t ( x t + δ u t ) − ℓ t ( x t − δ u t ) | ≤ dG 2 δ � 2 δ u t � = Gd .

  27. Regret analysis for gradient descent with two queries Bounded non-empty set: r B ⊆ K ⊆ D B . Lipschitz loss functions: | ℓ t ( x ) − ℓ t ( y ) | ≤ G � x − y � , ∀ x , y ∈ K , ∀ t . σ t -strong convexity: ℓ t ( y ) ≥ ℓ t ( x ) + �∇ ℓ t ( x ) , y − x � + σ t 2 � x − y � 2 . Theorem Under above assumptions, let σ 1 > 0 . If the GD2P algorithm is σ 1: t , δ = log T 1 and ξ = δ run with η t = r , then for any x ∈ K , T T T 1 � � 2( ℓ t ( y t , 1 ) + ℓ t ( y t , 2 )) − E ℓ t ( x ) ≤ E t =1 t =1 T d 2 G 2 1 � � 3 + D � + G log( T ) . 2 σ 1: t r t =1

  28. Regret bound for convex, Lipschitz functions Corollary Suppose the set K is bounded and non-empty, and ℓ t is convex, G 1 Lipschitz for all t. If the GD2P algorithm is run with η t = T , √ δ = log T and ξ = δ r , then T T T 1 � � E 2( ℓ t ( y t , 1 ) + ℓ t ( y t , 2 )) − min x ∈K E ℓ t ( x ) ≤ t =1 t =1 √ � � 3 + D ( d 2 G 2 + D 2 ) T + G log( T ) . r Optimal due to matching lower bound in full-information setup. Bound also holds with high probability for adaptive adversaries.

  29. Regret bound for strongly convex, Lipschitz functions Corollary Suppose the set K is bounded and non-empty, and ℓ t is σ -strongly convex, G Lipschitz for all t. If the GD2P algorithm is run with σ t , δ = log T 1 and ξ = δ η t = r , then T T T 1 � � 2( ℓ t ( y t , 1 ) + ℓ t ( y t , 2 )) − min ℓ t ( x ) ≤ E x ∈K E t =1 t =1 � d 2 G � + 3 + D G log( T ) . σ r Optimal due to matching lower bound in full-information setup.

  30. Extension to other gradient estimators Bounded exploration (BE): � x t − y i , t � ≤ δ . Bounded gradient estimator (BG): � ˜ g t � ≤ G 1 . Approximately unbiased (AU): � E t ˜ g t − ∇ ℓ t ( x t ) � ≤ c δ . Theorem Let K be bounded, non-empty and ℓ t be σ t -strongly convex with for σ 1 > 0 . For any gradient estimator satisfying above conditions, the regret of GD2P algorithm is bounded as: T T 1 � � 2( ℓ t ( y t , 1 ) + ℓ t ( y t , 2 )) − E ℓ t ( x ) ≤ E t =1 t =1 T G 2 1 � 1 + 2 c + D � 1 � + G log( T ) . 2 σ 1: t r t =1

  31. Analysis of other estimators for smooth functions Need to establish conditions (BE), (BG) and (AU). Smoothness assumption: ℓ t ( y ) ≤ ℓ t ( x ) + �∇ ℓ t ( x ) , y − x � + L 2 � x − y � 2 . Examples: Squared ℓ p norm � x − θ � 2 p for p ≥ 2. Quadratic loss ( y − w T x ) 2 for bounded x . Logistic loss log(1 + exp( − w T x )). ℓ ( x )

  32. A Randomized Co-ordinate Descent algorithm Pick a co-ordinate i t ∈ { i , . . . , d } u.a.r. Play y t , 1 = x t + δ e i t , y t , 2 = x t − δ e i t . g t = d Set ˜ 2 δ ( ℓ t ( y t , 1 ) − ℓ t ( y t , 2 )) e i t .

  33. A Randomized Co-ordinate Descent algorithm Pick a co-ordinate i t ∈ { i , . . . , d } u.a.r. Play y t , 1 = x t + δ e i t , y t , 2 = x t − δ e i t . g t = d Set ˜ 2 δ ( ℓ t ( y t , 1 ) − ℓ t ( y t , 2 )) e i t . √ dL δ (AU) holds: � E t ˜ g t − ∇ ℓ t ( x t ) � ≤ . 4 Same regret bound as before, with 1-dimensional gradient updates.

Recommend


More recommend