on the convergence of no regret learning in selfish
play

On the Convergence of No-regret Learning in Selfish Routing ICML - PowerPoint PPT Presentation

( t ) Convergence of ( t ) Online learning in the routing game Convergence of On the Convergence of No-regret Learning in Selfish Routing ICML 2014 - Beijing Walid Krichene 1 Benjamin Drighs 2 Alexandre Bayen 3 UC Berkeley Ecole


  1. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ On the Convergence of No-regret Learning in Selfish Routing ICML 2014 - Beijing Walid Krichene 1 Benjamin Drighès 2 Alexandre Bayen 3 UC Berkeley Ecole Polytechnique June 23, 2014 1 walid@cs.berkeley.edu 2 benjamin.drighes@polytechnique.edu 3 bayen@berkeley.edu

  2. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Introduction Routing game: players choose routes. Population distributions: µ ( t ) ∈ ∆ P 1 × · · · × ∆ P K Nash equilibria: N µ ( t ) = 1 τ ≤ t µ ( τ ) → N . � Under no-regret dynamics, ¯ t Does µ ( t ) → N ?

  3. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Outline Online learning in the routing game 1 µ ( t ) Convergence of ¯ 2 Convergence of µ ( t ) 3

  4. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Routing game 0 1 5 4 6 2 3 Figure : Example network Directed graph ( V , E ) Population X k : paths P k

  5. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Routing game 0 1 5 4 6 2 3 Figure : Example network Directed graph ( V , E ) Population X k : paths P k Player x ∈ X k : distribution over paths π ( x ) ∈ ∆ P k , Population distribution over paths µ k ∈ ∆ P k , µ k = � X k π ( x ) dm ( x ) Loss on path p : ℓ k p ( µ )

  6. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Routing game 0 1 5 4 6 2 3 Figure : Example network Directed graph ( V , E ) Population X k : paths P k Player x ∈ X k : distribution over paths π ( x ) ∈ ∆ P k , Population distribution over paths µ k ∈ ∆ P k , µ k = � X k π ( x ) dm ( x ) Loss on path p : ℓ k p ( µ )

  7. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Online learning model π ( t ) ∈ ∆ P 1 Sample p ∼ π ( t ) Discover ℓ ( t ) ∈ [ 0 , 1 ] P 1 Update π ( t + 1 )

  8. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ The Hedge algorithm Hedge algorithm Update the distribution according to observed loss p e − η t ℓ k ( t ) π ( t + 1 ) ∝ π ( t ) p p

  9. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Nash equilibria Nash equilibrium µ ∈ N if ∀ k , ∀ p ∈ P k with positive mass, p ′ ( µ ) ∀ p ′ ∈ P k ℓ k p ( µ ) ≤ ℓ k How to compute Nash equilibria?

  10. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Nash equilibria Nash equilibrium µ ∈ N if ∀ k , ∀ p ∈ P k with positive mass, p ′ ( µ ) ∀ p ′ ∈ P k ℓ k p ( µ ) ≤ ℓ k How to compute Nash equilibria? Convex formulation

  11. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Nash equilibria Convex potential function � ( M µ ) e � V ( µ ) = c e ( u ) du 0 e V is convex. ∇ µ k V ( µ ) = ℓ k ( µ ) . Minimizer not unique. How do players find a Nash equilibrium? Iterative play.

  12. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Nash equilibria Convex potential function � ( M µ ) e � V ( µ ) = c e ( u ) du 0 e V is convex. ∇ µ k V ( µ ) = ℓ k ( µ ) . Minimizer not unique. How do players find a Nash equilibrium? Iterative play. Ideally: distributed, and has reasonable information requirements.

  13. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Assume sublinear regret dynamics Losses are in [ 0 , 1 ] . π ( t ) ( x ) , ℓ k ( µ ( t ) ) � � Expected loss is Discounted regret t ≤ T γ t ℓ k ( t ) � π ( t ) ( x ) , ℓ k ( µ ( t ) ) � � t ≤ T γ t − min p � p r ( T ) ( x ) = ¯ � t ≤ T γ t

  14. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Assume sublinear regret dynamics Losses are in [ 0 , 1 ] . π ( t ) ( x ) , ℓ k ( µ ( t ) ) � � Expected loss is Discounted regret t ≤ T γ t ℓ k ( t ) � π ( t ) ( x ) , ℓ k ( µ ( t ) ) � � t ≤ T γ t − min p � p r ( T ) ( x ) = ¯ � t ≤ T γ t Assumptions γ ( t ) > 0 γ ( t ) ↓ 0 � t γ ( t ) = ∞

  15. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Convergence to Nash equilibria Population regret 1 � r k ( T ) = r ( T ) ( x ) dm ( x ) ¯ ¯ m ( X k ) X k Convergence of averages to Nash equilibria If an update has sublinear population regret, then µ ( T ) = � t ≤ T γ t µ ( t ) / � ¯ t ≤ T γ t converges � � µ ( T ) , N T →∞ d lim ¯ = 0

  16. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Convergence to Nash equilibria Population regret 1 � r k ( T ) = r ( T ) ( x ) dm ( x ) ¯ ¯ m ( X k ) X k Convergence of averages to Nash equilibria If an update has sublinear population regret, then µ ( T ) = � t ≤ T γ t µ ( t ) / � ¯ t ≤ T γ t converges � � µ ( T ) , N T →∞ d lim ¯ = 0 Proof: show � µ ( T ) ) − V ( µ ∗ ) ≤ r k ( T ) V (¯ ¯ k Similar result in Blum et al. (2006)

  17. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Convergence of a dense subsequence Proposition Under any algorithm with sublinear discounted regret, a dense subsequence of ( µ ( t ) ) t converges to N Subsequence ( µ ( t ) ) t ∈T converges � t ∈T : t ≤ T γ t lim T →∞ = 1 � t ≤ T γ t

  18. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Convergence of a dense subsequence Proposition Under any algorithm with sublinear discounted regret, a dense subsequence of ( µ ( t ) ) t converges to N Subsequence ( µ ( t ) ) t ∈T converges � t ∈T : t ≤ T γ t lim T →∞ = 1 � t ≤ T γ t Proof. Absolute Cesàro convergence implies convergence of a dense subsequence.

  19. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Example: Hedge with learning rates γ τ p e − η t ℓ k ( t ) π ( t + 1 ) ∝ π ( t ) p p Regret bound Under Hedge with η t = γ t , ln π ( 0 ) t ≤ T γ 2 min ( x ) + c � t r ( T ) ( x ) ≤ ρ ¯ � t ≤ T γ t

  20. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Simulations 0 1 5 4 6 2 3 Figure : Example network

  21. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Simulations path p 0 = ( v 0 , v 4 , v 5 , v 1 ) 2 . 5 path p 1 = ( v 0 , v 4 , v 6 , v 1 ) τ →∞ µ 1( τ ) : Nash equilibrium p 1 lim path p 2 = ( v 0 , v 1 ) 2 p ( µ ( τ ) ) 1 . 5 ℓ 1 1 p 0 0 . 5 µ 1(0) : uniform 0 10 20 30 40 50 τ p 2 path p 3 = ( v 2 , v 4 , v 5 , v 3 ) 2 . 5 p 4 path p 4 = ( v 2 , v 4 , v 6 , v 3 ) path p 5 = ( v 2 , v 3 ) 2 µ 2(0) : uniform p ( µ ( τ ) ) 1 . 5 ℓ 2 p 3 1 0 . 5 τ →∞ µ 2( τ ) : Nash equilibrium 0 10 20 30 40 50 lim τ p 5 Figure : Path losses and strategies for the Hedge algorithm with γ τ = 1 / ( 10 + τ )

  22. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Sufficient conditions for convergence of ( µ ( t ) ) t µ ( t ) → N . Have ¯

  23. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Sufficient conditions for convergence of ( µ ( t ) ) t µ ( t ) → N . Have ¯ Sufficient condition If V ( µ ( t ) ) converges ( µ ( t ) need not converge), then V ( µ ( t ) ) → V ∗ µ ( t ) → N ( V is continuous, µ ( t ) ∈ ∆ compact)

  24. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Replicator dynamics Imagine an underlying continuous time. Updates happen at γ 1 , γ 1 + γ 2 , . . . γ 1 + γ 2 0 γ 1 . . . Figure : Underlying continuous time

  25. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Replicator dynamics Imagine an underlying continuous time. Updates happen at γ 1 , γ 1 + γ 2 , . . . γ 1 + γ 2 0 γ 1 . . . Figure : Underlying continuous time In the update equation µ ( t + 1 ) ∝ µ ( t ) p e − γ t ℓ p ( t ) , take γ t → 0 p We obtain the autonomous ODE: Replicator equation ∀ p ∈ P k , d µ k p = µ k �� ℓ k ( µ ) , µ k � − ℓ k � p ( µ ) (1) p dt Also in evolutionary game theory.

  26. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Replicator dynamics Replicator equation ∀ p ∈ P k , d µ k p = µ k ℓ k ( µ ) , µ k � − ℓ k � p ( p ( µ )) dt

  27. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Replicator dynamics Replicator equation ∀ p ∈ P k , d µ k p = µ k ℓ k ( µ ) , µ k � − ℓ k � p ( p ( µ )) dt Theorem (Fischer and Vöcking (2004)) Every solution of the ODE (1) converges to the set of its stationary points.

  28. µ ( t ) Convergence of µ ( t ) Online learning in the routing game Convergence of ¯ Replicator dynamics Replicator equation ∀ p ∈ P k , d µ k p = µ k ℓ k ( µ ) , µ k � − ℓ k � p ( p ( µ )) dt Theorem (Fischer and Vöcking (2004)) Every solution of the ODE (1) converges to the set of its stationary points. Proof: V is a Lyapunov function.

Recommend


More recommend