escaping saddle points in constant dimensional spaces an
play

Escaping Saddle Points in Constant Dimensional Spaces: an - PowerPoint PPT Presentation

Escaping Saddle Points in Constant Dimensional Spaces: an Agent-based Modeling Perspective Grant Schoenebeck, University of Michigan Fang-Yi Yu , Harvard University Results Analyze the convergence rate of a family of stochastic processes


  1. Escaping Saddle Points in Constant Dimensional Spaces: an Agent-based Modeling Perspective Grant Schoenebeck, University of Michigan Fang-Yi Yu , Harvard University

  2. Results • Analyze the convergence rate of a family of stochastic processes Evolutionary • Three related applications Game theory – Evolutionary game theory – Dynamics on social networks – Stochastic Gradient Descent Dynamics Stochastic on social Gradient networks Descent

  3. Target Audience Evolutionary Game theory Dynamics Stochastic on social Gradient networks Descent

  4. Target Audience Evolutionary Game theory Dynamics Stochastic on social Gradient networks Descent

  5. Target Audience (still not-to-scale) Evolutionary Game Theory Dynamics on social networks Stochastic Gradient Descent

  6. Outline • Escaping saddle point Evolutionary game theory Dynamics on social Stochastic Gradient Descent networks

  7. Outline • Escaping saddle point • Case study: dynamics on social networks Evolutionary game theory Dynamics on social Stochastic Gradient Descent networks

  8. Upper bounds and lower bounds ESCAPING SADDLE POINTS

  9. Reinforced random walk with 𝐺 A discrete time stochastic process {𝑌 𝑙 : 𝑙 = 0, 1, … } in ℝ 𝑒 that admits the following representation, 𝑌 𝑙+1 − 𝑌 𝑙 = 1 𝑜 𝐺 𝑌 𝑙 + 𝑉 𝑙 1 𝑌 𝑙 𝑜 𝐺(𝑌 𝑙 ) 1 𝑜 𝑉 𝑙 𝑌 𝑙+1

  10. Reinforced random walk with 𝐺 A discrete time stochastic process {𝑌 𝑙 : 𝑙 = 0, 1, … } in ℝ 𝑒 that admits the following representation, 𝑌 𝑙+1 − 𝑌 𝑙 = 1 𝑜 𝐺 𝑌 𝑙 + 𝑉 𝑙 • Expected difference (drift), 𝐺 𝑌 1 𝑌 𝑙 𝑜 𝐺(𝑌 𝑙 ) 1 𝑜 𝑉 𝑙 𝑌 𝑙+1

  11. Reinforced random walk with 𝐺 A discrete time stochastic process {𝑌 𝑙 : 𝑙 = 0, 1, … } in ℝ 𝑒 that admits the following representation, 𝑌 𝑙+1 − 𝑌 𝑙 = 1 𝑜 𝐺 𝑌 𝑙 + 𝑉 𝑙 • Expected difference (drift), 𝐺 𝑌 1 𝑌 𝑙 • Unbiased noise (noise), 𝑉 𝑙 𝑜 𝐺(𝑌 𝑙 ) 1 𝑜 𝑉 𝑙 𝑌 𝑙+1

  12. Reinforced random walk with 𝐺 A discrete time stochastic process {𝑌 𝑙 : 𝑙 = 0, 1, … } in ℝ 𝑒 that admits the following representation, 𝑌 𝑙+1 − 𝑌 𝑙 = 1 𝑜 𝐺 𝑌 𝑙 + 𝑉 𝑙 • Expected difference (drift), 𝐺 𝑌 1 𝑌 𝑙 • Unbiased noise (noise), 𝑉 𝑙 𝑜 𝐺(𝑌 𝑙 ) • Step size, 1/𝑜 1 𝑜 𝑉 𝑙 𝑌 𝑙+1

  13. Examples A discrete time Markov process {𝑌 𝑙 : 𝑙 = 0, 1, … } in ℝ 𝑒 that admits the following representation, 𝑌 𝑙+1 − 𝑌 𝑙 = 1 𝑜 𝐺 𝑌 𝑙 + 𝑉 𝑙 • Agent based models with 𝑜 agents – Evolutionary games – Dynamics on social networks • Heuristic local search algorithms with uniform step size 1/𝑜

  14. Node Dynamic on complete graphs [SY18] • Let 𝑔 𝑂𝐸 : 0,1 → [0,1] . 𝑜 agents interact on a complete graph • Each agent 𝑤 has an initial binary state 𝐷 0 ( v ) ∈ {0,1} • At round 𝑙 , • Pick a node 𝑤 uniformly at random −1 (1) 𝐷 𝑙 • Compute the fraction of opinion 1 , 𝑌 𝑙 = <- Complete graph 𝑜 • Update 𝐷 𝑙+1 (𝑤) to 1 w.p. 𝑔 𝑂𝐸 𝑌 𝑙 ; 0 o.w.

  15. Node Dynamic Includes several existing dynamics Update functions • Voter model Voter Majority 3-Majority 1 • Iterative majority [Mossel et al 14] • Iterative 3-majority [Doerr et al 11] 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1

  16. Node Dynamic Reinforced random walk on ℝ Node dynamic on complete graphs • • 𝑌 𝑙 be the fraction of nodes in state 1 Let 𝑔 𝑂𝐸 : 0,1 → [0,1] . There are 𝑜 agents on a complete graph at 𝑙 . • Each agent 𝑤 has an initial binary state 𝐷 0 (v) ∈ {0,1} • At round 𝑙 , • Pick a node 𝑤 uniformly at random • Compute the fraction of opinion 1 , 𝑌 𝑙 = −1 (1) 𝐷 𝑙 𝑜 • Update 𝐷 𝑙+1 (𝑤) to 1 w.p. 𝑔 𝑂𝐸 𝑌 𝑙 ; 0 o.w.

  17. Node Dynamic Reinforced random walk on ℝ Node dynamic on complete graphs • • Let 𝑔 𝑂𝐸 : 0,1 → [0,1] . There are 𝑜 agents on 𝑌 𝑙 be the fraction of nodes in state 1 at 𝑙 . a complete graph • Given 𝑌 𝑙 , the expected number of nodes in • Each agent 𝑤 has an initial binary state state 1 after round 𝑙 , is 𝐷 0 (v) ∈ {0,1} 𝑂𝐸 𝑌 𝑙 − 𝑌 𝑙 ) . E[𝑜𝑌 𝑙+1 ∣ 𝑌 𝑙 ] = 𝑜𝑌 𝑙 + (𝑔 • At round 𝑙 , • Pick a node 𝑤 uniformly at random • Compute the fraction of opinion 1 , 𝑌 𝑙 = −1 (1) 𝐷 𝑙 𝑜 • Update 𝐷 𝑙+1 (𝑤) to 1 w.p. 𝑔 𝑂𝐸 𝑌 𝑙 ; 0 o.w.

  18. Node Dynamic Reinforced random walk on ℝ Node dynamic on complete graphs • • Let 𝑔 𝑂𝐸 : 0,1 → [0,1] . There are 𝑜 agents on 𝑌 𝑙 be the fraction of nodes in state 1 at 𝑙 . a complete graph • Given 𝑌 𝑙 , the expected number of nodes in • Each agent 𝑤 has an initial binary state state 1 after round 𝑙 , is E[𝑜𝑌 𝑙+1 ∣ 𝑌 𝑙 ] = 𝑜𝑌 𝑙 + (𝑔 𝑂𝐸 𝑌 𝑙 − 𝑌 𝑙 ) . 𝐷 0 (v) ∈ {0,1} Updated to 1 from 1 • At round 𝑙 , • Pick a node 𝑤 uniformly at random • Compute the fraction of opinion 1 , 𝑌 𝑙 = −1 (1) 𝐷 𝑙 𝑜 • Update 𝐷 𝑙+1 (𝑤) to 1 w.p. 𝑔 𝑂𝐸 𝑌 𝑙 ; 0 o.w.

  19. Node Dynamic Reinforced random walk on ℝ Node dynamic on complete graphs • • Let 𝑔 𝑂𝐸 : 0,1 → [0,1] . There are 𝑜 agents on 𝑌 𝑙 be the fraction of nodes in state 1 at 𝑙 . a complete graph 1 • E 𝑌 𝑙+1 𝑌 𝑙 − 𝑌 𝑙 = 𝑜 (𝑔 𝑂𝐸 𝑌 𝑙 − 𝑌 𝑙 ) . • Each agent 𝑤 has an initial binary state Drift 𝐺(𝑌 𝑙 ) 𝐷 0 (v) ∈ {0,1} • At round 𝑙 , • Pick a node 𝑤 uniformly at random • Compute the fraction of opinion 1 , 𝑌 𝑙 = −1 (1) 𝐷 𝑙 𝑜 • Update 𝐷 𝑙+1 (𝑤) to 1 w.p. 𝑔 𝑂𝐸 𝑌 𝑙 ; 0 o.w.

  20. Node Dynamic Reinforced random walk on ℝ Node dynamic on complete graphs • • Let 𝑔 𝑂𝐸 : 0,1 → [0,1] . There are 𝑜 agents on 𝑌 𝑙 be the fraction of nodes in state 1 at 𝑙 . a complete graph 1 • 𝑌 𝑙+1 − 𝑌 𝑙 = 𝑔 𝑂𝐸 𝑌 𝑙 − 𝑌 𝑙 + 𝑉 𝑙 . • 𝑜 Each agent 𝑤 has an initial binary state Drift Noise 𝐷 0 (v) ∈ {0,1} • At round 𝑙 , • Pick a node 𝑤 uniformly at random • Compute the fraction of opinion 1 , 𝑌 𝑙 = −1 (1) 𝐷 𝑙 𝑜 • Update 𝐷 𝑙+1 (𝑤) to 1 w.p. 𝑔 𝑂𝐸 𝑌 𝑙 ; 0 o.w.

  21. Question Given 𝐺 and 𝑉 , what is the limit of 𝑌 𝑙 for sufficiently large 𝑜 ? 𝑌 𝑙+1 − 𝑌 𝑙 = 1 𝑜 𝐺 𝑌 𝑙 + 𝑉 𝑙

  22. Mean field approximation 𝑌 𝑙+1 − 𝑌 𝑙 = 1 𝑦 ′ = 𝐺(𝑦) 𝑜 (𝐺 𝑌 𝑙 + 𝑉 𝑌 𝑙 )

  23. Mean field approximation 𝑙 If 𝑜 is large enough, for 𝑙 = 𝑃(𝑜) , 𝑌 𝑙 ≈ 𝑦 𝑜 by Wormald et al 95.

  24. Regular point 𝑙 If 𝑜 is large enough, for 𝑙 = 𝑃(𝑜) , 𝑌 𝑙 ≈ 𝑦 𝑜 .

  25. Fixed point, 𝑮 𝒚 ∗ = 𝟏 𝑙 If 𝑜 is large enough, for 𝑙 = 𝑃(𝑜) , 𝑌 𝑙 ≈ 𝑦 𝑜 .

  26. Escaping non-attracting fixed point When can the process escape a non-attracting fixed point?

  27. Escaping non-attracting fixed point When can the process escape a non-attracting fixed point? 1. Θ 𝑜 2. Θ(𝑜 log 𝑜) 3. Θ 𝑜 log 𝑜 4 4. Θ 𝑜 2

  28. Escaping non-attracting fixed point When can the process escape a non-attracting fixed point? 1. Θ 𝑜 2. 2. 𝚰(𝒐 𝒎𝒑𝒉 𝒐) 3. Θ 𝑜 log 𝑜 4 4. Θ 𝑜 2

  29. Lower bound Escaping saddle point region takes at least Ω(𝑜 log 𝑜) steps. 𝜗 𝑌 0 = 𝑦 ∗

  30. Upper bound Escaping saddle point region takes at most O(𝑜 log 𝑜) steps. If 𝜗 reg 𝑌 0 = 𝑦 ∗ 𝑌 𝑈 , 𝑈 = 𝑃(𝑜 log 𝑜)

  31. Upper bound Escaping saddle point region takes at most O(𝑜 log 𝑜) steps. If • Noise, 𝑉 𝑙 – Martingale difference 𝜗 reg – bounded 𝑌 0 = 𝑦 ∗ – Noisy (covariance matrix is large) • Expected difference, 𝐺 ∈ 𝒟 2 – 𝑦 ∗ is hyperbolic 𝑌 𝑈 , 𝑈 = 𝑃(𝑜 log 𝑜)

  32. Gradient-like dynamics Converges to an attracting fixed-point region in O(𝑜 log 𝑜) steps. If • Noise, 𝑉 𝑙 – Martingale difference – bounded – Noisy • Expected difference, 𝐺 ∈ 𝒟 2 – Fixed points are hyperbolic – Potential function

  33. Outline • Escaping saddle point Evolutionary game theory Dynamics on social Stochastic Gradient Descent networks

  34. Outline • Escaping saddle point • Case study: dynamics on social networks Evolutionary game theory Dynamics on social Stochastic Gradient Descent networks

  35. Dynamics on social networks (DIS)AGREEMENT IN PLANTED COMMUNITY NETWORKS

  36. Echo chamber Beliefs are amplified through interactions in segregated systems

  37. Echo chamber Beliefs are amplified through interactions in segregated systems

  38. Echo chamber Beliefs are amplified through interactions in segregated systems Rich-get-richer Community structure

  39. Question What is the consensus time given a rich-get-richer opinion formation and the level of intercommunity connectivity?

Recommend


More recommend