Escaping Saddle Points in Constant Dimensional Spaces: an Agent-based Modeling Perspective Grant Schoenebeck, University of Michigan Fang-Yi Yu , Harvard University
Results • Analyze the convergence rate of a family of stochastic processes Evolutionary • Three related applications Game theory – Evolutionary game theory – Dynamics on social networks – Stochastic Gradient Descent Dynamics Stochastic on social Gradient networks Descent
Target Audience Evolutionary Game theory Dynamics Stochastic on social Gradient networks Descent
Target Audience Evolutionary Game theory Dynamics Stochastic on social Gradient networks Descent
Target Audience (still not-to-scale) Evolutionary Game Theory Dynamics on social networks Stochastic Gradient Descent
Outline • Escaping saddle point Evolutionary game theory Dynamics on social Stochastic Gradient Descent networks
Outline • Escaping saddle point • Case study: dynamics on social networks Evolutionary game theory Dynamics on social Stochastic Gradient Descent networks
Upper bounds and lower bounds ESCAPING SADDLE POINTS
Reinforced random walk with 𝐺 A discrete time stochastic process {𝑌 𝑙 : 𝑙 = 0, 1, … } in ℝ 𝑒 that admits the following representation, 𝑌 𝑙+1 − 𝑌 𝑙 = 1 𝑜 𝐺 𝑌 𝑙 + 𝑉 𝑙 1 𝑌 𝑙 𝑜 𝐺(𝑌 𝑙 ) 1 𝑜 𝑉 𝑙 𝑌 𝑙+1
Reinforced random walk with 𝐺 A discrete time stochastic process {𝑌 𝑙 : 𝑙 = 0, 1, … } in ℝ 𝑒 that admits the following representation, 𝑌 𝑙+1 − 𝑌 𝑙 = 1 𝑜 𝐺 𝑌 𝑙 + 𝑉 𝑙 • Expected difference (drift), 𝐺 𝑌 1 𝑌 𝑙 𝑜 𝐺(𝑌 𝑙 ) 1 𝑜 𝑉 𝑙 𝑌 𝑙+1
Reinforced random walk with 𝐺 A discrete time stochastic process {𝑌 𝑙 : 𝑙 = 0, 1, … } in ℝ 𝑒 that admits the following representation, 𝑌 𝑙+1 − 𝑌 𝑙 = 1 𝑜 𝐺 𝑌 𝑙 + 𝑉 𝑙 • Expected difference (drift), 𝐺 𝑌 1 𝑌 𝑙 • Unbiased noise (noise), 𝑉 𝑙 𝑜 𝐺(𝑌 𝑙 ) 1 𝑜 𝑉 𝑙 𝑌 𝑙+1
Reinforced random walk with 𝐺 A discrete time stochastic process {𝑌 𝑙 : 𝑙 = 0, 1, … } in ℝ 𝑒 that admits the following representation, 𝑌 𝑙+1 − 𝑌 𝑙 = 1 𝑜 𝐺 𝑌 𝑙 + 𝑉 𝑙 • Expected difference (drift), 𝐺 𝑌 1 𝑌 𝑙 • Unbiased noise (noise), 𝑉 𝑙 𝑜 𝐺(𝑌 𝑙 ) • Step size, 1/𝑜 1 𝑜 𝑉 𝑙 𝑌 𝑙+1
Examples A discrete time Markov process {𝑌 𝑙 : 𝑙 = 0, 1, … } in ℝ 𝑒 that admits the following representation, 𝑌 𝑙+1 − 𝑌 𝑙 = 1 𝑜 𝐺 𝑌 𝑙 + 𝑉 𝑙 • Agent based models with 𝑜 agents – Evolutionary games – Dynamics on social networks • Heuristic local search algorithms with uniform step size 1/𝑜
Node Dynamic on complete graphs [SY18] • Let 𝑔 𝑂𝐸 : 0,1 → [0,1] . 𝑜 agents interact on a complete graph • Each agent 𝑤 has an initial binary state 𝐷 0 ( v ) ∈ {0,1} • At round 𝑙 , • Pick a node 𝑤 uniformly at random −1 (1) 𝐷 𝑙 • Compute the fraction of opinion 1 , 𝑌 𝑙 = <- Complete graph 𝑜 • Update 𝐷 𝑙+1 (𝑤) to 1 w.p. 𝑔 𝑂𝐸 𝑌 𝑙 ; 0 o.w.
Node Dynamic Includes several existing dynamics Update functions • Voter model Voter Majority 3-Majority 1 • Iterative majority [Mossel et al 14] • Iterative 3-majority [Doerr et al 11] 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1
Node Dynamic Reinforced random walk on ℝ Node dynamic on complete graphs • • 𝑌 𝑙 be the fraction of nodes in state 1 Let 𝑔 𝑂𝐸 : 0,1 → [0,1] . There are 𝑜 agents on a complete graph at 𝑙 . • Each agent 𝑤 has an initial binary state 𝐷 0 (v) ∈ {0,1} • At round 𝑙 , • Pick a node 𝑤 uniformly at random • Compute the fraction of opinion 1 , 𝑌 𝑙 = −1 (1) 𝐷 𝑙 𝑜 • Update 𝐷 𝑙+1 (𝑤) to 1 w.p. 𝑔 𝑂𝐸 𝑌 𝑙 ; 0 o.w.
Node Dynamic Reinforced random walk on ℝ Node dynamic on complete graphs • • Let 𝑔 𝑂𝐸 : 0,1 → [0,1] . There are 𝑜 agents on 𝑌 𝑙 be the fraction of nodes in state 1 at 𝑙 . a complete graph • Given 𝑌 𝑙 , the expected number of nodes in • Each agent 𝑤 has an initial binary state state 1 after round 𝑙 , is 𝐷 0 (v) ∈ {0,1} 𝑂𝐸 𝑌 𝑙 − 𝑌 𝑙 ) . E[𝑜𝑌 𝑙+1 ∣ 𝑌 𝑙 ] = 𝑜𝑌 𝑙 + (𝑔 • At round 𝑙 , • Pick a node 𝑤 uniformly at random • Compute the fraction of opinion 1 , 𝑌 𝑙 = −1 (1) 𝐷 𝑙 𝑜 • Update 𝐷 𝑙+1 (𝑤) to 1 w.p. 𝑔 𝑂𝐸 𝑌 𝑙 ; 0 o.w.
Node Dynamic Reinforced random walk on ℝ Node dynamic on complete graphs • • Let 𝑔 𝑂𝐸 : 0,1 → [0,1] . There are 𝑜 agents on 𝑌 𝑙 be the fraction of nodes in state 1 at 𝑙 . a complete graph • Given 𝑌 𝑙 , the expected number of nodes in • Each agent 𝑤 has an initial binary state state 1 after round 𝑙 , is E[𝑜𝑌 𝑙+1 ∣ 𝑌 𝑙 ] = 𝑜𝑌 𝑙 + (𝑔 𝑂𝐸 𝑌 𝑙 − 𝑌 𝑙 ) . 𝐷 0 (v) ∈ {0,1} Updated to 1 from 1 • At round 𝑙 , • Pick a node 𝑤 uniformly at random • Compute the fraction of opinion 1 , 𝑌 𝑙 = −1 (1) 𝐷 𝑙 𝑜 • Update 𝐷 𝑙+1 (𝑤) to 1 w.p. 𝑔 𝑂𝐸 𝑌 𝑙 ; 0 o.w.
Node Dynamic Reinforced random walk on ℝ Node dynamic on complete graphs • • Let 𝑔 𝑂𝐸 : 0,1 → [0,1] . There are 𝑜 agents on 𝑌 𝑙 be the fraction of nodes in state 1 at 𝑙 . a complete graph 1 • E 𝑌 𝑙+1 𝑌 𝑙 − 𝑌 𝑙 = 𝑜 (𝑔 𝑂𝐸 𝑌 𝑙 − 𝑌 𝑙 ) . • Each agent 𝑤 has an initial binary state Drift 𝐺(𝑌 𝑙 ) 𝐷 0 (v) ∈ {0,1} • At round 𝑙 , • Pick a node 𝑤 uniformly at random • Compute the fraction of opinion 1 , 𝑌 𝑙 = −1 (1) 𝐷 𝑙 𝑜 • Update 𝐷 𝑙+1 (𝑤) to 1 w.p. 𝑔 𝑂𝐸 𝑌 𝑙 ; 0 o.w.
Node Dynamic Reinforced random walk on ℝ Node dynamic on complete graphs • • Let 𝑔 𝑂𝐸 : 0,1 → [0,1] . There are 𝑜 agents on 𝑌 𝑙 be the fraction of nodes in state 1 at 𝑙 . a complete graph 1 • 𝑌 𝑙+1 − 𝑌 𝑙 = 𝑔 𝑂𝐸 𝑌 𝑙 − 𝑌 𝑙 + 𝑉 𝑙 . • 𝑜 Each agent 𝑤 has an initial binary state Drift Noise 𝐷 0 (v) ∈ {0,1} • At round 𝑙 , • Pick a node 𝑤 uniformly at random • Compute the fraction of opinion 1 , 𝑌 𝑙 = −1 (1) 𝐷 𝑙 𝑜 • Update 𝐷 𝑙+1 (𝑤) to 1 w.p. 𝑔 𝑂𝐸 𝑌 𝑙 ; 0 o.w.
Question Given 𝐺 and 𝑉 , what is the limit of 𝑌 𝑙 for sufficiently large 𝑜 ? 𝑌 𝑙+1 − 𝑌 𝑙 = 1 𝑜 𝐺 𝑌 𝑙 + 𝑉 𝑙
Mean field approximation 𝑌 𝑙+1 − 𝑌 𝑙 = 1 𝑦 ′ = 𝐺(𝑦) 𝑜 (𝐺 𝑌 𝑙 + 𝑉 𝑌 𝑙 )
Mean field approximation 𝑙 If 𝑜 is large enough, for 𝑙 = 𝑃(𝑜) , 𝑌 𝑙 ≈ 𝑦 𝑜 by Wormald et al 95.
Regular point 𝑙 If 𝑜 is large enough, for 𝑙 = 𝑃(𝑜) , 𝑌 𝑙 ≈ 𝑦 𝑜 .
Fixed point, 𝑮 𝒚 ∗ = 𝟏 𝑙 If 𝑜 is large enough, for 𝑙 = 𝑃(𝑜) , 𝑌 𝑙 ≈ 𝑦 𝑜 .
Escaping non-attracting fixed point When can the process escape a non-attracting fixed point?
Escaping non-attracting fixed point When can the process escape a non-attracting fixed point? 1. Θ 𝑜 2. Θ(𝑜 log 𝑜) 3. Θ 𝑜 log 𝑜 4 4. Θ 𝑜 2
Escaping non-attracting fixed point When can the process escape a non-attracting fixed point? 1. Θ 𝑜 2. 2. 𝚰(𝒐 𝒎𝒑𝒉 𝒐) 3. Θ 𝑜 log 𝑜 4 4. Θ 𝑜 2
Lower bound Escaping saddle point region takes at least Ω(𝑜 log 𝑜) steps. 𝜗 𝑌 0 = 𝑦 ∗
Upper bound Escaping saddle point region takes at most O(𝑜 log 𝑜) steps. If 𝜗 reg 𝑌 0 = 𝑦 ∗ 𝑌 𝑈 , 𝑈 = 𝑃(𝑜 log 𝑜)
Upper bound Escaping saddle point region takes at most O(𝑜 log 𝑜) steps. If • Noise, 𝑉 𝑙 – Martingale difference 𝜗 reg – bounded 𝑌 0 = 𝑦 ∗ – Noisy (covariance matrix is large) • Expected difference, 𝐺 ∈ 𝒟 2 – 𝑦 ∗ is hyperbolic 𝑌 𝑈 , 𝑈 = 𝑃(𝑜 log 𝑜)
Gradient-like dynamics Converges to an attracting fixed-point region in O(𝑜 log 𝑜) steps. If • Noise, 𝑉 𝑙 – Martingale difference – bounded – Noisy • Expected difference, 𝐺 ∈ 𝒟 2 – Fixed points are hyperbolic – Potential function
Outline • Escaping saddle point Evolutionary game theory Dynamics on social Stochastic Gradient Descent networks
Outline • Escaping saddle point • Case study: dynamics on social networks Evolutionary game theory Dynamics on social Stochastic Gradient Descent networks
Dynamics on social networks (DIS)AGREEMENT IN PLANTED COMMUNITY NETWORKS
Echo chamber Beliefs are amplified through interactions in segregated systems
Echo chamber Beliefs are amplified through interactions in segregated systems
Echo chamber Beliefs are amplified through interactions in segregated systems Rich-get-richer Community structure
Question What is the consensus time given a rich-get-richer opinion formation and the level of intercommunity connectivity?
Recommend
More recommend