empirical evidence equilibria in stochastic games
play

Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout - PowerPoint PPT Presentation

Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout Georgia Institute of Technology Empirical-evidence Equilibria (EEEs) Agent 1 Nature Agent 2 At Nash equilibrium in a stochastic game, each agent is playing an optimal


  1. Empirical-evidence Equilibria in Stochastic Games Nicolas Dudebout Georgia Institute of Technology

  2. Empirical-evidence Equilibria (EEEs) Agent 1 Nature Agent 2 At Nash equilibrium in a stochastic game, each agent is playing an optimal strategy for a POMDP EEE approach: 0. Pick arbitrary strategies 1. Formulate simple but consistent models 2. Design strategies optimal w.r.t. models, then, back to 1. The fixed points are EEEs Example Asset management on the stock market 2

  3. Empirical-evidence Equilibria (EEEs) Agent 1 Nature Agent 2 At Nash equilibrium in a stochastic game, each agent is playing an optimal strategy for a POMDP EEE approach: 0. Pick arbitrary strategies 1. Formulate simple but consistent models 2. Design strategies optimal w.r.t. models, then, back to 1. The fixed points are EEEs Example Asset management on the stock market 2

  4. Empirical-evidence Equilibria (EEEs) Agent 1 Nature Agent 2 At Nash equilibrium in a stochastic game, each agent is playing an optimal strategy for a POMDP EEE approach: 0. Pick arbitrary strategies 1. Formulate simple but consistent models 2. Design strategies optimal w.r.t. models, then, back to 1. The fixed points are EEEs Example Asset management on the stock market 2

  5. Empirical-evidence Equilibria (EEEs) Agent 1 Nature Agent 2 Nature 1 At Nash equilibrium in a stochastic game, each agent is playing an optimal strategy for a POMDP EEE approach: 0. Pick arbitrary strategies 1. Formulate simple but consistent models 2. Design strategies optimal w.r.t. models, then, back to 1. The fixed points are EEEs Example Asset management on the stock market 2

  6. β€’ 𝜈 consistent with 𝜏 β€’ 𝜏 optimal w.r.t. 𝜈 Single-agent Setup 𝔽 𝜏 [ πœ€ 𝑒 𝑣(𝑦 𝑒 , 𝑏 𝑒 , 𝑑 𝑒 )] 𝑒=0 βˆ‘ ∞ 𝜏 Agent max 𝑑 Μ‚ Μ‚ Nature 3 𝑨 + ∼ 𝑛(𝑨) 𝑑 + ∼ 𝜈(𝑨)

  7. β€’ 𝜈 consistent with 𝜏 β€’ 𝜏 optimal w.r.t. 𝜈 Single-agent Setup 𝜏 πœ€ 𝑒 𝑣(𝑦 𝑒 , 𝑏 𝑒 , 𝑑 𝑒 )] 𝑒=0 βˆ‘ ∞ 𝔽 𝜏 [ max 𝑑 Μ‚ Μ‚ 𝑑 Nature 𝑏 ∼ 𝜏(β„Ž) 3 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑑) 𝑨 + ∼ 𝑛(𝑨) 𝑑 + ∼ 𝜈(𝑨)

  8. β€’ 𝜈 consistent with 𝜏 β€’ 𝜏 optimal w.r.t. 𝜈 Single-agent Setup max πœ€ 𝑒 𝑣(𝑦 𝑒 , 𝑏 𝑒 , 𝑑 𝑒 )] 𝑒=0 βˆ‘ ∞ 𝔽 𝜏 [ 𝜏 𝑑 Μ‚ Μ‚ 𝑑 𝑑 ∼ πœ‰(π‘₯) 𝑏 ∼ 𝜏(β„Ž) 3 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑑) π‘₯ + ∼ π‘œ(π‘₯, 𝑦, 𝑏) 𝑨 + ∼ 𝑛(𝑨) 𝑑 + ∼ 𝜈(𝑨)

  9. β€’ 𝜈 consistent with 𝜏 β€’ 𝜏 optimal w.r.t. 𝜈 Single-agent Setup 𝜏 πœ€ 𝑒 𝑣(𝑦 𝑒 , 𝑏 𝑒 , 𝑑 𝑒 )] 𝑒=0 βˆ‘ ∞ 𝔽 𝜏 [ max 𝑑 Μ‚ Model 𝑑 𝑑 ∼ πœ‰(π‘₯) 𝑏 ∼ 𝜏(β„Ž) 3 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑑) π‘₯ + ∼ π‘œ(π‘₯, 𝑦, 𝑏)

  10. β€’ 𝜈 consistent with 𝜏 β€’ 𝜏 optimal w.r.t. 𝜈 Single-agent Setup max πœ€ 𝑒 𝑣(𝑦 𝑒 , 𝑏 𝑒 , 𝑑 𝑒 )] 𝑒=0 βˆ‘ ∞ 𝔽 𝜏 [ 𝜏 𝑑 Μ‚ Μ‚ 𝑑 𝑑 ∼ πœ‰(π‘₯) 𝑏 ∼ 𝜏(𝑦, 𝑨) 3 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑑) π‘₯ + ∼ π‘œ(π‘₯, 𝑦, 𝑏) 𝑨 + ∼ 𝑛(𝑨) 𝑑 + ∼ 𝜈(𝑨)

  11. Single-agent Setup max πœ€ 𝑒 𝑣(𝑦 𝑒 , 𝑏 𝑒 , 𝑑 𝑒 )] 𝑒=0 βˆ‘ ∞ 𝔽 𝜏 [ 𝜏 𝑑 Μ‚ Μ‚ 𝑑 𝑑 ∼ πœ‰(π‘₯) 𝑏 ∼ 𝜏(𝑦, 𝑨) 3 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑑) π‘₯ + ∼ π‘œ(π‘₯, 𝑦, 𝑏) 𝑨 + ∼ 𝑛(𝑨) 𝑑 + ∼ 𝜈(𝑨) β€’ 𝜈 consistent with 𝜏 β€’ 𝜏 optimal w.r.t. 𝜈

  12. Depth- 𝑙 Consistency Binary stochastic process 𝑑 0100010001001010010110111010000111010101... Definition Two processes 𝑑 and Μ‚ 𝑑 are depth- 𝑙 consistent if they have the same 𝑙 characteristic 4 β€’ 0 characteristic: β„™[𝑑 = 0], β„™[𝑑 = 1] β€’ 1 characteristic: β„™[𝑑𝑑 + = 00], β„™[𝑑𝑑 + = 10], β„™[𝑑𝑑 + = 01], β„™[𝑑𝑑 + = 11] β€’ ... β€’ 𝑙 characteristic: probability of strings of length 𝑙 + 1

  13. Complete Picture Μ‚ observed signals 𝑨 contains the last 𝑙 Fix a depth 𝑙 ∈ β„• 𝑑 𝑑 ∼ 𝜈(𝑨) Μ‚ 𝑑 𝑑 ∼ πœ‰(π‘₯) 𝑏 ∼ 𝜏(𝑦, 𝑨) 5 𝑦 + ∼ 𝑔(𝑦, 𝑏, 𝑑) π‘₯ + ∼ π‘œ(π‘₯, 𝑦, 𝑏) 𝑨 + ∼ 𝑛 𝑙 (𝑨) 𝜈(𝑨 = (𝑑 1 , 𝑑 2 , … , 𝑑 𝑙 ))[𝑑 𝑙+1 ] = β„™ 𝜏 [𝑑 𝑒+1 = 𝑑 𝑙+1 | 𝑑 𝑒 = 𝑑 𝑙 , … , 𝑑 π‘’βˆ’π‘™+1 = 𝑑 1 ]

  14. (𝜏, 𝜈) is an πœ— empirical-evidence optimum ( πœ— EEO) for 𝑙 iff β€’ 𝜏 is πœ— optimal w.r.t. 𝜈 β€’ 𝜈 is depth- 𝑙 consistent with 𝜏 Empirical-evidence Optimality Definition Definition 6 (𝜏, 𝜈) is an empirical-evidence optimum (EEO) for 𝑙 iff β€’ 𝜏 is optimal w.r.t. 𝜈 β€’ 𝜈 is depth- 𝑙 consistent with 𝜏

  15. Empirical-evidence Optimality Definition Definition 6 (𝜏, 𝜈) is an empirical-evidence optimum (EEO) for 𝑙 iff β€’ 𝜏 is optimal w.r.t. 𝜈 β€’ 𝜈 is depth- 𝑙 consistent with 𝜏 (𝜏, 𝜈) is an πœ— empirical-evidence optimum ( πœ— EEO) for 𝑙 iff β€’ 𝜏 is πœ— optimal w.r.t. 𝜈 β€’ 𝜈 is depth- 𝑙 consistent with 𝜏

  16. β€’ Technical assumption insures ergodicity of 𝑑 β€’ π‘ˆ ∢ 𝜏 β€’ 𝜏 ∢ 𝒴 Γ— 𝒢 β†’ Ξ”(𝒝) is parametrized over a simplex β€’ Apply Brouwer’s fixed point theorem to π‘ˆ Existence Result Theorem For all 𝑙 and πœ— , there exists an πœ— EEO for 𝑙 Proof sketch π‘‘π‘π‘œπ‘‘π‘—π‘‘π‘’π‘“π‘œπ‘‘π‘§ βŸβ†β†β†β†β†β†β†’ 𝜈 πœ— π‘π‘žπ‘’π‘—π‘›π‘π‘šπ‘—π‘’π‘§ βŸβ†β†β†β†β†β†β†β†’ 𝜏 is continuous 7

  17. Existence Result Theorem For all 𝑙 and πœ— , there exists an πœ— EEO for 𝑙 Proof sketch π‘‘π‘π‘œπ‘‘π‘—π‘‘π‘’π‘“π‘œπ‘‘π‘§ βŸβ†β†β†β†β†β†β†’ 𝜈 πœ— π‘π‘žπ‘’π‘—π‘›π‘π‘šπ‘—π‘’π‘§ βŸβ†β†β†β†β†β†β†β†’ 𝜏 is continuous 7 β€’ Technical assumption insures ergodicity of 𝑑 β€’ π‘ˆ ∢ 𝜏 β€’ 𝜏 ∢ 𝒴 Γ— 𝒢 β†’ Ξ”(𝒝) is parametrized over a simplex β€’ Apply Brouwer’s fixed point theorem to π‘ˆ

  18. Multiagent Setup 𝑦 + 𝑑 2 Μ‚ Μ‚ 𝑨 + 𝑑 2 𝑑 1 Μ‚ Μ‚ 𝑨 + 𝑑 1 𝑦 + 8 π‘₯ + ∼ π‘œ(π‘₯, 𝑦 1 , 𝑏 1 , 𝑦 2 , 𝑏 2 ) (𝑑 1 , 𝑑 2 ) ∼ πœ‰(π‘₯) 1 ∼ 𝑔 1 (𝑦 1 , 𝑏 1 , 𝑑 1 ) 2 ∼ 𝑔 2 (𝑦 2 , 𝑏 2 , 𝑑 2 ) 𝑏 1 ∼ 𝜏 1 (𝑦 1 , 𝑨 1 ) 𝑏 2 ∼ 𝜏 2 (𝑦 2 , 𝑨 2 ) 1 ∼ 𝑛 1,𝑙 1 (𝑨 1 ) 2 ∼ 𝑛 2,𝑙 2 (𝑨 2 ) 𝑑 1 ∼ 𝜈 1 (𝑨 1 ) 𝑑 2 ∼ 𝜈 2 (𝑨 2 )

  19. Empirical-evidence Equilibrium Definition Theorem For all 𝑙 and πœ— , there exists an πœ— EEE for 𝑙 9 Strategies 𝜏 = (𝜏 1 , 𝜏 2 , … , 𝜏 𝑂 ) Models 𝜈 = (𝜈 1 , 𝜈 2 , … , 𝜈 𝑂 ) Depths 𝑙 = (𝑙 1 , 𝑙 2 , … , 𝑙 𝑂 ) (𝜏, 𝜈) is an empirical-evidence equilibrium (EEE) for 𝑙 iff β€’ for all 𝑗 , 𝜏 𝑗 is optimal w.r.t. 𝜈 𝑗 β€’ for all 𝑗 , 𝜈 𝑗 is depth- 𝑙 𝑗 consistent with 𝜏

  20. Learning Setup 1. Design strategies 𝜏 optimal w.r.t. models 𝜈 𝑗 ) = (1 βˆ’ 𝛽)𝜈 𝑒 𝑗 𝜈 𝑒+1 2. Formulate consistent models 𝜈 upd , then, back to 1. 0. Pick arbitrary depth- 0 models 𝜈 Stage cost π‘ž β‹… 𝑏 𝑗 Dynamic 𝑦 + 10 State holdings 𝑦 𝑗 ∈ {0 .. 𝑁} Action sell one, hold, or buy one 𝑏 𝑗 ∈ {βˆ’1, 0, 1} Signal price π‘ž ∈ { Low , High } 𝑗 = 𝑦 𝑗 + 𝑏 𝑗 Nature market trend 𝑐 ∈ { Bull , Bear } π‘₯ = (𝑐, π‘ž) 𝑗 + 𝛽(𝜈 𝑒 𝑗, upd βˆ’ 𝜈 𝑒

  21. Learning Results: Offline 0.6 𝑗 = 2 𝑗 = 1 𝑗 [High] Prediction 𝜈 𝑒 Time 𝑒 1 0.8 0.4 0 0.2 0 100 80 60 40 20 11

  22. Learning Results: Online 0.6 𝑗 = 2 𝑗 = 1 𝑗 [High] Prediction 𝜈 𝑒 Time 𝑒 1 0.8 0.4 0 0.2 0 100 80 60 40 20 11

  23. β€’ Endogenous model ( 𝑨 + ∼ 𝑛(𝑨, 𝑦, 𝑏) ) β€’ Quality of EEEs β€’ Learning EEEs Concluding Remarks Comparison with mean-field equilibria Future directions 12 β€’ Identical agents with a specific signal β€’ Depth- 0 model β€’ Large number of agents to recover Nash equilibrium

  24. Concluding Remarks Comparison with mean-field equilibria Future directions 12 β€’ Identical agents with a specific signal β€’ Depth- 0 model β€’ Large number of agents to recover Nash equilibrium β€’ Endogenous model ( 𝑨 + ∼ 𝑛(𝑨, 𝑦, 𝑏) ) β€’ Quality of EEEs β€’ Learning EEEs

Recommend


More recommend