counterfactual regret minimization and domination in
play

Counterfactual Regret Minimization and Domination in Extensive-Form - PowerPoint PPT Presentation

Counterfactual Regret Minimization and Domination in Extensive-Form Games Richard Gibson University of Alberta Edmonton, Alberta, Canada Overview Counterfactual Regret Minimization (CFR) Overview Counterfactual Regret Minimization (CFR)


  1. Counterfactual Regret Minimization and Domination in Extensive-Form Games Richard Gibson University of Alberta Edmonton, Alberta, Canada

  2. Overview Counterfactual Regret Minimization (CFR)

  3. Overview Counterfactual Regret Minimization (CFR) Provably solves for Nash equilibrium 2-Player Zero-Sum Extensive-Form Games

  4. Overview Counterfactual Regret Minimization (CFR) Provably solves for Seems to work well... Nash equilibrium 2-Player Zero-Sum Extensive-Form Extensive-Form Games, any Games Number of players

  5. Overview Counterfactual Regret Minimization (CFR) Provably solves for Seems to work well... Nash equilibrium 2-Player Zero-Sum Extensive-Form Extensive-Form Games, any Games Number of players Question : Why do CFR strategies work well in extensive-form games outside of the 2-player zero-sum case?

  6. Extensive-Form Games C QJ QK 0.5 0.5 1 1 c b c b 2 2 2 2 c b f c c b f c (1,-1) 1 (1,-1) (2,-2) (-1,1) -1 1 (1,-1) (-2,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2)

  7. Extensive-Form Games C QJ QK 0.5 0.5 1 1 c b c b 2 2 2 2 c b f c c b f c (1,-1) 1 (1,-1) (2,-2) (-1,1) -1 1 (1,-1) (-2,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2) Information sets group states that are indistinguishable to the player.

  8. Extensive-Form Games C QJ QK 0.5 0.5 1 1 0.4 0.6 0.4 0.6 2 2 2 2 0.7 0.3 0 1 0.9 0.1 1 0 (1,-1) 1 (1,-1) (2,-2) (-1,1) 1 (1,-1) (-2,2) 0.2 0.8 0.2 0.8 (-1,1) (2,-2) (-1,1) (-2,2) A strategy profile σ = ( σ 1 , σ 2 ) assigns a probability distribution over actions at each information set. Example: Probability player 1 checks is σ 1 ( Q?, c ) = 0.4 .

  9. Extensive-Form Games C QJ QK 0.5 0.5 1 1 0.4 0.6 0.4 0.6 2 2 2 2 0.7 0.3 0 1 0.9 0.1 1 0 (1,-1) 1 (1,-1) (2,-2) (-1,1) 1 (1,-1) (-2,2) 0.2 0.8 0.2 0.8 (-1,1) (2,-2) (-1,1) (-2,2) u i ( σ ) is the expected utility for player i , assuming players play according to σ .

  10. Counterfactual Regret Minimization (CFR) [Zinkevich et al ., NIPS 2007] ● CFR is an iterative algorithm that generates strategy profiles ( σ 1 , σ 2 , ... , σ T ) over many iterations T . ● Final output of CFR: σ AVG = Average( σ 1 , σ 2 , ... , σ T ). ● For 2-player zero-sum games, σ AVG is an ϵ -Nash equilbrium, with ϵ → 0 as T → ∞: AVG , σ 2 AVG ) ≥ max u 1 ( σ 1 * , σ 2 AVG ) - ϵ u 1 ( σ 1 * σ 1 AVG , σ 2 AVG ) ≥ max u 2 ( σ 1 AVG , σ 2 * ) - ϵ u 2 ( σ 1 σ 2 *

  11. Counterfactual Regret Minimization (CFR) ● Outside of 2-player zero-sum games, σ AVG is not necessarily an approximate Nash equilibrium [Abou Risk and Szafron, AAMAS 2010] . – A player may gain by deviating from σ AVG . ● In these games, a Nash equilibrium might not be the most appropriate solution concept anyways. ● On the other hand, σ AVG performs very well in practice...

  12. Annual Computer Poker Competition 3-Player Limit Hold'em - 2009 3-Player Limit Hold'em - 2011 Agent Instant Run-off: Round 0 Agent Instant Run-off: Round 0 Hyperborean-Eqm 319 ± 2 Sartre3p 243 ± 20 Hyperborean-BR 299 ± 2 Hyperborean-3p-limit-iro 204 ± 20 akuma 151 ± 2 LittleRock 113 ± 19 dpp 171 ± 2 AAIMontybot 96 ± 44 CMURingLimit -37 ± 2 dcubot3plr 77 ± 19 dcu3pl -63 ± 2 OwnBot -4 ± 30 Bluechip -548 ± 2 Bnold3 -91 ± 22 Entropy -108 ± 36 3-Player Limit Hold'em - 2010 player.zeta.3p -530 ± 33 Agent Instant Run-off: Round 0 Hyperborean.iro 144 ± 32 dcu3pl.tbr 98 ± 30 LittleRock 65 ± 35 Arnold3 -135 ± 39 Bender -172 ± 16

  13. Counterfactual Regret Minimization (CFR) ● In games with more than 2-players, σ AVG is a “good” strategy. Why? ● What properties make a strategy good in games with more than 2-players? ● We know what a bad strategy is...

  14. Domination C QJ QK 0.5 0.5 1 1 c b c b 2 2 2 2 c b f c c b f c (1,-1) 1 (1,-1) (2,-2) (-1,1) -1 1 (1,-1) (-2,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2)

  15. Domination C QJ QK 0.5 0.5 1 1 c b c b 2 2 2 2 c b 0 1 c b f c (1,-1) 1 (1,-1) (2,-2) (-1,1) -1 1 (1,-1) (-2,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2) J, c that always calls with the Jack when faced Consider any player 2 strategy σ 2 with a bet.

  16. Domination C QK QJ 0.5 0.5 1 1 c b c b 2 2 2 2 c b 0 1 c b f c (1,-1) 1 (1,-1) (2, -2 ) (-1,1) -1 1 (1,-1) (-2,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2) J, c ) = ... + 0.5 σ 1 ( Q?, b )1 ( -2 ) + ... u 2 ( σ 1 , σ 2

  17. Domination C QK QJ 0.5 0.5 1 1 c b c b 2 2 2 2 c b 1 0 c b f c (1,-1) 1 (1,-1) (2,-2) (-1,1) -1 1 (1,-1) (-2,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2) J, c ) = ... + 0.5 σ 1 ( Q?, b )1 ( -2 ) + ... u 2 ( σ 1 , σ 2 J, f . Now consider the same player 2 strategy, except always folds the J. Call it σ 2

  18. Domination C QK QJ 0.5 0.5 1 1 c b c b 2 2 2 2 c b 1 0 c b f c (1,-1) 1 (1, -1 ) (2,-2) (-1,1) -1 1 (1,-1) (-2,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2) J, c ) = ... + 0.5 σ 1 ( Q?, b )1 ( -2 ) + ... u 2 ( σ 1 , σ 2 J, f ) = '' + 0.5 σ 1 ( Q?, b )1 ( -1 ) + '' u 2 ( σ 1 , σ 2

  19. Domination C QK QJ 0.5 0.5 1 1 c b c b 2 2 2 2 c b 1 0 c b f c (1,-1) 1 (1, -1 ) (2,-2) (-1,1) -1 1 (1,-1) (-2,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2) J, c ) = ... + 0.5 σ 1 ( Q?, b )1 ( -2 ) + ... u 2 ( σ 1 , σ 2 J, f ) = '' + 0.5 σ 1 ( Q?, b )1 ( -1 ) + '' u 2 ( σ 1 , σ 2 J, c ) ≤ u 2 ( σ 1 , σ 2 J, f ) for all σ 1 . u 2 ( σ 1 , σ 2 J, c ) < u 2 ( σ 1 , σ 2 J, f ) if σ 1 ( Q?, b ) > 0 u 2 ( σ 1 , σ 2

  20. Domination C QJ QK 0.5 0.5 1 1 c b c b 2 2 2 2 c b f c c b f c (1,-1) 1 (1,-1) (2,-2) (-1,1) -1 1 (1,-1) (-2,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2) σ 2 is a dominated strategy if there exists σ 2 ' such that u 2 ( σ 1 , σ 2 , σ 3 , ... ) ≤ u 2 ( σ 1 , σ 2 ' , σ 3 , ... ) for all σ 1 , σ 3 , ... u 2 ( σ 1 , σ 2 , σ 3 , ... ) < u 2 ( σ 1 , σ 2 ' , σ 3 , ... ) for some σ 1 , σ 3 , ...

  21. Domination C QJ QK 0.5 0.5 1 1 c b c b 2 2 2 2 c b f c c b f c (1,-1) 1 (1,-1) (2,-2) (-1,1) -1 1 (1,-1) (-2,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2) J, c is dominated by σ 2 J, f σ 2 K, f is dominated by σ 2 K, c σ 2

  22. Domination C QJ QK 0.5 0.5 1 1 c b c b 2 2 2 2 c b f c c b f c (1,-1) 1 (1,-1) (2,-2) (-1,1) -1 1 (1,-1) (-2,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2) Define a dominated action to be an action such that any strategy that always plays that action is dominated (assuming that player plays to reach that action).

  23. Domination C QJ QK 0.5 0.5 1 1 c b c b 2 2 2 2 c b f c b c (1,-1) 1 (1,-1) (-1,1) -1 1 (-2,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2)

  24. Domination C QJ QK 0.5 0.5 1 1 0 1 0 1 2 2 2 2 c b f c b c (1,-1) 1 (1,-1) (-1,1) -1 1 (-2,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2) b that always bets. Consider the player 1 strategy σ 1

  25. Domination C QJ QK 0.5 0.5 1 1 0 1 0 1 2 2 2 2 c b 1 c b 1 (1,-1) 1 ( 1 ,-1) (-1,1) -1 1 ( -2 ,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2) b , σ 2 ) = 0.5 ( 1 )( 1 )( 1 ) + 0.5 ( 1 )( 1 )( -2 ) = -0.5 u 1 ( σ 1

  26. Domination C QJ QK 0.5 0.5 1 1 1 0 1 0 2 2 2 2 c b f c b c (1,-1) 1 (1,-1) (-1,1) -1 1 (-2,2) 0 1 0 1 (-1,1) (2,-2) (-1,1) (-2,2) b , σ 2 ) = 0.5 ( 1 )( 1 )( 1 ) + 0.5 ( 1 )( 1 )( -2 ) = -0.5 u 1 ( σ 1 cc that checks then calls. Now consider the player 1 strategy σ 1

  27. Domination C QJ QK 0.5 0.5 1 1 1 0 1 0 2 2 2 2 1 0 f 0 1 c ( 1 ,-1) 1 (1,-1) (-1,1) -1 1 (-2,2) 0 1 0 1 (-1,1) (2,-2) (-1,1) ( -2 ,2) b , σ 2 ) = 0.5 ( 1 )( 1 )( 1 ) + 0.5 ( 1 )( 1 )( -2 ) = -0.5 u 1 ( σ 1 cc , σ 2 J c ,K b ) = 0.5 ( 1 )( 1 )( 1 ) + 0.5 ( 1 )( 1 )( 1 )( -2 ) = -0.5 u 1 ( σ 1

  28. Domination C QJ QK 0.5 0.5 1 1 1 0 1 0 2 2 2 2 0 1 f 1 0 c (1,-1) 1 (1,-1) ( -1 ,1) -1 1 (-2,2) 0 1 0 1 (-1,1) ( 2 ,-2) (-1,1) (-2,2) b , σ 2 ) = 0.5 ( 1 )( 1 )( 1 ) + 0.5 ( 1 )( 1 )( -2 ) = -0.5 u 1 ( σ 1 cc , σ 2 J c ,K b ) = 0.5 ( 1 )( 1 )( 1 ) + 0.5 ( 1 )( 1 )( 1 )( -2 ) = -0.5 u 1 ( σ 1 cc , σ 2 J b ,K c ) = 0.5 ( 1 )( 1 )( 1 )( 2 ) + 0.5 ( 1 )( 1 )( -1 ) = +0.5 u 1 ( σ 1

  29. Domination C QJ QK 0.5 0.5 1 1 c b c b 2 2 2 2 c b f c b c (1,-1) 1 (1,-1) (-1,1) -1 1 (-2,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2) σ 1 is an iteratively dominated strategy if there exists σ 1 ' such that u 1 ( σ 1 , σ 2 , σ 3 , ... ) ≤ u 1 ( σ 1 ' , σ 2 , σ 3 , ... ) for all non-iteratively dominated σ 2 , σ 3 , ... u 1 ( σ 1 , σ 2 , σ 3 , ... ) < u 1 ( σ 1 ' , σ 2 , σ 3 , ... ) for some non-iteratively dominated σ 2 , σ 3 , ...

  30. Domination C QJ QK 0.5 0.5 1 1 c b c b 2 2 2 2 c b f c b c (1,-1) 1 (1,-1) (-1,1) -1 1 (-2,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2) b is iteratively dominated by σ 1 cc σ 1

  31. Domination C QJ QK 0.5 0.5 1 1 c b c b 2 2 2 2 c b f c b c (1,-1) 1 (1,-1) (-1,1) -1 1 (-2,2) f c f c (-1,1) (2,-2) (-1,1) (-2,2) Define an iteratively dominated action to be an action such that any strategy that always plays that action is iteratively dominated (assuming that player plays to reach that action).

Recommend


More recommend