accelerating best response calculation in large extensive
play

Accelerating Best Response Calculation in Large Extensive Games Q - PowerPoint PPT Presentation

Accelerating Best Response Calculation in Large Extensive Games Q J # $ K 10 P " C R ! July 21, 2011 U G A V ! " # Michael Johanson, Kevin Waugh, ! K Q $ A J ! 10 University of Alberta Michael Bowling, Martin


  1. University of Alberta Agents Computer Poker Competition Man-vs-Machine 400 2007 Man-Machine: Narrow Human Win 320 Best Response (mbb/g) 240 160 2008 Man-Machine: Narrow Computer Win 80 0 2006 2007 2008 2009 2010 2011 Year Wednesday, November 14, 2012

  2. Evaluating the University of Alberta agents Comparing Abstraction Techniques: Percentile HS Public PHS k-Means Earthmover Best Response (mbb/g) 400 300 200 100 0 1E+06 1E+07 1E+08 1E+09 1E+10 Abstraction Size (# information sets) Wednesday, November 14, 2012

  3. Evaluating Computer Poker Agents: 2010 Competition Rock HyperB GS6 Best GGValuta PULPO Littlerock hopper (UofA) (CMU) Response Rock 6 3 7 37 77 300 hopper -6 3 1 31 77 237 GGValuta HyperB -3 -3 2 31 70 135 (UofA) -7 -1 -2 32 125 399 PULPO GS6 -37 -31 -31 -32 47 318 (CMU) -77 -77 -70 -125 -47 421 Littlerock Wednesday, November 14, 2012

  4. ♣ ♥ ♦ ♠ Conclusion Fast best-response calculation in imperfect information games The previously intractable computation can now be run in a day! Computer poker community is making steady progress towards robust strategies Many additional exciting results in the paper and at the poster! Wednesday, November 14, 2012

  5. More details at our poster! Today, 4:00 - 5:20, Room 120-121 Wednesday, November 14, 2012

  6. Additional Slides: Expectimax Public Tree n^2 to n Abstraction Pathologies CFR Hyperborean Polaris 2009 Additional Tilting Graphs Wednesday, November 14, 2012

  7. Leduc Hold’em Pathologies Abstraction Best Response Real Game vs Real Game 0 J.Q.K vs Real Game 55.2 [JQ].K vs Real Game 69.0 J.[QK] vs Real Game 126.3 [JQK] vs Real Game 219.3 [JQ].K vs [JQ].K 272.2 [JQ].K vs J.Q.K 274.1 Real Game vs J.[QK] 345.7 Real Game vs [JQ].K 348.9 J.Q.K vs J.Q.K 359.9 J.Q.K vs [JQ].K 401.3 J.[QK] vs J.[QK] 440.6 459.5 Real Game vs [JQK] Real Game vs J.Q.K 491.0 [JQK] vs [JQK] 755.8 Home Wednesday, November 14, 2012

  8. Expectimax My Tree Your Tree Reach: 2: 0.5 K: 0.5 0.5 0.5 -0.29 Home Wednesday, November 14, 2012

  9. Conventional Best Response in one tree walk My Tree Your Tree Reach: 2: 0.5*0.75 K: 0.5*0.1 -0.29 0.05 0.38 Home Wednesday, November 14, 2012

  10. Conventional Best Response in one tree walk My Tree Your Tree Reach: 2: 0.5*0.75 K: 0.5*0.1 -0.29 0.05 0.38 Home Wednesday, November 14, 2012

  11. Conventional Best Response in one tree walk My Tree Your Tree Reach: 2: 0.5*0.75*0.25 K: 0.5*0.1*0.9 -0.29 Home -0.045 0.09 0.05 Wednesday, November 14, 2012

  12. Conventional Best Response in one tree walk My Tree Your Tree Reach: 2: 0.5*0.75 K: 0.5*0.1 -0.29 0.05 0.38 Home -0.045 Wednesday, November 14, 2012

  13. Conventional Best Response in one tree walk My Tree Your Tree Reach: 2: 0.5*0.75*0.75 K: 0.5*0.1*0.1 -0.29 Home -0.045 0.14 0.28 0.005 Wednesday, November 14, 2012

  14. Conventional Best Response in one tree walk My Tree Your Tree Reach: 2: 0.5*0.75 K: 0.5*0.1 -0.29 0.1 0.05 0.38 Home Wednesday, November 14, 2012

  15. Conventional Best Response in one tree walk My Tree Your Tree Reach: 2: 0.5*0.75 K: 0.5*0.1 -0.29 0.05 0.38 0.1 Home Wednesday, November 14, 2012

  16. Conventional Best Response in one tree walk My Tree Your Tree Reach: 2: 0.5*0.75 K: 0.5*0.1 -0.29 0.1 -0.05 0.05 0.38 Home Wednesday, November 14, 2012

  17. Conventional Best Response in one tree walk My Tree Your Tree Reach: 2: 0.5*0.75 K: 0.5*0.1 0.1 -0.29 0.05 0.38 0.1 -0.05 Home Wednesday, November 14, 2012

  18. Conventional Best Response in one tree walk My Tree Your Tree Reach: 2: 0.5 K: 0.5 0.5 0.5 -0.19 Home Wednesday, November 14, 2012

  19. Conventional Best Response in one tree walk My Tree Your Tree Reach: 2: 0.5 K: 0.5 0.5 0.5 -0.19 Home Wednesday, November 14, 2012

  20. Conventional Best Response in one tree walk My Tree Your Tree Reach: 2: 0.5 K: 0.5 0.5 0.5 -0.19 Home Wednesday, November 14, 2012

  21. Conventional Best Response in one tree walk My Tree Your Tree Reach: 2: 0.5*0.25 K: 0.5*0.9 -0.19 0.45 0.13 Home Wednesday, November 14, 2012

  22. 1: Walking the Public Tree Their Reach Prob: 2: 0.5 K: 0.5 My Value: 2: K: Home Wednesday, November 14, 2012

  23. 1: Walking the Public Tree Their Reach Prob: 2: 0.5*0.25 K: 0.5*0.9 My Value: 2: K: Home Wednesday, November 14, 2012

  24. 1: Walking the Public Tree Their Reach Prob: 2: 0.5*0.25 K: 0.5*0.9 My Value: 2: -0.45 K:0.13 0.13 -0.45 -0.45, 0.13 Home Wednesday, November 14, 2012

  25. 1: Walking the Public Tree Their Reach Prob: 2: 0.5*0.25 K: 0.5*0.9 My Value: 2: -0.45 K:0.13 0.13 -0.45 -0.45, 0.13 Home Wednesday, November 14, 2012

  26. 1: Walking the Public Tree Their Reach Prob: 2: 0.5*0.25 K: 0.5*0.9 My Value: 2: -0.29 K: -0.29 0.13 -0.29 -0.45 -0.29 -0.29 -0.45, -0.29 0.13 Home Wednesday, November 14, 2012

  27. 1: Walking the Public Tree Their Reach Prob: 2: 0.5*0.25 K: 0.5*0.9 My Value: 2: -0.29 0.13 K: 0.13 -0.29 -0.29, 0.13 -0.29 0.13 -0.45 -0.29 -0.29 -0.45, -0.29 0.13 Home Wednesday, November 14, 2012

  28. 1: Walking the Public Tree Their Reach Prob: 2: 0.5 K: 0.5 My Value: 2: -0.29 0.13 K: 0.13 -0.29 -0.29, 0.13 Home Wednesday, November 14, 2012

  29. 1: Walking the Public Tree Their Reach Prob: 2: 0.5*0.75 K: 0.5*0.1 My Value: 2: 0.13 K: -0.29 -0.29, 0.13 Home Wednesday, November 14, 2012

  30. 1: Walking the Public Tree Their Reach Prob: 2: 0.5*0.75 K: 0.5*0.1 My Value: 2: 0.13 K: -0.29 -0.29, 0.13 Home Wednesday, November 14, 2012

  31. 1: Walking the Public Tree Their Reach Prob: 2: 0.5*0.75 K: 0.5*0.1 My Value: 2: -0.05 0.13 K: 0.09 -0.29 -0.29, 0.13 0.09 -0.05 -0.05, Home 0.09 Wednesday, November 14, 2012

  32. 1: Walking the Public Tree Their Reach Prob: 2: 0.5*0.75 K: 0.5*0.1 My Value: 2: -0.05 0.13 K: 0.09 -0.29 -0.29, 0.13 0.09 -0.05 -0.05, Home 0.09 Wednesday, November 14, 2012

  33. 1: Walking the Public Tree Their Reach Prob: 2: 0.5*0.75 K: 0.5*0.1 My Value: 2: 0.14 0.13 K: 0.14 -0.29 -0.29, 0.13 0.09 0.14 -0.05 0.14 -0.05, 0.14 Home 0.09 0.14 Wednesday, November 14, 2012

  34. 1: Walking the Public Tree Their Reach Prob: 2: 0.5*0.75 K: 0.5*0.1 My Value: 2: 0.14 0.13 K: 0.14 -0.29 0.09 0.23 -0.29, 0.13 0.09 0.14 -0.05 0.14 0.09 0.23 -0.05, 0.14 Home 0.09 0.14 Wednesday, November 14, 2012

  35. 1: Walking the Public Tree Their Reach Prob: 2: 0.5*0.75 K: 0.5*0.1 My Value: 2: 0.14 0.13 K: 0.14 -0.29 0.09 0.23 -0.29, 0.13 0.09 0.23 Home Wednesday, November 14, 2012

  36. 1: Walking the Public Tree Their Reach Prob: 2: 0.5*0.75 K: 0.5*0.1 My Value: 2: -0.05 0.13 K: 0.19 -0.29 0.09 -0.29, 0.23 0.19 -0.05 0.13 0.09 -0.05 0.23 0.19 Home Wednesday, November 14, 2012

  37. 1: Walking the Public Tree Their Reach Prob: 2: 0.5*0.75 K: 0.5*0.1 My Value: 2: 0.09 0.23 0.13 0.09 K: 0.23 -0.29 -0.29, 0.09 0.13 0.23 0.09 -0.05 0.23 0.19 Home Wednesday, November 14, 2012

  38. 1: Walking the Public Tree Their Reach Prob: 2: 0.5*0.75 K: 0.5*0.1 My Value: 2: -0.2 0.36 -0.2 K: 0.36 -0.2 0.36 Home Wednesday, November 14, 2012

  39. 1: Walking the Public Tree Their Reach Prob: 2: 0.5*0.75 K: 0.5*0.1 My Value: 0.17 0.17 0.18 Home Wednesday, November 14, 2012

  40. Home Polaris 2008 Agent Size Tilt Best Response Pink 266m 0, 0, 0, 0 235.294 Orange 266m 7, 0, 0, 7 227.457 Peach 266m 0, 0, 0, 7 228.325 Red 115m 0, -7, 0, 0 257.231 Green 115m 0, -7, 0, -7 263.702 (Reference) 115m 0, 0, 0, 0 266.797 Wednesday, November 14, 2012

  41. Polaris Polaris Hyperborean 500 400 300 200 100 0 2006 2007 2008 2009 2010 2011 Home Wednesday, November 14, 2012

  42. Polaris Polaris Hyperborean 500 Man-vs-Machine 2007 Narrow loss 400 300 200 100 0 2006 2007 2008 2009 2010 2011 Home Wednesday, November 14, 2012

  43. Polaris Polaris Hyperborean 500 Man-vs-Machine 2007 Narrow loss 400 300 200 Man-vs-Machine 2008 100 Narrow win 0 2006 2007 2008 2009 2010 2011 Home Wednesday, November 14, 2012

  44. Polaris Polaris Hyperborean 500 Man-vs-Machine 2007 Narrow loss 400 300 200 Man-vs-Machine 2008 100 Narrow win 0 2006 2007 2008 2009 2010 2011 Home Wednesday, November 14, 2012

  45. Tilting 340 Exploitability (mb/g) 320 300 280 260 -20 -10 0 10 20 Percent bonus for winner Home Wednesday, November 14, 2012

  46. Tilting: 7% 400 Untilted Perc. E[HS 2 ] Untilted k-Means Exploitability (mb/g) Tilted Perc. E[HS 2 ] 350 Tilted k-Means 300 250 200 0 50 100 150 200 250 300 Abstraction size (millions of information sets) Home Wednesday, November 14, 2012

  47. Counterfactual Regret Minimization: Abstract-Game Best Response 10-bucket Perfect Recall, Percentile 10 E[HS^2] 20 16 Exploitability (mbb/g) 12 8 4 0 0 400 800 1200 1600 2000 Home Iterations (million) Wednesday, November 14, 2012

  48. Counterfactual Regret Minimization: Real Game Best Response 10-bucket Perfect Recall, Percentile 10 E[HS^2] 300 280 Exploitability (mbb/g) 260 240 220 200 0 1600 3200 4800 6400 8000 Home Iterations (million) Wednesday, November 14, 2012

  49. Hyperborean 2009 Polaris Hyperborean Best Response (mbb/g) 500 400 300 200 100 0 ? 2006 2007 2008 2009 2010 2011 Year Home Wednesday, November 14, 2012

  50. Abstraction: Perc HS 2 Home Wednesday, November 14, 2012

  51. Abstraction: k-Means Home Wednesday, November 14, 2012

  52. Abstraction: HS Distributions Distribution over future outcomes for hand AsAd 1 0.8 E[HS] 0.6 Probability 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 Expected Hand Strength Home Wednesday, November 14, 2012

  53. Abstraction: HS Distributions Distribution over future outcomes for hand 2s7c 1 0.8 E[HS] 0.6 Probability 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 Home Expected Hand Strength Wednesday, November 14, 2012

  54. k-Means Earthmover Abstraction Home Wednesday, November 14, 2012

  55. 3: Fast Terminal Node Evaluation His Reach Probs: My Values: 0.1 = ? 0.05 = ? 0.02 = ? Home Wednesday, November 14, 2012

  56. 3: Fast Terminal Node Evaluation His Reach Probs: My Values: u = utility for winner 0.1 = 0*0.1 + u*0.05 + u*0.02 + ... 0.05 = -u*0.1 + 0*0.05 + u*0.02 + ... 0.02 = -u*0.1 + -u*0.05 + 0*0.02 + ... ... ... Home Wednesday, November 14, 2012

  57. 3: Fast Terminal Node Evaluation The obvious O(n^2) algorithm: r[i] = his reach probs v[i] = my values u = utility for the winner for( a = each of my hands ) for( b = each of his hands ) if( a > b ) v[a] += u*r[b] else if( a < b ) v[a] -= u*r[b] Home Wednesday, November 14, 2012

Recommend


More recommend