2 player zero sum game
play

2-player zero-sum game u Prove that NE exists in two ways 1. Nash's - PDF document

10/5/20 CSCI 3210: Computational Game Theory Linear Programming and 2-Player Zero-Sum Games Ref: Wikipedia: https://en.wikipedia.org/wiki/Linear_programming and [AGT] Ch 1 Mohammad T . Irfan Email: mirfan@bowdoin.edu Web:


  1. 10/5/20 CSCI 3210: Computational Game Theory Linear Programming and 2-Player Zero-Sum Games Ref: Wikipedia: https://en.wikipedia.org/wiki/Linear_programming and [AGT] Ch 1 Mohammad T . Irfan Email: mirfan@bowdoin.edu Web: www.bowdoin.edu/~mirfan Course: www.bowdoin.edu/~mirfan/CSCI-3210.html 1 2-player zero-sum game u Prove that NE exists– in two ways 1. Nash's theorem Doesn't give an algorithm (why?) u 2. Linear programming u Gives an algorithm 2 1

  2. 10/5/20 Example: 2-player zero-sum game u Penalty kick game Goalkeeper Left Right (0.42) (0.58) Shooter 0.58, 0.95, Left (0.38) 0.42 0.05 0.93, 0.70, Right 0.07 0.30 (0.62) 3 Example: 2-player zero-sum game u Assumption (wlog): sum of payoffs in each cell is 0 Column player L R L R Row player 2, -2 -1, 1 U 2 -1 U -3, 3 4, -4 D -3 4 D u More than 2 actions? u Need an algorithm 4 2

  3. 10/5/20 Linear Programming (LP) Will come back to game theory later 5 Applications u Optimization u Production, machine scheduling, employee scheduling, supply chain management, etc. u Game theory u In general: optimization 6 3

  4. 10/5/20 LP 1. Variables (or decision variables) u We can choose the values of these variables u What's the goal? u What range of values can we choose? Integer vs real? Any other restrictions? 2. Objective function (What's the goal?) u Minimization or maximization u Must be linear in the variables 3. Constraints (What values?) u Restricts the values of choice variables u Must be linear in the variables 7 Example 1: LP formulation & geometric interpretation u One is planning his day-to-day life. Outside of 10 hours of sleep every day, he wants to set aside a few hours for studying and a few hours for connecting with friends. u Gets 10 units/hr of payoff from study and 20 units/hr of payoff from connecting with friends. u Must study at least 6 hours every day. Also, feels guilty if spends more than 6 hours with friends. u How should he allocate time optimally? u Variables? u Objective function? u Constraints? 8 4

  5. 10/5/20 LP formulation u Maximize 10 x1 + 20 x2 u Subject to x1 >= 6 x2 <= 6 x1 + x2 <= 14 x1, x2 >= 0 9 14 One of the vertices (black dots) will give the optimal solution x2 6 Feasible region (0,0) 6 x1 14 Note: x1, x2 >= 0: white region 11 5

  6. 10/5/20 Example 2: infeasible LP u Want to sleep 10 hours/day, study at least 10 hours/day, and do other activities for at least 5 hours/day. How to allocate time? 12 Example 3: more var. & constraints u Gets 15 units/hr of payoff for studying up to 3 hours and 10 units/hr of payoff after 3 hours of studying (basically, brain slows down). Also gets 20 units/hr of payoff from connecting with friends. u Sleep 10 hours/day u Wants at least 6 hours of study/day u Wants at most 6 hours of time with friends/day 13 6

  7. 10/5/20 Example 4: unbounded LP u A tennis player is making a plan for practicing service and volley. She gets a payoff of 10 from every service and 5 from every volley. u She wants to practice service at least 100 times a day and doesn't want to practice volleys more than 500 times a day. What's her optimal plan? 14 Matrix algebra u Images from this tutorial: http://www.intmath.com/matrices- determinants/3-matrices.php u 4x1 matrix (AKA vector) u 3x3 matrix 15 7

  8. 10/5/20 Matrix multiplication u 2x3 matrix multiplied by 3x2 matrix must match u Result is a 2x2 matrix 16 Transpose of matrix u Transpose operator: superscript T ! $ 1 4 # & A = 2 5 # & # 3 6 & " % ! $ 1 2 3 A T = # & # & 4 5 6 " % u (A B) T = B T A T 17 8

  9. 10/5/20 18 Solving LP u Example 1 Max 10 x1 + 20 x2 s.t. x1 >= 6 x2 <= 6 x1 + x2 <= 14 x1, x2 >= 0 19 9

  10. 10/5/20 14 One of the vertices (black dots) will give the optimal solution x2 6 Feasible region (0,0) 6 x1 14 Note: x1, x2 >= 0: white region 20 Algorithms for solving LP u Simplex (Dantzig, 1947) u Worst case exponential time u Practically fast u Ellipsoid (Khachiyan, 1979) u O(n 4 L) for n variables and L input bits u Pseudo-polynomial u Karmarkar's algorithm (Karmarkar, 1984) u O(n 3.5 L) for n variables and L input bits u Pseudo-polynomial, but breakthrough for practical reasons u Open problem: strongly polynomial algorithm? 21 10

  11. 10/5/20 LP Duality (von Neumann, 1947) u Interview with Dantzig u http://www.personal.psu.edu/ecb5/Courses/M475 W/WeeklyReadings/Week%2015/An_Interview_with _George_Dantzig.pdf u If the "primal" LP is maximization, its "dual" is minimization and vice versa. u Every variable of the primal LP leads to a constraint in the dual LP and every constraint of the primal LP leads to a variable in the dual LP . u Dual of dual is primal. 22 Definition of dual LP Source: Applied Mathematical Programming book Example 23 11

  12. 10/5/20 Definition of dual LP Source: Applied Mathematical Programming book Primal Maximize c T x subject to: A x <= b x >= 0 Dual Minimize b T y subject to: T y >= c A y >= 0 24 Example 5: LP duality u How many Bowdoin logs and chocolate cakes should Thorne make to maximize its revenue? Derive primal and dual LP u Objective function. Each log has a satisfaction of 10 (or price of $10), each cake 5. u Constraints. For both desserts, the chef needs to use an oven, a food processor, and a boiler. Processing time/log Processing time/cake Total available time Oven 5 min 1 min 85 min Food processor 1 min 10 min 300 min Boiler 4 min 6 min 120 min 25 12

  13. 10/5/20 Example 5 (continued) u Revenue: $10/log, $5/cake Processing time/log Processing time/cake Total available time Oven 5 min 1 min 85 min Food processor 1 min 10 min 300 min Boiler 4 min 6 min 120 min u Primal LP: Dual LP: 26 Dual: intuition u Moulton wants to borrow Thorne's equipment for a day for a special event. u Moulton will pay Thorne $y1/min, $y2/min, and $y3/min for the 3 equipment, resp. such that: 1. (Dual objective) Moulton minimizes the total cost of renting 2. (Dual constraints) Moulton will make sure that Thorne recuperates the lost payoff for each piece of dessert through rental income 27 13

  14. 10/5/20 Daily planner (Example 1) Primal LP Dual LP? Maximize 10 x1 + 20 x2 Subject to x1 >= 6 (or, -x1 <= -6) x2 <= 6 x1 + x2 <= 14 x1, x2 >= 0 Work out the solutions by hand. What's the dual interpretation? 29 Weak duality theorem Dual LP Minimize ... Increasing objective function Gap? Primal LP Maximize ... 32 14

  15. 10/5/20 Weak duality theorem u Any feasible solution of the dual LP (minimization) gives an upper bound on Increasing the optimal solution of the primal LP objective function (maximization). [That’s how we defined dual!] Dual LP (min) u Proof (next slide) Gap? Primal LP (max) u Any feasible solution of the primal LP (maximization) is a lower bound on the optimal solution of the dual LP (minimization). 33 Proof: weak duality theorem Primal Maximize c T x subject to: A x <= b x >= 0 Dual Minimize b T y subject to: T y >= c A y >= 0 34 15

  16. 10/5/20 Implications: weak duality theorem u What will happen if primal (or dual) is unbounded? Increasing objective u Primal unbounded è Dual function infeasible Dual LP (min) u Dual unbounded è Primal infeasible Gap? Primal LP u Both primal and dual may be (max) infeasible (although not implied by this theorem) 35 Strong duality theorem u If the primal LP has a finite optimal solution, then so does the dual LP . Moreover, these two optimal solutions have the same objective function value. u In other words, if either the primal or the dual LP has a finite optimal solution, the gap between them is 0. 36 16

  17. 10/5/20 Complementary slackness u In case the strong duality theorem holds: u primal constraint non-binding (not equal) => corresponding dual variable = 0 at OPT u Similar condition holds for dual constr. & primal var. u The reverse implication may not hold! 37 2-player zero-sum game Algorithm via LP duality 38 17

  18. 10/5/20 Example 6: 2-player zero-sum game u Assumption (wlog): sum of payoffs in each cell is 0 Column player L R L R Row player 2, -2 -1, 1 U 2 -1 U -3, 3 4, -4 D -3 4 D Matrix A Example: (U,L): row gains 2 and col. loses 2 39 Row player u How much gain can row player guarantee? u Call it v r u Wants largest v r possible u Row: choose mixed strategy p (vector of prob.) to maximize v r u Expected gain of row when col. plays j (or expected loss of col. for playing j ) = Σ i ( p i A i,j ) = ( p T A ) j 40 18

  19. 10/5/20 Row player's LP Row player's thought process: maximize my guaranteed gain v knowing that column player will minimize his loss (in other words, col. player will make sure v <= col. v r = max v player’s loss for any of his action j ). subject to ∑ p i A i , j ≥ v , for each action j of column player i ∑ p i = 1 i p i ≥ 0, for each action i of row player 41 Column player u How little ( v c ) can col. player pay to row? u Choose mixed strategy q (vector of probabilities) to minimize v c u Expected gain of row player for playing i (or exp. loss of col. player when row plays i) = ( Aq ) i = Σ j ( A i,j q j ) 42 19

Recommend


More recommend