ece700 07 game theory with engineering applications
play

ECE700.07: Game Theory with Engineering Applications Le Lecture 6: - PowerPoint PPT Presentation

ECE700.07: Game Theory with Engineering Applications Le Lecture 6: Re Repeated Games Seyed Majid Zahedi Outline Finitely and infinitely repeated games w/ and w/o perfect monitoring Trigger strategies Folk theorems Readings:


  1. ECE700.07: Game Theory with Engineering Applications Le Lecture 6: Re Repeated Games Seyed Majid Zahedi

  2. Outline • Finitely and infinitely repeated games w/ and w/o perfect monitoring • Trigger strategies • Folk theorems • Readings: • MAS Sec. 6.1, GT Sec. 5.1 and 5.5

  3. <latexit sha1_base64="praSL3IReXNHTQC1lxMvD0NwPxQ=">ACFHicbVDLSgMxFM3UV62vqks3wSJUxDJTxQdYqLhxWcU+oNMOmTRtQzMPkoxQhvkIN/6Cn+DGhSJuBd35A278BRemnSJqPRA4nHMuN/fYPqNC6vqblpiYnJqeSc6m5uYXFpfSysV4QUckzL2mMdrNhKEUZeUJZWM1HxOkGMzUrV7JwO/ekm4oJ57Ifs+aTio49I2xUgqyUpvBRaFBWiKwLFCXjCiZngemS3CJGqGfNuIoAo0wyzfjFJWOqPn9CHgODFGJFM8frn5zH+8l6z0q9nycOAQV2KGhKgbui8bIeKSYkailBkI4iPcQx1SV9RFDhGNcHhUBDeU0oJtj6vnSjhUf06EyBGi79gq6SDZFX+9gfifVw9k+6ARUtcPJHFxvKgdMCg9OGgItignWLK+Ighzqv4KcRdxhKXqMS7hcIC975PHSWfM3Zyu2eqjSMQIwnWwDrIAgPsgyI4BSVQBhcgVtwDx60a+1Oe9Se4mhCG82sgl/Qnr8ACui1Q=</latexit> Finitely Repeated Games (with Perfect Monitoring) • In repeated games, stage game 𝐻 is played by same agents for R rounds • Agents discount utilities by discount factor 0 ≤ 𝜀 ≤ 1 • Game is denoted by 𝐻 & 𝜀 • At each round, outcomes of all past rounds are observed by all agents • Agents’ overall utility is sum of discounted utilities at each round ) , … , 𝑣 ( & • Given sequence of utilities 𝑣 ( R δ r − 1 u ( r ) X u i = i r =1 • In general, strategies at each round could depend on history of play • Memory-less (also called stationary ) strategies are special cases

  4. Example: Finitely-Repeated Prisoners’ Dilemma • Suppose that Prisoners’ Dilemma is played in R ( < ∞ ) rounds Prisoner 2 Stay Silent Confess Prisoner 1 Stay Silent (-1, -1) (-3, 0) Confess (0, -3) (-2, -2) • What is SPE of this game? • We can use backward induction • Starting from last round, (C, C) is dominant strategy • Regardless of history, (C, C) is dominant strategy at each round • There exists unique SPE which is (C, C) at each round

  5. SPE in Finitely Repeated Games [Theorem] • If stage game 𝐻 has unique pure strategy equilibrium 𝑡 ∗ , then 𝐻 & 𝜀 has unique SPE in which 𝑡 0 = 𝑡 ∗ for all 𝑠 = 1, … , 𝑆 , regardless of history [Proof] • By backward induction, at round 𝑆 , we have 𝑡 & = 𝑡 ∗ • Given this, then we have 𝑡 &4) = 𝑡 ∗ , and continuing inductively, 𝑡 0 = 𝑡 ∗ for all 𝑠 = 1, … , 𝑆 , regardless of history

  6. <latexit sha1_base64="HaS4GJK/zsYRK2PrnJxz5C2EcQ8=">ACI3icbVDLSsNAFJ34rPVdelmUIS6sCQqvlBQ3LhUsCo0bZhMJ3XoZBJmboQS8i9uxD9x48IHbly4deMvuHDaiPg6MHA451zu3OPHgmuw7Wer39gcGi4MFIcHRufmCxNTZ/oKFGUVWkInXmE80El6wKHAQ7ixUjoS/Yqd/e7/qnF0xpHslj6MSsHpKW5AGnBIzklbYSj+MdXHaW3CYTQBZdnYReqnacrJG6XAbQyXKnkaolJ8Mm30jLajEreqV5u2L3gP8S5PM7+49XL8v70eqVHtxnRJGQSqCBa1xw7hnpKFHAqWFZ0E81iQtukxWqGShIyXU97N2Z4wShNHETKPAm4p36fSEmodSf0TIkcK5/e13xP6+WQLBRT7mME2CS5ouCRGCIcLcw3OSKURAdQwhV3PwV03OiCAVTa17CZhdrXyf/JSfLFWelsnpk2thGOQpoFs2hMnLQOtpFB+gQVRFl+gG3aF768q6tR6tpzaZ3OzKAfsF4+ADl7qLQ=</latexit> <latexit sha1_base64="UD7+EuZcQV4tNkSs84mcnMkcsZI=">ACLHicbVBLSwMxGMz6rPV9egl+AC9lF0VrYdCwYvHWqwK3bpk02wbms0uybdKWfYHefGvCOJBEa/evHoSNG1FfA0EJjPfkHzjx4JrsO0Ha2R0bHxiMjeVn56ZnZsvLCye6ChRlNVpJCJ15hPNBJesDhwEO4sVI6Ev2KnfPej7pxdMaR7JY+jFrBmStuQBpwSM5BUOEo/jMnYFD7205ire7gBRKrELpcB9DLsBorQ1NWJGVBlJztPa5kJnacbajPLzCXvFVbtoj0A/kucT7Jacd7fVtZeXqte4dZtRTQJmQqiNYNx46hmRIFnAqW5d1Es5jQLmzhqGShEw308GyGV43SgsHkTJHAh6o3xMpCbXuhb6ZDAl09G+vL/7nNRISs2UyzgBJunwoSARGCLcbw63uGIURM8QhU3f8W0Q0w5YPodlrDfx+7Xyn/JyVbR2S7uHJk2SmiIHFpGK2gDOWgPVdAhqI6ougK3aB79GBdW3fWo/U0HB2xPjNL6Aes5w+Qyq2e</latexit> Infinitely Repeated Games • Infinitely repeated play of 𝐻 with discount factor 𝜀 is denoted by 𝐻 5 𝜀 • Agents’ utility is average of discounted utilities at each round ) , … , 𝑣 ( 5 • For 𝜀 < 1 , given sequence of utilities 𝑣 ( ∞ δ r − 1 u ( r ) X u i = (1 − δ ) i r =1 ) , … , 𝑣 ( 5 • For 𝜀 = 1 , given sequence of utilities 𝑣 ( r =1 u ( r ) P R i u i = lim R R →∞

  7. <latexit sha1_base64="J6hXRWqJpSkiJ6eFsJQRNSPICXs=">ACcnicbVFdaxQxFM1Mq9b1a1V8UdCri/aDsy24AdsoeCLjxXctrBZh0z2zjZtJjMkd6TLMC+fd89U/4IuvgpnZRaz1QuDk3HOS3JOk0MpRFH0LwpXVK1evrV3v3Lh56/ad7t17hy4vrcSRzHVujxPhUCuDI1Kk8biwKLJE41Fy9rbpH31C61RuPtC8wEkmZkalSgryVNz94mL1sdqgzXqPJzhTpL+NFd3Gn4LXnDCc6pUCjU4r7ObNex5tLUNPM2t0BrskDhv5acX5fHp0sANtrvGhOd+JgcWhtDYOJrp8sa424v6UVtwGQyWoLe/vlX+X1VH8Tdr3yayzJDQ1IL58aDqKBJSwpqbHu8NJhIeSZmOHYQyMydJOqjayG56Zgh/BL0PQsn87KpE5N8Sr8wEnbh/ew35v964pPT1pFKmKAmNXFyUlhohyZ/mCqLkvTcAyGt8m8FeSKskOR/qdOG8Kapl39GvgwOd/qD3f7ue5/GkC1qjT1iz9gG7BXbJ+9YwdsxCT7ETwIHgdPgp/hw/Bp2FtIw2Dpuc8uVLj9G3Jbvms=</latexit> Trigger Strategies (TS) • Agents get punished if they deviate from agreed profile • In non-forgivingTS (or grim TS), punishment continues forever ( if s ( r ) = s ∗ , 8 r < t s ∗ s ( t ) i = if s ( r ) i s j 6 = s ∗ j , 9 r < t i j 6 is punishment strategy of 𝑗 for agent 𝑘 • Here, 𝑡 ∗ is agreed profile, and 𝑡 ( 6 , forever • Single deviation by 𝑘 trigers agent 𝑗 to switch to 𝑡 (

  8. Trigger Strategies in Repeated Prisoners’ Dilemma Prisoner 2 • Suppose both agents use following trigger strategy Stay Silent Confess Prisoner 1 • Play S unless someone has ever played C in past Stay Silent (-1, -1) (-3, 0) • Play C forever if someone has played C in past Confess (0, -3) (-2, -2) • Under what conditions is this SPE? • We use one-stage deviation principle • Step 1: (S is best response to S) 1 + 𝜀 + 𝜀 ; + ⋯ • Utility from S: − 1 − 𝜀 = − 1 − 𝜀 / 1 − 𝜀 = −1 0 + 2𝜀 + 2𝜀 ; + ⋯ • Utility from C: − 1 − 𝜀 = −2𝜀 1 − 𝜀 / 1 − 𝜀 = −2𝜀 • S is better than C if 𝜀 ≥ 1/2 • Step 2: (C is best response to C) • Other agents will always play C, thus C is best response

  9. Remarks • Cooperation is equilibrium, but so are many other strategy profiles • If 𝑡 ∗ is NE of 𝐻 , then “ each agent plays 𝑡 ( ∗ ” is SPE of 𝐻 & 𝜀 • Future play of other agents is independent of how each agent plays • Optimal play is to maximize current utility, i.e. , play static best response • Sets of equilibria for finite and infinite horizon versions can be different • Multiplicity of equilibria in repeated prisoner’s dilemma only occurs at 𝑆 = ∞ • For any finite 𝑆 (thus for 𝑆 → ∞ ), repeated prisoners’ dilemma has unique SPE

  10. TS in Finitely Repeated Games • If 𝐻 has multiple equilibria, then 𝐻 & (𝜀) does not have unique SPE • Consider following example Agent 2 x y z Agent 1 x (3, 3) (0, 4) (-2, 0) y (4, 0) (1, 1) (-2, 0) z (0, -2) (0, -2) (-1, -1) • Stage game has two pure NE: (y, y) and (z, z) • Socially optimal outcome, (x, x), is not equilibrium • In twice repeated play, we can support (x, x) in first round

  11. TS in Finitely Repeated Games (cont.) • TS strategy • Play x in first round • Play y in second round if opponent played x; otherwise, play z • We can use one-shot deviation principle • For simplicity, suppose 𝜀 = 1 • Playing x first and y next leads to utility of 4 • Playing y first triggers opponent to play z next, which leads to utility 3 • Deviation is not profitable!

  12. Repetition Can Lead to Bad Outcomes • Consider this game Agent 2 x y z Agent 1 x (2, 2) (2, 1) (0, 0) y (1, 2) (1, 1) (-1, 0) z (0, 0) (0, -1) (-1, -1) • Strategy x strictly dominates y and z for both agents • Unique Nash equilibrium of stage game is (x, x) • If 𝜀 ≥ 1/2 , this game has SPE in which (y, y) is played in every round • It is supported by slightly more complicated strategy than grim trigger • I. Play y in every round unless someone deviates, then go to II • II. Play z. If no one deviates go to I. If someone deviates stay in II

  13. <latexit sha1_base64="IkDCELqS51Mqf/r7UGub8peXdw=">ACLXicbVDLSgMxFM34tr6qLt0Ei6CgZWrBygIunCpYGuhU4c7adqGJpkhyYhlmA/xF9z4Fe5FcKGILv0N06mIrwOBwzncnNPEHGmjes+OUPDI6Nj4xOTuanpmdm5/PxCVYexIrRCQh6qWgCaciZpxTDaS1SFETA6XnQPez75dUaRbKM9OLaENAW7IWI2Cs5OePhJ9sPQiYSnex6oNvYEk37iadYWkJmpleDKT7RvQ7HPVi1Z/+av+fmCW3Qz4L+k9EkKBztv10t3teaJn3/wmiGJBZWGcNC6XnIj0hAGUY4TXNerGkEpAtWrdUgqC6kWTXpnjFKk3cCpV90uBM/T6RgNC6JwKbFGA6+rfXF/z6rFp7TQSJqPYUEkGi1oxybE/epwkylKDO9ZAkQx+1dMOqCAGFtwLitht4+tr5P/kupmsVQulk9tG3togAm0hJbRKiqhbXSAjtEJqiCbtA9ekLPzq3z6Lw4r4PokPM5s4h+wHn/A8IrGw=</latexit> <latexit sha1_base64="nKWPzLxWbg6YlOqMulrJ9pCOQ7c=">ACLXicbVDLSgMxFM34tr6qLrsJFkFBy4yCVlAQdOFSwWqhU0Imk9ZgkhmSTLEM8yH+ghu/wr0ILiqiS3/DTCtSHwcCh3Pu4eaeIOZMG9ftOSOjY+MTk1PThZnZufmF4uLShY4SRWiNRDxS9QBrypmkNcMp/VYUSwCTi+D6Pcv+xQpVkz03pk2B25K1GMHGSqh47CcypCqPp50MXgAfcEkSn3N2gKjdJNlmZXwDUo1YhlMEFuzZGPIX0fFsltx+4B/ifdFyofV9vSQz08RcUnP4xIqg0hGOtG54bm2aKlWGE06zgJ5rGmFzjNm1YKrGgupn2r83gqlVC2IqUfdLAvjqcSLHQuisCOymwudK/vVz8z2skplVtpkzGiaGSDBa1Eg5NBPqYMgUJYZ3LcFEMftXSK6wsTYgv9EvZy7Hyf/JdcbFW87cr2mW1jHwBUpgBawBD+yCQ3ACTkENEHAHkEPvDj3zrPz6rwNRkecr8wy+AHn4xP/gaz1</latexit> Feasible and Individually Rational Utilities • 𝑊 = Convex hull of 𝑤 ∈ ℝ ℐ there exists 𝑡 ∈ 𝑇 such that 𝑣 s = 𝑤 • Utility in repeated game is just a weighted average of utilities in stage game • Note that 𝑊 ≠ 𝑤 ∈ ℝ ℐ there exists 𝜏 ∈ Σ such that 𝑣 𝜏 = 𝑤 • Recall minmax value of agent 𝑗 v i = min σ − i max u i ( s i , σ − i ) s i • Also recall minmax strategy against 𝑗 m i − i = arg min σ − i max u i ( s i , σ − i ) s i • Utility vector 𝑤 ∈ ℝ ℐ is strictly individually rational if 𝑤 ( > 𝑤 ( , ∀𝑗

Recommend


More recommend