Announcements: Homework 1 Out Ø HW1 and a latex template for solutions are out on the course website: http://www.haifeng-xu.com/cs6501fa19 • The HW sol template is for your convenience, but not required. Feel free to use your own template Ø Due in two weeks: Thursday 09/19 3:30 pm, rightly before class Ø Homework submission 1. Submit your PDF to UVA-ColLab (collab course website just up) 2. And hand a hard-copy over to Jing or Minbiao before class Ø Start it early, and hope you enjoy it! 1
CS6501: T opics in Learning and Game Theory (Fall 2019) Introduction to Game Theory (I) Instructor: Haifeng Xu
Outline Ø Games and its Basic Representation Ø Nash Equilibrium and its Computation Ø Other (More General) Classes of Games 3
(Recall) Example 1: Prisoner’s Dilemma Ø Two members A,B of a criminal gang are arrested Ø They are questioned in two separate rooms v No communications between them Q: How should each prisoner act? Ø Both of them betray, though (- 1,-1) is better for both 4
Example 2: Traffic Light Game Ø Two cars heading to orthogonal directions B STOP GO STOP (-3, -2) (-3, 0) A GO (0, -2) (-100, -100) Q: what are the equilibrium statuses? Answer: (STOP, GO) and (GO, STOP) 5
Example 3: Rock-Paper-Scissor Player 2 Rock Paper Scissor Rock (0, 0) (-1, 1) (1, -1) Player 1 Paper (1, -1) (0, 0) (-1, 1) Scissor (-1, 1) (1, -1) (0, 0) Q: what is an equilibrium? Ø Need to randomize – any deterministic action pair cannot make both players happy Ø Common sense suggests (1/3,1/3,1/3) 6
Example 4: Selfish Routing Ø One unit flow from 𝑡 to 𝑢 which consists of (infinite) individuals, each controlling an infinitesimal small amount of flow Ø Each individual wants to minimize his own travel time Q: What is the equilibrium status? Ø Half unit flow through each path Ø Social cost = 3/2 7
Example 4: Selfish Routing Ø One unit flow from 𝑡 to 𝑢 which consists of (infinite) individuals, each controlling an infinitesimal small amount of flow Ø Each individual wants to minimize his own travel time Q: What is the equilibrium status after adding a superior high way with 0 traveling cost? Ø Everyone takes the blue path Ø Social cost = 2 𝑑 𝑦 = 0 8
Key Characteristics of These Games Ø Each agent wants to maximize her own payoff Ø An agent’s payoff depends on other agents’ actions Ø The interaction stabilizes at a state where no agent can increase his payoff via unilateral deviation 9
Strategic Games Are Ubiquitous Ø Pricing 10
Strategic Games Are Ubiquitous Ø Pricing Ø Sponsored search • Drives 90%+ of Google’s revenue $1.03 $0.65 $1.02 $0.60 $0.21 11
Strategic Games Are Ubiquitous Ø Pricing Ø Sponsored search • Drives 90%+ of Google’s revenue Ø FCC’s Allocation of spectrum to radio frequency users 12
Strategic Games Are Ubiquitous Ø Pricing Ø Sponsored search • Drives 90%+ of Google’s revenue Ø FCC’s Allocation of spectrum to radio frequency users Ø National security, boarder patrolling, counter-terrorism Optimize resource allocation against attackers/adversaries 13
Strategic Games Are Ubiquitous Ø Pricing Ø Sponsored search • Drives 90%+ of Google’s revenue Ø FCC’s Allocation of spectrum to radio frequency users Ø National security, boarder patrolling, counter-terrorism Ø Kidney exchange – decides who gets which kidney at when 14
Strategic Games Are Ubiquitous Ø Pricing Ø Sponsored search • Drives 90%+ of Google’s revenue Ø FCC’s Allocation of spectrum to radio frequency users Ø National security, boarder patrolling, counter-terrorism Ø Kidney exchange – decides who gets which kidney at when Ø Entertainment games: poker, blackjack, Go, chess . . . Ø Social choice problems such as voting, fair division, etc. 15
Strategic Games Are Ubiquitous Ø Pricing Ø Sponsored search • Drives 90%+ of Google’s revenue Ø FCC’s Allocation of spectrum to radio frequency users Ø National security, boarder patrolling, counter-terrorism Ø Kidney exchange – decides who gets which kidney at when Ø Entertainment games: poker, blackjack, Go, chess . . . Ø Social choice problems such as voting, fair division, etc. These are just a few example domains where computer science has made significant impacts ; There are many others. 16
Main Components of a Game Ø Players: participants of the game, each may be an individual, organization, a machine or an algorithm, etc. Ø Strategies: actions available to each player Ø Outcome: the profile of player strategies Ø Payoffs: a function mapping an outcome to a utility for each player 17
Normal-Form Representation Ø 𝑜 players, denoted by set 𝑜 = {1, ⋯ , 𝑜} Ø Player 𝑗 takes action 𝑏 / ∈ 𝐵 / Ø An outcome is the action profile 𝑏 = (𝑏 3 , ⋯ , 𝑏 4 ) • As a convention, 𝑏 6/ = (𝑏 3 , ⋯ , 𝑏 /63 , 𝑏 /73 , ⋯ , 𝑏 4 ) denotes all actions excluding 𝑏 / 4 𝐵 / Ø Player 𝑗 receives payoff 𝑣 / (𝑏) for any outcome 𝑏 ∈ Π /:3 • 𝑣 / 𝑏 = 𝑣 / (𝑏 / , 𝑏 6/ ) depends on other players’ actions Ø 𝐵 / , 𝑣 / /∈[4] are public knowledge This is the most basic game model Ø There are game models with richer and more intricate structures 18
Illustration: Prisoner’s Dilemma Ø 2 players: 1 and 2 Ø 𝐵 / = {silent, betray} for 𝑗 = 1,2 Ø An outcome can be, e.g., 𝑏 = (silent, silent) Ø 𝑣 3 𝑏 , 𝑣 H (𝑏) are pre-defined, e.g., 𝑣 3 silent, silent = −1 Ø The whole game is public knowledge; players take actions simultaneously • Equivalently, take actions without knowing the others’ actions 19
Dominant Strategy An action 𝑏 / is a dominant strategy for player 𝑗 if 𝑏 / is better than J ∈ 𝐵 / , regardless what actions other players take . any other action 𝑏 / Formally, J ≠ 𝑏 / and ∀𝑏 6/ 𝑣 / 𝑏 / , 𝑏 6/ ≥ 𝑣 / 𝑏 / ′, 𝑏 6/ , ∀𝑏 / Note: “strategy” is just another term for “action” Ø Betray is a dominant strategy for both Ø Dominant strategies do not always exist • For example, the traffic light game STOP GO STOP (-3, -2) (-3, 0) Prisoner’s Dilemma GO (0, -2) (-100, -100) 20
Equilibrium Ø An outcome 𝑏 ∗ is an equilibrium if no player has incentive to deviate unilaterally. More formally, ∗ , 𝑏 6/ ∗ ∗ 𝑣 / 𝑏 / ≥ 𝑣 / 𝑏 / , 𝑏 6/ , ∀𝑏 / ∈ 𝐵 / • A special case of Nash Equilibrium, a.k.a., pure strategy NE Ø If each player has a dominant strategy, they form an equilibrium Ø But, an equilibrium does not need to consist of dominant strategies B STOP GO STOP (-3, -2) (-3, 0) A GO (0, -2) (-100, -100) Traffic Light Game 21
Equilibrium Ø An outcome 𝑏 ∗ is an equilibrium if no player has incentive to deviate unilaterally. More formally, ∗ , 𝑏 6/ ∗ ∗ 𝑣 / 𝑏 / ≥ 𝑣 / 𝑏 / , 𝑏 6/ , ∀𝑏 / ∈ 𝐵 / • A special case of Nash Equilibrium, a.k.a., pure strategy NE Ø If each player has a dominant strategy, they form an equilibrium Ø But, an equilibrium does not need to consist of dominant strategies Pure strategy NE does not always exist… Rock Paper Scissor Rock (0, 0) (-1, 1) (1, -1) Paper (1, -1) (0, 0) (-1, 1) Scissor (-1, 1) (1, -1) (0, 0) 22
Outline Ø Games and its Basic Representation Ø Nash Equilibrium and its Computation Ø Other (More General) Classes of Games 23
Pure vs Mixed Strategy Ø Pure strategy: take an action deterministically Ø Mixed strategy: can randomize over actions • Described by a distribution 𝑦 / where 𝑦 / 𝑏 / = prob. of taking action 𝑏 / • |𝐵 / | -dimensional simplex Δ R S : = {𝑦 / : ∑ V S ∈R S 𝑦 / 𝑏 / = 1 , 𝑦 / 𝑏 / ≥ 0} contains all possible mixed strategies for player 𝑗 • Players draw their own actions independently Ø Given strategy profile 𝑦 = (𝑦 3 , ⋯ , 𝑦 4 ) , expected utility of 𝑗 is ∑ V∈R 𝑣 / 𝑏 ⋅ Π /∈ 4 𝑦 / (𝑏 / ) • Often denoted as 𝑣 𝑦 or 𝑣 𝑦 / , 𝑦 6/ or 𝑣 𝑦 3 , ⋯ , 𝑦 4 • When 𝑦 / corresponds to some pure strategy 𝑏 / , we also write 𝑣 𝑏 / , 𝑦 6/ • Fix 𝑦 6/ , 𝑣 𝑦 / , 𝑦 6/ is linear in 𝑦 / 24
Best Responses ∗ is called a best response to 𝑦 6/ if Fix any 𝑦 6/ , 𝑦 / ∗ , 𝑦 6/ 𝑣 / 𝑦 / ≥ 𝑣 / 𝑦 / , 𝑦 6/ , ∀ 𝑦 / ∈ Δ R S . Claim. There always exists a pure best response Proof: linear program “max 𝑣 / 𝑦 / , 𝑦 6/ subject to 𝑦 / ∈ Δ R S ” has a vertex optimal solution ∗ is a best response to 𝑦 6/ , then any 𝑏 / in the support of Remark: If 𝑦 / ∗ (i.e., 𝑦 / ∗ (𝑏 / ) > 0 ) must be equally good and are all pure best 𝑦 / responses 25
Nash Equilibrium (NE) A mixed strategy profile 𝑦 ∗ = (𝑦 3 ∗ , ⋯ , 𝑦 4 ∗ ) is a Nash equilibrium if ∗ , 𝑦 6/ ∗ ∗ 𝑣 / 𝑦 / ≥ 𝑣 / 𝑦 / , 𝑦 6/ , ∀ 𝑦 / ∈ Δ R S , ∀𝑗 ∈ 𝑜 . ∗ is a best response to 𝑦 6/ ∗ . That is, for any 𝑗 , 𝑦 / Remarks ∗ , 𝑦 6/ ∗ ∗ Ø An equivalent condition: 𝑣 / 𝑦 / ≥ 𝑣 / 𝑏 / , 𝑦 6/ , ∀ 𝑏 / ∈ 𝐵 / , ∀𝑗 ∈ 𝑜 • Since there always exists a pure best response Ø It is not clear yet that such a mixed strategy profile would exist • Recall that pure strategy Nash equilibrium may not exist 26
Recommend
More recommend