CMU 15-896 Noncooperative games 4: Stackelberg games Teacher: Ariel Procaccia
A curious game • Playing up is a dominant strategy for row player 1,1 3,0 • So column player would play left • Therefore, is the 0,0 2,1 only Nash equilibrium outcome 15896 Spring 2016: Lecture 20 2
Commitment is good • Suppose the game is played as follows: Row player commits to 1,1 3,0 o playing a row Column player observes the o commitment and chooses 0,0 2,1 column • Row player can commit to playing down! 15896 Spring 2016: Lecture 20 3
Commitment to mixed strategy • By committing to a 0 1 mixed strategy, row player can guarantee a .49 1,1 3,0 reward of 2.5 • Called a Stackelberg .51 0,0 2,1 (mixed) strategy 15896 Spring 2016: Lecture 20 4
Computing Stackelberg • Theorem [Conitzer and Sandholm 2006] : In 2-player normal form games, an optimal Stackelberg strategy can be found in poly time • Theorem [ditto]: the problem is NP-hard when the number of players is 3 15896 Spring 2016: Lecture 20 5
Tractability: 2 players • For each pure follower strategy � , we compute via the LP below a strategy � for the leader such that Playing � � is a best response for the follower o Under this constraint, � � is optimal o ∗ that maximizes leader value • Choose � max ∑ � � � � � � �� � , � � � � � ∈� � ∈ �, � ∑ � � � � � � � � , � � � ∑ � � � � � � � � , � � ∀� � s.t. � � ∈� � � ∈� ∑ � � � � � 1 � � ∈� ∀� � ∈ �, � � � � ∈ �0,1� 15896 Spring 2016: Lecture 20 6
Application: security • Airport security: deployed at LAX • Federal Air Marshals • Coast Guard • Idea: Defender commits to o mixed strategy Attacker observes and o best responds 15896 Spring 2016: Lecture 20 7
security games • Set of targets targets • Set of security resources resources available to the defender (leader) � • Set of schedules • Resource can be assigned to one of the schedules in • Attacker chooses one target to attack 15896 Spring 2016: Lecture 20 8
security games • For each target , there are four targets � � numbers: � , and � � � resources � � • Let be the � � vector of coverage probabilities • The utilities to the defender/attacker under c if target is attacked are � � � � � � � � � � � � � � 15896 Spring 2016: Lecture 20 9
This is a 2-player Stackelberg game. Can we compute an optimal strategy for the defender in polynomial time? 15896 Spring 2016: Lecture 20 10
Solving security games • Consider the case of , i.e., resources are assigned to individual targets, i.e., schedules have size 1 • Nevertheless, number of leader strategies is exponential • Theorem [Korzhyk et al. 2010]: Optimal leader strategy can be computed in poly time 15896 Spring 2016: Lecture 20 11
A compact LP • LP formulation similar to previous max � � � ∗ , � one ∀� ∈ Ω, ∀� ∈ � � , 0 � � �,� � 1 s.t. • Advantage: ∀� ∈ �, � � � � � �,� � 1 logarithmic in �∈�:�∈� � #leader strategies ∀� ∈ Ω, � � �,� � 1 • Problem: do �∈� � ∀� ∈ �, � � �, � � � � �� ∗ , �� probabilities correspond to strategy? 15896 Spring 2016: Lecture 20 12
� � � � 0.7 � � � � � � � � � � 0.2 � � 0.7 0.2 0.1 � � � � 0.1 0.3 � � � � � � 0 0.3 0.7 � � � � 0.7 � � � � � � � � � � � � � � � � � � � � � � � � � � 0 0 1 0 1 0 1 0 0 1 0 0 � � 0 1 0 0 0 1 0 1 0 0 0 1 � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 15896 Spring 2016: Lecture 20 13
Fixing the probabilities Theorem [Birkhoff-von Neumann]: Consider an � � � matrix � • with real numbers � �� ∈ �0,1� , such that for each � , ∑ � �� � 1 , � and for each � , ∑ � �� � 1 �� is kinda doubly stochastic). Then � there exist matrices � � , … , � � and weights � � , … , � � such that: ∑ � � � 1 1. � ∑ � � � � � � 2. � For each � , � � is kinda doubly stochastic and its elements are 3. in �0,1� The probabilities � �,� satisfy theorem’s conditions • By 3, each � � is a deterministic strategy • By 1, we get a mixed strategy • By 2, gives right probs • 15896 Spring 2016: Lecture 20 14
Generalizing? • What about schedules of size 2? • Air Marshals domain has such schedules: 0.5 0.5 outgoing+incoming flight 0.5 (bipartite graph) 0.5 • Previous apporoach fails • Theorem [Korzhyk et al. 2010]: problem is NP-hard 15896 Spring 2016: Lecture 20 15
15896 Spring 2016: Lecture 20 16
Criticisms • Problematic assumptions: The attacker exactly observes the defender’s 1. mixed strategy The defender knows the attacker’s utility 2. function The attacker behaves in a perfectly rational 3. way • We will focus on relaxing assumption #1 15896 Spring 2016: Lecture 20 17
Limited surveillance • Let us compare two worlds: Status quo: The defender optimizes against 1. an attacker with unlimited observations (i.e., complete knowledge of the defender’s strategy), but the attacker actually has only observations Ideal: The defender optimizes against an 2. attacker with observations, and, miraculously, the attacker indeed has exactly observations 15896 Spring 2016: Lecture 20 18
Limited surveillance • Theorem [Blum et al. 2014]: Assume that utilities are normalized to be in . For any , there is a zero-sum security game such that the difference between worlds and is �� • Lemma: If � , there exists � such that: � �� ∀�, � � � |�|/2 1. Each � ∈ � is in exactly � members of � 2. If � � ⊂ � and � � � � then ⋃� � � � 3. � � 2 15896 Spring 2016: Lecture 20 19
Proof of theorem resources, each can defend any • �� �� targets, � , targets � • For any target , zero-sum utilities with � � and � � • Poll: The optimal strategy (in the status quo world) defends each target with probability roughly…? 15896 Spring 2016: Lecture 20 20
Proof of theorem • Next we define a much better strategy against an attacker with � observations �� • � � subset of targets 1, … , ⊆ � � • Define �� � , … � �� � as in the lemma • Pure strategy � � covers � � ; this is valid because � � � � /2 � �� (by property 1) • Let � ∗ be the uniform distribution over � � , … , � �� • By property 2, � ∗ covers each target in � with probability ½ • By property 3, � observations from � ∗ would show some target in � never being covered; that target is attacked ∎ 15896 Spring 2016: Lecture 20 21
Limited surveillance • Theorem [Blum et al. 2014]: For any zero- sum security game with targets, resources, and a set of schedules with max coverage , and for any observations, the difference between the two worlds is at most 15896 Spring 2016: Lecture 20 22
Recommend
More recommend