Strategic Inference with a Single Private Sample Erik Miehling, Roy Dong, Cédric Langbort, and Tamer Ba ş ar CDC 2019 December 11, 2019
General Setting learning phase game We are interested in se tu ings of strategic interaction where one or more • players have the opportunity to extract information from the environment before a game is played Learning actions are observed but the outcomes of the learning actions • are not (they are private to the learner) Player’s then play a game based on their subjective beliefs • 2
Reconnaissance in Cyber Security Motivating (physical) security example… • Consider an a tu acker visiting multiple locations to determine where to • launch to a tu ack. Ti e defender observes where the a tu acker visited, but does not know what information the a tu acker obtained Ti e a tu acker/defender then simultaneously choose which target to • a tu ack/defend In the above se tu ing, one player is acting while the other is observing. • Since learning actions are observed, one must consider the e ff ect of an agent’s learning decision on the belief of the other agent Fundamental problem in all multi-agent decision environments (known • as signaling in the team/game theory literature) 3
Learning in Multi-agent Settings Increasing focus on the intersection of learning and game theory • Recent workshop at EC 2019, “Learning in the presence of strategic • behavior” Learning from data that is produced by agents who have vested • interest in the outcome or the learning process Learning a model for the strategic behavior of one or more agents • by observing their interactions Learning as a model of interactions between agents • Interactions between multiple learners • 4
Related Work Strategic experimentation: • [Bolton & Harris, ’99; Rosenberg et al. , ’07; Heidhues et al. , ’15] • Incentivizing exploration: • [Mansour et al. , ’16; Slivkins, ’17; Chen et al., ’18] • We study a simple se tu ing in which the learning agent receives a single • sample of private information from a distribution privately known by another (observing) agent Objective of our work : to understand how the learning agent’s private • information in fl uences the observing agent’s inference process 5
Ti e Game Model — Payo ff s Two players: a tu acker (A) and defender (D) • Each player has two actions, , • corresponding to targets attacker defender A tu acker’s payo ff for choosing target is stochastic; • a tu acker receives reward with probability A tu acker’s payo ff of target is certain; a tu acker • receives reward where Ti e a tu acker wishes to choose a di ff erent target than the defender, • whereas the defender wishes to choose the same target as the a tu acker If the same target is chosen, the a tu acker incurs a capture cost • 6
Ti e Game Model — Information Defender knows the true value of but the a tu acker only possesses a • prior distribution Ti e prior is further assumed to be common knowledge between the • a tu acker and defender Before targets are selected, the a tu acker receives • a single private sample , denoted , from the uncertain target and forms its posterior attacker defender Ti e defender does not see this sample and thus • sample, does not know the a tu acker’s posterior 7
Ti e Game Model In summary, the (subjective) payo ff s for the a tu acker and costs for the • defender are as follows…. atta cl er’s reward, defender’s cost, 8
Best Response Functions Ti e game is a static game of incomplete information , and thus we seek • to fi nd Bayesian Nash equilibria Players’ strategies, denoted by and , represent the • probability that the player will choose target given their type Ti e best response functions of the players are given by • where is the type of the a tu acker (the private sample) and is the type of the defender (the distribution parameter) 9
Best Response Function — Atta cl er where thus 10
Best Response Functions — Defender where thus 11
Equilibrium ci aracterization Ti eorem — Ti e game has at most one pure strategy saddle-point equilibrium, characterized by the following three disjoint regions: 1) for all if 2) for all if 3) , , and if 12
Equilibrium discussion — Region 1 Ti roughout the discussion, we assume that , speci fi cally let • Region 1: Always risky • Both players choose target independent • of their private information, for all if 13
Equilibrium discussion — Region 2 Region 2: Always safe • Both players choose target independent of their private information, • that is, for all if 14
Equilibrium discussion — Region 3 Region 3: Information dependent • Ti e equilibrium strategies of the players depend on their private • information A tu acker: , (sample dependent) • Defender: (distribution dependent) • 15
Equilibrium discussion First investigate the conditions for and • which is given by the following condition (i) is high relative to so target looks su ffi ciently desirable (ii) Ti e a tu acker believes it to be very likely that (due to a higher ) and therefore would get caught should it choose . Ti e capture cost deters the a tu acker from choosing . (iii) & (iv) are analogous 16
Equilibrium discussion Ti e third equilibrium region is formed as the intersection of two regions • Ti e a tu acker chooses if it receives a good • sample, ; defender defends the be tu er target Ti e a tu acker chooses if it receives a good • sample, ; defender defends the be tu er target 17
Equilibrium discussion Taking the intersection… • 18
Sensitivity Analysis We investigate the equilibrium regions as the capture cost increases • As the capture cost increases, the a tu acker requires a higher level of • certainty that the risky target will generate All pure strategy equilibrium regions vanish when • 19
Concluding Remarks Motivated by cyber security se tu ings, we have introduced a simple • asymmetric information game model for describing the in fl uence of a learner’s (the a tu acker) private information on the inference process of an observing agent (the defender) Ti e subsequent game admits at most one pure strategy equilibrium • which, depending on the parameters of the game, takes di ff erent forms: Two of the equilibrium regions in which the players ignore their • private information; an intermediate region in which the players use their private information In the intermediate region, the a tu acker always follows its private • sample Thank you! 20
Recommend
More recommend