strategic inference with a single private sample
play

Strategic Inference with a Single Private Sample Erik Miehling, Roy - PowerPoint PPT Presentation

Strategic Inference with a Single Private Sample Erik Miehling, Roy Dong, Cdric Langbort, and Tamer Ba ar CDC 2019 December 11, 2019 General Setting learning phase game We are interested in se tu ings of strategic interaction where one or


  1. Strategic Inference with a Single Private Sample Erik Miehling, Roy Dong, Cédric Langbort, and Tamer Ba ş ar CDC 2019 December 11, 2019

  2. General Setting learning phase game We are interested in se tu ings of strategic interaction where one or more • players have the opportunity to extract information from the environment before a game is played Learning actions are observed but the outcomes of the learning actions • are not (they are private to the learner) Player’s then play a game based on their subjective beliefs • 2

  3. Reconnaissance in Cyber Security Motivating (physical) security example… • Consider an a tu acker visiting multiple locations to determine where to • launch to a tu ack. Ti e defender observes where the a tu acker visited, but does not know what information the a tu acker obtained Ti e a tu acker/defender then simultaneously choose which target to • a tu ack/defend In the above se tu ing, one player is acting while the other is observing. • Since learning actions are observed, one must consider the e ff ect of an agent’s learning decision on the belief of the other agent Fundamental problem in all multi-agent decision environments (known • as signaling in the team/game theory literature) 3

  4. Learning in Multi-agent Settings Increasing focus on the intersection of learning and game theory • Recent workshop at EC 2019, “Learning in the presence of strategic • behavior” Learning from data that is produced by agents who have vested • interest in the outcome or the learning process Learning a model for the strategic behavior of one or more agents • by observing their interactions Learning as a model of interactions between agents • Interactions between multiple learners • 4

  5. Related Work Strategic experimentation: • [Bolton & Harris, ’99; Rosenberg et al. , ’07; Heidhues et al. , ’15] • Incentivizing exploration: • [Mansour et al. , ’16; Slivkins, ’17; Chen et al., ’18] • We study a simple se tu ing in which the learning agent receives a single • sample of private information from a distribution privately known by another (observing) agent Objective of our work : to understand how the learning agent’s private • information in fl uences the observing agent’s inference process 5

  6. Ti e Game Model — Payo ff s Two players: a tu acker (A) and defender (D) • Each player has two actions, , • corresponding to targets attacker defender A tu acker’s payo ff for choosing target is stochastic; • a tu acker receives reward with probability A tu acker’s payo ff of target is certain; a tu acker • receives reward where Ti e a tu acker wishes to choose a di ff erent target than the defender, • whereas the defender wishes to choose the same target as the a tu acker If the same target is chosen, the a tu acker incurs a capture cost • 6

  7. Ti e Game Model — Information Defender knows the true value of but the a tu acker only possesses a • prior distribution Ti e prior is further assumed to be common knowledge between the • a tu acker and defender Before targets are selected, the a tu acker receives • a single private sample , denoted , from the uncertain target and forms its posterior attacker defender Ti e defender does not see this sample and thus • sample, does not know the a tu acker’s posterior 7

  8. Ti e Game Model In summary, the (subjective) payo ff s for the a tu acker and costs for the • defender are as follows…. atta cl er’s reward, defender’s cost, 8

  9. Best Response Functions Ti e game is a static game of incomplete information , and thus we seek • to fi nd Bayesian Nash equilibria Players’ strategies, denoted by and , represent the • probability that the player will choose target given their type Ti e best response functions of the players are given by • where is the type of the a tu acker (the private sample) and is the type of the defender (the distribution parameter) 9

  10. Best Response Function — Atta cl er where thus 10

  11. Best Response Functions — Defender where thus 11

  12. Equilibrium ci aracterization Ti eorem — Ti e game has at most one pure strategy saddle-point equilibrium, characterized by the following three disjoint regions: 1) for all if 2) for all if 3) , , and if 12

  13. Equilibrium discussion — Region 1 Ti roughout the discussion, we assume that , speci fi cally let • Region 1: Always risky • Both players choose target independent • of their private information, for all if 13

  14. Equilibrium discussion — Region 2 Region 2: Always safe • Both players choose target independent of their private information, • that is, for all if 14

  15. Equilibrium discussion — Region 3 Region 3: Information dependent • Ti e equilibrium strategies of the players depend on their private • information A tu acker: , (sample dependent) • Defender: (distribution dependent) • 15

  16. Equilibrium discussion First investigate the conditions for and • which is given by the following condition (i) is high relative to so target looks su ffi ciently desirable (ii) Ti e a tu acker believes it to be very likely that (due to a higher ) and therefore would get caught should it choose . Ti e capture cost deters the a tu acker from choosing . (iii) & (iv) are analogous 16

  17. Equilibrium discussion Ti e third equilibrium region is formed as the intersection of two regions • Ti e a tu acker chooses if it receives a good • sample, ; defender defends the be tu er target Ti e a tu acker chooses if it receives a good • sample, ; defender defends the be tu er target 17

  18. Equilibrium discussion Taking the intersection… • 18

  19. Sensitivity Analysis We investigate the equilibrium regions as the capture cost increases • As the capture cost increases, the a tu acker requires a higher level of • certainty that the risky target will generate All pure strategy equilibrium regions vanish when • 19

  20. Concluding Remarks Motivated by cyber security se tu ings, we have introduced a simple • asymmetric information game model for describing the in fl uence of a learner’s (the a tu acker) private information on the inference process of an observing agent (the defender) Ti e subsequent game admits at most one pure strategy equilibrium • which, depending on the parameters of the game, takes di ff erent forms: Two of the equilibrium regions in which the players ignore their • private information; an intermediate region in which the players use their private information In the intermediate region, the a tu acker always follows its private • sample Thank you! 20

Recommend


More recommend