cooperative multiagent patrolling for detecting multiple
play

Cooperative Multiagent Patrolling for Detecting Multiple Illegal - PowerPoint PPT Presentation

Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion Cooperative Multiagent Patrolling for Detecting Multiple Illegal Actions Under Uncertainty Best Paper ICTAI 2016 Aur elie Beynier 19 juin 2017 1 / 26


  1. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion Cooperative Multiagent Patrolling for Detecting Multiple Illegal Actions Under Uncertainty Best Paper ICTAI 2016 Aur´ elie Beynier 19 juin 2017 1 / 26

  2. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion Context • Multiagent patrolling in adversarial domains • Patrollers have to cooperate to patrol a set of sites and prevent illegal actions (poaching, illegal fishing, intrusion on a frontier) from multiple adversaries • Patrollers have partial observability of the system • Uncertainty on action outcomes • A priory unknown dynamic strategies of the adversaries 2 / 26

  3. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion Problem Setting • m heterogeneous defenders (agents i with i ∈ [1 , m ]). • n target sites t j to visit (with j ∈ [1 , n ]). • An unknown number of adversaries trying to perform illegal actions on target sites. • The environment toplogy is represented as a graph G = ( N , E ) with N = { t 1 , · · · , t n } and E is the set of possible routes between the targets. • Uncertainty on move duration : each edge e = ( t k , t j ) ∈ E is assigned a probability distribution C k , j on possible travel durations. • Each patrolling agent has limited observability of the system: observes her own location and adversaries on the current target site. 3 / 26

  4. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion Related works Security Games [PPT + 07, BGA09, KJT + 09, ABVT13] • The adversary is able to perform extensive surveillance and obtains full knowledge of the patrolling strategy • The adversary conducts a one-shot attack. • No effective cooperation between the patrollers except in [SJY + 14] (but a single and fully rational adversary). Resource conservation games [FST15, NSG + 16, QHJT14] • Multiple intruders performing illegal actions. • Objective: maximizing the number of detected illegal actions. • Limited observability of the intruders. • No effective cooperation between the patrollers. 4 / 26

  5. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion Issues Each patrolling agent • must decide in an autonomous but cooperative way which target to visit at each decision step, • must deal with the uncertainty on action outcomes, • partially observe the state of the system (illegal actions performed and states of the other agents), • has no knowledge a priori on the adversaries strategy, • must face evolving strategies of the adversary . 5 / 26

  6. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion Overview of the approach Main components • Context definition from observations on the adversaries • DEC-POMDP formalization of the decision problem based on the current context • Online policy computation • Online detection of context changes DEC- Patrolling Context Patrolling POMDP strategy Definition strategy formal- compu- (PI) execution ization tation 6 / 26

  7. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion Adversary model • Detected illegal actions are the only information about the adversaries • We define the probability PI i ( t ) that the adversaries perform an illegal action on site t i at t → the current context • Let NI i ( t − H , t ) be the number of detected adversaries on target t i (defined for all t i in N ) between t − H and t . • Each agent estimates PI i ( t ) using the following equation: NI i ( t − H , t ) PI i ( t ) = (1) � t k ∈N NI k ( t − H , t ) 7 / 26

  8. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion DEC-POMDP background DECentralized Partially Observable Markov Decision Process [BZI02] • A g = { 1 , ..., m } is a set of m agents, • S is the set of world states, • A = { A 1 × · · · × A m } is the set of possible joint actions a = { a 1 , ..., a m } • T is the transition function giving the probability T ( s ′ | s , a ) that the system moves to s ′ while executing a from s , • O = { O 1 × · · · × O m } is the set of joint observations o = { o 1 , ..., o m } , • Ω is the observation function giving the probability Ω( o | s , a ) of observing o when executing a from s , • R ( s ′ | s , a ) is the reward obtained when executing action a from s and moving to s ′ . 8 / 26

  9. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion DEC-POMDP for multiagent patrolling • A state s t at time t is defined as: the position of each agent, the list of targets where an illegal action has been currently observed, the idleness of each target, the elapsed time of each current move. • An individual action a i consists in moving to target t j ( t j ∈ N ). • Transition probabilities are defined from probabilities on move durations and from probabilities on detection of illegal actions. • Each agent observes her current position and illegal actions on the currently patrolled target. • The observation function is deterministic. • The reward function is defined in order to reward detected illegal actions. 9 / 26

  10. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion DEC-POMDP for multiagent patrolling Transition function • The probability that an agent reaches her target is defined from probability distributions C k , j . • Probabilities on the detection of illegal actions are estimated using the current context PI . Let w j be the probability of observing an illegal action on t j at t : min (∆ int , idle j ) � w j ( t ) = P ( I j ( t − x )) w =0 where I j ( t − x ) denotes the event “an illegal action is initiated at t − x on t j ” and is derived from PI 10 / 26

  11. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion DEC-POMDP solving: background • Solving a DEC-POMDP consists in computing a joint policy π = � π 1 , · · · , π n � where π i is the individual policy of the agent i . • Individual policies are coordinated and takes into account uncertainty and partial observability • The joint policy is usually computed off-line • The joint policy is then executed in a distributed way. 11 / 26

  12. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion DEC-POMDP solving: background 12 / 26

  13. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion DEC-POMDP solving: background Observations - Actions history An history of observations - actions for an agent is the sequence of observations and actions made by the agent along the execution: ¯ i = ( o 1 i , a 1 i , o 2 i , a 2 θ t i , · · · , o t i , a t i ) Observations history If the policy is deterministic, it can be summarize by the sequence of observations: ¯ i = ( o 1 i , o 2 θ t i , · · · , o t i ) 13 / 26

  14. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion DEC-POMDP solving For a given context, the DEC-POMDP can be optimally solved by existing algorithms in a centralized way. But: • Poor scalability because of the high complexity : optimally solving a DEC-POMDP is NEXP-complete [BZI02]. • Centralized computation should be avoided. • The policy has to be frequently updated . 14 / 26

  15. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion DEC-POMDP solving We propose an evolutionary algorithm (adapted from 1+1 algorithm) to optimize the patrolling strategy over a finite horizon T : • it can be executed in a decentralized way, • it allows the agents to exploit strategies of the previous context to compute new strategies, • it is anytime and scales well but no guarantee on the quality of the solutions. 15 / 26

  16. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion DEC-POMDP solving champion = RamdomIndividual() championValue = Evaluate(champion) while deadline non reached do challenger = Mutation(champion) challengerValue = Evaluate(challenger) if challengerValue > championValue then champion = challenger championValue = challengerValue end if end while 16 / 26

  17. Cooperative multiagent patrolling DEC-POMDPs with context Experiments Conclusion Limiting Communication • Patrolling agents need to communicate their observations about detected illegal actions . • Allows the agents to deduce the new context but risky and resource-consuming. • We propose to measure the relevance of an information : only relevant information is communicated. • The relevance measures the distance (Kullback-Leibler divergence) between the current probability distribution PI and the new one PI ′ obtained by considering the new information P ( i ) log P ( i ) Q ( i ) log Q ( i ) � � D ( P , Q ) = Q ( i ) + P ( i ) i i . 17 / 26

Recommend


More recommend