4.4. Evolutionary learning of Evolutionary Algorithms: � cooperative behavior: OLEMAS (I) Genetic Algorithms Denzinger and Fuchs (1996) Basic Idea: Use the biological model of evolution to improve solutions of problems OLEMAS: OffLine Evolution of Multi-Agent Systems 1. Generate an initial set of (not very good) solutions to Basic Problems tackled: your problem (initial population) n How can we specify tasks on a high and abstract 2. Repeat until an end condition is fulfilled: level and let the concrete problem solution be done by learning by the MAS? ● Generate out of actual population new solutions (using genetic operators), such that better n How can we use combined training of agents to have solutions in the population are used with higher them show cooperative behavior without needing probability (quality F fitness) much communication but relying on instinctive reactive behavior? ● Generate the next population out of the old and the new individuals � Multi-Agent Systems Jörg Denzinger Multi-Agent Systems Jörg Denzinger OLEMAS - � OLEMAS - � Basic Scenario: Pursuit Games (I) Basic Scenario: Pursuit Games (II) Several hunter agents have to catch one or several prey agents on a grid world. Multi-Agent Systems Jörg Denzinger Multi-Agent Systems Jörg Denzinger OLEMAS - � OLEMAS - � Basic Scenario: Pursuit Games (III) Basic Scenario: Pursuit Games (IV) Aspects leading to different variants: Discussion: n Structure and size of the grid n Different variants require rather different strategies of the hunters � n Form, possible actions, speed, observation abilities, F real need for learning cooperative behavior and communication abilities of the agents n We can have different ways to introduce random n Selection of preys and hunters, use of obstacles behavior (strategy of prey, random start positions) (bystanders) n Possibility to investigate co-evolution of hunters and n Strategy of the preys preys n Start situation n Can be used to simulate robot planning/controlling n Goal of the game problems n ... Multi-Agent Systems Jörg Denzinger Multi-Agent Systems Jörg Denzinger 1
OLEMAS - � OLEMAS - � Agent Architecture Learning algorithm (I) Use of prototypical situation-action pairs and the Evolutionary approach using GA: nearest-neighbor rule (see 2.2.1) Individuals (solution candidates): Situation: relative position of all other agents to agent For one agent we have a set of situation-action pairs plus type and orientation of other agents � (SAPs) and an individual has one such set for each (x 1 ,y 1 ,o 1 ,t 1 ,…,x n ,y n ,o n ,t n ) agent for which we have to learn a strategy. Why? Fitness of an individual: n Very expressive Basic idea : try out the strategy by doing a simulation of n Strategy can be easily changed by adding/modifying the behavior of the team. a pair If the simulation leads to success, then count the number of moves (rounds). Multi-Agent Systems Jörg Denzinger Multi-Agent Systems Jörg Denzinger OLEMAS - � OLEMAS - � Learning algorithm (II) Learning algorithm (III) If the simulation is unsuccessful, then compute for each Realizing the Genetic Operators: round the sum of the distances of all hunters to the We have a Crossover-Operator that randomly selects nearest prey agent and sum up this result over all SAPs for each agent from the strategies of this agent rounds. If there are random influences in the game, in two parent individuals. use the mean value of several simulations. Mutation can exchange an SAP in a strategy by a Therefore an individual is the better the lower its fitness randomly generated other SAP, it can just add a is. random SAP to a strategy and it can delete a random Note that this fitness measure is not without problems SAP from a strategy. (agents blocking each other do not contribute, neither do obstacles), but applicable for a wide range of problems. Multi-Agent Systems Jörg Denzinger Multi-Agent Systems Jörg Denzinger OLEMAS - � OLEMAS - � Learning algorithm (IV) Learning algorithm (V) Generating the initial population: Generating offspring: All SAPs in an individual are generated randomly. The Select the r percent best individuals ( “ elite selection ” ) number of SAPs in a strategy is also a random Select out of this pool always two individuals number between a pre-defined minimal and maximal randomly. value. Then perform a crossover and with a given probability Since the crucial steps in the hunt are coordinating you additionally perform a mutation. several hunters around a prey (i.e. all agents are close Next Generation: to each other), we modified the random number Add to the selected r percent of the old generation 100-r generator so that smaller numbers are generated with percent newly generated individuals. a higher probability. Multi-Agent Systems Jörg Denzinger Multi-Agent Systems Jörg Denzinger 2
Characterization of the learning Discussion (I) The behavior of all hunter agents is ✚ For many game variants requiring only small situation vectors very good strategies can be evolved, n learned offline, even if the variant contains random influences. n using one learner and ✚ The concept is rather general for reactive MAS n without a teacher. ✚ Resulting agents are simple and fast The learning is achieved by experiences gained in ✚ Only little problem specific knowledge is necessary: simulating the task to do and it involves random influences. ✚ defining the situation vector ✚ some components in the fitness function Multi-Agent Systems Jörg Denzinger Multi-Agent Systems Jörg Denzinger Evolutionary learning of cooperative Discussion (II) behavior: OLEMAS (II) Denzinger and Kordt (2000) - Bigger situation vectors result in enormous search OLEMAS: OnLine Evolution of Multi-Agent Systems spaces - Only measuring the team is rather coarse: � F good single agents or good SAPs can get lost due � Basic Problems tackled: to being in a bad individual n How can we use offline learning in online learning? - Nearly no problem specific knowledge was used � n How can we model other agents to use them in the F lot of space for improvement (fitness measure!) own decision making (more precisely in simulations - The GA has a lot of “ obscure ” parameters for which used by learning)? good values are difficult to predict without performing experiments Multi-Agent Systems Jörg Denzinger Multi-Agent Systems Jörg Denzinger OLEMAS - � OLEMAS - � Online Learning “ learn ” n Takes a fixed amount of rounds Agents act in a pursuit game (the real world) and have n Performed after at least min and at most max rounds to learn while trying to win the game n Each positive action lengthens the time to “ learn ” , Basic Ideas : each negative action shortens it n Use a special action “ learn ” n First uses current situation, models of other agents n Use models of other agents and duration of learn to predict situation after n As “ learn ” : use offline learning with shorter “ learn ” simulations, the models of the other agents, and the n Then performs offline learning with this situation as current situation as start situation start and a very limited length of the simulations n Best found strategy is combined with current strategy to get new strategy Multi-Agent Systems Jörg Denzinger Multi-Agent Systems Jörg Denzinger 3
OLEMAS - � Modeling other agents Characterization of the learning n Most accurate model: communicate and ask for The behavior of the individual hunter agents is strategy of other agent n learned online, n If not possible: � n each using its own learner (of same type) and Use observed situation action pairs in real world so n without a teacher. far as the SAPs of the other agent and assume it uses The learning is achieved by experiences gained in SAPs and the nearest-neighbor rule as agent model simulating the task to do and it involves random influences. Learning (by heart) is also used to model other agents. Multi-Agent Systems Jörg Denzinger Multi-Agent Systems Jörg Denzinger Discussion (I) Discussion (II) ✚ Can deal with much more complex game variants - Online learning can result in bad early decisions that cannot be unmade later and therefore will not allow ✚ Can automatically divide the problem into the team to win subproblems (or steps) and learns strategies for the current step only - What if other agents learn/change their behavior? ✚ Can deal with changing game scenarios - What if our models are not accurate? � ✚ Results in much smaller search spaces (due to limited simulation length and having to learn for one agent only) Multi-Agent Systems Jörg Denzinger Multi-Agent Systems Jörg Denzinger Evolutionary learning of cooperative OLEMAS - � behavior: OLEMAS (III) The new guy in an experienced team Denzinger and Ennis (2002) Basic idea : modify GA to make use of strategy of old agent while still allowing for enough flexibility to Basic Problems tackled: deal with new abilities. n How can we use strategies of agents in successful teams to help a new agent replacing an old one to fit Difference in abilities: into the team much quicker than learning from n different forms scratch (if the new agent has different abilities)? n different moves n different speeds Multi-Agent Systems Jörg Denzinger Multi-Agent Systems Jörg Denzinger 4
Recommend
More recommend