4.3. Reinforcement learning for forming coalitions: the DFG - PowerPoint PPT Presentation

4.3. Reinforcement learning for forming coalitions: the DFG algorithm Weiß (1995) DFG: Dissolution and Formation of Groups Basic Problems tackled: n How can several agents learn what actions they can perform in parallel? n How can several agents learn what sets of actions have to be executed sequentially? Multi-Agent Systems Jörg Denzinger

Reinforcement Learning (I) Watkins (1989), Sutton (1987) Let’s use our single agent definition: Then an agent Ag has in Dat for each pair (s,a) Sit ¥ Act an evaluation e(s,a). Its decision function then selects always the action a in a situation s, for which e(s,a) is optimal. Then learning is performed by getting a feedback after an action or action sequence and a learn function Q distributes the feedback among the evaluations. Multi-Agent Systems Jörg Denzinger

Reinforcement Learning (II) The interesting part of reinforcement learning (often also called Q-learning) is how the learn function Q is defined. There are many possibilities and an important point is especially how the distribution of feedback is done after action sequences. There are obvious similarities to learning in neural networks. The basic agent architecture resembles Markov processes and their theory is used for proving properties of Q-functions. From time to time random decisions have to be made to try out new situation action combinations F exploration Multi-Agent Systems Jörg Denzinger

The DFG Algorithm - Scenario (I) A set of organizations competes for furthering a given task. The general procedure is that for each occurring situation each organization is allowed to bid its next solution step and only the solution step of the best organization will be executed, thus generating the next situation. An organization itself consists of compatible agents and smaller organizations. In the following, we call these organizations and agents units. The units of a winning organization perform the actions that their decision functions suggest for the current situation. Multi-Agent Systems Jörg Denzinger

The DFG Algorithm - Scenario (II) This is the reason why the units have to be compatible, i.e. no action of one unit can prevent the action of another unit. In each organization there is one agent that is acting as leader and that computes the bids of the organization. It also receives the rewards (feedback) for the organization. It represents the whole organization. We want organizations to be dependent on situations! Multi-Agent Systems Jörg Denzinger

The DFG Algorithm - Examples for organizations flat: hierarchical: Multi-Agent Systems Jörg Denzinger

The DFG Algorithm - Rationale Obviously, for each situation we want to find the organization whose units perform all possible actions that can be performed in parallel and that also are sensible, i.e. they should further the problem solution process. The DFG algorithm tries to learn these organizations. Multi-Agent Systems Jörg Denzinger

The DFG Algorithm - The basic cycle The DFG algorithm learns by extending, dissolving and forming of organizations. Basic cycle: 1. Competition: Evaluation and selection of actions 2. Modification of evaluations: former and active organizations get rewarded 3. Development of organizations: Dissolving and forming of organizations Multi-Agent Systems Jörg Denzinger

Competition S j : actual situation U i : organization that could act in actual situation j = (a + b) ¥ E i j : bid of U i for S j , where B i a: learn factor b: random factor E i j : evaluation of the combined actions of U i for S j so far Multi-Agent Systems Jörg Denzinger

Modification of evaluations Let U i be the organization winning in situation S j and U k the winning organization that led from situation S l to S j Modify the evaluations as follows: E i j = E i j - a ¥ E i j + R extern l = E k l + a ¥ E i j E k Where R extern is the extern feedback provided by the environment. F this stabilizes successful action sequences and destabilizes unsuccessful sequences Multi-Agent Systems Jörg Denzinger

Development of organizations (I) n After starting the system and as long as the evaluation of a unit is increasing, there is no need to look for alternative organizations, i.e. no extensions, no defects. n An interest in alternative organizations starts, when the evaluation of a unit decreases or stagnates. In order to find this out, the leader (or the agent itself) computes a moving mean value of the last n modifications of the evaluation of the unit. Multi-Agent Systems Jörg Denzinger

Development of organizations (II) n Organizations interested in alternatives form a new (combined) organization, if the modification mean value gets smaller than the evaluation before n+1 modifications (multiplied by a so-called formation factor). First the unit with the highest evaluation selects one cooperation partner, namely the compatible unit with the highest evaluation, then among the remaining ones this is repeated until all units found a new partner or there are no compatible units left anymore. Multi-Agent Systems Jörg Denzinger

Development of organizations (III) n An organization is dissolved by its leader, if the mean value of its evaluation falls below its initial evaluation (from when it was formed) multiplied by a so-called dissolution factor. n Whenever a unit has to bid the first time for its situation, it uses a predefined value E init Multi-Agent Systems Jörg Denzinger

Characterization of the DFG algorithm Each unit permanently does n online learning n with a teacher who specifies the quality of its behavior. The learning is achieved by making experiences. Multi-Agent Systems Jörg Denzinger

Discussion : Good solution to problem scenario : Rather fine tuning of organizations to situations possible - Only sensible for a small Sit and a small Mact - In order to allow for learning, the same situations have to occur very often - Big administrative overhead in agents Multi-Agent Systems Jörg Denzinger

4.3. Reinforcement learning for forming coalitions: the DFG - PowerPoint PPT Presentation

4.3. Reinforcement learning for forming coalitions: the DFG algorithm Wei (1995) DFG: Dissolution and Formation of Groups Basic Problems tackled: n How can several agents learn what actions they can perform in parallel? n How can several

1 The DFG Algorithm - The DFG Algorithm - Rationale The basic cycle Obviously, for each

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Tissue Properties and Manufacturing Forming and TAD Fabrics Peter McCabe Tissue Business Leader

Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING DFG

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Coalitions Powerpoint Presentation Billings Area 2011 1 In Level 1 we talked about STARTING

CM30174: Intelligent Agents Marina De Vos, Julian Padget Coalitions / version 0.3 November 9,

The German Research Foundation (DFG) and Funding Programs for International Collaboration

Reinforcement Learning CS 294-112: Deep Reinforcement Learning Sergey Levine Class Notes 1.

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

Introduction to Reinforcement Learning Kevin Chen and Zack Khan Lecture 1: Introduction to

Complex manifolds of dimension 1 lecture 3 Misha Verbitsky IMPA, sala 232 January 10, 2020 1

Paper review 1 Privacy is a Process, not a PET A Theory for Effective Privacy Practice

Public view of computers in the 1970s in Sweden 1) Impowerish work tasks, 2) Be tools for

Understanding and Influencing Parliament Gary Hart Parliamentary Outreach www.parliament.uk

Social and political activism Permacultura Cantabria - Spain Governments should take the necessary

Reordering Philipp Koehn 5 March 2015 Philipp Koehn Machine Translation: Reordering 5 March

Monte Carlo Generators for International Linear Collider Physics Stefano Moretti NExT Institute

Tools for random generation in safe Petri nets GASCOM 2016 Samy Abbes (Universit e Paris