Multi-agent reinforcement learning for new generation control - PowerPoint PPT Presentation

Multi-agent reinforcement learning for new generation control systems Manuel Graña 1 , 2 ; Borja Fernandez-Gauna 2 1 ENGINE centre, Wroclaw Technological University; 2 Computational Intelligence Group (www.ehu.eus/ccwintco) University of the Basque Country (UPV/EHU) IDEAL, 2015 M Graña et al. (ENGINE-WrTU) MARL for new generation control systems IDEAL 2015 1 / 92

Overall view of the talk • Comment on Reinforcement Learning and Multi-Agent Reinforcement Learning • Not a tutorial • Our own contributions in the last times (mostly Borja’s) • improvements on RL avoiding traps • a “new” coordination mechanism in MARL : D-RR-QL • A glimpse on a promising avenue of research in MARL M Graña et al. (ENGINE-WrTU) MARL for new generation control systems IDEAL 2015 2 / 92

Contents Introduction Reinforcement Learning Single-Agent RL State-Action Vetoes Undesired State-Action Prediction Transfer Learning Continuous action and state spaces MARL-based control Multi-Agent RL (MARL) Distributed Value Functions Distributed Round-Robin Q-Learning (D-RR-QL) Ideas for future research Conclusions M Graña et al. (ENGINE-WrTU) MARL for new generation control systems IDEAL 2015 3 / 92

Introduction Contents Introduction Reinforcement Learning Single-Agent RL State-Action Vetoes Undesired State-Action Prediction Transfer Learning Continuous action and state spaces MARL-based control Multi-Agent RL (MARL) Distributed Value Functions Distributed Round-Robin Q-Learning (D-RR-QL) Ideas for future research Conclusions M Graña et al. (ENGINE-WrTU) MARL for new generation control systems IDEAL 2015 4 / 92

Introduction Motivation • Goals of innovation in control systems: • attain an acceptable control system • when system’s dynamics are not fully understood or precisely modeled • when training feedback is sparse or minimal • autonomous learning • adaptability to changing environments • distributed controllers robust to component failures • large multicomponent systems • Minimal human designer input M Graña et al. (ENGINE-WrTU) MARL for new generation control systems IDEAL 2015 5 / 92

Introduction Example • Multi-robot transportation of a hose • non-linear dyamical strong interactions trough an elastic deformable link • hard constraints: • robots could drive over the hose, overstretch it, collide, ... • sources of uncertainty: hose position, hose weight and intrinsic forces (elasticity) M Graña et al. (ENGINE-WrTU) MARL for new generation control systems IDEAL 2015 6 / 92

Introduction Reinforcement Learning for controller design • Reinforcement Learning • agent-environment interaction • learning action policies from rewards • time delayed rewards • almost unsupervised learning • Advantages: • Designer does not specify (input, output) training samples • rewards are positive upon reaching the task completion • Model free • Autonomous adaptation to slowly changing conditions • exploitation vs. exploration dilemma M Graña et al. (ENGINE-WrTU) MARL for new generation control systems IDEAL 2015 7 / 92

Reinforcement Learning Contents Introduction Reinforcement Learning Single-Agent RL State-Action Vetoes Undesired State-Action Prediction Transfer Learning Continuous action and state spaces MARL-based control Multi-Agent RL (MARL) Distributed Value Functions Distributed Round-Robin Q-Learning (D-RR-QL) Ideas for future research Conclusions M Graña et al. (ENGINE-WrTU) MARL for new generation control systems IDEAL 2015 8 / 92

Reinforcement Learning Single-Agent RL Contents Introduction Reinforcement Learning Single-Agent RL State-Action Vetoes Undesired State-Action Prediction Transfer Learning Continuous action and state spaces MARL-based control Multi-Agent RL (MARL) Distributed Value Functions Distributed Round-Robin Q-Learning (D-RR-QL) Ideas for future research Conclusions M Graña et al. (ENGINE-WrTU) MARL for new generation control systems IDEAL 2015 9 / 92

Reinforcement Learning Single-Agent RL Markov Decision Process (MDP) • Single-agent environment interaction modeled as Markov Decision Processes h S , A , P , R i • S : the set of states the system can have • A : the set of actions from which the agent can choose • P : the transition function • R : the reward function M Graña et al. (ENGINE-WrTU) MARL for new generation control systems IDEAL 2015 10 / 92

Reinforcement Learning Single-Agent RL Single-agent approach • The simplest approach to the multirobot hose transportation: • a unique central agent learning how to control all robots M Graña et al. (ENGINE-WrTU) MARL for new generation control systems IDEAL 2015 11 / 92

Reinforcement Learning Single-Agent RL The set of states: S • Simple state model • S is a set of discrete states • State: discretized spatial position of the two robots. e.g.: h ( 2 , 2 ) , ( 4 , 4 ) i . • In a 5 ⇥ 4 grid, total amount of 20 2 states M Graña et al. (ENGINE-WrTU) MARL for new generation control systems IDEAL 2015 12 / 92

Reinforcement Learning Single-Agent RL Single-Agent MDP Observation Single-Agent MDP can deal with multicomponent systems • State space is the product space of component state spaces • Action space is the space of joint actions • Dynamics of all components are pull together • Reward is system global • Equivalent to a centralized monolithic controller M Graña et al. (ENGINE-WrTU) MARL for new generation control systems IDEAL 2015 13 / 92

Reinforcement Learning Single-Agent RL The set of actions: A • Discrete set of actions for each robot: • A 1 = { up 1 , down 1 , left 1 , right 1 } • A 2 = { up 2 , down 2 , left 2 , right 2 } • If we want the agent to move both robots at the same time, the set of joint-actions is A = A 1 ⇥ A 2 : • A = { up 1 / up 2 , up 1 / down 2 ,..., down 1 / up 2 , down 1 / down 2 ,... } • 16 di ff erent joint-actions M Graña et al. (ENGINE-WrTU) MARL for new generation control systems IDEAL 2015 14 / 92

Reinforcement Learning Single-Agent RL The transition function: P • Defines the state transitions induced by action execution • Deterministic (state-action mapping): P : S , A ! S ; • s 0 = P ( s , a ) s 0 observed after a is executed in s . • Stochastic (probability distribution): P : S , A , S ! [ 0 , 1 ] • p ( s 0 | s , a ) probability of observing s 0 after a is executed in s . M Graña et al. (ENGINE-WrTU) MARL for new generation control systems IDEAL 2015 15 / 92

Reinforcement Learning Single-Agent RL The reward function: R • This function returns the environment’s evaluation of either • the last agent’s decision: i.e. action executed R : S ⇥ A ! R • state reached: R : S ! R • It is the objective function to be maximized • given by the system designer • A reward function for our hose transportation task: ( 1 if s = Goal R ( s ) 0 otherwise M Graña et al. (ENGINE-WrTU) MARL for new generation control systems IDEAL 2015 16 / 92

Reinforcement Learning Single-Agent RL Learning • The goal of the agent is to learn a policy π ( s ) that maximizes the accumulated expected rewards • Each time-step: • The agent observes the state s • Applying policy π , it chooses and executes action a • A new state s 0 is observed and reward r is received by the agent • The agent “learns” by updating the estimation of the value of states and actions M Graña et al. (ENGINE-WrTU) MARL for new generation control systems IDEAL 2015 17 / 92

Reinforcement Learning Single-Agent RL Q-Learning • State value function : expected rewards from state s following policy π ( s ) : ( ) ∞ V π ( s ) = E π ∑ γ t r t | s = s t t = 0 • discount parameter γ • weight higher immediate rewards than future ones • state-action value function Q ( s , a ) : ( ) ∞ Q π ( s , a ) = E π γ t r t | s = s t ^ a = a t ∑ t = 0 M Graña et al. (ENGINE-WrTU) MARL for new generation control systems IDEAL 2015 18 / 92

Reinforcement Learning Single-Agent RL Q-Learning • Q-Learning : iterative estimation of Q-values :  s 0 , a 0 �� Q t ( s , a ) = ( 1 � α ) Q t � 1 ( s , a )+ α · r t + γ · max a 0 Q t � 1 , where α is the learning gain. • Tabular representation : store value of each state-action pair ( | S |·| A | ) • In our example, with 2 robots (20 states) and 4 actions per robot, the Q-table size : 20 · 4 2 M Graña et al. (ENGINE-WrTU) MARL for new generation control systems IDEAL 2015 19 / 92

Reinforcement Learning Single-Agent RL Action-selection policy • Convergence: Q-learning converges to the optimal Q-table • i ff all possible state-action pairs are visited infinitely often • Exploration: requires trying suboptimal actions to gather information (convergence) • ε � greedy action selection policy: ( with probability ε random action π ε ( s ) = argmax a 2 A Q ( s , a ) with probability 1 � ε • Exploitation: selects action a ⇤ = max a Q ( s , a ) M Graña et al. (ENGINE-WrTU) MARL for new generation control systems IDEAL 2015 20 / 92

Reinforcement Learning Single-Agent RL Learning Observation • Learning often requires the repetition of experiments • Repetitions often imply simulation is the only practical way • Autonomous learning implies exploration • non-stationarity asks for permanent exploration M Graña et al. (ENGINE-WrTU) MARL for new generation control systems IDEAL 2015 21 / 92

Multi-agent reinforcement learning for new generation control - PowerPoint PPT Presentation

Multi-agent reinforcement learning for new generation control systems Manuel Graa 1 , 2 ; Borja Fernandez-Gauna 2 1 ENGINE centre, Wroclaw Technological University; 2 Computational Intelligence Group (www.ehu.eus/ccwintco) University of the

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

REINFORCEMENT LEARNING IN MULTI-AGENT SYSTEMS MACHINE LEARNING MEETUP DR. ANA PELETEIRO

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department,

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Foundations of Machine Learning Reinforcement Learning Reinforcement Learning Agent exploring

ROMA: Multi-Agent Reinforcement Learning with Emerging Roles Tonghan Wang, Heng Dong, Victor

QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Reinforcement Learning Robert Platt Northeastern University Some images and slides are used

Shannons Theory (contd.) Debdeep Mukhopadhyay Assistant Professor Department of Computer

DRAFT This paper is a draft submission to Inequality Measurement, trends, impacts, and

Towards the Exploitation of Statistical Language Models for Plagiarism Detection with Reference

V.3 Top-k Query Processing 3.1 IR-style heuristics for efficient inverted index scans 3.2

W arped e ff ective theories and holography Luca Martucci University of Padova based on:

Enhancing and Double Field Theory G . A l d a z a b a l , C A B - I B , B a r

The DOM tree 1 CS380 The DOM tree 2 CS380 Types of DOM nodes 3 <p> This is a

Towards a German frame based QA System Frame Semantics and QA Andrei Beliankou LDV/CL, Trier

Multi-agent reinforcement learning for new generation control - PowerPoint PPT Presentation

Multi-agent reinforcement learning for new generation control systems Manuel Graa 1 , 2 ; Borja Fernandez-Gauna 2 1 ENGINE centre, Wroclaw Technological University; 2 Computational Intelligence Group (www.ehu.eus/ccwintco) University of the

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

REINFORCEMENT LEARNING IN MULTI-AGENT SYSTEMS MACHINE LEARNING MEETUP DR. ANA PELETEIRO

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department,

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Foundations of Machine Learning Reinforcement Learning Reinforcement Learning Agent exploring

ROMA: Multi-Agent Reinforcement Learning with Emerging Roles Tonghan Wang, Heng Dong, Victor

QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

Reinforcement Learning Robert Platt Northeastern University Some images and slides are used

Shannons Theory (contd.) Debdeep Mukhopadhyay Assistant Professor Department of Computer

DRAFT This paper is a draft submission to Inequality Measurement, trends, impacts, and

Towards the Exploitation of Statistical Language Models for Plagiarism Detection with Reference

V.3 Top-k Query Processing 3.1 IR-style heuristics for efficient inverted index scans 3.2

W arped e ff ective theories and holography Luca Martucci University of Padova based on:

Enhancing and Double Field Theory G . A l d a z a b a l , C A B - I B , B a r

The DOM tree 1 CS380 The DOM tree 2 CS380 Types of DOM nodes 3 &lt;p&gt; This is a

Towards a German frame based QA System Frame Semantics and QA Andrei Beliankou LDV/CL, Trier

The DOM tree 1 CS380 The DOM tree 2 CS380 Types of DOM nodes 3 <p> This is a