Decentralized Non- Communicating Multi-agent Collision Avoidance - PowerPoint PPT Presentation

Decentralized Non- Communicating Multi-agent Collision Avoidance with Deep Reinforcement Learning By Yu Fan Chen, Miao Liu, Michael Everett, and Jonathan P . How Presenter: Jared Choi

Motivation • Finding a path • Computationally expensive due to • Collision checking • Feasibility checking • Effjciency checking

Motivation • Finding a path • Computationally expensive due to • Collision checking • Feasibility checking • Effjciency checking • Offmine Learning

Background • A sequential decision making problem can be formulated as a Markov Decision Process (MDP) • M = <S, A, P, R, >

Background • A sequential decision making problem can be formulated as a Markov Decision Process (MDP) • M = <S, A, P, R, > • S (state space) • A(action space) • P(state transition model) • R: reward function • : discount factor

State Space (M = < S , A, P, R, >) • S(state space) • System’s state is constructed by concatenating the two agents’ individual states Observable State Vector (position (x,y), velocity(x,y), radius) Unobservable State vector (goal position (x,y), preferred speed, he

State Space (M = < S , A, P, R, >) • S(state space) • System’s state is constructed by concatenating the two agents’ individual states Observable State Vector (position (x,y), velocity(x,y), radius) Unobservable State vector (goal position (x,y), preferred speed, hea

Action Space (M = <S, A , P, R, >) • A(action space): • Set of permissible velocity vectors, a(s) = v

State Transition Model(M = <S, A, P , R, >) • P(state transition model) • A probabilistic state transition model • Determined by the agents’ kinematics • Unknown to us

Reward Function (M = <S, A, P, R , >) • R: reward function • Award the agent for reaching its goal • Penalize the agent for getting too close or colliding with other agent

Discount Factor(M = <S, A, P, R, >) • Discount factor

Value Function • The value of a state • Value depends on • close to 1 • We care about our long term reward • close to 0 • We care only about our immediate reward

Optimal Policy • The best trajectory at given state

Value Function and Optimal Policy From David Silver’s slide

Value Function and Optimal Policy • Every state s has value V(s) • Store it in a lookup table • In a grid world : 16 values • In motion planning : Infjnite values (b/c it’s continuous state space) • Solution: • Approximate value via neural network

Value Function and Optimal Policy From David Silver’s slides

Value Function and Optimal Policy

Collision Avoidance Deep Reinforcement Learning 1.T rain Value network using ORCA 2.T rain again with Deep reinforcement Learning

Collision Avoidance Deep Reinforcement Learning 1.T rain Value network using ORCA • Why pre-train?

Collision Avoidance Deep Reinforcement Learning 1.T rain Value network using ORCA • Why pre-train? - Initializing the neural network is crucial to convergence - We want the network to output something reasonable

Collision Avoidance Deep Reinforcement Learning 1.T rain Value network using ORCA • Why pre-train? - Initializing the neural network is crucial to convergence - We want the network to output something reasonable • Generate 500 trajectories as a training set • Each trajectory contains 40 state-value pairs (total of 20,000 pairs) • Back-propagate to minimize our loss function:

Collision Avoidance Deep Reinforcement Learning 1.T rain Value network using ORCA 2.T rain again with Deep reinforcement Learning

Collision Avoidance Deep Reinforcement Learning 1.T rain again with Deep reinforcement Learning

Collision Avoidance Deep Reinforcement Learning 1.T rain again with Deep reinforcement Learning Backpropagatio n

Collision Avoidance Deep Reinforcement Learning 1.T rain again with Deep reinforcement Learning

Result

Quiz  Values are update after each episode (T/F)  Value function needs to be trained with ORCA (T/F)  ORCA path does not need to be optimal (T/F)

Decentralized Non- Communicating Multi-agent Collision Avoidance - PowerPoint PPT Presentation

Decentralized Non- Communicating Multi-agent Collision Avoidance with Deep Reinforcement Learning By Yu Fan Chen, Miao Liu, Michael Everett, and Jonathan P . How Presenter: Jared Choi Motivation Finding a path Computationally

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

Collision Detection Based on Collision Series On XNA Creators Club Collision Detection Circular

10-Collision Response Collision Response Collision Response [Moore and Wilhelms 88]:

Collision Detection Part 2. Narrow Phase Collision Detection The Narrow Phase Exact collision

Collision Detection Collision detection weaknesses Naive collision detection suffers from 3 known

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

I. Introduction II. Collision of Domain Walls in 5D Minkowski Space III. Reheating by Collision

Collision Detection That Collision Detection That Collision Detection That Really Works Really

Collision Detection http://www.cse.iitd.ac.in/ Collision Detection IIT Delhi Collision handling

Q44.2 Consider the collision p + p -> p + p + 0 p + p + Consider the collision p + p

Towards optimization-based multi-agent collision avoidance under continuous stochastic dynamics

Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department,

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

S S S S calable calable Agent calable calable Agent Agent Plat forms Agent Plat forms

Agent-Based Systems Agent communication Speech act theory Michael Rovatsos Agent

Gerber Collision & Glass Benefit Package 2015 - 2016 Gerber Collision & Glass Benefits

Presentation guidance 1 | Presentation Title | Presenter Name | Date | Subject | Business Use Only

Initial Validation of a Convective Weather Avoidance Model (CWAM) in Departure Airspace Mikhail

Avoidance Coupling Ohad N. Feldheim Institute of Mathematics and its Applications, UMN Jan 2015

Hospital Avoidance Sharon Madden Oak Ward Manager and Hospital Avoidance Lead Denise Walker

Correlations in Pattern Avoidance Marisa Gaetz, Will Hardt, Shruthi Sridhar, and Anh Quoc Tran

The Drafting of General Anti- Avoidance Rules Michael Littlewood ATTA 2013 The New Zealand GAAR

Automatic Collision Avoidance System based on Geometric Approach applied to Multiple Aircraft

Negative symptoms: Clinical assessments, biomarkers and the role of reward processing James Gold

Decentralized Non- Communicating Multi-agent Collision Avoidance - PowerPoint PPT Presentation

Decentralized Non- Communicating Multi-agent Collision Avoidance with Deep Reinforcement Learning By Yu Fan Chen, Miao Liu, Michael Everett, and Jonathan P . How Presenter: Jared Choi Motivation Finding a path Computationally

Overview Multi-Agent Systems Introduction to multi-agent systems and agent societies Agent

Collision Detection Based on Collision Series On XNA Creators Club Collision Detection Circular

10-Collision Response Collision Response Collision Response [Moore and Wilhelms 88]:

Collision Detection Part 2. Narrow Phase Collision Detection The Narrow Phase Exact collision

Collision Detection Collision detection weaknesses Naive collision detection suffers from 3 known

Multi-agent learning Multi-agent reinforcement learning Gerard Vreeswijk , Intelligent Systems

I. Introduction II. Collision of Domain Walls in 5D Minkowski Space III. Reheating by Collision

Collision Detection That Collision Detection That Collision Detection That Really Works Really

Collision Detection http://www.cse.iitd.ac.in/ Collision Detection IIT Delhi Collision handling

Q44.2 Consider the collision p + p -&gt; p + p + 0 p + p + Consider the collision p + p

Towards optimization-based multi-agent collision avoidance under continuous stochastic dynamics

Multi-agent learning Gerard Vreeswijk , Intelligent Systems Group, Computer Science Department,

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

S S S S calable calable Agent calable calable Agent Agent Plat forms Agent Plat forms

Agent-Based Systems Agent communication Speech act theory Michael Rovatsos Agent

Gerber Collision &amp; Glass Benefit Package 2015 - 2016 Gerber Collision &amp; Glass Benefits

Presentation guidance 1 | Presentation Title | Presenter Name | Date | Subject | Business Use Only

Initial Validation of a Convective Weather Avoidance Model (CWAM) in Departure Airspace Mikhail

Avoidance Coupling Ohad N. Feldheim Institute of Mathematics and its Applications, UMN Jan 2015

Hospital Avoidance Sharon Madden Oak Ward Manager and Hospital Avoidance Lead Denise Walker

Correlations in Pattern Avoidance Marisa Gaetz, Will Hardt, Shruthi Sridhar, and Anh Quoc Tran

The Drafting of General Anti- Avoidance Rules Michael Littlewood ATTA 2013 The New Zealand GAAR

Automatic Collision Avoidance System based on Geometric Approach applied to Multiple Aircraft

Negative symptoms: Clinical assessments, biomarkers and the role of reward processing James Gold

Q44.2 Consider the collision p + p -> p + p + 0 p + p + Consider the collision p + p

Gerber Collision & Glass Benefit Package 2015 - 2016 Gerber Collision & Glass Benefits