Autonomous Agents Assault game - A3C agent 2016030010-Kosmas - PowerPoint PPT Presentation

Autonomous Agents Assault game - A3C agent 2016030010-Kosmas Pinitas Technical University of Crete February 23, 2020 2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 1 / 16

Outline Background ◮ Environment ◮ MDPs ◮ Q Learning ◮ Policy Gradients A3C Definition Advantages Model ◮ Archtecture ◮ Results References 2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 2 / 16

Background Environment states: 4 grayscaled images (84 x 84) actions: 7 supported actions (6 permitted actions) ◮ do nothing, shoot, move left, move right, shoot left, shoot right 2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 3 / 16

Background MDPs A Markov Decision Process (MDP) is a set ( S , A , P α , R α ) where: S is a finite set of states, A is a finite set of actions, P α is the probability that action α in state s at time t will lead to state s ′ at time t + 1, R α is the immediate reward (or expected immediate reward) received after transitioning from state s to state s ′ , due to action α 2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 4 / 16

Background Q-Learning The goal of Q-learning is to learn a policy, which tells an agent what action to take under what circumstances. It does not require a model of the environment, and it can handle problems with stochastic transitions and rewards. Q new ( s t , α t ) = Q ( s t , α t ) + a · ( r t + γ · max α { Q ( s t +1 , α ) } − Q ( s t , α t )) r t is the reward received when moving from state s t to state s t +1 , a is the learning rate or step size and determines to what extent newly acquired information overrides old information, γ is the discount factor and determines the importance of future rewards. For problems with big dimensionality we use a neural network as Q approximator in order to reduce the complexity (Deep Q-Learning) 2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 5 / 16

Background Q-Learning (Cont.) 2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 6 / 16

Background Policy-gradients Direct approximation of policy function π ( s ) , J ( π ) = E ρ s 0 [ V ( s + 0)] (Objective function) ∇ θ J ( π ) = E s ∼ ρ π , a ∼ π ( s ) [ A ( s , a ) · ∇ θ log π ( a � s )] (Gradient) ◮ ∇ θ log π ( a � s ) tells us a direction in which logged probability of taking action α in state s rises ◮ A ( s , a ) is a scalar value and tells us what’s the advantage of taking this action. ◮ If we combine the above terms , we will see that the likelihood of actions that are better than average is increased, and the likelihood of actions worse than average is decreased. 2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 7 / 16

Background Policy-gradients (Cont.) 2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 8 / 16

A3C Definition Asynchronous ◮ Multiple agents in parallel and each one has its own network parameters and a copy of the environment. ◮ This agents learn only from their respective environments ◮ As each agent gains more knowledge, it contributes to the total knowledge of the global network Advantage ◮ A ( s , a ) = Q ( s , a ) − V ( s ) = r + γ V ( s ′ ) − V ( s ) ◮ Expresses how good it is to take an action α in a state s compared to average. Actor-Critic ◮ Combines the best parts of Policy-Gradient and Value-Iteration methods. ◮ Predicts both the value function V ( s ) as well as the optimal policy function π ( s ). ◮ Agent uses the value of the Value function (Critic) to update the optimal policy function (Actor) (stochastic policy) 2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 9 / 16

A3C Actor-Critic Network 2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 10 / 16

A3C Advantages Faster and more robust than the standard Reinforcement Learning Algorithms. Performs better than the other Reinforcement learning techniques because of the diversification of knowledge. It can be used on discrete as well as continuous action spaces. 2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 11 / 16

Model Architecture 2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 12 / 16

Model Results 2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 13 / 16

Model Results (Cont.)) 2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 14 / 16

Model Results (Cont.)) 2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 15 / 16

References environment: https://gym.openai.com/envs/Assault-ram-v0/ MDP: https://en.wikipedia.org/wiki/Markov decision process Q-Learning: https://en.wikipedia.org/wiki/Q-learning Policy-Gradients: https://jaromiru.com/2017/02/16/lets-make-an-a3c-theory/ A3C ◮ https://jaromiru.com/2017/02/16/lets-make-an-a3c-theory/ ◮ https://www.geeksforgeeks.org/asynchronous-advantage-actor-critic- a3c-algorithm/ 2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 16 / 16

Autonomous Agents Assault game - A3C agent 2016030010-Kosmas - PowerPoint PPT Presentation

Autonomous Agents Assault game - A3C agent 2016030010-Kosmas Pinitas Technical University of Crete February 23, 2020 2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 1 / 16 Outline Background

Intelligent Agents Chapter 2 Intelligent Agents p.1/25 Outline Agents and environments

CSC421 Intro to Artificial Intelligence UNIT 01: Intelligent Agents Agents & environments

Intelligent Agents Chapter 2 Intelligent Agents p.1/25 Outline Agents and environments

LMIs and autonomous work 1 From autonomous work to discontinuous career paths Autonomous

Math 211 Math 211 Lecture #2 2 Autonomous Equations Autonomous Equations General equation:

AUTONOMOUS DRIVING AGENT An agent by Stylianos Zafeiris for the Autonomous Agents (COMP513)

Subjectivity of Autonomous August 28, 2012 Agents. Some Philosophical and Legal Remarks

LECTURE 11: autonomous action is required. Intelligent agents are usefully applied in domains

Where is the Semantics on the Semantic Web? Ontologies and Agents Workshop Autonomous Agents

Real World Autonomous Agents Coordinate multiple agents Robust Execution of Contingent,

Tutorial Outline Introduction to Autonomous Agents and Multi-Agent Systems I Agents N What are

2015 OUTSTANDING YOUNG AGENTS COMMITTEE: Membership Development The Young Agents Council of the

Intelligent Driving Agents Intelligent Driving Agents Microscopic traffic simulation with

Innovative Ideas to Engage Agents Will Bickmore & Sarah-Lynne Rand Senior Account Managers

BABA is getting Social BECOME A BETTER AGENT Where good agents go to become great agents.

Learning Agents Overview Learning important aspects Learning in Agents goal, types; individual

New Challenges in e-Learning of Mathematics via EVLM and IDeLC Projects Snezhana Gocheva-Ilieva,

C retan Brewery S.A is which was something that was the first microbrew- missing from the

Broadband services for schools through the Greek School Network Christos Bouras Professor,

South Durham Little League 2018 Annual Meeting Board of Directors 2018 Executive Coaches

Technical University of Crete (TUC) Chania, Crete, Greece www.tuc.gr moustaki@dpem.tuc.gr Short

water in Cyprus and Crete Dr. Maria N. Anastasiadou KIOS Research and Innovation Centre of

Acceleration at the Edge for supporting SMEs Security: The FORTIKA Paradigm. H2020 FORTIKA

Second Year Review WP5: Dissemination Trento 17 October 2008 WP5 Tasks 5.4

Autonomous Agents Assault game - A3C agent 2016030010-Kosmas - PowerPoint PPT Presentation

Autonomous Agents Assault game - A3C agent 2016030010-Kosmas Pinitas Technical University of Crete February 23, 2020 2016030010-Kosmas Pinitas (Technical University of Crete) Autonomous Agents February 23, 2020 1 / 16 Outline Background

Intelligent Agents Chapter 2 Intelligent Agents p.1/25 Outline Agents and environments

CSC421 Intro to Artificial Intelligence UNIT 01: Intelligent Agents Agents &amp; environments

Intelligent Agents Chapter 2 Intelligent Agents p.1/25 Outline Agents and environments

LMIs and autonomous work 1 From autonomous work to discontinuous career paths Autonomous

Math 211 Math 211 Lecture #2 2 Autonomous Equations Autonomous Equations General equation:

AUTONOMOUS DRIVING AGENT An agent by Stylianos Zafeiris for the Autonomous Agents (COMP513)

Subjectivity of Autonomous August 28, 2012 Agents. Some Philosophical and Legal Remarks

LECTURE 11: autonomous action is required. Intelligent agents are usefully applied in domains

Where is the Semantics on the Semantic Web? Ontologies and Agents Workshop Autonomous Agents

Real World Autonomous Agents Coordinate multiple agents Robust Execution of Contingent,

Tutorial Outline Introduction to Autonomous Agents and Multi-Agent Systems I Agents N What are

2015 OUTSTANDING YOUNG AGENTS COMMITTEE: Membership Development The Young Agents Council of the

Intelligent Driving Agents Intelligent Driving Agents Microscopic traffic simulation with

Innovative Ideas to Engage Agents Will Bickmore &amp; Sarah-Lynne Rand Senior Account Managers

BABA is getting Social BECOME A BETTER AGENT Where good agents go to become great agents.

Learning Agents Overview Learning important aspects Learning in Agents goal, types; individual

New Challenges in e-Learning of Mathematics via EVLM and IDeLC Projects Snezhana Gocheva-Ilieva,

C retan Brewery S.A is which was something that was the first microbrew- missing from the

Broadband services for schools through the Greek School Network Christos Bouras Professor,

South Durham Little League 2018 Annual Meeting Board of Directors 2018 Executive Coaches

Technical University of Crete (TUC) Chania, Crete, Greece www.tuc.gr moustaki@dpem.tuc.gr Short

water in Cyprus and Crete Dr. Maria N. Anastasiadou KIOS Research and Innovation Centre of

Acceleration at the Edge for supporting SMEs Security: The FORTIKA Paradigm. H2020 FORTIKA

Second Year Review WP5: Dissemination Trento 17 October 2008 WP5 Tasks 5.4

CSC421 Intro to Artificial Intelligence UNIT 01: Intelligent Agents Agents & environments

Innovative Ideas to Engage Agents Will Bickmore & Sarah-Lynne Rand Senior Account Managers