LunarLander-v2 using Deep Reinforcement Learning A project - PowerPoint PPT Presentation

Oct 08, 2022 •183 likes •301 views

LunarLander-v2 using Deep Reinforcement Learning A project developed for Autonomous Agents Course PLH513 Portokalakis Petros February 2020 Simple Game 8-Dimensional state space 4 actions per state +100 points for landing

LunarLander-v2 using Deep Reinforcement Learning A project developed for Autonomous Agents Course PLH513 Portokalakis Petros February 2020
Simple Game 8-Dimensional state space ● 4 actions per state ● +100 points for landing ● -100 points when crashed ● Infinite fuel, but -0.3 points per ● frame when firing main engine +10 for each leg ground contact (to ● encourage smooth landing)
Deep Reinforcement Learning Objective: approximate the optimal Q-Function (which satisfies the Bellman Equation) Neural network: 8 node input layer - dimensionality of state space ● 150 node fully connected 1st hidden layer ● 128 node fully connected 2nd hidden layer ● 4 node output layer - q-values for actions ● 4 layer approach works well with a variety of hidden layer node number 5 layers prove insufficient to even train the agent
Deep Reinforcement Learning: Advancing performance Experience replay: Every tuple(s,a,r,s’,done) is stored in a replay buffer (maxlength=1M) ● Randomly sample a batch of previous experiences (64). Break correlation ● between consecutive samples Predict best action for all items in the batch via the NN ● Update neural network weights ● Generate episodes via exploration or exploitation ●
Deep Reinforcement Learning: Advancing performance Calculating loss between output Q-value and target Q-value requires a seconds ● pass to the network for the next state s and s’ share the same network and have one step difference ● Optimization becomes unstable ● Target network: Use an identical network to the policy network, but update target network weight’s every C iterations (C is a hyperparameter) First pass occures with the policy network Second pass occures with the target network
Deep Reinforcement Learning: Advancing performance Abstract version of the agent algorithm implemented
Deep Reinforcement Learning: Performance of Lunar Lander
Deep Reinforcement Learning: Performance of Lunar Lander Adding a third hidden layer
Deep Reinforcement Learning: Hyperparameter Tuning Hyperparameter Value Starting epsilon 1 Minimum epsilon 0.01 Decay factor of epsilon 0.99 Discount factor gamma 0.99 Learning rate 0.001 Batch size 64 Replay buffer 1000000
Thank you Questions? Contact: pportokalakis@gmail.com

Recommend

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

Deep Neural Networks and Deep Reinforcement Learning Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and Courville [chapt. 6,7,8]; AIMA [sect. 21.1-21.3]; Sutton and Barto, Reinforcement Learning: an

528 views • 35 slides

Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement

Reinforcement Learning Q-Learning Deep Q-Learning on Atari Reinforcement Learning Timothy Chou Charlie Tong Vincent Zhuang April 19, 2016 Reinforcement Learning Q-Learning Deep Q-Learning on Atari Table of Contents Reinforcement Learning

939 views • 63 slides

Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning:

Reinforcement Learning Reinforcement Learning AIMA Chapters: 21.1, 21.2, 21.3. Sutton and Barto, Reinforcement Learning: an Introduction, 2nd Edition: Chapters 6 (6.1 6.5) Outline Reinforcement Learning Reinforcement Learning: the

587 views • 27 slides

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement

1 Deep Reinforcement Learning Qianqian Li, Nayeon Koong, Langtian He What is deep reinforcement learning? Agent/Actor + Action + Environment + State + Reward How does reinforcement learning work?

793 views • 31 slides

RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem

Introduction to Reinforcement Learning RL Overview of topics About Reinforcement Learning The Reinforcement Learning Problem Inside an RL agent Temporal difference learning Many faces of Reinforcement Learning What is

552 views • 35 slides

Deep Reinforcement Learning [Mastering the Game of Go with Deep Reinforcement Learning and Tree

Deep Reinforcement Learning [Mastering the Game of Go with Deep Reinforcement Learning and Tree Search, Nature 2016] CS 486/686 University of Waterloo Lecture 21: July 12, 2017 Outline AlphaGo Supervised Learning of Policy Networks

541 views • 15 slides

Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning?

Reinforcement Learning<br/><br/> 4/25/19, 8*06 PM Reinforcement Learning<br/><br/> 4/25/19, 8*06 PM Reinforcement Learning UMaine COS 470/570 Introduction to AI Why reinforcement learning? Spring 2019 Created:

371 views • 15 slides

Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and

Reinforcement Learning and Simulation-Based Search Reinforcement Learning and Simulation-Based Search David Silver Reinforcement Learning and Simulation-Based Search Outline 1 Reinforcement Learning 2 Simulation-Based Search 3 Planning Under

425 views • 20 slides

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine

Reinforcement Learning Reinforcement Learning Reinforcement Learning in a nutshell g Imagine playing a new game whose rules you dont know; after a hundred or so moves your don t know; after a hundred or so moves, your opponent announces, You

512 views • 30 slides

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest

Safe Reinforcement Learning Philip S. Thomas Stanford CS234: Reinforcement Learning, Guest Lecture May 24, 2017 Lecture overview What makes a reinforcement learning algorithm safe ? Notation Creating a safe reinforcement learning

1.42k views • 88 slides

Deep Reinforcement Learning [Human-Level Control through deep reinforcement learning, Nature

Deep Reinforcement Learning [Human-Level Control through deep reinforcement learning, Nature 2015] CS 486/686 University of Waterloo Lecture 20: July 10, 2017 Outline Value Function Approximation Linear approximation Neural

706 views • 19 slides

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Deep learning Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December 25, 2018 Hamid Beigy | Sharif university of technology | December 25, 2018 1 / 65 Deep learning Table of contents 1 Introduction 2

836 views • 65 slides

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning

CS885 Reinforcement Learning Module 2: June 6, 2020 Maximum Entropy Reinforcement Learning Haarnoja, Tang et al. (2017) Reinforcement Learning with Deep Energy Based Policies, ICML . Haarnoja, Zhou et al. (2018) Soft Actor-Critic: Off-Policy

684 views • 24 slides

Deep Reinforcement Learning Philipp Koehn 21 April 2020 Philipp Koehn Artificial Intelligence:

Deep Reinforcement Learning Philipp Koehn 21 April 2020 Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 21 April 2020 Reinforcement Learning 1 Sequence of actions moves in chess driving controls in car

815 views • 63 slides

Deep Reinforcement Learning Philipp Koehn 18 April 2019 Philipp Koehn Artificial Intelligence:

Deep Reinforcement Learning Philipp Koehn 18 April 2019 Philipp Koehn Artificial Intelligence: Deep Reinforcement Learning 18 April 2019 Reinforcement Learning 1 Sequence of actions moves in chess driving controls in car

861 views • 63 slides

Deep he(a)p, big feat arXiv:1707.06887 A Distributional Perspective on Reinforcement Learning

Deep he(a)p, big feat arXiv:1707.06887 A Distributional Perspective on Reinforcement Learning arXiv:1702.08165 Reinforcement Learning with Deep Energy-Based Policies 1 / 25 Reinforcement Learning Environment Action Reward Interpreter State

531 views • 25 slides

ML MLOp Ops CI CI/CD CD for or Ma Machine Le Learn rning SASHA ROSENBAUM Sasha Rosenbaum

ML MLOp Ops CI CI/CD CD for or Ma Machine Le Learn rning SASHA ROSENBAUM Sasha Rosenbaum Sr. Program Manager @DivineOps https://www.deliveryconf.com/ Agenda Machine Learning 101 ML CI/CD Pipeline Overview Potential

800 views • 67 slides

Fast and Easy Hyper-Parameter Grid Search for Deep Learning GTC 2016 Mark Whitney Rescale

Fast and Easy Hyper-Parameter Grid Search for Deep Learning GTC 2016 Mark Whitney Rescale Overview Hyper-parameter optimization intro Intro to training on Rescale Random sampling demo Advanced optimization workflows Image

445 views • 33 slides

Expected Nodes : a quality function for the detection of link communities e Gaumont , Fran cois

u n i v e r s i t e p i e r r e e t m a r i e c u r i e - l i p 6 Expected Nodes : a quality function for the detection of link communities e Gaumont , Fran cois Queyroi, Cl emence Magnien and No Matthieu Latapy LIP6 - CNRS & UPMC ,

1.89k views • 26 slides

Cleaning Up the Neighborhood: Duplicate Ryan de Vera, Anna Ma, Daniel Moyer, Brendan Detection

Cleaning Up the Neighborhood: Duplicate Detection and Community Analysis of Hollenbeck Gangs Cleaning Up the Neighborhood: Duplicate Ryan de Vera, Anna Ma, Daniel Moyer, Brendan Detection and Community Analysis of Schneiderman

677 views • 26 slides

Trust Region Policy Optimization Yixin Lin Duke University yixin.lin@duke.edu March 28, 2017

Trust Region Policy Optimization Yixin Lin Duke University yixin.lin@duke.edu March 28, 2017 Yixin Lin (Duke) TRPO March 28, 2017 1 / 21 Overview Preliminaries 1 Markov Decision Processes Policy iteration Policy gradients TRPO 2

726 views • 21 slides

VGG/MOBILENET SSD Michael Sun July September, 2019 DATASETS FOR FINE-TUNING BelgaLogos

LOGO DETECTION WITH VGG/MOBILENET SSD Michael Sun July September, 2019 DATASETS FOR FINE-TUNING BelgaLogos FlickrLogos Combined LogosInTheWild DATASETS FOR FINE-TUNING (POST-CLEANING) BelgaLogos BelgaLogos-top LogosInTheWild

670 views • 32 slides

Using Deep Learning to Detect Galaxy Mergers Jonas Arilho Levy Supervisor: Mateus Espadoto [

Using Deep Learning to Detect Galaxy Mergers Jonas Arilho Levy Supervisor: Mateus Espadoto [ Co-supervisor: Prof. Dr. Roberto Hirata Junior ] Instituto de Matemtica e Estatstica da Universidade de So Paulo Contents: Objectives 3

470 views • 26 slides

Network Anomaly Detection in Modbus TCP Industrial Control Systems RP1 #52: Industrial Control

Network Anomaly Detection in Modbus TCP Industrial Control Systems RP1 #52: Industrial Control Systems Research Philipp Mieden & Rutger Beltman, 2020 Supervisor: Bartosz Czaszynski, Deloitte Industrial Network VS Corporate Network 2

379 views • 35 slides