lunarlander v2 using deep reinforcement learning
play

LunarLander-v2 using Deep Reinforcement Learning A project - PowerPoint PPT Presentation

LunarLander-v2 using Deep Reinforcement Learning A project developed for Autonomous Agents Course PLH513 Portokalakis Petros February 2020 Simple Game 8-Dimensional state space 4 actions per state +100 points for landing


  1. LunarLander-v2 using Deep Reinforcement Learning A project developed for Autonomous Agents Course PLH513 Portokalakis Petros February 2020

  2. Simple Game 8-Dimensional state space ● 4 actions per state ● +100 points for landing ● -100 points when crashed ● Infinite fuel, but -0.3 points per ● frame when firing main engine +10 for each leg ground contact (to ● encourage smooth landing)

  3. Deep Reinforcement Learning Objective: approximate the optimal Q-Function (which satisfies the Bellman Equation) Neural network: 8 node input layer - dimensionality of state space ● 150 node fully connected 1st hidden layer ● 128 node fully connected 2nd hidden layer ● 4 node output layer - q-values for actions ● 4 layer approach works well with a variety of hidden layer node number 5 layers prove insufficient to even train the agent

  4. Deep Reinforcement Learning: Advancing performance Experience replay: Every tuple(s,a,r,s’,done) is stored in a replay buffer (maxlength=1M) ● Randomly sample a batch of previous experiences (64). Break correlation ● between consecutive samples Predict best action for all items in the batch via the NN ● Update neural network weights ● Generate episodes via exploration or exploitation ●

  5. Deep Reinforcement Learning: Advancing performance Calculating loss between output Q-value and target Q-value requires a seconds ● pass to the network for the next state s and s’ share the same network and have one step difference ● Optimization becomes unstable ● Target network: Use an identical network to the policy network, but update target network weight’s every C iterations (C is a hyperparameter) First pass occures with the policy network Second pass occures with the target network

  6. Deep Reinforcement Learning: Advancing performance Abstract version of the agent algorithm implemented

  7. Deep Reinforcement Learning: Performance of Lunar Lander

  8. Deep Reinforcement Learning: Performance of Lunar Lander Adding a third hidden layer

  9. Deep Reinforcement Learning: Hyperparameter Tuning Hyperparameter Value Starting epsilon 1 Minimum epsilon 0.01 Decay factor of epsilon 0.99 Discount factor gamma 0.99 Learning rate 0.001 Batch size 64 Replay buffer 1000000

  10. Thank you Questions? Contact: pportokalakis@gmail.com

Recommend


More recommend