osu mania reinforcement learning agent
play

osu!mania Reinforcement Learning Agent - PowerPoint PPT Presentation

osu!mania Reinforcement Learning Agent ichrysomallis@isc.tuc.gr 2014030078 Contents Introduction osu!mania game Graphical User Interface customization Agents environment Approach and


  1. osu!mania Reinforcement Learning Agent Χρυσομάλλης Ιάσων ichrysomallis@isc.tuc.gr 2014030078

  2. Contents  Introduction  osu!mania game  Graphical User Interface customization  Agent’s environment  Approach and variable definition  Q-learning  Deep reinforcement learning  Future plans 2

  3. Introduction Topic: Develop an agent able to learn how to play the video game osu!mania, through reinforcement learning. Two agents :  Q-learning agent  Deep reinforcement learning agent 3

  4. osu!mania game Rhythm game , notes are falling Single-tap notes 1. Hold notes 2. Judgment bar 3. Player keys 4. Combo 5. Hitburst 6. Overall accuracy 7. Score 8. 4

  5. Graphical User Interface customization Fully customizable environment, all elements can be changed Each element is painted with solid color RGB = [X, 100, 100], where X is in accordance with the element’s identity (see numbers) 5

  6. Agent’s environment Record screenshots and translate information based on the RGB values given Small fraction of the screen includes relevant information , specific boxes are being recorded 6

  7. Approach and variable definition (1) Identical behavior on each column, problem can be narrowed down to single column learning  Agent’s actions : Instantaneous key tap 1. Key press (no release) 2. Key release 3. Do nothing 4. 7

  8. Approach and variable definition ( 2 )  Rewards:  Epsilon: o Initial value = 1 o Decay value = 0.9977 o Minimum value = 0.01 8

  9. Approach and variable definition ( 3 )  State:  One column of 200 pixels  Only red (R) layer  Three possible values (no note, singe-tap note, hold note) Deep reinforcement learning: o Raw input of the column Q-learning: o Only 8 pixels due to state complexity, taking one pixel every 15 pixels of the recorded column 9

  10. Q-learning  Algorithm:  Steps:  Receive current state  Choose an action based on epsilon  Execute the action  Receive new state  Check if song is over  Update Q-table 10

  11. Deep reinforcement learning  Neural network model (Keras):  Steps: Identical steps apart from last one. Save transitions in temporary memory and train the model with a smaller, randomly selected sample group (batch). 11

  12. Results Q-learning agent DQN agent 12

  13. Future plans  Try different combinations of neural network model layers  Design the neural network model in TensorFlow  Run the agent on GPU, instead of CPU  Make use of a high end computer 13

Recommend


More recommend