agent for ms pac man vs ghost team competition
play

Agent for Ms. Pac-Man vs. Ghost Team Competition , - PowerPoint PPT Presentation

Agent for Ms. Pac-Man vs. Ghost Team Competition , 2012030142 513 - Target Try to maximize your score by eating as many pills/power pills/ghosts as you can


  1. Agent for Ms. Pac-Man vs. Ghost Team Competition Πλατανιώτης Στέργιος, 2012030142 ΠΛΗ513 - Αυτόνομοι Πράκτορες

  2. Target • Try to maximize your score by eating as many pills/power pills/ghosts as you can • Available moves are UP/DOWN/LEFT/RIGHT/NEUTRAL • Partial observability(PO): Ms. Pac-Man can only see in a vertical and horizontal line • This yields many problems as it is more likely to get stuck in local maxima states when you can see no food or ghosts around you • Also, it is more difficult because the ghosts have internal communication and can get you trapped very easily without you even realizing it

  3. Q-learning • Implementation of the reinforcement learning Q- learning algorithm • A table with a value for every pair of (state, move) • After every round we update the entry for the previous (state, move) • Takes as parameters a:learning rate and γ :discount factor • The values are proven to converge to an optimal policy for 0 <=α<= 1 and 0 <=γ<= 1

  4. Move selection • ε -soft implementation: in each round we choose a random move with a small probability ε, this is used only during learning to encourage exploration • Otherwise, we choose our move greedily by choosing the move with the highest Q value • If multiple moves have the same best value, we can either keep our old move, if it is still optimal, or just choose a random from the optimal ones

  5. State generalization • There is a huge amount of different states in Ms. Pac- Man game • We generalize the states based on specific features • We check: • If there is a wall up, down, left and right of Ms. Pac-Man • If there is an intimidating ghost approaching her • And finally, the direction of the nearest food or exit if we are being chased • This way we decrease the number of possible states dramatically and make the learning process faster

  6. Reward Function • Reward function gives a positive value if Ms. Pac-Man did something good or negative if she did something bad • We encourage her to eat pills/power/pills/ghost (+20) • We give a penalty for being eaten by a ghost (-350), for hitting a wall (-100) , for doing an opposite move (-6) and for every step she takes (-2.5) to make her find a quickest optimal path

  7. Results • Training for thousands of games using a decaying ε probability starting at 0.1 Score Average MAX MIN StarterGhostsComm 3671 13200 810 StarterGhosts 3650 14860 670

  8. Future Work • In the future we can make use of a genetic algorithm to find an optimal pair of parameters α and γ • Also there can be implemented a Neural Network to better train our agent, this is also known as deep learning and is popular for its results

  9. Thank you for your time!

Recommend


More recommend