Combining Cooperative and Adversarial Coevolution in the Context of - PowerPoint PPT Presentation

Combining Cooperative and Adversarial Coevolution in the Context of Pac-Man by Alexander Dockhorn and Rudolf Kruse Institute for Intelligent Cooperating Systems Department for Computer Science, Otto von Guericke University Magdeburg Universitaetsplatz 2, 39106 Magdeburg, Germany Email: {alexander.dockhorn, rudolf.kruse}@ovgu.de Alexander Dockhorn Slide 1/20, 23.08.2017

Contents I. Pac-Man and the Mrs. Pac-Man vs. Ghost Team Challenge II. Previous Competition Submissions III. Genetic Programming and Partial Observation IV. Combined Coevolution Framework V. Conclusion, Limitations and Future Work Alexander Dockhorn Slide 2/20, 23.08.2017

What is Pac-Man? • Pac-Man is an arcade video game released by Konami in 1980. • It yielded the second highest cross revenue of all arcade games (approx. 7.27 billion dollar). • Pac-Man is the best known video character among American customers [source]. Inky Clyde/Sue Pinky Blinky Alexander Dockhorn Slide 3/20, 23.08.2017

Pac- Man’s Goals • Pac- Man‘s task is to traverse a maze and eat all the pills. • Four ghosts will hunt and try to stop him. • Eating one of the our power pills will allow Pac-Man to eat ghosts for a short duration. • Each of those actions scores Pac-Man points. • After all pills were eaten, the next level starts. • The game ends when no continues remain, after Pac-Man was eaten by a ghost. 200 400 800 1600 Alexander Dockhorn Slide 4/20, 23.08.2017

Mrs. Pac-Man vs. Ghost Team Competition • Since 2007 the Mrs. Pac-Man vs. Ghost Team Competitions. • This work is part of this years competition, which features partial observation. • The competition allows to program agents for Mrs. Pac-Man and the Ghost Team. • In contrast to previous installments, agents will only receive information about objects in line of sight or general information about the map. Alexander Dockhorn Slide 5/20, 23.08.2017

Related Work • Previous Competition installments included agents based on: – State Machines [Gallagher and Ryan] – MCTS [Robles, Tong, Nguyen] – Neural Networks [Gallagher and Ledwich] – Ant Colony Algorithms – Genetic Programming [Alhejali, Brandstetter] • It is not clear how well those solutions translate to the partial observation scenario! Alexander Dockhorn Slide 6/20, 23.08.2017

Genetic Programming • The behavior of each individual is encoded by a tree. • The tree includes simple control structures using input by the game and points to an appropriate output. • Evolutionary Algorithms are used to create a diverse set of trees while trying to improve the fitness of applied trees over time. • Mutation and Crossover operators are used to modify parts of the trees. Alexander Dockhorn Slide 7/20, 23.08.2017

Genetic Programming for Ghost Agents • Implemented nodes should give access to all capabilities of the API, while being as general as possible. • We differentiate function nodes, data terminal and action terminals. • Function Nodes: include basic control functions (e.g. If … Then …Else… -nodes), and Boolean or Numeric operators • Data Terminals: queries the API and the internal memory • Action Terminals: perform a basic action, which is provided by the API Alexander Dockhorn Slide 8/20, 23.08.2017

Mrs. Pac-Man Data and Action Terminals Data Terminals: Action Terminals: • • IsPowerPillStillAvailable FromClosestGhost • • AmICloseToPower ToClosestEdibleGhost • • AmIEmpowered ToClosestPowerPill • • IsGhostClose ToClosestPill • SeeingGhost • DistanceToGhostNr<1,2,3,4> • EmpoweredTime This approach was adapted by previous competition submissions! Due to partial observation restrictions we extended most Data Terminals with a short term memory: • Remembers the last seen position of a ghost • and simulates its behavior for a few ticks • after a tick threshold is reached, the memory is cleared Alexander Dockhorn Slide 9/20, 23.08.2017

Evaluation in a Partial Observation Scenario • We first validated if the Genetic Programming works with partial observation. • A ghost team of simple state machine agents were used as contrahent for evolved Pac-Man agents. • The average performance as well as the performance of the best Pac-Man improved only slightly over time. Alexander Dockhorn Slide 10/20, 23.08.2017

Ghost Team Data and Action Terminals Data Terminals: Action Terminals: • • SeeingPacMan ToPacMan • • IsPacManClose FromPacMan • • IsPacManCloseToPower FromClosestPowerPill • • IsEdible ToClosestPowerPill • • IsPowerPillStillAvailable Split • • DistanceToOtherGhosts Group • EstimatedDistance Alexander Dockhorn Slide 11/20, 23.08.2017

Evaluating Genetic Programming for Ghost Teams • Two Pac-Man agents were used as contrahents for evolved ghost teams. • SimpleAI = state machine agent • MCTSAI = Monte Carlo Tree Search agent • Two approaches were compared: • uniform: • Ghost tTeams are made of four instances of the same individual • all individuals share the same population  single evolution • diverse: • Ghost Teams are made of four instances of different individuals • each individual is of one from four populations  cooperative coevolution Alexander Dockhorn Slide 12/20, 23.08.2017

Single Evolution vs. Cooperative Coevolution Alexander Dockhorn Slide 13/20, 23.08.2017

Genetic Programming Summary • Agents for both agent parties can be learned using genetic programming. • However, we need a suitable contrahent to assist the generation of complex behavior. • Contrahents need to be hand-coded in the current framework.  Time consuming  Can miss possible strategies  Can be limited in the play-strength • How can we combine both genetic programming procedures to get suitable Pac-Man agents AND Ghost Team agents? Alexander Dockhorn Slide 14/20, 23.08.2017

Combined Coevolution Framework • • Ghosts are split into 4 populations Mrs. Pac-Man agents have • one population Each population exhibits its own strategy • The best individuals per population will survive Alexander Dockhorn Slide 15/20, 23.08.2017

Combined Coevolution Framework • The general idea:  When one agent type becomes stronger, their opponents need to react • From our evaluation we can see bumps in Pac-Mans fitness values, which degrade over time – Those correspond to faster strategy changes in the beginning – And higher complexity in the end of the evolutionary process Alexander Dockhorn Slide 16/20, 23.08.2017

Combined Coevolution Framework • We repeated the learning process 10 times to get insights in the general behavior of this learning process – Average points of Pac-Man and the Ghost Team converge over time – Best individuals per population quickly foster new strategies in the next generations – Overall complexity increases very slowly Alexander Dockhorn Slide 17/20, 23.08.2017

Insights • The combined genetic programming reaches the same levels of complexity compared to single evolutionary processes. Favor Pills – …, but is increadible slow in doing so. • Why does the complexity increase so slow? Chase – Due to the scoring of the game, few basic Defend Power-Pills Pac-Man strategies have a high return – This cycle dominates the first generations Eat Ghosts • Open Question: – How can we promote complexity? Alexander Dockhorn Slide 17/20, 23.08.2017

Conclusions • Genetic Programming proved to be capable of generating simple and complex behavior in agents. • Using four diverse ghost controllers was better and converged faster than using only one kind of ghosts – Either it is generally better to have mixed ghost teams – … or individuals from the single population need more time to built up comparable complexity • Combining both genetic programming procedures potentially removes the need of creating suitable opponents. Alexander Dockhorn Slide 18/20, 23.08.2017

Limitations and Open Research Questions • Strategy loops hinder the combined framework in creating more complex strategies – Can those loops be detected during the evolutionary process? – Can we promote more complex solutions? • Local maximas hinder the process – Exchange game induced scoring – Use a dynamic scoring function, which takes current strategies into account? • How can other agent types be included, e.g. learning a multi- objective MCTS score function? Alexander Dockhorn Slide 18/20, 23.08.2017

Thank you for your attention! Check on Updates on our project at: http://fuzzy.cs.ovgu.de/wiki/pmwiki.php/Mitarbeiter/Dockhorn (Download of our project files will be made available soon) by Alexander Dockhorn and Rudolf Kruse Institute for Intelligent Cooperating Systems Department for Computer Science, Otto von Guericke University Magdeburg Universitaetsplatz 2, 39106 Magdeburg, Germany Email: {alexander.dockhorn, rudolf.kruse}@ovgu.de Alexander Dockhorn Slide 20/20, 23.08.2017

Combining Cooperative and Adversarial Coevolution in the Context of - PowerPoint PPT Presentation

Combining Cooperative and Adversarial Coevolution in the Context of Pac-Man by Alexander Dockhorn and Rudolf Kruse Institute for Intelligent Cooperating Systems Department for Computer Science, Otto von Guericke University Magdeburg

Modelling coevolution - even more limited and biased comments Viggo Andreasen, Roskilde

Cooperative Web Caching Cooperative Web Caching Cooperative Caching Cooperative Caching

Coevolution of Lexical Meaning and Pragmatic Use Thomas Brochhagen, Michael Franke & Robert

Coevolution Petr Po s k P. Po s k c 2014 A0M33EOA: Evolutionary Optimization

Cooperative Choice Cooperative and non-cooperative motives and their consequences via Mark

Cooperative Game Theory Outline Introduction Relationship between Non-cooperative and

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google

Neglected topics CS 446 Adversarial examples and deep networks 1 / 23 Adversarial

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

CLIMBS Life and General Insurance Cooperative CLIMBS Life and General Insurance Cooperative A

COOPERATIVE EDUCATION INVENTORY STUDY Association of Cooperative Christina Clamp, PhD.

2018 Navigating the Annual Report and Proxy Season 2018: Looking Ahead Finally, a Full

4x High Performance for Drupal - Step by Step Fabian Franz (Fabianx) - @fabianfranz

1 Get Started 2 3 Web Application Development Bootstrap 4 Badges Badges are used to add

Mergers and Acquisitions Merger: two (or more) corporations are combining to for one single

MIKE AMBINDER, PhD VALVE DATA TO DRIVE DECISION-MAKING HOW AND WHY VALVE USES DATA TO DRIVE THE

CS5412: SPRING 2012 CLOUD COMPUTING Lecture 1 Ken Birman Welcome to CS 5412... 2 A completely

Overview ! !

Follow the WhiteRabbit: Towards Consolidation of On-the-Fly Virtualization and Virtual Machine

Combining Cooperative and Adversarial Coevolution in the Context of - PowerPoint PPT Presentation

Combining Cooperative and Adversarial Coevolution in the Context of Pac-Man by Alexander Dockhorn and Rudolf Kruse Institute for Intelligent Cooperating Systems Department for Computer Science, Otto von Guericke University Magdeburg

Modelling coevolution - even more limited and biased comments Viggo Andreasen, Roskilde

Cooperative Web Caching Cooperative Web Caching Cooperative Caching Cooperative Caching

Coevolution of Lexical Meaning and Pragmatic Use Thomas Brochhagen, Michael Franke &amp; Robert

Coevolution Petr Po s k P. Po s k c 2014 A0M33EOA: Evolutionary Optimization

Cooperative Choice Cooperative and non-cooperative motives and their consequences via Mark

Cooperative Game Theory Outline Introduction Relationship between Non-cooperative and

Stronger and Faster Wasserstein Adversarial Attacks Kaiwen Wu kaiwen.wu@uwaterloo.ca Joint work

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training Xi Wu

Adversarial Examples and Adversarial Training Ian Goodfellow, Sta ff Research Scientist, Google

Neglected topics CS 446 Adversarial examples and deep networks 1 / 23 Adversarial

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias

Synthesizing Robust Adversarial Examples Anish Athalye*, Logan Engstrom*, Andrew Ilyas*, Kevin

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

CLIMBS Life and General Insurance Cooperative CLIMBS Life and General Insurance Cooperative A

COOPERATIVE EDUCATION INVENTORY STUDY Association of Cooperative Christina Clamp, PhD.

2018 Navigating the Annual Report and Proxy Season 2018: Looking Ahead Finally, a Full

4x High Performance for Drupal - Step by Step Fabian Franz (Fabianx) - @fabianfranz

1 Get Started 2 3 Web Application Development Bootstrap 4 Badges Badges are used to add

Mergers and Acquisitions Merger: two (or more) corporations are combining to for one single

MIKE AMBINDER, PhD VALVE DATA TO DRIVE DECISION-MAKING HOW AND WHY VALVE USES DATA TO DRIVE THE

CS5412: SPRING 2012 CLOUD COMPUTING Lecture 1 Ken Birman Welcome to CS 5412... 2 A completely

Overview ! !

Follow the WhiteRabbit: Towards Consolidation of On-the-Fly Virtualization and Virtual Machine

Coevolution of Lexical Meaning and Pragmatic Use Thomas Brochhagen, Michael Franke & Robert

Synthesizing Robust Adversarial Examples Anish Athalye, Logan Engstrom, Andrew Ilyas*, Kevin