Crawling in Rogues dungeons with (partitioned) A3C Andrea Asperti, - PowerPoint PPT Presentation

Crawling in Rogue’s dungeons with (partitioned) A3C Andrea Asperti, Daniele Cortesi, Francesco Sovrano University of Bologna Department of Informatics: Science and Engineering (DISI) Fourth International Conference on Machine Learning, Optimization, and Data Science September 13-16, 2018 Volterra, Tuscany, Italy Asperti, Cortesi, Sovrano. Crawling in Rogue’s Dungeons with (Partitione) A3C 1

Learning to play Rogue through Reinforcement Learning Rogue: a famous video games of the ’80, the ancestor of this gender. The player (the rogue) must retrieve the amulet of Yendor inside a dungeon composed of many levels, collecting objects and fighting enemies. We exclusively focus on roaming inside the dungeon: find the stairs and take them to descend to the next level. Asperti, Cortesi, Sovrano. Crawling in Rogue’s Dungeons with (Partitione) A3C 2

Why games Game-like environments, providing abstractions of real-life situations, have been at the core of many recent breakthroughs in Deep RL (mostly by Deep Mind): • Atari Games: DQN [4], A3C [3] (Mnih et al.) • Sokoban: Imagination augmentation [6] (Weber et al.) • Labyrinth: ACER [5] (Wang et al.) Mazes and labyrinths are a traditional topic of reinforcement learning, often requiring memory , attention , and the acquisition of complex, non-reactive behaviors based on long-term planning . Asperti, Cortesi, Sovrano. Crawling in Rogue’s Dungeons with (Partitione) A3C 3

Why Rogue Rogue has many challenging features for Deep RL: • no level replay : dungeons are randomly generated and always different from each other • partially observable (POMD): the map gets discovered during exploration • sparse rewards The ASCII interface allows us to focus on the really challenging aspects of the game, bypassing image detection problems (by now well understood). Asperti, Cortesi, Sovrano. Crawling in Rogue’s Dungeons with (Partitione) A3C 4

Rogueinbox In previous works [2, 1] we developed an API for Rogue, easing the development of automatic agents; the library was tested of many architectures, comprising Qlearning, A3C, and ACER. Rogueinabox allows an easy configuration of many game parameters, such as: ◮ monsters ◮ traps and secret passages ◮ dark rooms and mazes ◮ starvation a Rogue layer configured ◮ location of the amulet to just contain mazes Asperti, Cortesi, Sovrano. Crawling in Rogue’s Dungeons with (Partitione) A3C 5

Achievements overview Proviso: ◮ we just focus on movement : no monsters, objects, food, ... ◮ learning based on a single level : find and take the stairs - no dark rooms, no traps, no hidden passages - maximum steps: 500 moves Achievements: agent random DQN [2] this work succes rate 7% 23% 98% Table: Achievements overview Asperti, Cortesi, Sovrano. Crawling in Rogue’s Dungeons with (Partitione) A3C 6

Main architectural ingredients 1. the adoption of A3C as learning framework 2. an agent-centered, cropped representation of the state 3. a supervised partition of the problem in a predefined set of situations , each one delegated to a different A3C agent A3C (Mnih et al.) “On-policy” technique: - A syncronous: exploting a set of asynchronous agents - A dvantage: a formal notion expressing the convenience of an action in a given state - A ctor- C ritic: the policy π is the actor and the value function V is the critic. Asperti, Cortesi, Sovrano. Crawling in Rogue’s Dungeons with (Partitione) A3C 7

Neural network Asperti, Cortesi, Sovrano. Crawling in Rogue’s Dungeons with (Partitione) A3C 8

Situations and rewards Situations: Rewards: 1. +1 for entering a new door 1. corridor 2. stairs in view 2. +1 for discovering a new doors 3. +10 for descending the stairs 3. adjiacent to wall 4. other 4. − 0 . 01 for other actions Situations and rewards are quite ad-hoc (weak!!) Asperti, Cortesi, Sovrano. Crawling in Rogue’s Dungeons with (Partitione) A3C 9

Demo! Figure: Agent’s behaviour after 40 millions iterations A longer version is available on youtube Asperti, Cortesi, Sovrano. Crawling in Rogue’s Dungeons with (Partitione) A3C 10

Conclusions • The rogue movement is not perfect, but satisfactory • Some projectual choices are weak: - situations - rewarding mechanism - cropped view • We are already working on these issues with promising results Asperti, Cortesi, Sovrano. Crawling in Rogue’s Dungeons with (Partitione) A3C 11

Conclusions • The rogue movement is not perfect, but satisfactory • Some projectual choices are weak: - situations - rewarding mechanism - cropped view • We are already working on these issues with promising results thanks for your attention Asperti, Cortesi, Sovrano. Crawling in Rogue’s Dungeons with (Partitione) A3C 12

Bibliography A.Asperti, C.De Pieri, M.Maldini, G.Pedrini, and F.Sovrano. A modular deep-learning environment for rogue. WSEAS Transactions on Systems and Control , 12, 2017. A.Asperti, C. De Pieri, and G.Pedrini. Rogueinabox: an environment for roguelike learning. International Journal of Computers , 2:146–154, 2017. V.Mnih et al. Asynchronous methods for deep reinforcement learning. CoRR , abs/1602.01783, 2016. V.Mnih et al. Human-level control through deep reinforcement learning. Nature , 518(7540):529–533, 2015. Z.Wang et al. Sample efficient actor-critic with experience replay. 2016. T.Weber et al. Imagination-augmented agents for deep reinforcement learning. CoRR , abs/1707.06203, 2017. Asperti, Cortesi, Sovrano. Crawling in Rogue’s Dungeons with (Partitione) A3C 13

Crawling in Rogues dungeons with (partitioned) A3C Andrea Asperti, - PowerPoint PPT Presentation

Crawling in Rogues dungeons with (partitioned) A3C Andrea Asperti, Daniele Cortesi, Francesco Sovrano University of Bologna Department of Informatics: Science and Engineering (DISI) Fourth International Conference on Machine Learning,

CRAWLING WIT ITH Deeksha Kushal Motwani APACHE NUTCH Shailender Joseph Web-Crawling Apache

Rogue Waves Thama Duba, Colin Please, Graeme Hocking, Kendall Born, Meghan Kennealy 18 January

Bargaining Failure: Aggression by Rogue States Class 7 What is a rogue state? What is a rogue

Dungeons and DQNs Toward Reinforcement Learning Agents that Play Tabletop Roleplaying Games

Pitfalls of Crawling Crawling, session 7 CS6200: Information Retrieval Slides by: Jesse Anderton

1 A Crawler Architecture Web Crawler Starts with a set of seeds Seeds are added to a URL

Web Crawling Najork and Heydon, High-Performance Web Crawling , Compaq SRC Research Report

Web Crawling Najork and Heydon, High-Performance Web Crawling , Compaq SRC Research Report

Novel Gaits for a Novel Novel Gaits for a Novel Crawling/Grasping Mechanism Crawling/Grasping

Crawling HTML Query processing Content Analysis Indexing Crawling Document Layer Network

HTTP Crawling Crawling, session 2 CS6200: Information Retrieval Slides by: Jesse Anderton A

Crawling Structured Data Crawling, session 10 CS6200: Information Retrieval Slides by: Jesse

Web Crawling Najork and Heydon, High-Performance Web Crawling , Compaq SRC Research Report

Building Partitioned Architectures Building Partitioned Architectures based on the based on the

Block and Triangular Matrices Block Matrices Defn. A partitioned matrix has the rows and columns

DUNGEONS & DRAGONS As a Drupal project Hacking and slashing our way through real-world

Automatic Summarization Project Anca Burducea Joe Mulvey Nate Perkins April 28, 2015 Outline

Scholars Recommendation System Based on academic knowledge graph Group member: Chengyongxiao

White supremacy, Climate crisis & human trauma Can we design to confront all three issues?

Wisdom: The Art of Godly Living Ecclesiastes in Five Weeks Ecclesiastes in Two Words

Peephole runs COMP 520: Compiler Design (4 credits) Professor Laurie Hendren

Proofs, Continued Today Proofs : A style guide Proofs should be easy to

The Twenty-Eighth Sunday in Ordinary Time Welcome Home Gathering Song Faith of Our Fathers

02291: System Integration UML State Machines Hubert Baumeister huba@dtu.dk DTU Compute

Crawling in Rogues dungeons with (partitioned) A3C Andrea Asperti, - PowerPoint PPT Presentation

Crawling in Rogues dungeons with (partitioned) A3C Andrea Asperti, Daniele Cortesi, Francesco Sovrano University of Bologna Department of Informatics: Science and Engineering (DISI) Fourth International Conference on Machine Learning,

CRAWLING WIT ITH Deeksha Kushal Motwani APACHE NUTCH Shailender Joseph Web-Crawling Apache

Rogue Waves Thama Duba, Colin Please, Graeme Hocking, Kendall Born, Meghan Kennealy 18 January

Bargaining Failure: Aggression by Rogue States Class 7 What is a rogue state? What is a rogue

Dungeons and DQNs Toward Reinforcement Learning Agents that Play Tabletop Roleplaying Games

Pitfalls of Crawling Crawling, session 7 CS6200: Information Retrieval Slides by: Jesse Anderton

1 A Crawler Architecture Web Crawler Starts with a set of seeds Seeds are added to a URL

Web Crawling Najork and Heydon, High-Performance Web Crawling , Compaq SRC Research Report

Web Crawling Najork and Heydon, High-Performance Web Crawling , Compaq SRC Research Report

Novel Gaits for a Novel Novel Gaits for a Novel Crawling/Grasping Mechanism Crawling/Grasping

Crawling HTML Query processing Content Analysis Indexing Crawling Document Layer Network

HTTP Crawling Crawling, session 2 CS6200: Information Retrieval Slides by: Jesse Anderton A

Crawling Structured Data Crawling, session 10 CS6200: Information Retrieval Slides by: Jesse

Web Crawling Najork and Heydon, High-Performance Web Crawling , Compaq SRC Research Report

Building Partitioned Architectures Building Partitioned Architectures based on the based on the

Block and Triangular Matrices Block Matrices Defn. A partitioned matrix has the rows and columns

DUNGEONS &amp; DRAGONS As a Drupal project Hacking and slashing our way through real-world

Automatic Summarization Project Anca Burducea Joe Mulvey Nate Perkins April 28, 2015 Outline

Scholars Recommendation System Based on academic knowledge graph Group member: Chengyongxiao

White supremacy, Climate crisis &amp; human trauma Can we design to confront all three issues?

Wisdom: The Art of Godly Living Ecclesiastes in Five Weeks Ecclesiastes in Two Words

Peephole runs COMP 520: Compiler Design (4 credits) Professor Laurie Hendren

Proofs, Continued Today Proofs : A style guide Proofs should be easy to

The Twenty-Eighth Sunday in Ordinary Time Welcome Home Gathering Song Faith of Our Fathers

02291: System Integration UML State Machines Hubert Baumeister huba@dtu.dk DTU Compute

DUNGEONS & DRAGONS As a Drupal project Hacking and slashing our way through real-world

White supremacy, Climate crisis & human trauma Can we design to confront all three issues?