Emergent solu-ons to high dimensional mul--task reinforcement - PowerPoint PPT Presentation

“Humies” Compe--on GECCO 2018 Emergent solu-ons to high dimensional mul--task reinforcement learning Stephen Kelly & Malcolm Heywood

Why does the result qualify as human compe--ve? Visual State s ( t ) End-of-Evalua-on Game Playing Game score Agent Game -tle: - Atari - Doom Atomic Ac-on a ( t ) July 2018 Humies 2

Visual RL dominated by Deep learning • DQN (2015) – Visual RL on Atari Learning Environment (49 -tles) – Q-learning with Deep learning – Cropped visual image (84 × 84) – Frame stacking (removes the interleaving of sprites & stochas-c proper-es) – “able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games” [Nature (2015) Vol. 518] • Gorila (2015), Double Q (2016), Dueling DL (2016), AC3 (2016), Noisy DQN (2017), Distribu-onal DQN (2017), Rainbow (2018) • One policy per game -tle • Learning parameters and DNN topology iden-fied a priori July 2018 Humies 3

Visual RL Compared to ‘human’ 100 (algorithm – rnd)/(Human – rnd) Log ( % human ) TPG DQN Gorila Double-DQN H-NEAT 10000 Sta-s-cally equivalent Algorithm Beker than Human 1000 Human 100 level Best -tle Worst -tle 10 Algorithm Worse than Human 1 July 2018 Humies 4

Visual RL and Mul--task learning • Mul-ple game -tles played by single agent • Single -tle DQN provides the baseline • Best DNN result needs prior knowledge regarding parameters and topology • Cons-tutes an example of a task pertaining to ‘Ar-ficial General Intelligence’ July 2018 Humies 5

Single Title DQN score (%DQN score) July 2018 Mul---tle TPG versus Single--tle DQN Worse Beker Log 1000 100 10 1 Alien Bakle Zone Group 1 Asteroids Bank Heist Bowling Copper Com. Humies Cen-pede Group 2 Fishing Derby Kangaroo Frostbite Krull Kung-Fu Group 3 Ms.Pac-Man Time Pilot Private Eye 6

Why [is our entry] ‘best’ in comparison to other entries? • Single -tle task – TPG provides solu-ons compe--ve with human and DQN – Agents have to be compe--ve over mul-ple game -tles • Mul---tle task – TPG mul--task solu-on is compe--ve with DQN trained under single -tle sepng – DNN state-of-the-art in single task does not address Mul---tle task • TPG for Single -tle task a special case of TPG for Mul-- -tle task July 2018 Humies 7

The ‘icing on the cake’ • TPG addresses mul-ple issues simultaneously: – Complexity of topology is emergent and: • Highly modular • Unique to the task • Explicitly reflects a decomposi-on of the task – No image specific instruc9ons just: • Four 2 Argument operators {+, −, ×, ÷} • Three 1 Argument operators {log, exp, cosine} • One condi-onal operator – TPG highly efficient computa9onally – Some examples… July 2018 Humies 8

Entire champion policy graph ● Visited per decision during test ● Bowling Ms. Pac − Man Teams (nodes) per 200 [ diko pixels used ] ● ● Overall solu-on graph emerge… 100 ● Boxing complexity ● ● ● ● ● Alien ● ● 50 ● ● ● ● Number of Teams ● ● ● ● ● ● ● 20 ● ● 10 ● Asteroids ● Per Decision complexity 5 2 ● Rand 1 0 200 400 600 800 Generation July 2018 Humies 9

Emergent discovery of Mul---tle solu-ons { } { } { } { } Ms. Pac − Man { } { } 1 { } { } { } { } { } { } 7 0 { } { } { } { } 1 { } { } { } { } { } { } { } { } { } { } 3 0 1 { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } Frostbite Centipede { } { } { } { } { } { } July 2018 Humies 10

Run -me complexity DQN TPG • ≈1.6 million weights in MLP • Single -tle – 71 – 2346 Instruc-ons (avg) • ≈3.2 million convolu-on opera-ons in DNN • Mul- -tle – 413 – 869 Instruc-ons (avg) • 3.2 GHz Intel i7-4700s • 2.2 GHz Intel E5-2650 – 5 decisions per second – Single -tle: • GPU accelera-on • 758-2853 decisions per sec. – 330 decisions per second – Mul---tle • 1832-2922 decisions per sec. July 2018 Humies 11

Ques-ons? July 2018 Humies 12

Emergent solu-ons to high dimensional mul--task reinforcement - PowerPoint PPT Presentation

Humies Compe--on GECCO 2018 Emergent solu-ons to high dimensional mul--task reinforcement learning Stephen Kelly & Malcolm Heywood Why does the result qualify as human compe--ve? Visual State s ( t ) End-of-Evalua-on Game

Mul$lingual web- based communica$on solu$ons for the

Func+on applica+ons (calls, invoca+ons) lambda denotes a anonymous func+on To use a func+on, you

Emergent behaviour in virtual agents Emergent behaviour in virtual agents Colin Chibaya Colin

Cable roung soluons Roxanne Guene1e Contribu*ons from Lee

Mul&lingualism @ ECUAD Debora O & Tara Wren

Mul$-Object Synchroniza$on Mul$-Object Programs What happens

Mul$-Object Synchroniza$on Mul$-Object Programs What happens

WhatisGameTheory?I* Verygeneralmathema)calframeworktostudysitua)ons*

Mul$dimensional arrays CSCI 136: Fundamentals of Computer Science

Mul$dimensional Core- Collapse Simula$ons in FLASH & some

We all know Health Care Became more As Your Employee Benefit Solu(ons Partner We simplify

Problems, solu=ons, algorithms CS420 lecture one In this

Emergent Invasive Plant Program A CNPS Chapter model for early detection and effective response to

Emergent Trilateralism in Developing Asia Emergent Trilateralism in Developing Asia Long Term

Emergent Trilateralism in Developing Asia Emergent Trilateralism in Developing Asia Long Term

Emergent Distributed Bio-Organization: A Framework for Achieving Emergent Properties in

Who Gets Hous ho Gets Housing Fir ing First? st? Facilitated Discussion on Prioritization

PLAYING ATARI WITH DEEP REINFORCEMENT LEARNING NEURAL NETWORK VISION FOR ROBOT DRIVING ARJUN

Welcome! Todays Agenda: Introduction Light Sources Materials Sensors

Tatsu Red B Product Motivation When my ski team is out on the slopes, we have to check

DeepMDP Learning Latent Space Continuous Models for Representation Learning Carles Gelada,

Game Engines 1 Overview Game engines are a significant part of the modern games industry

The Project FeederWatch Top 20 feeder birds in the Southeast Based on the reports of citizen

Tracking Cold Related Illness in Massachusetts Photo: "Boston Winter Days" by

Emergent solu-ons to high dimensional mul--task reinforcement - PowerPoint PPT Presentation

Humies Compe--on GECCO 2018 Emergent solu-ons to high dimensional mul--task reinforcement learning Stephen Kelly & Malcolm Heywood Why does the result qualify as human compe--ve? Visual State s ( t ) End-of-Evalua-on Game

Mul$lingual web- based communica$on solu$ons for the

Func+on applica+ons (calls, invoca+ons) lambda denotes a anonymous func+on To use a func+on, you

Emergent behaviour in virtual agents Emergent behaviour in virtual agents Colin Chibaya Colin

Cable rou*ng solu*ons Roxanne Guene1e Contribu*ons from Lee

Mul&amp;lingualism @ ECUAD Debora O &amp; Tara Wren

Mul$-Object Synchroniza$on Mul$-Object Programs What happens

Mul$-Object Synchroniza$on Mul$-Object Programs What happens

What*is*Game*Theory?*I* Very*general*mathema)cal*framework*to*study*situa)ons*

Mul$dimensional arrays CSCI 136: Fundamentals of Computer Science

Mul$dimensional Core- Collapse Simula$ons in FLASH &amp; some

We all know Health Care Became more As Your Employee Benefit Solu(ons Partner We simplify

Problems, solu=ons, algorithms CS420 lecture one In this

Emergent Invasive Plant Program A CNPS Chapter model for early detection and effective response to

Emergent Trilateralism in Developing Asia Emergent Trilateralism in Developing Asia Long Term

Emergent Trilateralism in Developing Asia Emergent Trilateralism in Developing Asia Long Term

Emergent Distributed Bio-Organization: A Framework for Achieving Emergent Properties in

Who Gets Hous ho Gets Housing Fir ing First? st? Facilitated Discussion on Prioritization

PLAYING ATARI WITH DEEP REINFORCEMENT LEARNING NEURAL NETWORK VISION FOR ROBOT DRIVING ARJUN

Welcome! Todays Agenda: Introduction Light Sources Materials Sensors

Tatsu Red B Product Motivation When my ski team is out on the slopes, we have to check

DeepMDP Learning Latent Space Continuous Models for Representation Learning Carles Gelada,

Game Engines 1 Overview Game engines are a significant part of the modern games industry

The Project FeederWatch Top 20 feeder birds in the Southeast Based on the reports of citizen

Tracking Cold Related Illness in Massachusetts Photo: &quot;Boston Winter Days&quot; by

Cable roung soluons Roxanne Guene1e Contribu*ons from Lee

Mul&lingualism @ ECUAD Debora O & Tara Wren

WhatisGameTheory?I* Verygeneralmathema)calframeworktostudysitua)ons*

Mul$dimensional Core- Collapse Simula$ons in FLASH & some

Tracking Cold Related Illness in Massachusetts Photo: "Boston Winter Days" by