RE REIN INFOR FORCEM CEMENT ENT LE LEAR ARNI NING NG AS A - PowerPoint PPT Presentation

RE REIN INFOR FORCEM CEMENT ENT LE LEAR ARNI NING NG AS A PR AS A PROD ODUC UCTION TION TOOL OOL ON ON AAA AAA GA GAME MES Olivie ier r DELALL ALLEAU EAU & A Adrien n LOGU GUT 2017-10 10-18 18

AGENDA Project Overview Fighting in For Honor Driving in Watch_Dogs 2

Project Overview Build AIs that can play y games es like our players ers wo would FOR HONOR Olivier Delalleau Fr édéric Doll Maxim Peter S 2 DOGS WATCH_DOG Adrien Logut Olivier Lamothe-Penelle

Motivations Automated testing Design assistance In-game AI

Why Reinforcement Learning Google Trends reinforcement learning genetic algorithm imitation learning Evolutionary methods Imitation learning

RL & Video games (recent) (incomplete) Atari Doom Minecraft Universe SNES Starcraft II Dota 2 Unity

 

CENTURION GLADIATOR SHINOBI HIGHLANDER

• • • •

A u t o n o m o u s D r i v i n g

Watch_Dogs 2 Open world game within a living city Takes place in San Francisco » Living city  Cars in the street » Cars need to be controlled » by an AI

Objectives How is it currently done? PID controller with custom curves » Hand-tuned curves » Takes a lot of time ▪ Not precise ▪ How about Reinforcement ➢ Learning?

Reinforcement Learning What the agent can see and do Environment Distance to the road: 0.1 Action Velocity: 0.3 Reward State Acceleration: [0,1] Desired speed: 0.9 +3 Brake: [-1, 1] ... Steering: [-1, 1] Agent

Reinforcement Learning What the agent can see and do Continuous States ➢ ➢ Neural network to approximate 𝑹 𝒕 𝒖 , 𝒃 𝒖

Reinforcement Learning What the agent can see and do Continuous States ➢ ➢ Neural network to approximate 𝑹(𝒕 𝒖 , 𝒃 𝒖 ) Continuous Actions ➢ ➢ Cannot use greedy policy from DQN (For Honor) ➢ Neural network to approximate a policy 𝒃 𝒖 ~ 𝝂 𝒕 𝒖

Reinforcement Learning Actor Critic Architecture Two neural networks, approximate functions » Updates Actor : 𝑏 𝑢 ~ 𝜈 𝑡 𝑢 ▪ Critic : 𝑅 𝑡 𝑢 , 𝑏 𝑢 ▪ 𝑏 𝑢 Critic update: 𝑡 𝑢 𝑅(𝑡 𝑢 , 𝑏 𝑢 ) » » Actor » Critic Expected discounted reward Q-Learning (same as For Honor) Actor update: » Policy gradient 𝑏 𝑅 𝑡, 𝑏 𝜄 𝑅 )| 𝑡=𝑡 𝑢 ,𝑏=𝜈 𝑡 𝑢 𝛼 𝜄 𝜈 𝜈 𝑡 𝜄 𝜈 )| 𝑡=𝑡 𝑢 ∇ 𝜄 𝜈 𝐾 = 𝛼

Reinforcement Learning Actor update - Intuition Actor update: » Updates Policy gradient Intuition: » 𝑏 𝑢 Critic gives the direction ▪ 𝑡 𝑢 𝑅(𝑡 𝑢 , 𝑏 𝑢 ) » Actor » Critic to update the actor ▪ “In which way should I change actor parameters in order to maximize the critic output given a state.”

First experience Since we have the PID, what about imitating it? Supervised learning on the actor » Updated with Mean Squared Error between actor ▪ output and PID output Updates 𝑏 𝑢, 𝑏𝑑𝑢𝑝𝑠 𝑡 𝑢 » Actor 2 𝜀 𝑢 = 𝑏 𝑢,𝑏𝑑𝑢𝑝𝑠 − 𝑏 𝑢,𝑄𝐽𝐸 𝑏 𝑢, 𝑄𝐽𝐸 » PID

First experience Supervised vs Original – Slight improvement

Reward Shaping Defining the reward function The reward is the only signal received by the agent » Am I doing good or bad? ▪ This is the key part of reinforcement learning » Called Reward Shaping ▪ Requires a good understanding of the problem ▪ For driving: » Follow the given path at the right speed ▪ Stop when needed ▪

Reward Shaping Defining the reward function - Configuration Three main components are measured: » Velocity along the path 𝒘 𝒚 » Velocity perpendicular to the path 𝒘 𝒛 » 𝒆 Distance from the path » 𝒆 𝒘 𝒚 𝒘 𝒘 𝒛

Reward Shaping Defining the reward function – Desired speed Positive reward when driving close to the desired speed » Negative when far from » the desired speed Punish more when driving » Reward vs. velocity along path faster than slower Desired speed in red » Reward Velocity x

Reward Shaping Defining the reward function – Velocity y Only negative reward » Want to punish harder for » small values (Power < 1) Reward vs. velocity perpendicular to path Reward Velocity y

Reward Shaping Defining the reward function – Distance Only negative reward » Want to punish less » for small values (Power > 1) Reward vs. distance from path Reward Distance

Results The learning curve

Results Good Results after 15 mins

Results One model to rule them all?

Results One model to rule them all? Each vehicle has its own physical model » Accelerate, Steer, Brake all have different reactions » over the vehicles We can still group physically close vehicles » Need more state info for bigger vehicles (Bus, Trucks, …) »

Results One model to rule them all?

Results Need to deal with a lot of variance Game is not deterministic » Even with seeding, different results »

Tools Multi-dimensional function visualizer Developed with » PyQt5 Load the models » and plot the output

Tools Archives reader and comparison tools Developed with » PyQt5 Load the metrics » and plot them to compare models

What’s next? Awesome stuff! Analyze what could be introduced into the game » Level of quality ? Robustness ? Computation time ? Learning time ? Size of models in memory ? Try other learning algorithms » Optimize workflow with multiple » agents

Conclusion Reinforcement learning is promising Found efficient fighting behavior in For Honor Already better driving in Watch_Dogs 2 compared to PID It is just the beginning… Still a lot of work and research to do Not ready to use in production... yet The future? Player-facing AIs

Thank you! Do you have questions? laforge@ubisoft.com PS: we’re hiring (!)

RE REIN INFOR FORCEM CEMENT ENT LE LEAR ARNI NING NG AS A - PowerPoint PPT Presentation

RE REIN INFOR FORCEM CEMENT ENT LE LEAR ARNI NING NG AS A PR AS A PROD ODUC UCTION TION TOOL OOL ON ON AAA AAA GA GAME MES Olivie ier r DELALL ALLEAU EAU & A Adrien n LOGU GUT 2017-10 10-18 18 AGENDA Project

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

PORTLAND CEMENT GRAY GOLD CEMENT 101 CONCRETE IS MADE FROM CEMENT, SAND, ROCKS, AND WATER THE

Infor EAM v11 Showcase Barry Diedericks 24 March 2016 Agenda Introductions Infor Enterprise

Portland Cement Concrete Portland Cement Concrete Portland cement concrete consists of a mixture

Virtualisation with Vagrant By Noorvir Aulakh James Lambert Thomas Rickard Le Lear arni ning

FROM M DEEP LE P LEAR ARNI NING NG TO NE NEXT-GEN N VI VISU SUAL ALIZA IZATION ION: :

Lear arni ning ng an and As d Assessment ment (OTL TLA) A) Techn hnic ical al Skil

Gr Grow owth th Mi Mind ndset set an and d Bu Build ildin ing Le g Lear arni ning

A A deep ep l lear arni ning-bas based ed p pipel peline ne f for e error de detecti

Re Recent nt trends nds in n Aut Autom omated ed Machi chine ne Le Lear arni ning ng

I . Preliminaries: practical matters I . Preliminaries: practical matters A. Office

Presentation Arni Oddur Thordarson, CEO Linda Jonsdottir, CFO 5 February 2014 Arni Oddur

BERRIMA CEMENT > Community Meeting 27 July 2017 Berrima Cement Works Berrima Agenda

Asphalt Cement Grading Asphalt Cement Performance During its lifetime, asphalt cement must

A review of global cement industry trends INSIGHTS FROM THE GLOBAL CEMENT REPORT, 12 TH EDITION

LEAR C ONTIGUOUS A REAS A NALYSIS (CAA) M APPING R EFINEMENT LEAR Open House Presentation April

Signaling and Preplay communication Felix Munoz-Garcia Strategy and Game Theory - Washington

Str Stron ong Home Team Strate Home Team Strategy gy No Reco No Recover ery Appr y Approac

Renew Boston Strategy Board Meeting 9 June 6, 2014 Chorus Foundation Agenda 2013 Update on

Chapter 12 Some other key rights: freedom of thought, conscience, religion, opinion, expression,

McCombs Knowledge To Go December 14, 2010 Business as Sport by Professor Eli Cox Webinar

Recommender Systems + uRank Multimedia Information Systems 2 Cecilia di Sciascio

Minimum rate of descent in gliding flight Introduc)on to

Minimum descent angle in gliding flight Introduc)on to

RE REIN INFOR FORCEM CEMENT ENT LE LEAR ARNI NING NG AS A - PowerPoint PPT Presentation

RE REIN INFOR FORCEM CEMENT ENT LE LEAR ARNI NING NG AS A PR AS A PROD ODUC UCTION TION TOOL OOL ON ON AAA AAA GA GAME MES Olivie ier r DELALL ALLEAU EAU & A Adrien n LOGU GUT 2017-10 10-18 18 AGENDA Project

Cement, Aggregates, Mining Presentation Cement, Aggregates and Mining Cement, Aggregates and

PORTLAND CEMENT GRAY GOLD CEMENT 101 CONCRETE IS MADE FROM CEMENT, SAND, ROCKS, AND WATER THE

Infor EAM v11 Showcase Barry Diedericks 24 March 2016 Agenda Introductions Infor Enterprise

Portland Cement Concrete Portland Cement Concrete Portland cement concrete consists of a mixture

Virtualisation with Vagrant By Noorvir Aulakh James Lambert Thomas Rickard Le Lear arni ning

FROM M DEEP LE P LEAR ARNI NING NG TO NE NEXT-GEN N VI VISU SUAL ALIZA IZATION ION: :

Lear arni ning ng an and As d Assessment ment (OTL TLA) A) Techn hnic ical al Skil

Gr Grow owth th Mi Mind ndset set an and d Bu Build ildin ing Le g Lear arni ning

A A deep ep l lear arni ning-bas based ed p pipel peline ne f for e error de detecti

Re Recent nt trends nds in n Aut Autom omated ed Machi chine ne Le Lear arni ning ng

I . Preliminaries: practical matters I . Preliminaries: practical matters A. Office

Presentation Arni Oddur Thordarson, CEO Linda Jonsdottir, CFO 5 February 2014 Arni Oddur

BERRIMA CEMENT &gt; Community Meeting 27 July 2017 Berrima Cement Works Berrima Agenda

Asphalt Cement Grading Asphalt Cement Performance During its lifetime, asphalt cement must

A review of global cement industry trends INSIGHTS FROM THE GLOBAL CEMENT REPORT, 12 TH EDITION

LEAR C ONTIGUOUS A REAS A NALYSIS (CAA) M APPING R EFINEMENT LEAR Open House Presentation April

Signaling and Preplay communication Felix Munoz-Garcia Strategy and Game Theory - Washington

Str Stron ong Home Team Strate Home Team Strategy gy No Reco No Recover ery Appr y Approac

Renew Boston Strategy Board Meeting 9 June 6, 2014 Chorus Foundation Agenda 2013 Update on

Chapter 12 Some other key rights: freedom of thought, conscience, religion, opinion, expression,

McCombs Knowledge To Go December 14, 2010 Business as Sport by Professor Eli Cox Webinar

Recommender Systems + uRank Multimedia Information Systems 2 Cecilia di Sciascio

Minimum rate of descent in gliding flight Introduc)on to

Minimum descent angle in gliding flight Introduc)on to

BERRIMA CEMENT > Community Meeting 27 July 2017 Berrima Cement Works Berrima Agenda