Sergio Ruiz Computer Science Department Tecnolgico de Monterrey, - PowerPoint PPT Presentation

GPU A GPU Acce cceler lerated Mar ted Markov v Decision Decision Pr Proce ocesse sses s in C in Crowd d Simula Simulation tion Sergio Ruiz Computer Science Department Tecnológico de Monterrey, CCM Mexico City, México sergio.ruiz.loza@itesm.mx Benjamín Hernández National Center for Computational Sciences Oak Ridge National Laboratory Tennessee, USA hernandezarb@ornl.gov

Contents • Introduction • Optimization Approaches • Problem solving strategy • A simple example • Algorithm description • Results • Conclusions & future work 2

Cr Crowd d Simula Simulation tion Path Planning Local Collision Avoidance (LCA) 3

Optimization Approaches • According to (Reyes et al. 2009, Foka and Trahanias 2003), Markov Decision Processes (MDPs) are computationally inefficient: as the state space grows, the problem becomes intractable. • Decomposition offers the possibility to solve large MDPs (Sucar 2007, Meuleau et al. 1998, Singh and Cohn 1998), either in State Space decomposition, or Process decomposition. • (Mausam and Weld. 2004) follow the idea of concurrency to solve MDPs generating solutions close to optimal extending the Labeled Real-time Dynamic Programming method. 4

Optimization Approaches • (Sucar 2007) proposes a parallel implementation of weakly coupled MDPs. • (Jóhansson 2009) presents a dynamic programming framework that implements the Value Iteration algorithm to solve MDPs using CUDA. • (Noer 2013) explores the design and implementation of a point- based Value Iteration algorithm for Partially Observable MDPs (POMDPs) with approximate solutions. The GPU implementation supports belief stat pruning which avoids calculations. 5

Pr Problem oblem Solving Str Solving Strate tegy • We propose a parallel Value Iteration MDP solving algorithm to guide groups of agents toward assigned goals while avoiding obstacles interactively. For optimal performance the algorithm is run over a hexagonal grid in the context of a Fully Observable MDP. 6

Pr Problem oblem Solving Str Solving Strate tegy A Markov Decision Process is a tuple 𝑁 = 𝑇, 𝐵, 𝑈, 𝑆 • • S is a finite set of states. In our case, 2D cells. • A is a finite set of actions. In our case, 6 directions. • T is a transition model T(s, a, s’) . • R is a reward function R(s). A policy 𝜌 is a solution that specifies the action for an agent • at a given state. 𝜌 ∗ is the optimal policy. • Transition 7

Problem Pr oblem Solving Str Solving Strate tegy States Value Iteration ∗ 𝑡 = 𝑏𝑠𝑕𝑛𝑏𝑦 𝑏 𝑅 𝑢 𝑡, 𝑏 𝜌 𝑢 5 𝑏 𝑊 𝑅 𝑢 𝑡, 𝑏 = 𝑆 𝑡, 𝑏 + 𝛿 𝑈 𝑢−1 𝑘 𝑡𝑘 𝑘=0 𝑢 𝑡 = 𝑅 𝑢 𝑡, 𝜌 ∗ 𝑡 𝑊 ; 𝑊 0 𝑡 = 0 8

Pr Problem oblem Solving Str Solving Strate tegy • We propose to temporarily override the optimal policy when agent density in a cell is above a certain threshold 𝝉 . 9

A simplified example ∗ 𝑡 = 𝑏𝑠𝑕𝑛𝑏𝑦 𝑏 𝑅 𝑢 𝑡, 𝑏 𝜌 𝑢 2 𝑏 𝑊 𝑅 𝑢 𝑡, 𝑏 = 𝑆 𝑡, 𝑏 + 𝛿 𝑈 𝑡𝑘 𝑢−1 𝑘 1 2 3 4 𝑘=0 a -3 -3 -3 +100 What is 𝜌 for cell a3 a3 ? b -3 -3 -100 𝜌 𝑏3 = max{𝑅 𝑏3, 𝑋 , 𝑅 𝑏3, 𝑂 , 𝑅 𝑏3, 𝐹 } c -3 -3 -3 -3 𝑅 𝑏3, 𝑭 = 100 + 1.0(0. 0.8(1 8(100) ) + 0.1(-3) + 0.1(0)) 𝑅 𝑏3, 𝑿 = -3 + 1.0 (0. 0.1(1 1(100) ) + 0.8(-3) + 0.1(0)) A = { N , W, E } 𝑅 𝑏3, 𝑶 = 0 + 1.0 (0. 0.1( 1(100) ) + 0.1(-3) + 0.8(0)) 𝛿 = 1 (for simplicity) => max is 𝑅 𝑏3, 𝑭 Transitions: p = 0.8 (probability of taking a current action) q = 0.1 (probability of taking another action) 𝑅 𝑏3, 𝐅 = 100 + 1.0 ( 0. 0.8(100) ) + 0.1(-3) + 0.1(0) ) 𝑅 𝑏3, 𝐗 = -3 + 1.0 ( 0. 0.1(100) ) + 0.8(-3) + 0.1(0) ) 𝑅 𝑏3, 𝐎 = 0 + 1.0 ( 0. 0.1(100) ) + 0.1(-3) + 0.8(0) ) 2 𝛿 𝑆 𝑡, 𝑏 𝑏 𝑊 𝑘 𝑈 𝑡𝑘 𝑘=0 10

Algorithm 𝑅 𝑏3, 𝐅 = 100 + 1.0 ( 0. 0.8(100) ) + 0.1(-3) + 0.1(0) ) 𝑅 𝑏3, 𝐗 = -3 + 1.0 ( 0. 0.1(100) ) + 0.8(-3) + 0.1(0) ) 𝑅 𝑏3, 𝐎 = 0 + 1.0 ( 0. 0.1(100) ) + 0.1(-3) + 0.8(0) ) 2 𝛿 𝑆 𝑡, 𝑏 𝑏 𝑊 𝑘 𝑈 𝑡𝑘 𝑘=0 – Data collect: current cell needs to know rewards from neighboring cells and out of bound values. 𝑏 and 𝑆 𝑡, 𝑏 = 𝑆𝑋 – Input generation: build 𝑈 𝑡𝑘 – Value Iteration: optimal policy computed using parallel transformations and parallel reduction by key. 11

Al Algorithm: gorithm: input gener input generation tion • Transition matrix requirements: 𝑞 ⋯ 𝑞 𝑟 𝑗 ⋯ 𝑟 𝑗 𝑅 = ⋮ ⋱ ⋮ ⋮ ⋱ ⋮ 𝑈 𝑄 = 𝑈 𝑠,𝑑 𝑞 ⋯ 𝑞 𝑟 𝑗 ⋯ 𝑟 𝑗 1 ⋯ 0 0 ⋯ 1 𝐸 𝐵 = 𝐸 𝐶 = ⋮ ⋱ ⋮ ⋮ ⋱ ⋮ 0 ⋯ 1 1 ⋯ 0 Dimensions: |A|x|A| i.e. each cell can compute neighboring info 𝑟 𝑠 ∈ 1, 𝑁𝐸𝑄 𝑠𝑝𝑥𝑡 𝑟 𝑗 = 𝑆𝐹 𝑗 −1 𝑑 ∈ 1, 𝑁𝐸𝑄 𝑑𝑝𝑚𝑣𝑛𝑜𝑡 12

Algorithm: Al gorithm: input gener input generation tion 𝑞 𝑟 𝑟 𝑅 ∘ 𝐸 𝐶 = 𝑟 𝑞 𝑟 where 𝑈 𝑠,𝑑 = 𝑈 𝑞 ∘ 𝐸 𝐵 + 𝑈 𝑠,𝑑 𝑟 𝑟 𝑞 𝑅 𝑏3, 𝐅 = 100 + 1.0 ( 0.8 .8(100) + 0.1(-3) + 0.1(0) ) 𝑅 𝑏3, 𝐗 = -3 + 1.0 ( 0.1 .1(100) ) + 0.8(-3) + 0.1(0) ) 𝑅 𝑏3, 𝐎 = 0 + 1.0 ( 0.1 .1(100) ) + 0.1(-3) + 0.8(0) ) 𝑏 computation: Transition matrix 𝑈 𝑡𝑘 Represents a Cell 𝑈 ⋯ 𝑈 1,1 1,𝑁𝐸𝑄 𝑑𝑝𝑚𝑣𝑛𝑜𝑡 𝑏 = ⋮ ⋱ ⋮ 𝑈 𝑡𝑘 𝑈 𝑁𝐸𝑄 𝑠𝑝𝑥𝑡 ,1 ⋯ 𝑈 𝑁𝐸𝑄 𝑠𝑝𝑥𝑡 ,𝑁𝐸𝑄 𝑑𝑝𝑚𝑣𝑛𝑜𝑡 13

Algorithm: Al gorithm: Par arallel allel Value Iter alue Iteration tion 1. Computation of Q-values . 𝜌 𝑢 = 𝑆𝑋 + 𝛿 𝑈 𝑡𝑘 𝑏 𝑊 Consecutive parallel transformations (mult, mult, sum) results in a matrix Q that stores |A| -tuple of policies for taking all actions per each cell. 14

Al Algorithm: gorithm: Par arallel allel Value Iter alue Iteration tion 2. Selection of best Q-values . – Parallel reduction: from every consecutive |A|-tuple in 𝜌 𝑢 , the largest value index indicates current best policy. 3. Check for convergence . – If 𝜌 𝑢 − 𝜌 𝑢−1 = [0, … , 0] 15

Cr Crowd d Na Navig vigation tion Video ideo https://www.youtube.com/watch?v=369td2O8dxY 16

Results: esults: test s test scenarios cenarios Office (1,584 cells) Maze (100x100 Champ de Mars cells) (100x100 cells) Implementation: CUDA Thrust, OpenMP and CUDA Backbends CPU: Intel Core i7 CPU running at 3.40GHz. ARM (Jetson TK1): 32 bit ARM quad-core Cortex-A15 CPU running at 2.32GHz. GPUs: Tegra K1 192 CUDA Cores, Tesla K40c 2880 CUDA cores, Geforce GTX TITAN 2688 CUDA cores. 17

Results: esults: GPU GPU perf perfor ormance mance 18

Results: esults: GPU GPU speedup speedup Intel CPU baseline: 8 threads ARM CPU baseline: 4 threads 19

Conc Conclusion lusion • Parallelization of the proposed algorithm was made possible by formulating it in terms of matrix “massive” operations, leveraging the data parallelism in GPU computing to reduce the MDP solution time. • We demonstrated that standard parallel transformation and reduction operations provide the means to solve MDPs via Value Iteration with optimal performance. 20

Conc Conclusion lusion • Taking advantage of the proposed hexagonal grid partitioning method, our implementation provides a good level of space discretization and performance. • We obtained a 90x speed up using GPUs enabling us to simulate crowd behavior interactively. • We found the Jetson TK1 GPU to have a remarkable performance, opening many possibilities to incorporate real-time MDP solvers in mobile robotics. 21

Futur Future W e Wor ork • Reinforcement learning. Evaluate different parameter values to obtain policy convergence in the least number of iterations without losing precision in the generated paths. • Couple the MDP solver with a Local Collision Avoidance method to obtain more precise simulation results at microscopic level. • Investigate further applications of our MDP solver beyond the context of crowd simulation. 22

GPU GPU Acceler Accelerated Mar ted Markov v Decision Decision Pr Proces ocess s in in Cr Crowd d Simula Simulation tion Further reading: Ruiz, S. Hernandez, B . “A parallel solver for Markov Decision Process in Crowd Simulation” MICAI 2015, 14th Mexican International Conference on Artificial Intelligence, At Cuernavaca, Mexico, IEEE volume: ISBN 978-1-5090-0323-5 Thank you! Benjamín Hernández Sergio Ruiz National Center for Computational Sciences Computer Science Department Oak Ridge National Laboratory Tecnológico de Monterrey, CCM Tennessee, USA Mexico City, México hernandezarb@ornl.gov sergio.ruiz.loza@itesm.mx This research was partially supported by: CONACyT SNI-54067, CONACyT PhD scholarship 375247, Nvidia Hardware Grant and Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, under DOE Contract No. DE-AC05-00OR22725.

Ad Additional R ditional Results: esults: Intel Intel CPU CPU 24

Ad Additional R ditional Results: esults: AR ARM M CPU CPU 25

Sergio Ruiz Computer Science Department Tecnolgico de Monterrey, - PowerPoint PPT Presentation

GPU A GPU Acce cceler lerated Mar ted Markov v Decision Decision Pr Proce ocesse sses s in C in Crowd d Simula Simulation tion Sergio Ruiz Computer Science Department Tecnolgico de Monterrey, CCM Mexico City, Mxico

A Taxonomy of Variability in Web Service Flows A Taxonomy of Variability in Web Service Flows

Voronoi Diagram and Delaunay Triangulation Lekcija 9 Sergio Cabello sergio.cabello@fmf.uni-lj.si

Shortest Paths in Intersection Graphs of Unit Disks Sergio Cabello sergio.cabello@fmf.uni-lj.si

Point Location Lekcija 8 Sergio Cabello sergio.cabello@fmf.uni-lj.si FMF Univerza v Ljubljani

Instituto Tecnol ogico y de Estudios Superiores de Monterrey Campus Monterrey Divisi on de

The Bio-Screening Core Facility PETER BANKS PETER.BANKS@NCL.AC.UK CARMEN MARTIN-RUIZ

An Approach to Mathematical Finance David Ruiz David Ruiz An Approach to Mathematical Finance

Marketing Gtk+ Alberto Ruiz GUADEC 2009 Marketing Gtk+ Marketing: Understanding Choice

Transportation Sustainability Program Photo: Sergio Ruiz Transportation Sustainability Program

Transportation Sustainability Program Photo: Sergio Ruiz Transportat ation Sustainab abili

Sustainability Program Photo: Sergio Ruiz Transportat ation Sustainab abili lity Program am

th Annual State of the Lake Address 12 th Annual State of the Lake Address 12 Lewis Smith Lake

Changing Internet Policies is Easy Sergio Rojas sergio@lacnic.net Scenario The Services

LACNIC Regional Registry at your services Sergio Rojas. sergio@lacnic.net How the Internet

Sergio Benitez sb@sergio.bz Rocket is a web framework for Rust that makes it simple to write fast

LACNIC LACNIC Regional Registry at your services Regional Registry at your services

FABRIK Forward And Backward Reaching Inverse Kinematics (FABRIK) is an iterative approach for

LSMR: An iterative algorithm for sparse least-squares problems David Fong Michael Saunders

Final Poster, Presentation, & Report (Team) Tuesday, March 12, 2019 (9 PM revised Wed) Due

iAide Stian & Moquan Goal Research querstion What would be necessary so that the users of

Xabclib and OpenATLib Ver.1.0: A Fully Auto-tuned Sparse Iterative Library and Its Auto-tuning

An Iterative Graph Optimization Approach for 2D SLAM He Zhang, Guoliang Liu, and Zifeng Hou Lenovo

Evolutionary algorithms paper Overview Laurits Tani laurits.tani@gmail.com National Institute

Model Derivation from Direct DPD (Digital Pre-Distortion) Dr. Florian Ramian Martin Wei

Sergio Ruiz Computer Science Department Tecnolgico de Monterrey, - PowerPoint PPT Presentation

GPU A GPU Acce cceler lerated Mar ted Markov v Decision Decision Pr Proce ocesse sses s in C in Crowd d Simula Simulation tion Sergio Ruiz Computer Science Department Tecnolgico de Monterrey, CCM Mexico City, Mxico

A Taxonomy of Variability in Web Service Flows A Taxonomy of Variability in Web Service Flows

Voronoi Diagram and Delaunay Triangulation Lekcija 9 Sergio Cabello sergio.cabello@fmf.uni-lj.si

Shortest Paths in Intersection Graphs of Unit Disks Sergio Cabello sergio.cabello@fmf.uni-lj.si

Point Location Lekcija 8 Sergio Cabello sergio.cabello@fmf.uni-lj.si FMF Univerza v Ljubljani

Instituto Tecnol ogico y de Estudios Superiores de Monterrey Campus Monterrey Divisi on de

The Bio-Screening Core Facility PETER BANKS PETER.BANKS@NCL.AC.UK CARMEN MARTIN-RUIZ

An Approach to Mathematical Finance David Ruiz David Ruiz An Approach to Mathematical Finance

Marketing Gtk+ Alberto Ruiz GUADEC 2009 Marketing Gtk+ Marketing: Understanding Choice

Transportation Sustainability Program Photo: Sergio Ruiz Transportation Sustainability Program

Transportation Sustainability Program Photo: Sergio Ruiz Transportat ation Sustainab abili

Sustainability Program Photo: Sergio Ruiz Transportat ation Sustainab abili lity Program am

th Annual State of the Lake Address 12 th Annual State of the Lake Address 12 Lewis Smith Lake

Changing Internet Policies is Easy Sergio Rojas sergio@lacnic.net Scenario The Services

LACNIC Regional Registry at your services Sergio Rojas. sergio@lacnic.net How the Internet

Sergio Benitez sb@sergio.bz Rocket is a web framework for Rust that makes it simple to write fast

LACNIC LACNIC Regional Registry at your services Regional Registry at your services

FABRIK Forward And Backward Reaching Inverse Kinematics (FABRIK) is an iterative approach for

LSMR: An iterative algorithm for sparse least-squares problems David Fong Michael Saunders

Final Poster, Presentation, &amp; Report (Team) Tuesday, March 12, 2019 (9 PM revised Wed) Due

iAide Stian &amp; Moquan Goal Research querstion What would be necessary so that the users of

Xabclib and OpenATLib Ver.1.0: A Fully Auto-tuned Sparse Iterative Library and Its Auto-tuning

An Iterative Graph Optimization Approach for 2D SLAM He Zhang, Guoliang Liu, and Zifeng Hou Lenovo

Evolutionary algorithms paper Overview Laurits Tani laurits.tani@gmail.com National Institute

Model Derivation from Direct DPD (Digital Pre-Distortion) Dr. Florian Ramian Martin Wei

Final Poster, Presentation, & Report (Team) Tuesday, March 12, 2019 (9 PM revised Wed) Due

iAide Stian & Moquan Goal Research querstion What would be necessary so that the users of