GPU A GPU Acce cceler lerated Mar ted Markov v Decision Decision Pr Proce ocesse sses s in C in Crowd d Simula Simulation tion Sergio Ruiz Computer Science Department Tecnológico de Monterrey, CCM Mexico City, México Benjamín Hernández National Center for Computational Sciences Oak Ridge National Laboratory Tennessee, USA
Contents • Introduction • Optimization Approaches • Problem solving strategy • A simple example • Algorithm description • Results • Conclusions & future work 2
Cr Crowd d Simula Simulation tion Path Planning Local Collision Avoidance (LCA) 3
Optimization Approaches • According to (Reyes et al. 2009, Foka and Trahanias 2003), Markov Decision Processes (MDPs) are computationally inefficient: as the state space grows, the problem becomes intractable. • Decomposition offers the possibility to solve large MDPs (Sucar 2007, Meuleau et al. 1998, Singh and Cohn 1998), either in State Space decomposition, or Process decomposition. • (Mausam and Weld. 2004) follow the idea of concurrency to solve MDPs generating solutions close to optimal extending the Labeled Real-time Dynamic Programming method. 4
Optimization Approaches • (Sucar 2007) proposes a parallel implementation of weakly coupled MDPs. • (Jóhansson 2009) presents a dynamic programming framework that implements the Value Iteration algorithm to solve MDPs using CUDA. • (Noer 2013) explores the design and implementation of a point- based Value Iteration algorithm for Partially Observable MDPs (POMDPs) with approximate solutions. The GPU implementation supports belief stat pruning which avoids calculations. 5
Pr Problem oblem Solving Str Solving Strate tegy • We propose a parallel Value Iteration MDP solving algorithm to guide groups of agents toward assigned goals while avoiding obstacles interactively. For optimal performance the algorithm is run over a hexagonal grid in the context of a Fully Observable MDP. 6
Pr Problem oblem Solving Str Solving Strate tegy A Markov Decision Process is a tuple 𝑁 = 𝑇, 𝐵, 𝑈, 𝑆 • • S is a finite set of states. In our case, 2D cells. • A is a finite set of actions. In our case, 6 directions. • T is a transition model T(s, a, s’) . • R is a reward function R(s). A policy 𝜌 is a solution that specifies the action for an agent • at a given state. 𝜌 ∗ is the optimal policy. • Transition 7
Problem Pr oblem Solving Str Solving Strate tegy States Value Iteration ∗ 𝑡 = 𝑏𝑠𝑛𝑏𝑦 𝑏 𝑅 𝑢 𝑡, 𝑏 𝜌 𝑢 5 𝑏 𝑊 𝑅 𝑢 𝑡, 𝑏 = 𝑆 𝑡, 𝑏 + 𝛿 𝑈 𝑢−1 𝑘 𝑡𝑘 𝑘=0 𝑢 𝑡 = 𝑅 𝑢 𝑡, 𝜌 ∗ 𝑡 𝑊 ; 𝑊 0 𝑡 = 0 8
Pr Problem oblem Solving Str Solving Strate tegy • We propose to temporarily override the optimal policy when agent density in a cell is above a certain threshold 𝝉 . 9
A simplified example ∗ 𝑡 = 𝑏𝑠𝑛𝑏𝑦 𝑏 𝑅 𝑢 𝑡, 𝑏 𝜌 𝑢 2 𝑏 𝑊 𝑅 𝑢 𝑡, 𝑏 = 𝑆 𝑡, 𝑏 + 𝛿 𝑈 𝑡𝑘 𝑢−1 𝑘 1 2 3 4 𝑘=0 a -3 -3 -3 +100 What is 𝜌 for cell a3 a3 ? b -3 -3 -100 𝜌 𝑏3 = max{𝑅 𝑏3, 𝑋 , 𝑅 𝑏3, 𝑂 , 𝑅 𝑏3, 𝐹 } c -3 -3 -3 -3 𝑅 𝑏3, 𝑭 = 100 + 1.0(0. 0.8(1 8(100) ) + 0.1(-3) + 0.1(0)) 𝑅 𝑏3, 𝑿 = -3 + 1.0 (0. 0.1(1 1(100) ) + 0.8(-3) + 0.1(0)) A = { N , W, E } 𝑅 𝑏3, 𝑶 = 0 + 1.0 (0. 0.1( 1(100) ) + 0.1(-3) + 0.8(0)) 𝛿 = 1 (for simplicity) => max is 𝑅 𝑏3, 𝑭 Transitions: p = 0.8 (probability of taking a current action) q = 0.1 (probability of taking another action) 𝑅 𝑏3, 𝐅 = 100 + 1.0 ( 0. 0.8(100) ) + 0.1(-3) + 0.1(0) ) 𝑅 𝑏3, 𝐗 = -3 + 1.0 ( 0. 0.1(100) ) + 0.8(-3) + 0.1(0) ) 𝑅 𝑏3, 𝐎 = 0 + 1.0 ( 0. 0.1(100) ) + 0.1(-3) + 0.8(0) ) 2 𝛿 𝑆 𝑡, 𝑏 𝑏 𝑊 𝑘 𝑈 𝑡𝑘 𝑘=0 10
Algorithm 𝑅 𝑏3, 𝐅 = 100 + 1.0 ( 0. 0.8(100) ) + 0.1(-3) + 0.1(0) ) 𝑅 𝑏3, 𝐗 = -3 + 1.0 ( 0. 0.1(100) ) + 0.8(-3) + 0.1(0) ) 𝑅 𝑏3, 𝐎 = 0 + 1.0 ( 0. 0.1(100) ) + 0.1(-3) + 0.8(0) ) 2 𝛿 𝑆 𝑡, 𝑏 𝑏 𝑊 𝑘 𝑈 𝑡𝑘 𝑘=0 – Data collect: current cell needs to know rewards from neighboring cells and out of bound values. 𝑏 and 𝑆 𝑡, 𝑏 = 𝑆𝑋 – Input generation: build 𝑈 𝑡𝑘 – Value Iteration: optimal policy computed using parallel transformations and parallel reduction by key. 11
Al Algorithm: gorithm: input gener input generation tion • Transition matrix requirements: 𝑞 ⋯ 𝑞 𝑟 𝑗 ⋯ 𝑟 𝑗 𝑅 = ⋮ ⋱ ⋮ ⋮ ⋱ ⋮ 𝑈 𝑄 = 𝑈 𝑠,𝑑 𝑞 ⋯ 𝑞 𝑟 𝑗 ⋯ 𝑟 𝑗 1 ⋯ 0 0 ⋯ 1 𝐸 𝐵 = 𝐸 𝐶 = ⋮ ⋱ ⋮ ⋮ ⋱ ⋮ 0 ⋯ 1 1 ⋯ 0 Dimensions: |A|x|A| i.e. each cell can compute neighboring info 𝑟 𝑠 ∈ 1, 𝑁𝐸𝑄 𝑠𝑝𝑥𝑡 𝑟 𝑗 = 𝑆𝐹 𝑗 −1 𝑑 ∈ 1, 𝑁𝐸𝑄 𝑑𝑝𝑚𝑣𝑛𝑜𝑡 12
Algorithm: Al gorithm: input gener input generation tion 𝑞 𝑟 𝑟 𝑅 ∘ 𝐸 𝐶 = 𝑟 𝑞 𝑟 where 𝑈 𝑠,𝑑 = 𝑈 𝑞 ∘ 𝐸 𝐵 + 𝑈 𝑠,𝑑 𝑟 𝑟 𝑞 𝑅 𝑏3, 𝐅 = 100 + 1.0 ( 0.8 .8(100) + 0.1(-3) + 0.1(0) ) 𝑅 𝑏3, 𝐗 = -3 + 1.0 ( 0.1 .1(100) ) + 0.8(-3) + 0.1(0) ) 𝑅 𝑏3, 𝐎 = 0 + 1.0 ( 0.1 .1(100) ) + 0.1(-3) + 0.8(0) ) 𝑏 computation: Transition matrix 𝑈 𝑡𝑘 Represents a Cell 𝑈 ⋯ 𝑈 1,1 1,𝑁𝐸𝑄 𝑑𝑝𝑚𝑣𝑛𝑜𝑡 𝑏 = ⋮ ⋱ ⋮ 𝑈 𝑡𝑘 𝑈 𝑁𝐸𝑄 𝑠𝑝𝑥𝑡 ,1 ⋯ 𝑈 𝑁𝐸𝑄 𝑠𝑝𝑥𝑡 ,𝑁𝐸𝑄 𝑑𝑝𝑚𝑣𝑛𝑜𝑡 13
Algorithm: Al gorithm: Par arallel allel Value Iter alue Iteration tion 1. Computation of Q-values . 𝜌 𝑢 = 𝑆𝑋 + 𝛿 𝑈 𝑡𝑘 𝑏 𝑊 Consecutive parallel transformations (mult, mult, sum) results in a matrix Q that stores |A| -tuple of policies for taking all actions per each cell. 14
Al Algorithm: gorithm: Par arallel allel Value Iter alue Iteration tion 2. Selection of best Q-values . – Parallel reduction: from every consecutive |A|-tuple in 𝜌 𝑢 , the largest value index indicates current best policy. 3. Check for convergence . – If 𝜌 𝑢 − 𝜌 𝑢−1 = [0, … , 0] 15
Cr Crowd d Na Navig vigation tion Video ideo 16
Results: esults: test s test scenarios cenarios Office (1,584 cells) Maze (100x100 Champ de Mars cells) (100x100 cells) Implementation: CUDA Thrust, OpenMP and CUDA Backbends CPU: Intel Core i7 CPU running at 3.40GHz. ARM (Jetson TK1): 32 bit ARM quad-core Cortex-A15 CPU running at 2.32GHz. GPUs: Tegra K1 192 CUDA Cores, Tesla K40c 2880 CUDA cores, Geforce GTX TITAN 2688 CUDA cores. 17
Results: esults: GPU GPU perf perfor ormance mance 18
Results: esults: GPU GPU speedup speedup Intel CPU baseline: 8 threads ARM CPU baseline: 4 threads 19
Conc Conclusion lusion • Parallelization of the proposed algorithm was made possible by formulating it in terms of matrix “massive” operations, leveraging the data parallelism in GPU computing to reduce the MDP solution time. • We demonstrated that standard parallel transformation and reduction operations provide the means to solve MDPs via Value Iteration with optimal performance. 20
Conc Conclusion lusion • Taking advantage of the proposed hexagonal grid partitioning method, our implementation provides a good level of space discretization and performance. • We obtained a 90x speed up using GPUs enabling us to simulate crowd behavior interactively. • We found the Jetson TK1 GPU to have a remarkable performance, opening many possibilities to incorporate real-time MDP solvers in mobile robotics. 21
Futur Future W e Wor ork • Reinforcement learning. Evaluate different parameter values to obtain policy convergence in the least number of iterations without losing precision in the generated paths. • Couple the MDP solver with a Local Collision Avoidance method to obtain more precise simulation results at microscopic level. • Investigate further applications of our MDP solver beyond the context of crowd simulation. 22
GPU GPU Acceler Accelerated Mar ted Markov v Decision Decision Pr Proces ocess s in in Cr Crowd d Simula Simulation tion Further reading: Ruiz, S. Hernandez, B . “A parallel solver for Markov Decision Process in Crowd Simulation” MICAI 2015, 14th Mexican International Conference on Artificial Intelligence, At Cuernavaca, Mexico, IEEE volume: ISBN 978-1-5090-0323-5 Thank you! Benjamín Hernández Sergio Ruiz National Center for Computational Sciences Computer Science Department Oak Ridge National Laboratory Tecnológico de Monterrey, CCM Tennessee, USA Mexico City, México This research was partially supported by: CONACyT SNI-54067, CONACyT PhD scholarship 375247, Nvidia Hardware Grant and Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, under DOE Contract No. DE-AC05-00OR22725.
Ad Additional R ditional Results: esults: Intel Intel CPU CPU 24
Ad Additional R ditional Results: esults: AR ARM M CPU CPU 25
