GPU A GPU Acce cceler lerated Mar ted Markov v Decision Decision Pr Proce ocesse sses s in C in Crowd d Simula Simulation tion Sergio Ruiz Computer Science Department Tecnológico de Monterrey, CCM Mexico City, México sergio.ruiz.loza@itesm.mx Benjamín Hernández National Center for Computational Sciences Oak Ridge National Laboratory Tennessee, USA hernandezarb@ornl.gov
Contents • Introduction • Optimization Approaches • Problem solving strategy • A simple example • Algorithm description • Results • Conclusions & future work 2
Cr Crowd d Simula Simulation tion Path Planning Local Collision Avoidance (LCA) 3
Optimization Approaches • According to (Reyes et al. 2009, Foka and Trahanias 2003), Markov Decision Processes (MDPs) are computationally inefficient: as the state space grows, the problem becomes intractable. • Decomposition offers the possibility to solve large MDPs (Sucar 2007, Meuleau et al. 1998, Singh and Cohn 1998), either in State Space decomposition, or Process decomposition. • (Mausam and Weld. 2004) follow the idea of concurrency to solve MDPs generating solutions close to optimal extending the Labeled Real-time Dynamic Programming method. 4
Optimization Approaches • (Sucar 2007) proposes a parallel implementation of weakly coupled MDPs. • (Jóhansson 2009) presents a dynamic programming framework that implements the Value Iteration algorithm to solve MDPs using CUDA. • (Noer 2013) explores the design and implementation of a point- based Value Iteration algorithm for Partially Observable MDPs (POMDPs) with approximate solutions. The GPU implementation supports belief stat pruning which avoids calculations. 5
Pr Problem oblem Solving Str Solving Strate tegy • We propose a parallel Value Iteration MDP solving algorithm to guide groups of agents toward assigned goals while avoiding obstacles interactively. For optimal performance the algorithm is run over a hexagonal grid in the context of a Fully Observable MDP. 6
Pr Problem oblem Solving Str Solving Strate tegy A Markov Decision Process is a tuple 𝑁 = 𝑇, 𝐵, 𝑈, 𝑆 • • S is a finite set of states. In our case, 2D cells. • A is a finite set of actions. In our case, 6 directions. • T is a transition model T(s, a, s’) . • R is a reward function R(s). A policy 𝜌 is a solution that specifies the action for an agent • at a given state. 𝜌 ∗ is the optimal policy. • Transition 7
Problem Pr oblem Solving Str Solving Strate tegy States Value Iteration ∗ 𝑡 = 𝑏𝑠𝑛𝑏𝑦 𝑏 𝑅 𝑢 𝑡, 𝑏 𝜌 𝑢 5 𝑏 𝑊 𝑅 𝑢 𝑡, 𝑏 = 𝑆 𝑡, 𝑏 + 𝛿 𝑈 𝑢−1 𝑘 𝑡𝑘 𝑘=0 𝑢 𝑡 = 𝑅 𝑢 𝑡, 𝜌 ∗ 𝑡 𝑊 ; 𝑊 0 𝑡 = 0 8
Pr Problem oblem Solving Str Solving Strate tegy • We propose to temporarily override the optimal policy when agent density in a cell is above a certain threshold 𝝉 . 9
A simplified example ∗ 𝑡 = 𝑏𝑠𝑛𝑏𝑦 𝑏 𝑅 𝑢 𝑡, 𝑏 𝜌 𝑢 2 𝑏 𝑊 𝑅 𝑢 𝑡, 𝑏 = 𝑆 𝑡, 𝑏 + 𝛿 𝑈 𝑡𝑘 𝑢−1 𝑘 1 2 3 4 𝑘=0 a -3 -3 -3 +100 What is 𝜌 for cell a3 a3 ? b -3 -3 -100 𝜌 𝑏3 = max{𝑅 𝑏3, 𝑋 , 𝑅 𝑏3, 𝑂 , 𝑅 𝑏3, 𝐹 } c -3 -3 -3 -3 𝑅 𝑏3, 𝑭 = 100 + 1.0(0. 0.8(1 8(100) ) + 0.1(-3) + 0.1(0)) 𝑅 𝑏3, 𝑿 = -3 + 1.0 (0. 0.1(1 1(100) ) + 0.8(-3) + 0.1(0)) A = { N , W, E } 𝑅 𝑏3, 𝑶 = 0 + 1.0 (0. 0.1( 1(100) ) + 0.1(-3) + 0.8(0)) 𝛿 = 1 (for simplicity) => max is 𝑅 𝑏3, 𝑭 Transitions: p = 0.8 (probability of taking a current action) q = 0.1 (probability of taking another action) 𝑅 𝑏3, 𝐅 = 100 + 1.0 ( 0. 0.8(100) ) + 0.1(-3) + 0.1(0) ) 𝑅 𝑏3, 𝐗 = -3 + 1.0 ( 0. 0.1(100) ) + 0.8(-3) + 0.1(0) ) 𝑅 𝑏3, 𝐎 = 0 + 1.0 ( 0. 0.1(100) ) + 0.1(-3) + 0.8(0) ) 2 𝛿 𝑆 𝑡, 𝑏 𝑏 𝑊 𝑘 𝑈 𝑡𝑘 𝑘=0 10
Algorithm 𝑅 𝑏3, 𝐅 = 100 + 1.0 ( 0. 0.8(100) ) + 0.1(-3) + 0.1(0) ) 𝑅 𝑏3, 𝐗 = -3 + 1.0 ( 0. 0.1(100) ) + 0.8(-3) + 0.1(0) ) 𝑅 𝑏3, 𝐎 = 0 + 1.0 ( 0. 0.1(100) ) + 0.1(-3) + 0.8(0) ) 2 𝛿 𝑆 𝑡, 𝑏 𝑏 𝑊 𝑘 𝑈 𝑡𝑘 𝑘=0 – Data collect: current cell needs to know rewards from neighboring cells and out of bound values. 𝑏 and 𝑆 𝑡, 𝑏 = 𝑆𝑋 – Input generation: build 𝑈 𝑡𝑘 – Value Iteration: optimal policy computed using parallel transformations and parallel reduction by key. 11
Al Algorithm: gorithm: input gener input generation tion • Transition matrix requirements: 𝑞 ⋯ 𝑞 𝑟 𝑗 ⋯ 𝑟 𝑗 𝑅 = ⋮ ⋱ ⋮ ⋮ ⋱ ⋮ 𝑈 𝑄 = 𝑈 𝑠,𝑑 𝑞 ⋯ 𝑞 𝑟 𝑗 ⋯ 𝑟 𝑗 1 ⋯ 0 0 ⋯ 1 𝐸 𝐵 = 𝐸 𝐶 = ⋮ ⋱ ⋮ ⋮ ⋱ ⋮ 0 ⋯ 1 1 ⋯ 0 Dimensions: |A|x|A| i.e. each cell can compute neighboring info 𝑟 𝑠 ∈ 1, 𝑁𝐸𝑄 𝑠𝑝𝑥𝑡 𝑟 𝑗 = 𝑆𝐹 𝑗 −1 𝑑 ∈ 1, 𝑁𝐸𝑄 𝑑𝑝𝑚𝑣𝑛𝑜𝑡 12
Algorithm: Al gorithm: input gener input generation tion 𝑞 𝑟 𝑟 𝑅 ∘ 𝐸 𝐶 = 𝑟 𝑞 𝑟 where 𝑈 𝑠,𝑑 = 𝑈 𝑞 ∘ 𝐸 𝐵 + 𝑈 𝑠,𝑑 𝑟 𝑟 𝑞 𝑅 𝑏3, 𝐅 = 100 + 1.0 ( 0.8 .8(100) + 0.1(-3) + 0.1(0) ) 𝑅 𝑏3, 𝐗 = -3 + 1.0 ( 0.1 .1(100) ) + 0.8(-3) + 0.1(0) ) 𝑅 𝑏3, 𝐎 = 0 + 1.0 ( 0.1 .1(100) ) + 0.1(-3) + 0.8(0) ) 𝑏 computation: Transition matrix 𝑈 𝑡𝑘 Represents a Cell 𝑈 ⋯ 𝑈 1,1 1,𝑁𝐸𝑄 𝑑𝑝𝑚𝑣𝑛𝑜𝑡 𝑏 = ⋮ ⋱ ⋮ 𝑈 𝑡𝑘 𝑈 𝑁𝐸𝑄 𝑠𝑝𝑥𝑡 ,1 ⋯ 𝑈 𝑁𝐸𝑄 𝑠𝑝𝑥𝑡 ,𝑁𝐸𝑄 𝑑𝑝𝑚𝑣𝑛𝑜𝑡 13
Algorithm: Al gorithm: Par arallel allel Value Iter alue Iteration tion 1. Computation of Q-values . 𝜌 𝑢 = 𝑆𝑋 + 𝛿 𝑈 𝑡𝑘 𝑏 𝑊 Consecutive parallel transformations (mult, mult, sum) results in a matrix Q that stores |A| -tuple of policies for taking all actions per each cell. 14
Al Algorithm: gorithm: Par arallel allel Value Iter alue Iteration tion 2. Selection of best Q-values . – Parallel reduction: from every consecutive |A|-tuple in 𝜌 𝑢 , the largest value index indicates current best policy. 3. Check for convergence . – If 𝜌 𝑢 − 𝜌 𝑢−1 = [0, … , 0] 15
Cr Crowd d Na Navig vigation tion Video ideo https://www.youtube.com/watch?v=369td2O8dxY 16
Results: esults: test s test scenarios cenarios Office (1,584 cells) Maze (100x100 Champ de Mars cells) (100x100 cells) Implementation: CUDA Thrust, OpenMP and CUDA Backbends CPU: Intel Core i7 CPU running at 3.40GHz. ARM (Jetson TK1): 32 bit ARM quad-core Cortex-A15 CPU running at 2.32GHz. GPUs: Tegra K1 192 CUDA Cores, Tesla K40c 2880 CUDA cores, Geforce GTX TITAN 2688 CUDA cores. 17
Results: esults: GPU GPU perf perfor ormance mance 18
Results: esults: GPU GPU speedup speedup Intel CPU baseline: 8 threads ARM CPU baseline: 4 threads 19
Conc Conclusion lusion • Parallelization of the proposed algorithm was made possible by formulating it in terms of matrix “massive” operations, leveraging the data parallelism in GPU computing to reduce the MDP solution time. • We demonstrated that standard parallel transformation and reduction operations provide the means to solve MDPs via Value Iteration with optimal performance. 20
Conc Conclusion lusion • Taking advantage of the proposed hexagonal grid partitioning method, our implementation provides a good level of space discretization and performance. • We obtained a 90x speed up using GPUs enabling us to simulate crowd behavior interactively. • We found the Jetson TK1 GPU to have a remarkable performance, opening many possibilities to incorporate real-time MDP solvers in mobile robotics. 21
Futur Future W e Wor ork • Reinforcement learning. Evaluate different parameter values to obtain policy convergence in the least number of iterations without losing precision in the generated paths. • Couple the MDP solver with a Local Collision Avoidance method to obtain more precise simulation results at microscopic level. • Investigate further applications of our MDP solver beyond the context of crowd simulation. 22
GPU GPU Acceler Accelerated Mar ted Markov v Decision Decision Pr Proces ocess s in in Cr Crowd d Simula Simulation tion Further reading: Ruiz, S. Hernandez, B . “A parallel solver for Markov Decision Process in Crowd Simulation” MICAI 2015, 14th Mexican International Conference on Artificial Intelligence, At Cuernavaca, Mexico, IEEE volume: ISBN 978-1-5090-0323-5 Thank you! Benjamín Hernández Sergio Ruiz National Center for Computational Sciences Computer Science Department Oak Ridge National Laboratory Tecnológico de Monterrey, CCM Tennessee, USA Mexico City, México hernandezarb@ornl.gov sergio.ruiz.loza@itesm.mx This research was partially supported by: CONACyT SNI-54067, CONACyT PhD scholarship 375247, Nvidia Hardware Grant and Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, under DOE Contract No. DE-AC05-00OR22725.
Ad Additional R ditional Results: esults: Intel Intel CPU CPU 24
Ad Additional R ditional Results: esults: AR ARM M CPU CPU 25
Recommend
More recommend