A Dynamic Programming-based MCMC Framework for Solving DCOPs with - PowerPoint PPT Presentation

A Dynamic Programming-based MCMC Framework for Solving DCOPs with GPUs Ferdinando Fioretto 1(2) (joint work with) William Yeoh 2 and Enrico Pontelli 2 1 University of Michigan 2 New Mexico State University CP 2016, Toulouse

Introduction GPUs DMCMC Results Conclusions Distributed Discrete Optimization with Preferences 1

Introduction GPUs DMCMC Results Conclusions GPUs • Every new desktop/laptop is now equipped with a graphic processing unit (GPU). • GPU = Massively Parallel Architecture. • For most of their life, such GPUs are idle. • General Purpose GPU applications: Numerical Analysis Bioinformatics Deep Learning MathWorks MATLAB 2

Outline • Introduction • GPUs • D-MCMC • Results • Conclusions 3

Introduction GPUs DMCMC Results Conclusions Multi-Agent Constraint Optimization • A DCOP is a tuple < X , D , F , A, α>, where: • X is a set of variables. • D is a set of finite domains for each variable. • F is a set of constraints between variables . • A is a set of agents, controlling the variables in X . • α is a mapping from variables to agents. agent x a x b U 0 0 3 0 1 20 1 0 2 1 1 5 variables constraints centralized DCOP centralized 4 solver Algorithm solver

Introduction GPUs DMCMC Results Conclusions Multi-Agent Constraint Optimization • A DCOP is a tuple < X , D , F , A, α>, where: • X is a set of variables. • D is a set of finite domains for each variable. • F is a set of constraints between variables . • A is a set of agents, controlling the variables in X . • α is a mapping from variables to agents. x 3 x 1 Boundary Local variables x 5 variables B i x 2 L i x 4 Agent a i 5

Introduction GPUs DMCMC Results Conclusions Multi-Agent Constraint Optimization • A DCOP is a tuple < X , D , F , A, α>, where: • X is a set of variables. • D is a set of finite domains for each variable. • F is a set of constraints between variables . • A is a set of agents, controlling the variables in X . • α is a mapping from variables to agents. • GOAL: Find a utility maximal assignment. x ⇤ = arg max F ( x ) x X = arg max f ( x | scope ( f ) ) x f 2 F 6

Introduction GPUs DMCMC Results Conclusions MCMC Sampling • MCMC algorithms approximate probability distributions. • They use a proposal distribution to generate a sequence of samples z (1) , z (2) , … which forms a Marokv Chain. • The quality of the sample improves as a function of the number of steps. Source: http://xr0038.hatenadiary.jp/ 7

Introduction GPUs DMCMC Results Conclusions MCMC Sampling • MCMC sampling algorithms can be used to solve DCOPs. [Nguyen et al., AAMAS 2013] • MCMC Sampling algorithms can be used to solve the Maximum A Posteriori (MAP) estimation problem. • The authors provide a mapping from solving a DCOP to solving a MAP. 8

Introduction GPUs DMCMC Results Conclusions Graphical Processing Units (GPUs) • A GPU is a massive parallel architecture: • Thousands of multi-threaded computing cores . • Very high memory bandwidths . • ~80% of transistors devoted to data processing rather than caching. • However: • GPU cores are slower than CPU cores. • GPU memories have different sizes and access times. • GPU programming is more challenging and time consuming. 9

Introduction GPUs DMCMC Results Conclusions Execution Model • A Thread is the basic parallel unit. CPU GPU • Identified by a Thread ID. Block Block Block (0,0) (1,0) (2,0) Block Block Block Kernel 1 (0,0) (1,0) (2,0) block Block block Block block Block • Threads are organized into Blocks . (0,1) (1,1) (2,1) Block Block Block (0,1) (1,1) (2,1) block block block • Several Streaming Multiprocessors , (SD) scheduled in parallel. Kernel 2 block Thread Thread Thread Thread Thread • Single Instruction Multiple Thread (0,0) (1,0) (2,0) (3,0) (4,0) Thread Thread Thread Thread Thread (SIMT) parallel model. (0,1) (1,1) (2,1) B (3,1) (4,1) Thread Thread Thread Thread Thread (0,2) (1,2) (2,2) (3,2) (4,2) Thread Thread Thread Thread Thread (0,3) (1,3) (2,3) (3,3) (4,3) ... 10

Introduction GPUs DMCMC Results Conclusions Memory Hierarchy • The GPU memory architecture is rather involved. • Registers GRID Block Block • Fastest . Shared Shared • Only accessible by a thread . memory memory • Lifetime of a thread. • Shared memory regs regs regs regs • Fast. Thread Thread Thread Thread • Accessible by all threads in a block . • Global memory GLOBAL MEMORY • High access latency HOST CONSTANT MEMORY • Potential of traffic congestion . 11

Introduction GPUs DMCMC Results Conclusions CUDA: Compute Unified Device Architecture Host Device 12

Introduction GPUs DMCMC Results Conclusions CUDA: Compute Unified Device Architecture Host Device lloc (&deviceV, sizeV); cu cudaMallo data Global Memory cudaMemcpy(deviceV, hostV, sizeV, ...) 13

Introduction GPUs DMCMC Results Conclusions CUDA: Compute Unified Device Architecture Host Device cudaKernel ( ) cu Kernel invocation Global Memory Kernel <nThreads, nBlocks>( ) cu cudaKe 14

Introduction GPUs DMCMC Results Conclusions CUDA: Compute Unified Device Architecture Host Device data Global Memory cudaMemcpy(hostV, deviceV, sizeV, ...) 15

Introduction GPUs DMCMC Results Conclusions D-MCMC: Related Work [Nguyen et al. AAMAS-2013] D-Gibbs Sampling Algorithm 1: Gibbs ( z 1 , . . . , z n ) 1 for i = 1 to n do z 0 i ← Initialize ( z i ) 2 3 end S 4 for t = 1 to T do for i = 1 to n do 5 i � 1 , z t � 1 z t i ← Sample ( P ( z i | z t 1 , . . . , z t i +1 , . . . , z t � 1 )) 6 n 7 end 8 end • Computing the normalizing constant can be expensive. • A lots of sample to converge. 16

Introduction GPUs DMCMC Results Conclusions DMCMC Each agent controls several variables. • Given values for its boundary variables each agent can • solve its local sub-problem independently from other agents. B j B i x 8 x 3 x 6 x 1 x 10 x 5 x 7 x 2 L j x 9 L i x 4 Agent a j Agent a i 17

Introduction GPUs DMCMC Results Conclusions DMCMC Each agent controls several variables. • Given values for its boundary variables. • Find a solution for the local sub-problem using MCMC • algorithms: Gibbs sampling and Metropolis–Hastings. Joint utility table B i x 3 L i B i x 1 x 1 x 2 x 3 x 4 x 5 U 0 0 3 2 5 21 x 5 0 1 2 1 4 20 x 2 0 2 3 5 1 32 L i x 4 18

Introduction GPUs DMCMC Results Conclusions DMCMC: Local Sampling Process 3 Level of Parallelism GPU 1 Joint utility table Block Block Block (0,0) (1,0) (2,0) Block Block Block x 1 x 2 x 3 x 4 x 5 U (0,0) (1,0) (2,0) block Block block Block block Block 0 0 3 2 5 21 (0,1) (1,1) (2,1) Block Block Block 0 1 2 1 4 20 (0,1) (1,1) (2,1) block block block 0 2 3 5 1 32 Each row of the Joint utility table is computed in parallel using several blocks. [Fioretto et al. CP-15] 19

Introduction GPUs DMCMC Results Conclusions DMCMC: Local Sampling Process 3 Level of Parallelism GPU R 2 Joint utility table multiple Block Block Block (0,0) (1,0) (2,0) samples Block Block Block x 1 x 2 x 3 x 4 x 5 U (0,0) (1,0) (2,0) block Block block Block block Block 0 0 3 2 5 21 (0,1) (1,1) (2,1) Block Block Block 0 0 2 1 4 20 (0,1) (1,1) (2,1) block block block 0 0 3 5 1 32 0 1 2 1 4 20 0 1 3 5 1 32 0 1 2 1 4 20 0 2 3 5 1 32 … 20

Introduction GPUs DMCMC Results Conclusions DMCMC: Local Sampling Process 3 Level of Parallelism GPU 3 Block Block Block (0,0) (1,0) (2,0) Block Block Block (0,0) (1,0) (2,0) block Block block Block block Block (0,1) (1,1) (2,1) Block Block Block (0,1) (1,1) (2,1) Gibbs Sampling Process block block block q ( x k = d id | x l ∈ L i \ { x k } ) = 1 0 X Z π exp f j ( z | x fj ) f j ∈ F i block q ( x k = d id | x l ∈ L i \ { x k } ) = 1 1 Thread Thread Thread Thread Thread X Z π exp f j ( z | x fj ) (0,0) (1,0) (2,0) (3,0) (4,0) Thread Thread Thread Thread Thread f j ∈ F i (0,1) (1,1) (2,1) B (3,1) (4,1) Thread Thread Thread Thread Thread q ( x k = d id | x l ∈ L i \ { x k } ) = 1 2 X (0,2) (1,2) (2,2) (3,2) (4,2) Z π exp f j ( z | x fj ) Thread Thread Thread Thread Thread (0,3) (1,3) (2,3) (3,3) (4,3) f j ∈ F i ... … 21

Introduction GPUs DMCMC Results Conclusions Algorithm design and data structure • Ensure data accesses are coalesced . good bad • Minimize the accesses to the global memory . • Padding Utility Tables’ rows; Perfect hashing. 22

A Dynamic Programming-based MCMC Framework for Solving DCOPs with - PowerPoint PPT Presentation

A Dynamic Programming-based MCMC Framework for Solving DCOPs with GPUs Ferdinando Fioretto 1(2) (joint work with) William Yeoh 2 and Enrico Pontelli 2 1 University of Michigan 2 New Mexico State University CP 2016, Toulouse Introduction GPUs

An MCMC library for probabilistic programming Rob Zinkov June 13th, 2014 Rob Zinkov An MCMC

Testing MCMC Samplers Jason M.T. Roos First European Bayesian Summit in Marketing Testing MCMC

Additional notes on MCMC sampling Shravan Vasishth March 18, 2020 For more details on MCMC, some

Integrating Problem Solving 2020 Integrating Problem Solving 2020 Integrating Problem Solving

Network determination based on birth-death MCMC inference A. Mohammadi and E. Wit February 4,

MCMC and Variational Inference for AutoEncoders Achille Thin 1 , Alain Durmus 2 , Eric Moulines 1 1

MCMC for Cut Models or Chasing a Moving Target with MCMC Martyn Plummer International Agency

Modern Computational Statistics Lecture 8: Advanced MCMC Cheng Zhang School of Mathematical

CSci 8980: Advanced Topics in Graphical Models MCMC, Gibbs Sampling Instructor: Arindam Banerjee

Introduction to MCMC and BUGS Basic recipes, and a sample of some techniques for getting

FOR MCMC OLD HEADQUARTER CONFIDENTIAL BACKGROUND Existing MCMC Old HQ building is occupying

STAT 339 Markov Chain Monte Carlo (MCMC) 7 April 2017 Some theory and intuition about MCMC

Parallel tempering and Interacting MCMC algorithms Gersende FORT / Eric MOULINES Telecom Paris

Dynamic Programming Outline and Reading Matrix Chain-Product (5.3.1) Dynamic Programming:

Dynamic Programming Prof. Kuan-Ting Lai 2020/4/10 Dynamic Programming Dynamic Programming is

Dynamic Programming Dynamic Programming Steps. 9 View the problem solution as the result of a

Energy-efficient Delivery by Heterogeneous Mobile Agents Andreas B artschi J er emie

A Multi-Agent Prediction Market based on Raj Dasgupta Partially Observable Stochastic Game

Optimal p Sequential Resource Sharing and Exchange i in Multi Agent Systems l i S Yuanzhang

July 2017 Webinar Agenda Lindsey 2018 Certification and Training NEW Consumer Connector

A Framework for Distributed Intrusion Detection using Interest-Driven Cooperating Agents Rajeev

3. Agent-Oriented Methodologies Part 2: D) ems Design (MASD The PROMETHEUS The PROMETHEUS

1 Solutions: Solutions: The delivery men (II) The delivery men (III) Hiding a delivery:

Planning Agents Chapter Overview planning problems from problem solving to planning