Exploring the Use of GPUs in Constraint Solving A Preliminary Investigation Federico Campeotto 1 , 2 Alessandro Dal Pal` u 3 Agostino Dovier 1 Ferdinando Fioretto 1 , 2 Enrico Pontelli 2 1. Universit` a di Udine 2. New Mexico State University 3. Universit` a di Parma San Diego CA, January 2014
Introduction Introduction Every new desktop/laptop comes equipped with a powerful graphic processor unit (GPU) These GPUs are general purpose (i.e., we can program them) For most of their life, however, they are absolutely idle (unless some kid is continuously playing with your PC) The question is: can we exploit this computation power for constraint solving? We present a preliminary investigation, focusing on constraint solving FD-ADP-AD-FF-EP (UD-NMSU-PR) Exploring the Use of GPUs in Constraint Solving 2 / 1
Introduction Constraint Satisfaction Problems A Constraint Satisfaction Problem ( CSP ) is defined by: X = { x 1 , . . . , x n } is a n -tuple of variables D = { D x 1 , . . . , D x n } set of variable’s domains C finite set of constraints over X : c ( x i 1 , . . . , x i m ) is a relation c ( x i 1 , . . . , x i m ) ⊆ D x i 1 × . . . × D x im . i =1 D x i such that for each A solution of a CSP is a tuple � s 1 , . . . , s n � ∈ × n c ( x i 1 , . . . , x i m ) ∈ C , we have � s i 1 , . . . , s i m � ∈ c . CSP solvers alternate 2 steps: Labeling : select a variable and (non-deterministically) assign a value 1 from its domain Constraint propagation : propagate the assignment through the 2 constraints, and possibly detect inconsistencies FD-ADP-AD-FF-EP (UD-NMSU-PR) Exploring the Use of GPUs in Constraint Solving 3 / 1
Introduction Consistency techniques Idea: replace the current CSP by a “simpler” one, yet equivalent Definition (Arc Consistency) The most common notion of local consistency is arc consistency ( AC ). Let us consider a binary constraint c ∈ C , where scp( c ) = { x i , x j } and x i , x j ∈ X . We say that c is arc consistent if: • ∀ a ∈ D x i ∃ b ∈ D x j ( a , b ) ∈ c ; • ∀ b ∈ D x j ∃ a ∈ D x i ( a , b ) ∈ c ; It is possible to ensure AC by iteratively removing all the values of the variables involved in the constraint that are not consistent with the constraint until a fixpoint is reached FD-ADP-AD-FF-EP (UD-NMSU-PR) Exploring the Use of GPUs in Constraint Solving 4 / 1
Introduction Consistency techniques Idea: replace the current CSP by a “simpler” one, yet equivalent Definition (Arc Consistency) The most common notion of local consistency is arc consistency ( AC ). Let us consider a binary constraint c ∈ C , where scp( c ) = { x i , x j } and x i , x j ∈ X . We say that c is arc consistent if: • ∀ a ∈ D x i ∃ b ∈ D x j ( a , b ) ∈ c ; • ∀ b ∈ D x j ∃ a ∈ D x i ( a , b ) ∈ c ; It is possible to ensure AC by iteratively removing all the values of the variables involved in the constraint that are not consistent with the constraint until a fixpoint is reached The propagation engine computes a mutual fixpoint of all the constraint FD-ADP-AD-FF-EP (UD-NMSU-PR) Exploring the Use of GPUs in Constraint Solving 4 / 1
Introduction Consistency techniques Idea: replace the current CSP by a “simpler” one, yet equivalent Definition (Arc Consistency) The most common notion of local consistency is arc consistency ( AC ). Let us consider a binary constraint c ∈ C , where scp( c ) = { x i , x j } and x i , x j ∈ X . We say that c is arc consistent if: • ∀ a ∈ D x i ∃ b ∈ D x j ( a , b ) ∈ c ; • ∀ b ∈ D x j ∃ a ∈ D x i ( a , b ) ∈ c ; It is possible to ensure AC by iteratively removing all the values of the variables involved in the constraint that are not consistent with the constraint until a fixpoint is reached The propagation engine computes a mutual fixpoint of all the constraint Several algorithms based on fixpoint loop iteration to achieve (Arc) Consistency: AC3 , AC4 , AC6 , etc. FD-ADP-AD-FF-EP (UD-NMSU-PR) Exploring the Use of GPUs in Constraint Solving 4 / 1
GPUs, in few minutes Why GPUs? GPUs, in few minutes FD-ADP-AD-FF-EP (UD-NMSU-PR) Exploring the Use of GPUs in Constraint Solving 5 / 1
GPUs, in few minutes Compute Unified Device Architecture GPUs, in few minutes A GPU is a parallel machine with a lot of computing cores, with shared and a local memories, able to schedule the execution of a large number of threads. FD-ADP-AD-FF-EP (UD-NMSU-PR) Exploring the Use of GPUs in Constraint Solving 6 / 1
GPUs, in few minutes Compute Unified Device Architecture GPUs, in few minutes A GPU is a parallel machine with a lot of computing cores, with shared and a local memories, able to schedule the execution of a large number of threads. However, things are not that easy. Cores are organized hierarchically, memories have different behaviors, . . . it’s not easy to obtain a good speed-up. FD-ADP-AD-FF-EP (UD-NMSU-PR) Exploring the Use of GPUs in Constraint Solving 6 / 1
GPUs, in few minutes Compute Unified Device Architecture CUDA: Host, Global, Device FD-ADP-AD-FF-EP (UD-NMSU-PR) Exploring the Use of GPUs in Constraint Solving 7 / 1
GPUs, in few minutes Compute Unified Device Architecture CUDA: Host, Global, Device FD-ADP-AD-FF-EP (UD-NMSU-PR) Exploring the Use of GPUs in Constraint Solving 7 / 1
GPUs, in few minutes Compute Unified Device Architecture CUDA: Host, Global, Device FD-ADP-AD-FF-EP (UD-NMSU-PR) Exploring the Use of GPUs in Constraint Solving 7 / 1
GPUs, in few minutes Compute Unified Device Architecture CUDA: Host, Global, Device FD-ADP-AD-FF-EP (UD-NMSU-PR) Exploring the Use of GPUs in Constraint Solving 7 / 1
GPUs, in few minutes Compute Unified Device Architecture CUDA: Host, Global, Device FD-ADP-AD-FF-EP (UD-NMSU-PR) Exploring the Use of GPUs in Constraint Solving 7 / 1
GPUs, in few minutes Compute Unified Device Architecture CUDA: Host, Global, Device FD-ADP-AD-FF-EP (UD-NMSU-PR) Exploring the Use of GPUs in Constraint Solving 8 / 1
GPUs, in few minutes Compute Unified Device Architecture CUDA: Host, Global, Device FD-ADP-AD-FF-EP (UD-NMSU-PR) Exploring the Use of GPUs in Constraint Solving 8 / 1
GPUs, in few minutes Compute Unified Device Architecture CUDA: Host, Global, Device FD-ADP-AD-FF-EP (UD-NMSU-PR) Exploring the Use of GPUs in Constraint Solving 8 / 1
GPUs, in few minutes Compute Unified Device Architecture CUDA: Host, Global, Device FD-ADP-AD-FF-EP (UD-NMSU-PR) Exploring the Use of GPUs in Constraint Solving 8 / 1
GPUs, in few minutes Compute Unified Device Architecture CUDA: Memories FD-ADP-AD-FF-EP (UD-NMSU-PR) Exploring the Use of GPUs in Constraint Solving 9 / 1
GPUs, in few minutes Compute Unified Device Architecture How to . . . Can we perform propagation on GPGPUs? We will see a constraint engine that uses GPU to propagate constraints in parallel Several issues: memory accesses, slow GPU cores, data transfers, ... Different choices Preliminary results FD-ADP-AD-FF-EP (UD-NMSU-PR) Exploring the Use of GPUs in Constraint Solving 10 / 1
Parallel CP Parallel Constraint Solving: Parallel Consistency Establishing arc-consistency is P-complete; There are different parallel AC-based algorithms that can achieve 3 , 4 × speedup; Two main parallel strategies: parallel AC algorithms using shared memory 1 distributed AC algorithms 2 We focus on a shared memory AC algorithm FD-ADP-AD-FF-EP (UD-NMSU-PR) Exploring the Use of GPUs in Constraint Solving 11 / 1
Parallel CP Parallel AC algorithm - 1 Parallel algorithms for solving node and (bound) arc consistency; Strategy: check for consistency on all the arcs in the constraint queue simultaneously → O ( nd ) instead of O ( ed 3 ); We adopted 3 level of parallelism Constraints : one parallel block for each constraint Variables : one parallel thread for each variable CPU for efficient propagators and GPU for expensive propagators FD-ADP-AD-FF-EP (UD-NMSU-PR) Exploring the Use of GPUs in Constraint Solving 12 / 1
Parallel CP Parallel AC algorithm - 2 FD-ADP-AD-FF-EP (UD-NMSU-PR) Exploring the Use of GPUs in Constraint Solving 13 / 1
Parallel CP Parallel AC algorithm - 3 The constraint engine is based on the notion of events (not AC3!) Event : a change in the domain of a variable The queue of propagators is updated accordingly... FD-ADP-AD-FF-EP (UD-NMSU-PR) Exploring the Use of GPUs in Constraint Solving 14 / 1
Parallel CP Choices: Domain representation Domain as a Bitset 4 extra variables are used: (1) sign , (2) min , (3) max , and (4) event The use of bit-wise operators on domains reduces the differences between the GPU cores and the CPU cores FD-ADP-AD-FF-EP (UD-NMSU-PR) Exploring the Use of GPUs in Constraint Solving 15 / 1
Parallel CP Choices: Status representation The status of the computation is represented by a vector of M · | V | integer values where M is a multiple of 32 We take advantage of the device cache, since global memory accesses are cached and served as part of 128 − byte memory transactions. Coalesced memory accesses: the accesses to the global memory are coalesced for contiguous locations in global memory FD-ADP-AD-FF-EP (UD-NMSU-PR) Exploring the Use of GPUs in Constraint Solving 16 / 1
Parallel CP Choices: Data transfers FD-ADP-AD-FF-EP (UD-NMSU-PR) Exploring the Use of GPUs in Constraint Solving 17 / 1
Recommend
More recommend