Simulation of Computing P Systems: A GPU Design for the - PowerPoint PPT Presentation

Simulation of Computing P Systems: A GPU Design for the Factorization Problem Miguel Á. Martínez-del-Amor , David Orellana-Martín Ignacio Pérez-Hurtado, Luis Valencia-Cabrera Agustín Riscos-Núñez, Mario J. Pérez-Jiménez Research Group on Natural Computing Dept. Computer Science and Artificial Intelligence Universidad de Sevilla CMC19 , 4-7 September 2018, Dresden (Germany) M.Á. Martínez-del-Amor et al. (RGNC) Simulation of Computing P Systems CMC19, Dresden (Germany) 1 / 37

Contents GPU computing fundamentals 1 GPU simulators for P systems 2 Structure of a GPU simulator State of the art Other P system models Concepts for specific simulators 3 Future research lines 4 M.Á. Martínez-del-Amor et al. (RGNC) Simulation of Computing P Systems CMC19, Dresden (Germany) 2 / 37

GPU computing fundamentals Outline GPU computing fundamentals 1 GPU simulators for P systems 2 Structure of a GPU simulator State of the art Other P system models Concepts for specific simulators 3 Future research lines 4 M.Á. Martínez-del-Amor et al. (RGNC) Simulation of Computing P Systems CMC19, Dresden (Germany) 3 / 37

GPU computing fundamentals GPU computing Graphics Processor Unit (GPU) Data-parallel computing model: SPMD programming model ( S ame P rogram for M ultiple D ata ) Shared memory system New programming languages: CUDA, OpenCL, DirectCompute A GPU features thousand of cores M.Á. Martínez-del-Amor et al. (RGNC) Simulation of Computing P Systems CMC19, Dresden (Germany) 4 / 37

GPU computing fundamentals NVIDIA’s technology CUDA programming model 1 Heterogeneous model: CPU (host) + GPU (device). All threads execute the same code (kernel) in parallel. Three-level hierarchy of threads (grid, blocks, threads). Memory hierarchy (global, shared within block). 1 W.-M. Hwu, D. Kirk. Programming massively parallel processors, Morgan Kaufmann, 2010. M.Á. Martínez-del-Amor et al. (RGNC) Simulation of Computing P Systems CMC19, Dresden (Germany) 5 / 37

GPU computing fundamentals Why is the GPU interesting for simulating P systems? Desired properties: High level of parallelism (up to 4000 cores) Shared memory system (easily synchronized) Scalability and portability Known languages: C/C++, Python, Fortran... Cheap technology everywhere (cost and maintenance) Undesired properties: Best performance requires lot of research. Programming model imposes many restrictions M.Á. Martínez-del-Amor et al. (RGNC) Simulation of Computing P Systems CMC19, Dresden (Germany) 6 / 37

GPU simulators for P systems Structure of a GPU simulator Outline GPU computing fundamentals 1 GPU simulators for P systems 2 Structure of a GPU simulator State of the art Other P system models Concepts for specific simulators 3 Future research lines 4 M.Á. Martínez-del-Amor et al. (RGNC) Simulation of Computing P Systems CMC19, Dresden (Germany) 7 / 37

GPU simulators for P systems Structure of a GPU simulator GPU simulator workflow - Initialization (I) CPU (serial code) GPU (serial code) Read P system information: GPU memory + P system model description + Initial configuration P system info (rules, alphabet) Allocate memory in GPU P system configuration Auxiliary (incl. all possible membranes to (rule Copy P system information to GPU be generated during selection) computation) Copy P system initial config to GPU M.Á. Martínez-del-Amor et al. (RGNC) Simulation of Computing P Systems CMC19, Dresden (Germany) 8 / 37

GPU simulators for P systems Structure of a GPU simulator GPU simulator workflow - Simulation - Selection (II) CPU (serial code) GPU (serial code) Read P system information: GPU memory + P system model description + Initial configuration P system info P system configuration Auxiliary Allocate memory in GPU Copy P system information to GPU Copy P system initial config to GPU Call to Selection Kernel(s) GPU grid M.Á. Martínez-del-Amor et al. (RGNC) Simulation of Computing P Systems CMC19, Dresden (Germany) 9 / 37

GPU simulators for P systems Structure of a GPU simulator GPU simulator workflow - Simulation - Execution (III) CPU (serial code) GPU (serial code) Read P system information: GPU memory + P system model description + Initial configuration P system info P system configuration Auxiliary Allocate memory in GPU Copy P system information to GPU Copy P system initial config to GPU Call to Selection Kernel(s) GPU grid Call to Execution Kernel(s) REPEAT M.Á. Martínez-del-Amor et al. (RGNC) Simulation of Computing P Systems CMC19, Dresden (Germany) 10 / 37

GPU simulators for P systems Structure of a GPU simulator GPU simulator workflow - Wrap up (IV) CPU (serial code) GPU (serial code) Read P system information: GPU memory + P system model description + Initial configuration P system info P system configuration Auxiliary Allocate memory in GPU (incl. all possible membrane to be generated during computation) Copy P system information to GPU Copy P system initial config to GPU Call to Selection Kernel(s) Call to Execution Kernel(s) Copy P system configuration(s) back to CPU memory Report outcome of simulation M.Á. Martínez-del-Amor et al. (RGNC) Simulation of Computing P Systems CMC19, Dresden (Germany) 11 / 37

GPU simulators for P systems State of the art Outline GPU computing fundamentals 1 GPU simulators for P systems 2 Structure of a GPU simulator State of the art Other P system models Concepts for specific simulators 3 Future research lines 4 M.Á. Martínez-del-Amor et al. (RGNC) Simulation of Computing P Systems CMC19, Dresden (Germany) 12 / 37

GPU simulators for P systems State of the art Simulation approaches Generic approach: simulator for a variant / class (under restrictions). Specific approach: simulator for a certain family / model. M.Á. Martínez-del-Amor et al. (RGNC) Simulation of Computing P Systems CMC19, Dresden (Germany) 13 / 37

GPU simulators for P systems State of the art Simulating models (“generic” approach) P systems with active membranes Rooted tree of membranes. Polarization and no cooperation (only one object in LHS). Rules: Evolution, send-in, send-out, division and dissolution. Assumptions to simplify the simulator: Confluent models Only two-level trees (skin and elementary membranes) M.Á. Martínez-del-Amor et al. (RGNC) Simulation of Computing P Systems CMC19, Dresden (Germany) 14 / 37

GPU simulators for P systems State of the art Simulating models (“generic” approach) Mapping double parallelism: Membranes to Thread Blocks Objects to Threads : thanks to no-cooperative rules, it is enough to check the existence of one object to trigger a rule. M.Á. Martínez-del-Amor et al. (RGNC) Simulation of Computing P Systems CMC19, Dresden (Germany) 15 / 37

GPU simulators for P systems State of the art Simulating models (“generic” approach) Performance analysis Two benchmarks (on a C1060 with 240 cores): A. A simple test P system 2 Max speedup: 5.8x B. An efficient solution to SAT Max speedup: 1.5x ( n = 18, 2 18 membranes) # Objects Reality Density of objects per membrane: WorstCase = AlphabetSize Test A: 100% Test B: ∼ 15% 2 One division rule: [ d ] 2 → [ d ] 2 [ d ] 2 , Many evolution rules: [ o i → o i ] 2 , 0 ≤ i ≤ N M.Á. Martínez-del-Amor et al. (RGNC) Simulation of Computing P Systems CMC19, Dresden (Germany) 16 / 37

GPU simulators for P systems State of the art Simulating models (“generic” approach) Foreseen performance by Sevilla Carpets: D. Orellana-Martín et al. Sevilla Carpets revisited: Enriching the Membrane Computing toolbox. Fundamenta Informaticae, 134 (2014), 153-166. The flatter the carpet, the higher the parallel degree in the system (and so, in the simulation). M.Á. Martínez-del-Amor et al. (RGNC) Simulation of Computing P Systems CMC19, Dresden (Germany) 17 / 37

GPU simulators for P systems State of the art Simulating models (“specific” approach) Cell-like solution to SAT P systems with active membranes A specific linear time solution to SAT , with exponential workspace Encoding: Objects: literals of the formula and auxiliary (counters, etc.) Membranes: truth assignments A 4-staged solution: Generation 1 Synchronization 2 Check out 3 Output 4 M.Á. Martínez-del-Amor et al. (RGNC) Simulation of Computing P Systems CMC19, Dresden (Germany) 18 / 37

GPU simulators for P systems State of the art Simulating models (“specific” approach) Cell-like solution to SAT - parallel design Membranes to Thread Blocks Objects in initial multiset to Threads : we have constrained the number of threads to the amount of different objects in the initial multiset. M.Á. Martínez-del-Amor et al. (RGNC) Simulation of Computing P Systems CMC19, Dresden (Germany) 19 / 37

Simulation of Computing P Systems: A GPU Design for the - PowerPoint PPT Presentation

Simulation of Computing P Systems: A GPU Design for the Factorization Problem Miguel . Martnez-del-Amor , David Orellana-Martn Ignacio Prez-Hurtado, Luis Valencia-Cabrera Agustn Riscos-Nez, Mario J. Prez-Jimnez Research Group

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Tsunami simulation on FPGA/GPU Tsunami simulation on FPGA/GPU and its analysis based on Statistical

MULTI-GPU TRAINING WITH NCCL Sylvain Jeaugey MULTI-GPU COMPUTING Harvesting the power of

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Outline Narcisse Ngada DESY, MKK 1) What is simulation ? 14.05.2014 2) Why simulation ? 3)

Super GPU & Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8,

Grid simulation (AliEn) Outline GRID simulation Simulation tool Ptolemy (Berkeley)

HPC AND IN THE DATA CENTER Peter Messmer, DATE 2019, March 27 2019 RISE OF GPU COMPUTING 1000X

Use Tesla to provide first GPU VM Service in China Feng Zhu

THEIA GPU Open Source multicore programmable GPU Problem Statement Develop an open source 3D

Performance Evaluation of a Multithreaded GPU Using CUDA GPU architecture GeForce 8800 GPU

GPU Architecture and chitecture and GPU Ar The good The good The bad The bad

GPU programming in Haskell Henning Thielemann 2015-01-23 GPU programming in Haskell Motivation:

MVAPICH2-GPU: Op0mized GPU to GPU Communica0on for InfiniBand

Hierarchical Content Stores in High-Speed ICN Routers: Emulation and Prototype Implementation

Arduino Magic Wand Workshop petewarden@google.com Goal By the end of this workshop, you should

Lecture 8: Parallelism and Locality in Scientific Codes David Bindel 22 Feb 2010 Logistics

Distinguishing prime numbers from composite numbers: the state of the art D. J. Bernstein

ART AS EXPERIMENT Our central and consistent effort is to teach method, not content; to

Large Area Resis+ve Micromegas for the Upgrade of the

Linux on ARM Gernot Kvas (gernot.kvas@fh-joanneum.at) April 19, 2008 Gernot Kvas

Powering of Detector Systems Satish Dhawan, Yale University Richard Sumner , CMCAMAC LLC AWLC

Simulation of Computing P Systems: A GPU Design for the - PowerPoint PPT Presentation

Simulation of Computing P Systems: A GPU Design for the Factorization Problem Miguel . Martnez-del-Amor , David Orellana-Martn Ignacio Prez-Hurtado, Luis Valencia-Cabrera Agustn Riscos-Nez, Mario J. Prez-Jimnez Research Group

Status of GPU offloading on Wayland Axel Davy FOSDEM 2014 Status of GPU offloading on Wayland

Motivation to Learn GPGPU Julius Parulek Why to Learn About GPU? Computational power of GPU vs.

Tsunami simulation on FPGA/GPU Tsunami simulation on FPGA/GPU and its analysis based on Statistical

MULTI-GPU TRAINING WITH NCCL Sylvain Jeaugey MULTI-GPU COMPUTING Harvesting the power of

UNIFIED MEMORY ON PASCAL AND VOLTA Nikolay Sakharnykh - May 10, 2017 1 HETEROGENEOUS

Advancements in V-Ray RT GPU Vlado Koylazov, CTO &amp; Co-founder Blagovest Taskov, RT GPU Team

Outline Narcisse Ngada DESY, MKK 1) What is simulation ? 14.05.2014 2) Why simulation ? 3)

Super GPU &amp; Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8,

Grid simulation (AliEn) Outline GRID simulation Simulation tool Ptolemy (Berkeley)

HPC AND IN THE DATA CENTER Peter Messmer, DATE 2019, March 27 2019 RISE OF GPU COMPUTING 1000X

Use Tesla to provide first GPU VM Service in China Feng Zhu

THEIA GPU Open Source multicore programmable GPU Problem Statement Develop an open source 3D

Performance Evaluation of a Multithreaded GPU Using CUDA GPU architecture GeForce 8800 GPU

GPU Architecture and chitecture and GPU Ar The good The good The bad The bad

GPU programming in Haskell Henning Thielemann 2015-01-23 GPU programming in Haskell Motivation:

MVAPICH2-GPU: Op0mized GPU to GPU Communica0on for InfiniBand

Hierarchical Content Stores in High-Speed ICN Routers: Emulation and Prototype Implementation

Arduino Magic Wand Workshop petewarden@google.com Goal By the end of this workshop, you should

Lecture 8: Parallelism and Locality in Scientific Codes David Bindel 22 Feb 2010 Logistics

Distinguishing prime numbers from composite numbers: the state of the art D. J. Bernstein

ART AS EXPERIMENT Our central and consistent effort is to teach method, not content; to

Large Area Resis+ve Micromegas for the Upgrade of the

Linux on ARM Gernot Kvas (gernot.kvas@fh-joanneum.at) April 19, 2008 Gernot Kvas

Powering of Detector Systems Satish Dhawan, Yale University Richard Sumner , CMCAMAC LLC AWLC

Advancements in V-Ray RT GPU Vlado Koylazov, CTO & Co-founder Blagovest Taskov, RT GPU Team

Super GPU & Super Kernels: Make programming of multi-GPU systems easy Michael Frumkin, May 8,