Programming Soft Processors in High Performance Reconfigurable Computing Andrew W. H. House & Paul Chow University of Toronto Workshop on Soft Processor Systems 26 October 2008
Outline ● Introduction ● High Performance Reconfigurable Computing ● Programming Models for HPRC ● Our Proposed Programming Model ● Implications for Soft Processors ● Conclusion
Introduction ● Traditional microprocessors are starting to reach a performance plateau. – In HPC there has been much interest in using accelerators (like GPUs) for computation. ● Reconfigurable hardware has much to offer. – Application speedups, lower power consumption, flexibility all possible. ● Soft processors can have an important role in high performance reconfigurable computing systems.
High Performance Reconfigurable Computing
High Performance Reconfigurable Computing ● HPRC is a branch of high performance computing (HPC) that uses reconfigurable hardware (typically FPGAs) to accelerate computation. ● HPRC shares a number of characteristics with other accelerator-based HPC solutions. ● Let's start with some definitions....
High Performance Computing ● Massively parallel multiprocessor systems ● Shared or distributed memory ● High end systems have specialized interconnection network
HPRC Classifications ● We can classify HPRC systems into three categories: – Accelerated Reconfigurable Multiprocessors – Application-Specific Reconfigurable Multiprocessors – Heterogeneous Peer Reconfigurable Multiprocessors ● These mirror our more general definitions for accelerator-based systems.
Accelerated Reconfigurable Multiprocessor ● Reconfigurable hardware is co- processor ● Subordinate to CPU ● May have its own memory, but CPU controls access to main memory and interconnect
Application-Specific Reconfigurable Multiprocessor ● Uses only FPGAs or other reconfigurable hardware as computing elements ● FPGAs have direct connections to main memory and interconnect
Heterogeneous Peer Reconfigurable Multiprocessor ● FPGAs are first- class computing elements, having equal access to system resources as CPUs and other accelerators ● This is most likely scenario for future of reconfigurable computing(?)
Soft Processors in HPRC ● Can soft processors compete with hard ones? Maybe. – Many soft processors in parallel can exploit massive FPGA on-chip memory bandwidth. – Application-specific soft processors can offer significant performance (Mitrion, Tensilica...). ● Soft processors can also be used for: – Controlling interaction between hardware kernels. – Interfacing hardware with host CPUs.
Programming Models for HPRC
Programming Models for HPRC ● Consider existing programming models: – Parallel programming for HPC – Programming for reconfigurable computing – Hardware-Software Co-design ● Effectively integrating soft processors into HPRC provides challenges to all of these paradigms.
Challenges in Programming HPRC ● Handling Multiple FPGAs – Multiple processors in each? ● Heterogeneity – Both hard and soft processors, plus other processing elements. ● Application Partitioning – Between processing elements, or processors and hardware kernels. ● Synthesis – Generation of ASP/kernels from high level.
Parallel Programming for HPC ● Good at handling massive data parallelism. ● Shared memory programming and message passing are dominant. – Single-program multiple-data (SPMD) most scalable paradigm for message passing. – Partitioned global address space (PGAS) offers shared memory abstraction for distributed memory machines. ● SPMD/PGAS approaches not as effective in heterogeneous environments.
Application to Soft Processors ● Program them in the same way! – TMD-MPI is an MPI implementation for Xilinx Microblaze and PPC processors. – Couples processors and hardware kernels. – Regular MPI applications ported easily. ● But standard models don't scale in a heterogeneous environment. – Runtime partitioning systems like RapidMind or Intel Ct not currently relevant for FPGA- based systems.
Programming Model Requirements ● For these emerging HPRC systems, the programming model must: – Support heterogeneity, – Be scalable, – Be synthesizable, – Assume limited system services, – Support varied types of computation, – Expose coarse- and fine-grain parallelism, – Separate algorithm and implementation, – Be architecture-independent, – Provide execution model transparency.
n o i t a t n e m y e c l n n g p s o e n m e t i r i n c t a s i a r e p s d t u d u s n e o p n n c a s e a o m e m m p r r t P R o e h s l C d i t e s m l i n d f e r u e o e o i o l o l t l a g e b s m e s a r l r n y e a u a z S p n e p t i e y c o g e s d t i l T e a e o e t b e t s r u r h y i a t o a h e i c l t n p a n m p c t e a e c y x e r x i M H S S L E S A E Model Language Data Parallel HPF C* Dataparallel C mpC RapidMind Dataflow and Functional Simulink LabVIEW Prograph Lustre Multilisp SISAL SAC Cilk CellSs Mitrion-C Stream Computing StreamIT CSP Occam MPI PVM Active Messages HDLs Verilog/VHDL Handel-C Shared Memory PRAM SHMEM OpenMP Linda Orca PGAS Split-C UPC Titanium Co-Array Fortran ZPL Fortress X10 Chapel Parallel Objects CAL
So what now? ● None of the existing models meet our nine general criteria. ● Making the best use of soft processors will also require a tool that can automatically: – Partition applications. – Identify parts of application for soft processors. – Generate soft ASPs. – Generate software for whole system. – Generate hardware kernels where needed. – Manage communication. ● We need a new programming model.
Our Proposed Programming Model
A New Programming Model ● New language (or adaptation of existing language) designed to meet the nine requirements. ● Include features such as: – Data-parallel operations. – Region-based array management. – Global view of memory. – High level, implicit communication and synchronization. – Emphasizes use of libraries and functions. ● Compatible with soft processor design flow.
The Armada Language ● High level languages for writing simulations. ● Data-parallel, PGAS-style language. ● Provides high-level operators and built-in functions (eg. Matrix multiply). ● Functions are free from side effects. ● Program interpreted as dataflow. ● No pointers/direct memory manipulation. ● Region-based array management.
Sample Code for(i := 1 to NUMSTEPS) { // Calculate forces foreach(i,j in i := 1 TO NP : j := i+1 TO NP) { var real forceij := calculateForce(pMass[i], pMass[j], pPos[i], pPos[j]); allForces[i,j] := forceij; allForces[j,i] := -forceij; } // sum columns of allForces array SummedForces[] := sum(allForces[1 TO NP, 1 ACROSS NP]); // update velocities and positions var array[1 TO NP] of triple a := summedForces[]/pMass[]; pPos[] := pPos[] + (pVel[]*TIMESTEP) + (0.5 * (TIMESTEP^2) * a[]); pVel[] := pVel[] + (a[]*TIMESTEP); }
Comments on Armada ● Meant to write algorithms and simulations, not systems programming. ● Designed to expose parallelism as much as possible, while reducing or eliminating troublesome features (eg. pointer indirection) ● But it is still just a language. – Effectiveness will be heavily dependent on the back-end tools and design flow.
Planned Armada Design Flow C++ code partitioning, code generation, Armada Front-end Back-end Compiler – design interfacing with other tools program file(s) Compiler C++ code using MPI HDL code for single Application FPGA Description File HDL code & Platform scripts for Description multi-FPGA File(s)
Recommend
More recommend