programming soft processors in high performance
play

Programming Soft Processors in High Performance Reconfigurable - PowerPoint PPT Presentation

Programming Soft Processors in High Performance Reconfigurable Computing Andrew W. H. House & Paul Chow University of Toronto Workshop on Soft Processor Systems 26 October 2008 Outline Introduction High Performance Reconfigurable


  1. Programming Soft Processors in High Performance Reconfigurable Computing Andrew W. H. House & Paul Chow University of Toronto Workshop on Soft Processor Systems 26 October 2008

  2. Outline ● Introduction ● High Performance Reconfigurable Computing ● Programming Models for HPRC ● Our Proposed Programming Model ● Implications for Soft Processors ● Conclusion

  3. Introduction ● Traditional microprocessors are starting to reach a performance plateau. – In HPC there has been much interest in using accelerators (like GPUs) for computation. ● Reconfigurable hardware has much to offer. – Application speedups, lower power consumption, flexibility all possible. ● Soft processors can have an important role in high performance reconfigurable computing systems.

  4. High Performance Reconfigurable Computing

  5. High Performance Reconfigurable Computing ● HPRC is a branch of high performance computing (HPC) that uses reconfigurable hardware (typically FPGAs) to accelerate computation. ● HPRC shares a number of characteristics with other accelerator-based HPC solutions. ● Let's start with some definitions....

  6. High Performance Computing ● Massively parallel multiprocessor systems ● Shared or distributed memory ● High end systems have specialized interconnection network

  7. HPRC Classifications ● We can classify HPRC systems into three categories: – Accelerated Reconfigurable Multiprocessors – Application-Specific Reconfigurable Multiprocessors – Heterogeneous Peer Reconfigurable Multiprocessors ● These mirror our more general definitions for accelerator-based systems.

  8. Accelerated Reconfigurable Multiprocessor ● Reconfigurable hardware is co- processor ● Subordinate to CPU ● May have its own memory, but CPU controls access to main memory and interconnect

  9. Application-Specific Reconfigurable Multiprocessor ● Uses only FPGAs or other reconfigurable hardware as computing elements ● FPGAs have direct connections to main memory and interconnect

  10. Heterogeneous Peer Reconfigurable Multiprocessor ● FPGAs are first- class computing elements, having equal access to system resources as CPUs and other accelerators ● This is most likely scenario for future of reconfigurable computing(?)

  11. Soft Processors in HPRC ● Can soft processors compete with hard ones? Maybe. – Many soft processors in parallel can exploit massive FPGA on-chip memory bandwidth. – Application-specific soft processors can offer significant performance (Mitrion, Tensilica...). ● Soft processors can also be used for: – Controlling interaction between hardware kernels. – Interfacing hardware with host CPUs.

  12. Programming Models for HPRC

  13. Programming Models for HPRC ● Consider existing programming models: – Parallel programming for HPC – Programming for reconfigurable computing – Hardware-Software Co-design ● Effectively integrating soft processors into HPRC provides challenges to all of these paradigms.

  14. Challenges in Programming HPRC ● Handling Multiple FPGAs – Multiple processors in each? ● Heterogeneity – Both hard and soft processors, plus other processing elements. ● Application Partitioning – Between processing elements, or processors and hardware kernels. ● Synthesis – Generation of ASP/kernels from high level.

  15. Parallel Programming for HPC ● Good at handling massive data parallelism. ● Shared memory programming and message passing are dominant. – Single-program multiple-data (SPMD) most scalable paradigm for message passing. – Partitioned global address space (PGAS) offers shared memory abstraction for distributed memory machines. ● SPMD/PGAS approaches not as effective in heterogeneous environments.

  16. Application to Soft Processors ● Program them in the same way! – TMD-MPI is an MPI implementation for Xilinx Microblaze and PPC processors. – Couples processors and hardware kernels. – Regular MPI applications ported easily. ● But standard models don't scale in a heterogeneous environment. – Runtime partitioning systems like RapidMind or Intel Ct not currently relevant for FPGA- based systems.

  17. Programming Model Requirements ● For these emerging HPRC systems, the programming model must: – Support heterogeneity, – Be scalable, – Be synthesizable, – Assume limited system services, – Support varied types of computation, – Expose coarse- and fine-grain parallelism, – Separate algorithm and implementation, – Be architecture-independent, – Provide execution model transparency.

  18. n o i t a t n e m y e c l n n g p s o e n m e t i r i n c t a s i a r e p s d t u d u s n e o p n n c a s e a o m e m m p r r t P R o e h s l C d i t e s m l i n d f e r u e o e o i o l o l t l a g e b s m e s a r l r n y e a u a z S p n e p t i e y c o g e s d t i l T e a e o e t b e t s r u r h y i a t o a h e i c l t n p a n m p c t e a e c y x e r x i M H S S L E S A E Model Language Data Parallel HPF         C*         Dataparallel C         mpC        RapidMind       Dataflow and Functional Simulink        LabVIEW        Prograph        Lustre        Multilisp       SISAL       SAC     Cilk       CellSs        Mitrion-C        Stream Computing StreamIT       CSP Occam       MPI        PVM        Active Messages        HDLs Verilog/VHDL          Handel-C          Shared Memory PRAM      SHMEM        OpenMP        Linda         Orca         PGAS Split-C        UPC        Titanium        Co-Array Fortran      ZPL      Fortress       X10        Chapel       Parallel Objects CAL       

  19. So what now? ● None of the existing models meet our nine general criteria. ● Making the best use of soft processors will also require a tool that can automatically: – Partition applications. – Identify parts of application for soft processors. – Generate soft ASPs. – Generate software for whole system. – Generate hardware kernels where needed. – Manage communication. ● We need a new programming model.

  20. Our Proposed Programming Model

  21. A New Programming Model ● New language (or adaptation of existing language) designed to meet the nine requirements. ● Include features such as: – Data-parallel operations. – Region-based array management. – Global view of memory. – High level, implicit communication and synchronization. – Emphasizes use of libraries and functions. ● Compatible with soft processor design flow.

  22. The Armada Language ● High level languages for writing simulations. ● Data-parallel, PGAS-style language. ● Provides high-level operators and built-in functions (eg. Matrix multiply). ● Functions are free from side effects. ● Program interpreted as dataflow. ● No pointers/direct memory manipulation. ● Region-based array management.

  23. Sample Code for(i := 1 to NUMSTEPS) { // Calculate forces foreach(i,j in i := 1 TO NP : j := i+1 TO NP) { var real forceij := calculateForce(pMass[i], pMass[j], pPos[i], pPos[j]); allForces[i,j] := forceij; allForces[j,i] := -forceij; } // sum columns of allForces array SummedForces[] := sum(allForces[1 TO NP, 1 ACROSS NP]); // update velocities and positions var array[1 TO NP] of triple a := summedForces[]/pMass[]; pPos[] := pPos[] + (pVel[]*TIMESTEP) + (0.5 * (TIMESTEP^2) * a[]); pVel[] := pVel[] + (a[]*TIMESTEP); }

  24. Comments on Armada ● Meant to write algorithms and simulations, not systems programming. ● Designed to expose parallelism as much as possible, while reducing or eliminating troublesome features (eg. pointer indirection) ● But it is still just a language. – Effectiveness will be heavily dependent on the back-end tools and design flow.

  25. Planned Armada Design Flow C++ code partitioning, code generation, Armada Front-end Back-end Compiler – design interfacing with other tools program file(s) Compiler C++ code using MPI HDL code for single Application FPGA Description File HDL code & Platform scripts for Description multi-FPGA File(s)

Recommend


More recommend