parallel programming with session java
play

Parallel programming with Session Java Nicholas Ng ( - PowerPoint PPT Presentation

Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion Parallel programming with Session Java Nicholas Ng ( nickng@doc.ic.ac.uk ) Imperial College London 1/17 Introduction


  1. Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion Parallel programming with Session Java Nicholas Ng ( nickng@doc.ic.ac.uk ) Imperial College London 1/17

  2. Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion Motivation Parallel designs are difficult, error prone (eg. MPI) Session types ensure communication safety in concurrent systems So use session types to design safe parallel algorithms for high performance clusters 2/17

  3. Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion Contributions An implementation of parallel n-body simulation 1 Programmed in Session Java (SJ) , a full implementation of session types 2 Uses FPGA on the AXEL heterogeneous cluster A formal description of multicast outwhile, inwhile SJ primitives in session types Showed type soundness, progress property in SJ parallel programs connected in a ring topology Proved SJ n-body implementation deadlock free 3/17

  4. Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion Session types Typing system for [HVK98] π -calculus π -calculus models structured interactions between processes Main idea: communication primitives should have a dual Example (Conventional type system) int i = 9 i is type int 9 is type int Process A: c ab ! � 9 � ; P (send 9 to B via channel c ab ) Process B: c ab ?( x ) . Q (receive x from A via channel c ab ) A is type Send int (or c ab : ![ int ]) B is type Receive int (or c ab : ?[ int ]) 4/17

  5. Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion Session programming with SJ Session Java (SJ) [HYH08] A full implementation of binary session types in Java Provides a socket programming interface with eg. accept() , request() , send() , receive() Workflow of a SJ program: 1 Declare session type/ protocol of program in SJ 2 SJ compiler checks local session type conformance 3 Runtime duality check with communicating program 5/17

  6. Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion SJ features for parallel programming Iteration chaining Multi-channel inwhile and outwhile in place of reduce-scatter operations Master: <s1,s2>.outwhile(i<42){...} Forwarder Master 1 Forwarder1: s3.outwhile(s1.inwhile){...} Forwarder2: s4.outwhile(s2.inwhile){...} Forwarder End 2 End: <s3,s4>.inwhile(){...} 6/17

  7. Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion Simple example: N-body simulation n particles following Newton’s laws of motion Calculate the result force acting on each particle Displace the particle based on net force acting on it Figure: Result force is vector sum of all forces 7/17

  8. Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion Simple example: N-body simulation Implemented in a ring topology 3 kinds of processes - Master, Worker (multiple), LastWorker 1 Each allocated a partition of particles Worker 2 Calculate resultant forces on received set of particles Master Worker 3 Forward to next node Worker 4 Repeat until end of one time step Last 8/17

  9. Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion Another example: Jacobi method Iteration-based method for solving the Discrete Poisson Equation Used in physics and natural sciences Given initial prediction, iterate until converged or upper limit of iterations - edge edge - edge value value edge edge value value edge - edge edge - Figure: A sub-matrix of calculation 9/17

  10. Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion Another example: Jacobi method Implemented in a mesh topology (2D decomposition) 9 kinds of processes - one for each edge case and a Worker in the center 1 Each allocated a sub-matrix of values 2 1 4 Worker Worker Master North NorthEast 2 Calculate average of neighbouring values for all element 5 7 3 Worker Worker Worker West East 3 exchange edges to adjacent sub-grid 9 6 8 Worker Worker Worker 4 Repeat until converged SouthWest South SouthEast 10/17

  11. Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion AXEL: a heterogeneous cluster Axel [TL10] is a heterogeneous cluster that contains different Processing Elements (PE) on each node: CPU Off-the-shelf multicore x86 architecture CPU GPU Graphics Processing Unit , nVidia Tesla, dedicated General Purpose GPU FPGA Field Programmable Gate Arrays , reconfigurable hardware AXEL is a 16-node NNUS cluster Each node can be used as individual PC Connected by high speed Ethernet 11/17

  12. Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion Performance benchmark results Against MPJ Express [SCB09] , implementation of MPI in Java Performance competitive (Left: N-body simulation, Right: Jacobi method) n-Body simulation Jacobi solution of the Discrete Poisson Equation 10000 3000 Multi-channel SJ Multi-channel SJ 9000 Old SJ Old SJ MPJ Express MPJ Express 2500 8000 Runtime (milliseconds) 7000 Runtime (seconds) 2000 6000 5000 1500 4000 1000 3000 2000 500 1000 0 0 500 1000 1500 2000 2500 3000 1000 1500 2000 2500 3000 3500 4000 4500 5000 Number of particles per node Partition size 12/17

  13. Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion Performance benchmark results (with FPGA) Better performance with more particles Best performance: SJ+FPGA 2x faster than SJ implementation 80000 SJ + FPGA SJ MPJExpress 70000 60000 Runtime (milliseconds) 50000 40000 30000 20000 10000 0 0 10000 20000 30000 40000 50000 60000 70000 Number of particles 13/17

  14. Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion Well-formed topology Multichannel inwhile and outwhile not safe on its own Well-formed topology: Topology constructed as DAG with 1 root node and 1 sink node Individual pairs of sessions are dual Iteration controlled by a single condition in the Master node Deadlock freedom for group of processes in well-formed topology 1 2 4 3 5 7 6 8 9 14/17

  15. Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion Future (and ongoing) work C based language implementing session types Higher performance with FPGA or other acceleration hardware Can integrate with AXEL or similar HPC applications toolchain 15/17

  16. Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion 16/17

  17. Introduction Parallel programming examples Target architecture and benchmarks Theory of multichannel primitives Conclusion References Kohei Honda, Vasco T. Vasconcelos, and Makoto Kubo. Language primitives and type disciplines for structured communication-based programming. In ESOP’98 , volume 1381, pages 22–138, 1998. Raymond Hu, Nobuko Yoshida, and Kohei Honda. Session-based distributed programming in java. In ECOOP’08 , volume 5142 of LNCS , pages 516–541, 2008. Aamir Shafi, Bryan Carpenter, and Mark Baker. Nested Parallelism for Multi-core HPC Systems using Java. Journal of Parallel and Distributed Computing , 69(6):532 – 545, 2009. Kuen Hung Tsoi and Wayne Luk. Axel: a heterogeneous cluster with FPGAs and GPUs. In FPGA ’10: Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays , pages 115–124, New York, NY, USA, 2010. ACM. 17/17

Recommend


More recommend