rMPI: Message Passing on Multicore Processors with On-Chip Interconnect 19. Oktober 2009 www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect
2 Outline — Background — RAW microprocessor — rMPI — Evaluation — Discussion www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect
3 Why? — Chips offering on-chip network — Ease programmability — MPI is a well known standard — Migrating existing code base is easy — Fine grain program control if necessary www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect
4 RAW overview — Developed at MIT — Tiled architecture (16 in ASIC implementation) — 8 stage in-order single issue pipeline — 32kB hardware-managed data cache — 32kB software-managed instruction cache — 64kB software-managed switch instruction memory www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect
5 Architecture overview www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect
6 RAW architecture — ISA allows direct control over network — Four 32-bit networks • Two static, compile time • Two dynamic, programmable — General Dynamic Network (GDN) • Used by rMPI • 32 bit header • Messages up to 32 words • Guarantees message delivery atomically and in-order www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect
7 RAW pipeline www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect
8 rMPI — MPI on RAW — Borrowed ideas from LAM/MPI and MPICH — 75 KLOC! www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect
9 rMPI architecture www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect
10 rMPI packet format www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect
11 Receiving — Using RAW fast interrupt handler — Interrupt handler sorts and assembles packets — Drains network of contents — Interrupt driven design: • Allows asynchronous communication and computation • Reduce network contention • Avoids deadlocks (blocking sends) • No OS layer that increases delay www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect
12 Methodology — Collected results with simulator — LAM/MPI: • 128 nodes • Two 2GHz opteron per node, 4GB RAM (use only 1 CPU) • 10GB Ethernet — Speedups relative to a single CPU on each platform running serial implementation www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect
13 End-To-End overhead www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect
14 End-To-End overhead comparison www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect
15 Problems — Balance between performance and programmability — GDN requires manual packet splitting and reassembly in software — rMPI gives too much overhead for small packets — Guidelines for future designers: • Handles packet splitting and sending • Prevent deadlocks • Middle ground between GDN and rMPI www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect
16 Performance scaling — Jacobi relaxation • Low send/receive overhead • 16x16 to 2048x2048 matrices — Matrix multiply — Trapezoidal integration — Parallel pi estimation — Better performance scalability for computationally-intensitive www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect
17 Jacobi speedup www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect
18 Speedup summary www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect
19 DRAM impact www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect
20 Overhead www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect
21 Instruction cache size www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect
22 Matrix multiply www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect
23 LAM/MPI latency www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect
24 Discussion!! www.ntnu.no , rMPI: Message Passing on Multicore Processors with On-Chip Interconnect
Recommend
More recommend