Second Programming Assignment l Parallel Implementation of Minimum - PowerPoint PPT Presentation

Second Programming Assignment l Parallel Implementation of Minimum Spanning Tree l Deadline: Monday April 27 l Implementation & Benchmark. l Based on Bor ů vka's algorithm. l Platform: DAS-4 (16 nodes, 2 CPUs/node, 2 threads/core)

Second Programming Assignment l Parallelization only to be done by distributing the work over different nodes. So, per node only one CPU core and one thread will be used. So, the maximum degree of parallelism is limited to 16 !!! l Test graphs are taken from the UF Matrix Collection. Three of which were selected: mouse_gene ( 29M nnz), ldoor ( 42M nnz), nkpkkt240 (760M nnz).

A Short DAS History (1) l DAS configurations at Leiden University. l DAS3 (2007): 32 nodes - Each node: 2 CPUs (single core), 4GB RAM - Totals: 64 CPUs / 64 cores / 64 threads, 256GB RAM l DAS4 (2011): 16 nodes - Each node: 2 CPUs (4 cores, 2 threads/core), 48GB RAM - Totals: 32 CPUs / 128 cores / 256 threads, 768GB RAM

A Short DAS History (2) l DAS5 (2015): 24 nodes - Each node: 2 CPUs (8 cores, 2 threads/core), 64GB RAM - Totals: 48 CPUs / 384 cores / 768 threads, 1536GB RAM 1800 1600 1400 1200 Cores 1000 Threads 800 Memory (GB) 600 400 200 0 2007 2011 2015

MPI l Communication between processes in a distributed program is typically implemented using MPI: Message Passing Interface. l MPI is a generic API that can be implemented in different ways: - Using specific interconnect hardware, such as InfiniBand. - Using TCP/IP over plain Ethernet. - Or even used (emulated) on Shared Memory for inter process communication on the same node.

Some MPI basic functions l #include <mpi.h> l Initialize library: MPI_Init(&argc, &argv); l Determine number of processes that take part: int n_procs; MPI_Comm_size(MPI_COMM_WORLD, &n_procs); ( MPI_COMM_WORLD is the initially defined universe intracommunicator for all processes ) l Determine ID of this process: int id; MPI_Comm_rank(MPI_COMM_WORLD, &id);

Sending Messages MPI_Send(buffer,count,datatype,dest,tag,comm); Ø buffer: pointer to data buffer. Ø count: number of items to send. Ø datatype: data type of the items (see next slide). All items must be of the same type. Ø dest: rank number of destination. Ø tag: message tag (integer), may be 0. You can use this to distinguish between different messages. Ø comm: communicator, for instance MPI_COMM_WORLD. Note: this is a blocking send!

MPI data types You must specify a data type when performing MPI transmissions. l For instance for built-in C types: - "int" translates to MPI_INT - "unsigned int" to MPI_UNSIGNED - "double" to MPI_DOUBLE, and so on. l You can define your own MPI data types, for example if you want to send/receive custom structures.

Other calls l MPI_Recv() l MPI_Isend(), MPI_Irecv() - Non-blocking send/receive l MPI_Scatter(), MPI_Gather() l MPI_Bcast() l MPI_Reduce()

Shutting down l MPI_Finalize()

Example: Computing Pi Source: http://www.mcs.anl.gov/research/projects/mpi/tutorial/ mpiexmpl/src/pi/C/ int main(int argc, char *argv[]) { int done = 0, n, myid, numprocs, i; double PI25DT = 3.141592653589793238462643; double mypi, pi, h, sum, x; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&myid);

while (!done) { if (myid == 0) { printf("Enter number of intervals: (0 quits)"); scanf("%d",&n); } MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD); if (n == 0) break; h = 1.0 / (double) n; sum = 0.0; for (i = myid + 1; i <= n; i += numprocs) { x = h * ((double)i - 0.5); sum += 4.0 / (1.0 + x*x); } mypi = h * sum; MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); if (myid == 0) printf("pi = approximately %.16f, Error is %.16f\n", pi, fabs(pi - PI25DT)); } MPI_Finalize(); return 0; }

Compiling and Running l Compile using "mpicc", this automatically links in necessary libraries. l Run the program using "mpirun". On systems like DAS, mpirun is started from a "job script”. l The job script is submitted to the job scheduler, which will run your job once your resource reservation can be fulfilled. l See assignment text for links to more detailed manuals.

Second Programming Assignment l Parallel Implementation of Minimum - PowerPoint PPT Presentation

Second Programming Assignment l Parallel Implementation of Minimum Spanning Tree l Deadline: Monday April 27 l Implementation & Benchmark. l Based on Bor vka's algorithm. l Platform: DAS-4 (16 nodes, 2 CPUs/node, 2

FY1 0 Second Quarter FY1 0 Second Quarter FY1 0 Second Quarter FY1 0 Second Quarter

CS255 Programming Assignment #1 Programming Assignment #1 Due: Friday Feb 10 th (11:59pm)

Second Quarter 2011 July 28, 2011 Results Second Quarter 2011 Results 1 Contents 1. Second

JAVASCRIPT PROGRAMMING Functions Examples Homework assignment

GPU PROGRAMMING 2 GPU Programming Assignment 4 Consists of

Second Interim Management Statement 2013 Second Interim Management Statement 2013 Second Interim

C OMPUTATIONAL A SPECTS OF C OMPUTATIONAL D IGITAL P HOTOGRAPHY P HOTOGRAPHY Assignment 0: C++

DPS915 Presentation Ray Tracing Parallelization Soutrik Barua Faiq Malik Assignment

Objects Announcements for Today Assignment 1 Assignment 2 We are starting grading

Assignment Design Assignment Design Across the Curriculum: Across the Curriculum: Cueing for

Objects Announcements for Today Assignment 1 Assignment 2 We are starting grading

Volunteer Name: State of Origin: Occupation: Assignment Title: SOW NO: Host Organization:

Dedicated Storage Assignment (DSAP) The assignment of items to slots is termed slotting

Announcements Assignment 4 due today. Assignment 5 uploaded to website and Piazza. Will be due

Assignment # 2 So You Want to Write a Physically Based Motion Which is something you may wish

CSE 158/258 Web Mining and Recommender Systems Assignment 2 Assignment 2 Open-ended Due

Scheduling Algorithm and Analysis RT Synchronization Protocol (Module 34) Yann-Hang Lee Arizona

Python Programming: An Introduction to Computer Science Chapter 13 Algorithm Design and

Facilitating HPC job debugging through job scripts archival Andy Georges 2 February 2020 FOSDEM

Scheduling Aperiodic and Sporadic J obs Def init ions Comparison t o t radit ional

Kernel development: How things go wrong (And why you should participate anyway) Jonathan Corbet

Turning error-reducing quantum turbo codes into error-correcting codes Mamdouh Abbara (MEc),

A Chatbot that walks you through your job search Overview 1. Motivation 2. System Architecture

McCombs Career Webinar May 19, 2011 Navigating the Job Search Process Presented by Amber N