Euro PVM/MPI 2003 1/22 Venezia, Italia Efficient Parallel - PowerPoint PPT Presentation

Euro PVM/MPI 2003 1/22 Venezia, Italia Efficient Parallel Implementation of Transitive Closure of Digraphs C. E. R. Alves Univsidade S˜ ao Judas Tadeu E. N. C´ aceres � Universidade Federal de Mato Grosso do Sul A. A. Castro Jr. � Universidade Cat´ olica Dom Bosco � S. W. Song � Universidade de S˜ ao Paulo J. L. Szwarcfiter � Universidade Federal do Rio de Janeiro � �

2/22 The Transitive Closure Problem • Used in many areas such as – Network Planning – Distributed Systems Design • Used in problems such as – All Shortest Paths in a Directed Graph – Breadth-First Spanning Trees � • Directed graph D ( V, E ) with | V | = n , | E | = m � • We present a parallel algorithm to compute its transi- � tive closure using � – p processors � – each with O ( n 2 p ) local memory � �

3/22 Example 5 3 2 4 6 � � � 1 � A directed graph. � � �

4/22 Example 5 3 2 4 6 � � � 1 � Its transitive closure: green edges joining i to j if j can � be reached from i . � �

5/22 BSP/CGM Model CGM (Coarse Grained Multicomputer) model: p of processors, each with its own local memory, communicating through a network. The algorithm alternates between • Computation round: each processor computes inde- pendently. • Communication round: each processor sends/receives � data to/from other processors. � � Goals: � • Obtain a linear speed-up on p . � • Minimize the number of rounds. � �

6/22 The CGM Model Computation round Communication round P p − 1 P 2 � � P 1 � Global Communication � Synchronization Barrier P 0 � Local computation � �

7/22 Previous Parallel Algorithms 1. PRAM: • Karp et al.: CREW: O (log 2 n ) time with O ( M ( n )) 1 processors. a: CRCW: O (log n ) time with O ( n 3 ) processors. • J´ aJ´ 2. C´ aceres et al.: Acyclic digraph with linear extension labeling O ( logp ) rounds with O ( n 3 /p ) local time � 3. Dependency Graph Approach: � O ( p ) rounds with O ( n 3 /p ) local • Pagourtzis et al.: � time � � � 1 M ( n ) is the best known sequential bound for multiplying two n × n matrices over a ring �

8/22 Warshall’s Algorithm Algorithm 1: Warshall’s Algorithm Input: Adjacency matrix M n × n of graph G Output: Transitive closure of graph G 1: for k ← 1 until n do for i ← 1 until n do 2: for j ← 1 until n do 3: M [ i, j ] ← M [ i, j ] or ( M [ i, k ] and M [ k, j ]) � 4: end for 5: � end for 6: � 7: end for � � � �

9/22 Partitioning the Adjacency Matrix 1 2 3 4 j k 1 k t i t t 2 � � � 3 � � 4 � �

10/22 The Parallel Algorithm Algorithm 2: Parallel Warshall Input: Adjacency matrix M stored in the p processors: each processor q (1 ≤ q ≤ p ) stores submatrices M [( q − 1) n p + 1 ..q n p ][1 ..n ] and M [1 ..n ][( q − 1) n p + 1 ..q n p ]. Output: Transitive closure of graph G represented by the trans- formed matrix M . � � � � � � �

Algorithm 3: Parallel Warshall 11/22 Each processor q (1 ≤ q ≤ p ) does the following. 1: repeat for k = ( q − 1) n p + 1 until q n p do 2: for i = 0 until n − 1 do 3: for j = 0 until n − 1 do 4: if M [ i ][ k ] = 1 and M [ k ][ j ] = 1 then 5: M [ i ][ j ] = 1 (if M [ i ][ j ] belongs to processor different 6: from q then store it for subsequent transmission to the corresponding processor.) end if 7: � Send stored data to the corresponding processors. 8: Receive data that belong to processor q from other pro- � 9: cessors. � end for 10: � end for 11: � end for 12: � 13: until no new matrix entry updates are done �

12/22 The Main Idea • Make a partition of V ( D ) . • In each partition, using the edges of D construct a digraph formed by the edges of D that have at least one of its extremes in the partition. • Compute the Transitive Closure in each partition. • Send the computed transitive edges to the proper partition. � � � � � � �

13/22 Example 1 5 3 2 8 � � 4 6 � � � 7 � �

14/22 Example 1 5 5 3 2 3 2 8 4 6 6 � � � 7 7 � Processor 0 Processor 1 � � �

15/22 Example 1 5 5 3 2 3 2 8 4 6 6 � � � 7 7 � Processor 0 Processor 1 � � �

16/22 Example 1 5 1 5 3 2 8 3 2 8 4 6 4 6 � � � 7 7 � Processor 0 Processor 1 � � �

17/22 Implementation • 64-node Beowulf cluster - low cost microcomputers with 256MB RAM, 256MB swap memory, CPU In- tel Pentium III 448.956 MHz, 512KB cache. • 100 Mb fast-Ethernet switch. • Code in standard ANSI C and LAM-MPI Version 6.5.6. • Tests on randomly generated digraphs with 20 % prob- ability of an edge between two vertices. � � • In all the tests, the number of communication rounds � required are less than log p . � � � �

18/22 Implementation Results • 25 ◦ 480x480 • 512x512 20 ◦ 15 Seconds ◦ � 10 •• ◦ � � 5 ◦ • • • ◦ ◦ � • 0 � 10 20 30 40 50 60 � No. Processors �

19/22 Implementation Results ⋄ 1500 ⋄ 1920x1920 • 1024x1024 ◦ 960x960 1000 Seconds � ⋄ � 500 � ⋄ ◦ � • •• • ⋄ ◦◦ ◦ ⋄ ⋄ ⋄ • • • ◦ ◦ ◦ 0 � 10 20 30 40 50 60 � No. Processors �

20/22 Implementation Results 15 • 10 • Speedup • � ◦ • ◦ � 5 • 512x512 ◦ � •• ◦ 480x480 ◦◦ � • ◦ 0 � 10 20 30 40 50 60 � No. Processors �

21/22 Implementation Results ⋄ 30 ⋄ 20 ◦ ◦ ◦ Speedup � ⋄ � ⋄ • ◦ 10 • � • ⋄ 1920x1920 ⋄ ◦ • 1024x1024 � • ⋄ ◦ 960x960 ◦ •• ◦ • ⋄ � 0 10 20 30 40 50 60 � No. Processors �

22/22 Conclusion A BSP/CGM algorithm for the Transitive Closure problem. • Digraph with n vertices and m edges. • The number of communication rounds measured: O (log p ) . • Local computation time: O ( mn/p ) . � � � � � � �

Euro PVM/MPI 2003 1/22 Venezia, Italia Efficient Parallel - PowerPoint PPT Presentation

Euro PVM/MPI 2003 1/22 Venezia, Italia Efficient Parallel Implementation of Transitive Closure of Digraphs C. E. R. Alves Univsidade S ao Judas Tadeu E. N. C aceres Universidade Federal de Mato Grosso do Sul A. A. Castro Jr.

MPI is too High-Level MPI is too Low-Level Marc Snir High-Level MPI MPI is an Application

The MPI+MPI programming model and why we need shared-memory MPI libraries Jeff Hammond Extreme

Introduction to MPI T opics to be covered MPI vs shared memory Initializing MPI MPI

Message Passing Programming with MPI What is MPI? Message Passing Programming with MPI 1

MPI - Message Passing Interface MPI is the mostly used message passing-standard By

MPI-IO: A Retrospective Rajeev Thakur 25 th Anniversary of MPI Workshop Argonne, IL, Sept 25,

Message Passing Programming with MPI Message Passing Programming with MPI 1 What is MPI?

Programming Miscellaneous MPI-IO topics MPI-IO Errors Unlike the rest of MPI, MPI-IO errors

MPI & MPICH Presenter: Naznin Fauzia CSE 788.08 Winter 2012 Outline MPI-1 standards

Open MPI on the Cray XT presented by Richard L. Graham Galen Shipman Open MPI Is Open

Advanced MPI USER-DEFINED DATATYPES MPI datatypes MPI datatypes are used for communication

Investigation of Parallel Processing Using How to Enable/Access Open MPI in Open MPI ADMB.

Parallelization strategies in PWSCF (and other QE codes) MPI vs Open MP MPI Message

The Evolution of MPI William Gropp Computer Science www.cs.uiuc.edu/ homes/ wgropp Outline 1.

Message Passing Programming Designing MPI Applications Overview Lecture will cover MPI

In Introduction to MPI Shaohao Chen Research Computing Services Information Services and

Thermal Effects in Silicon-Photonic Interconnect Networks Jiang Xu MOBILE COMPUTING SYSTEM

Distributed Computing 17.7. 22.7. 2011 Wolf-Tilo Balke & Pierre Senellart IFIS,

Foundations Ricardo Rocha and Fernando Silva (modified by Miguel Areias) Computer Science

Parallel Numerical Algorithms for Heterogeneous Parallel Computers Antonio M. Vidal Maci a

SYLLABUS The Idea of Parallelism: A Parallelised version of the Sieve of Eratosthenes PRAM Model

Definition Definition of a Distributed System: A collection of independent computers that

DNS Server Selection on Multi-Homed Hosts draft-savolainen-mif-dns-server-selection-03 Teemu

A Policy Management Framework for Flow Distribution on Multihomed End Nodes Koshiro Mitsuya

Euro PVM/MPI 2003 1/22 Venezia, Italia Efficient Parallel - PowerPoint PPT Presentation

Euro PVM/MPI 2003 1/22 Venezia, Italia Efficient Parallel Implementation of Transitive Closure of Digraphs C. E. R. Alves Univsidade S ao Judas Tadeu E. N. C aceres Universidade Federal de Mato Grosso do Sul A. A. Castro Jr.

MPI is too High-Level MPI is too Low-Level Marc Snir High-Level MPI MPI is an Application

The MPI+MPI programming model and why we need shared-memory MPI libraries Jeff Hammond Extreme

Introduction to MPI T opics to be covered MPI vs shared memory Initializing MPI MPI

Message Passing Programming with MPI What is MPI? Message Passing Programming with MPI 1

MPI - Message Passing Interface MPI is the mostly used message passing-standard By

MPI-IO: A Retrospective Rajeev Thakur 25 th Anniversary of MPI Workshop Argonne, IL, Sept 25,

Message Passing Programming with MPI Message Passing Programming with MPI 1 What is MPI?

Programming Miscellaneous MPI-IO topics MPI-IO Errors Unlike the rest of MPI, MPI-IO errors

MPI &amp; MPICH Presenter: Naznin Fauzia CSE 788.08 Winter 2012 Outline MPI-1 standards

Open MPI on the Cray XT presented by Richard L. Graham Galen Shipman Open MPI Is Open

Advanced MPI USER-DEFINED DATATYPES MPI datatypes MPI datatypes are used for communication

Investigation of Parallel Processing Using How to Enable/Access Open MPI in Open MPI ADMB.

Parallelization strategies in PWSCF (and other QE codes) MPI vs Open MP MPI Message

The Evolution of MPI William Gropp Computer Science www.cs.uiuc.edu/ homes/ wgropp Outline 1.

Message Passing Programming Designing MPI Applications Overview Lecture will cover MPI

In Introduction to MPI Shaohao Chen Research Computing Services Information Services and

Thermal Effects in Silicon-Photonic Interconnect Networks Jiang Xu MOBILE COMPUTING SYSTEM

Distributed Computing 17.7. 22.7. 2011 Wolf-Tilo Balke &amp; Pierre Senellart IFIS,

Foundations Ricardo Rocha and Fernando Silva (modified by Miguel Areias) Computer Science

Parallel Numerical Algorithms for Heterogeneous Parallel Computers Antonio M. Vidal Maci a

SYLLABUS The Idea of Parallelism: A Parallelised version of the Sieve of Eratosthenes PRAM Model

Definition Definition of a Distributed System: A collection of independent computers that

DNS Server Selection on Multi-Homed Hosts draft-savolainen-mif-dns-server-selection-03 Teemu

A Policy Management Framework for Flow Distribution on Multihomed End Nodes Koshiro Mitsuya

MPI & MPICH Presenter: Naznin Fauzia CSE 788.08 Winter 2012 Outline MPI-1 standards

Distributed Computing 17.7. 22.7. 2011 Wolf-Tilo Balke & Pierre Senellart IFIS,