Performance Results of Running Parallel Applications on the - PowerPoint PPT Presentation

Introduction The 0-1 Knapsack Problem Local Alignment Problem Conclusions and Future Work Performance Results of Running Parallel Applications on the InteGrade Edson Norberto C´ aceres, Henrique Mongelli, Leonardo Loureiro, Christiane Nishibe Siang Wun Song 29 de Outubro de 2008 1

Introduction The 0-1 Knapsack Problem Local Alignment Problem Conclusions and Future Work Outline Introduction; The 0-1 Knapsack Problem; Local Alignment Problem; Conclusions and Future Work. 2

Introduction The 0-1 Knapsack Problem The BSP/CGM Model Local Alignment Problem Implementations Conclusions and Future Work The BSP/CGM Model BSP/CGM model: p of processors, each with its own local memory, communicating through a network. The algorithm alternates between Computation rounds: each processor computes independently. Communication rounds: each processor sends/receives data to/from other processors. Goals: Obtain a linear speed-up on p . Minimize the number of rounds. 3

Introduction The 0-1 Knapsack Problem The BSP/CGM Model Local Alignment Problem Implementations Conclusions and Future Work Implementations C and C++ languages Lam MPI library. SPMD paradigm 6 Pentium IV 1.7 MHz nodes 6 AMD Athlon 1.6 MHz nodes 1 GB memory 1 GBit Interconnection Network 4

Introduction Introduction The 0-1 Knapsack Problem The 0-1 Knapsack Problem Local Alignment Problem The BSP/CGM Algorithm Conclusions and Future Work Experimental Results Introduction Motivation: Classical Combinatorial Problem Wide range of applications; Integer Programming Problem - on constraint. Good Algorithms 0-1 Knapsack Problem Integer Programming Research Area. NP-Complete Problem Two basic approachs: Dynamic Programming (DP) and Branch-and-Bound (B&B) O ( nW ) time - Pseudo-Polynomial - DP Our Result: An O ( p ) communication rounds BSP/CGM algorithm that requires communication with few neighbor processors. 5

Introduction Introduction The 0-1 Knapsack Problem The 0-1 Knapsack Problem Local Alignment Problem The BSP/CGM Algorithm Conclusions and Future Work Experimental Results Definition 0-1 Knapsack Problem : S = { 1 , 2 , . . ., n } a set of n distinct items i th item is worth v i dollars and weighs w i kilos. v i and w i are integers. W is the integer capacity of the knapsack. which items should be selected in order to fill the knapsack with the most valuable load without exceeding the capacity constraint . � n n � � � max v i z i : w i z i ≤ W , z i ∈ { 0 , 1 } . i =1 i =1 6

Introduction Introduction The 0-1 Knapsack Problem The 0-1 Knapsack Problem Local Alignment Problem The BSP/CGM Algorithm Conclusions and Future Work Experimental Results The BSP/CGM Algorithm Approach: Wavefront - Dynamic Programming - Alves at al. p processors, each processor has O ( Wn / p ) local memory. Computing the optimal solution matrix f . S = { 1 , 2 , . . ., n } of items. w , where w [ i ] is the weight of item i , is broadcasted to all processors v [ i ] is divided into p pieces, of size n p . Each P i , 1 ≤ i ≤ p , receives the i -th piece of v ( v [( i − 1) n p + 1 . . i n p ]) 7

Introduction Introduction The 0-1 Knapsack Problem The 0-1 Knapsack Problem Local Alignment Problem The BSP/CGM Algorithm Conclusions and Future Work Experimental Results The BSP/CGM Algorithm P k i - work of processor P i at round k . P 1 starts computing at round 0. P 1 and P 2 can work at round 1. P 1 , P 2 and P 3 at round 2, and so on. After computing f k i , P i sends to P i +1 the boundary R k i . Using R k i , P i +1 compute f i +1 . After p − 1 rounds, P p receives R 1 p − 1 and computes f 1 p . In the 2 p − 2 round, P p receives R p p − 1 and computes f p p . GOOD, but poor load balancing. 8

Introduction Introduction The 0-1 Knapsack Problem The 0-1 Knapsack Problem Local Alignment Problem The BSP/CGM Algorithm Conclusions and Future Work Experimental Results The BSP/CGM Algorithm Input: (1) The number p of processors; (2) The number i of the processor, where 1 ≤ i ≤ p ; and (3) The array w , the capacity of the knapsack W and subarray v i of size n p , respectively. Output: f ( r , c ) = max { f [ r , c − w [ r ]] + v [ r ] , f [ r − 1 , c ] } , where 1 ≤ c ≤ W and ( j − 1) n p + 1 ≤ r ≤ j n p . for 1 ≤ k ≤ p do if i = 1 then for ( k − 1) W p + 1 ≤ r ≤ k W and 1 ≤ c ≤ n p do p n compute f ( r , c ); P p − 1 P 0 P 1 P 2 1 2 3 p end for P 1 P 2 P p 1 2 p send ( R k n i , P i +1 ); p P 2 1 end if W P k W R k i p i if i � = 1 then receive ( R k i − 1 , P i − 1 ); P p − 1 P p P 2 p − 2 1 2 p for ( k − 1) W p + 1 ≤ r ≤ k W and 1 ≤ c ≤ n p do p compute f ( r , c ); end for if i � = p then send ( R k i , P i +1 ); end if end if end for 9

Introduction Introduction The 0-1 Knapsack Problem The 0-1 Knapsack Problem Local Alignment Problem The BSP/CGM Algorithm Conclusions and Future Work Experimental Results LAM × InteGrade 4096 × 1024 8192 × 2048 16384 × 4096 32768 × 8192 p I II I II I II I II 1 0.071 0.084 0.283 0.367 1.105 1.105 4.050 4.591 2 0.063 0.072 0.250 0.278 0.992 1.053 3.953 4.065 4 0.057 0.078 0.244 0.280 0.952 1.146 3.718 4.079 8 0.050 - 0.173 - 0.645 - 2.390 - 5.000 Cluster 4096 x 1024 Cluster 8192 x 2048 Cluster 16384 x 4096 4.500 Cluster 32768 x 8192 Grid 4096 x 1024 Grid 8192 x 2048 4.000 Grid 16384 x 4096 Grid 32768 x 8192 3.500 3.000 Time (s) 2.500 2.000 1.500 1.000 0.500 0.000 0 1 2 4 8 No. CPUs 10

Introduction Introduction The 0-1 Knapsack Problem The BSP/CGM Algorithm Local Alignment Problem Experimental Results Conclusions and Future Work Introduction Motivation: The local alignment is used to determine if two sequences of nucleotides or proteins have similar functionality or evolutionary relationship. Basic approach: Dynamic Programming (DP) O ( nm ) time Our Result: An O ( p ) communication rounds and O ( m × n / p ) complexity BSP/CGM algorithm (requires communication with few neighbor processors). 11

Introduction Introduction The 0-1 Knapsack Problem The BSP/CGM Algorithm Local Alignment Problem Experimental Results Conclusions and Future Work The BSP/CGM Algorithm Input: (1) Sequences S 1 of size m and S 2 of size n ; (2) Number of processors p ; (3) Rank of processor i ; (3) Each processor of rank i holds s 1[0 .. m − 1] and s 2[ i ∗ ( n / p ) .. ( i + 1) ∗ ( n / p )]. Output: Best local alignment between S 1 and S 2 matrix A(m+1, blockSize+1), matrix B(m+1, blockSize+1), matrix C(m+1, blockSize+1) blockSize ← n / p next ← i + 1 previous ← i − 1 col ← 1 for round ← 0 to p − 1 do col ← col + blockSize if i � = 0 then receive ( A [0 , col .. col + blockSize ] , previous ) receive ( B [0 , col .. col + blockSize ] , previous ) receive ( C [0 , col .. col + blockSize ] , previous ) end if compute A [1 .. m , col .. col + blockSize ] compute B [1 .. m , col .. col + blockSize ] compute C [1 .. m , col .. col + blockSize ] if i � = p − 1 then send ( A [ m , col .. col + blockSize ] , next ) send ( B [ m , col .. col + blockSize ] , next ) send ( C [ m , col .. col + blockSize ] , next ) end if end for 12

Introduction Introduction The 0-1 Knapsack Problem The BSP/CGM Algorithm Local Alignment Problem Experimental Results Conclusions and Future Work LAM × InteGrade 450.000 Cluster 4096 x 4096 Cluster 8192 x 8192 Cluster 16384 x 16384 400.000 Cluster 32768 x 32768 Grid 4096 x 4096 Grid 8192 x 8192 Grid 16384 x 16384 350.000 Grid 32768 x 32768 300.000 250.000 Time (s) 200.000 150.000 100.000 50.000 0.000 0 1 2 4 8 No. CPUs 13

Introduction The 0-1 Knapsack Problem Local Alignment Problem Conclusions and Future Work Conclusions BSP/CGM Algorithms are suitable for grids. BSP/CGM DP Algorithms can be implemented using wavefront strategy. 14

Introduction The 0-1 Knapsack Problem Local Alignment Problem Conclusions and Future Work Future Work Fault Tolerant BSP/CGM Algorithms. “Balance” the size of the messages. 15

Performance Results of Running Parallel Applications on the - PowerPoint PPT Presentation

Introduction The 0-1 Knapsack Problem Local Alignment Problem Conclusions and Future Work Performance Results of Running Parallel Applications on the InteGrade Edson Norberto C aceres, Henrique Mongelli, Leonardo Loureiro, Christiane

Red- -Light Running Light Running Red Red-Light Running 2 Traffic Signals Traffic Signals

Red- -Light Running Light Running Red Red-Light Running 2 Traffic Signals Traffic Signals

RUNNING CP2K IN PARALLEL ON ARCHER Iain Bethune (ibethune@epcc.ed.ac.uk) Overview

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources of

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources

PAYE Modernisation PSDA 18th April 2018 Agenda CAB Parallel Running PIT Plan and

Introduction to ARCHER and Cray MPI Running a Simple Parallel Program Aims To familiarise

Image Sharpening Example Running a simple parallel program Aims (i) To familiarise yourself

Running Probabilistic Running Probabilistic Running Probabilistic Programs Backwards Programs

Running Time Why do we need to analyze the running Algorithm/Running Time Analysis time of a

D7: Front-running Race conditions #7: Front ont-running running A form of a race condition

Parallel and Distributed Programming Introduction Kenjiro Taura 1 / 21 Contents 1 Why Parallel

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

Introduction Introduction What is Parallel Architecture? Why Parallel Architecture? Evolution

Introduction to Parallel Computing George Karypis Principles of Parallel Algorithm Design

Overview Why Parallel Sorting? Parallel Quicksort Bitonic Sort Parallel Merge Sort

Knapsack Problems in Hyperbolic Groups Markus Lohrey September 30, 2018 Markus Lohrey Knapsack

CS 310 Advanced Data Structures and Algorithms Dynamic Programming July 5, 2018 Mohammad

More NP-Complete Problems [HMU06,Chp.10b] Node Cover Independent Set Knapsack

Approximation Algorithms Algorithm Theory WS 2012/13 Fabian Kuhn Approximation Algorithms

Lecture: Approximation Algorithms Lecture: Approximation Algorithms Jannik Matuschke November 5,

Lecture 9: SOS Lower Bound for Knapsack Lecture Outline Part I: Knapsack Eqations and Pseudo-

Efficient Dissection of Composite Problems, with Applications to Cryptanalysis, Knapsacks, and

Knapsack with Removability Hans-Joachim Bckenhauer, Jan Dreier, Fabian Frei , Peter Rossmanith