1 Data parallelism Data-parallel Reduction Given: Given: One - PDF document

DF21500 Multicore Computing, 2009 Example: Prof. Dr. Christoph Kessler, IDA, Linköping university 1D smoothening in C ... float filter ( float a, b, c) { return wa*a + wb*b + wc*c; } … P void main ( int argc , char * argv [] ) { float * array = new_FloatArray ( n +2 ); float * tmp = new_FloatArray ( n +2 ); n Programming and Parallelization ... while ( globalerr > 0.1 ) { array i for (i=1; i<= n ; i++) with Algorithmic Skeletons tmp [i] = filter ( array [i-1], array [i], array [i+1] ); globalerr = 0.0; for (i=1; i<= n ; i++) globalerr = fmax ( globalerr, fabs ( array [i] – tmp [i]) ); for (i=1; i<= n ; i++) An Introduction array [i] = tmp [i]; } } 2 Example: Complexity of Parallel Algorithms and Programs 1D smoothening in C + MPI ... void main ( int argc , char * argv [] ) { � Many different parallel programming models MPI_Comm com = MPI_COMM_WORLD; P P P P MPI_Init ( & argc , & argv ); � Identify parallelism (”tasks”) MPI_Comm_size ( com , & np ); MPI_Comm_rank ( com , & me ); � Synchronization and Communication? … localsize localsize localsize = ( int ) ceil ( ( float ) n / np ); � Memory structure, -consistency? local = new_FloatArray ( localsize + 2); local local i i ... � Load balancing, Scheduling? while ( globalerr > 0.1 ) { if ( me >0) MPI_Send ( local +1, 1, MPI_FLOAT, left _ neighbor , 10, com ); � Network structure? if ( me<np- 1) MPI_Send ( last , 1, MPI_FLOAT, right _ neighbor , 20, com ); for (i=1; i<= localsize ; i++) � Error prone, hard to debug tmp [i] = filter ( local [i-1], local [i], local [i+1] ); if ( me < np -1) MPI_Recv ( tmp , 1, MPI_FLOAT, right _ neighbor , 10, com , … ); � … if ( me >0) MPI_Recv ( tmp+localsize+ 1 , 1, MPI_FLOAT, left _ neighbor , 20, com , … ); tmp [1] = filter ( local [0], local [1], local [2] ); tmp [ localsize ] = filter ( local [ localsize -1], local [ localsize ], local [ localsize +1] ); localerr = 0.0; for (i=1; i<= localsize ; i++) localerr = fmax ( localerr , fabs ( local [i]- tmp [i] )); Can we make parallel programming as easy as MPI_Allreduce ( & localerr , & globalerr , 1, MPI_FLOAT, MPI_MAX, com ); for (i=1; i< localsize ; i++) sequential programming? local [i] = tmp [i]; } ... 3 4 } Example: Observation 1D smoothening in C + Skeletons � Same characteristic form of parallelism, communication, … float filter ( float a, b, c) { return wa*a + wb*b + wc*c; } synchronization re-applicable for all occurrences of the same specific structure of computation float elemError ( float a, b ) { return fabs ( a – b ); } ... ( (parallel) algorithmic paradigm, building block, pattern, … ) void main ( int argc , char * argv [] ) { … DistrFloatArray * array = new_DistrFloatArray ( n + 2 ); � Elementwise operations on arrays DistrFloatArray * tmp = new_DistrFloatArray ( n + 2 ); � Reductions DistrFloatArray * err = new_DistrFloatArray ( n + 2 ); ... � Scan (Prefix-op) while ( globalerr > 0.1 ) { P P P P map_with_overlap ( filter , 1, tmp , array+1 , n ); � Divide-and-Conquer map ( elemError, err, array+ 1 , tmp, n ); reduce ( fmax , & globalerr, err, n ); n � Farming independent tasks map ( copy, array+1, tmp, n ); } � Pipelining ... array } � … � Most of these have both sequential and parallel implementations 5 6 1

Data parallelism Data-parallel Reduction Given: Given: � � � One or several data containers x w ith n e lements, � A data container x with n e lements, e.g. array(s) x= ( x 1 , ... x n ), z =( z 1 ,…,z n ), … e.g. array x= ( x 1 , ... x n ) � An operation f on individual elements of x, z, … � A binary, associative operation op on individual elements of x (e.g. incr, sqrt, mult , ...) (e.g. add, max, bitwise-or , ...) Compute : y = f ( x ) = ( f ( x 1 ) , ..., f ( x n ) ) Compute : y = OP i=1…n x = x 1 op x 2 op ... op x n � � Parallelizability: Exploit associativity of op Parallelizability: Each data element defines a task � � � Fine grained parallelism � Portionable, fits very well on all parallel architectures Notation with higher-order function : � � y = map ( f, x ) Notation with higher-order function : � Variant: map with overlap: y i = f ( x i-k ,…, x i+k ) , i = 0,…,n-1 � y = reduce ( op, x ) � 7 8 Paralleles Divide-and-Conquer Task farming dispatcher (Sequential) Divide-and-conquer: f 1 , f 2 , ..., f m Independent computations � � � Divide : Decompose problem instance P in one or several smaller could be done in parallel and/or in arbitrary order, e.g. f 1 f 2 f m independent instances of the same problem, P 1 , ..., P k … � independent loop iterations � For all i : If P i trivial , solve it directly . � independent function calls Else, solve P i by recursion. collector � � Combine the solutions of the P i into an overall solution for P Zeit Scheduling problem � Parallel Divide-and-Conquer: � � n tasks onto p processors P1 f 1 � Recursive calls can be done in parallel. � static or dynamic P2 � Parallelize, if possible, also the divide and combine phase. � Load balancing � Switch to sequential divide-and-conquer when enough parallel tasks P3 f 2 have been created. Notation with higher-order function: Notation with higher-order function: � � � ( y 1 ,...,y m ) = farm ( f 1 , ..., f m ) ( x 1 ,...,x n ) � solution = DC ( divide, combine, istrivial, solvedirectly, n, P ) 9 10 Example: Parallel Divide-and-Conquer Example: Parallel Divide-and-Conquer (2) Example : Parallel QuickSort over a float-array x Divide: Partition the array (elements <= pivot, elements > pivot) Combine: trivial, concatenate sorted sub-arrays sorted = DC partition concatenate nIsSmall qsort n x Example : Parallel Sum over integer-array x Exploit associativity: Sum( x 1 ,...,x n ) = Sum( x 1 ,...x n/2 ) + Sum( x n/2+ 1 ,..., x n ) Divide: trivial, split array x in place Combine is just an addition. y = DC ( split, add, nIsSmall, addFewInSeq, n, x ) Data parallel reductions are an important special case of DC. 11 12 2

1 Data parallelism Data-parallel Reduction Given: Given: One - PDF document

DF21500 Multicore Computing, 2009 Example: Prof. Dr. Christoph Kessler, IDA, Linkping university 1D smoothening in C ... float filter ( float a, b, c) { return waa + wbb + wcc; } P void main ( int argc , char argv [] ) { float *

Eden: Parallel Processes, Patterns and Skeletons Jost Berthold berthold@diku.dk Department of

Automatic Skinning via Constrained Energy Optimization Alec Jacobson Columbia University

Distributed Systems Remote Procedure Calls Paul Krzyzanowski pxk@cs.rutgers.edu Except as

In-Home Daily-Life Captioning Using Radio Signals Lijie Fan* Tianhong Li* Yuan Yuan

An Overview of Algebraic Topology Richard Wong UT Austin Math Club Talk, March 2017 Slides can

Vectorization in Graphics Recognition: To Thin or not to Thin Karl Tombre and Salvatore Tabbone

Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition Ziyu Liu 1,3 ,

r rr r rs

Three questions on graphs of polytopes Guillermo Pineda-Villavicencio Federation University

We put stunning user experiences on the road. 2 Agenda Prototyping

ROVIS Rosetta visualisation and (much) more Bjrn Grieger Trajectories and Operations

CS425 /ECE428 Distributed Systems Spring 2020 Material derived from slides by I. Gupta, M.

rs t rr r

Probabilistic Graphical Models Lecture 4 Learning Bayesian Networks CS/CNS/EE 155 Andreas

Lecture 3: Introduction to OpenGL/GLUT (Part 2) Prof Emmanuel Agu Computer Science Dept.

Pose Estimation Vasileios Belagiannis 1 , Sikandar Amin 2,3 , Mykhaylo Andriluka 3,4 , Bernt

GEOMETRICAL PROPERTIES OF WITH APPLICATIONS Supanut Chaidee Department of Mathematics, Faculty

Skin Depth Investigation Lei Zang The University of Sheffield 1 Introduction: CST STUDIO

Disclosure Anatomic Study : None Lindsay Stone MD Case Example Using Pontis Device :

Overview on Nucleon Form Factors Marc Vanderhaeghen 7 th Workshop on Hadron Physics in China and

Grope and Hope Matthew T. Mason Dagstuhl Workshop on Multimodal Manipulation Under Uncertainty

Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab

CS 4495 Computer Vision Classification 2 Aaron Bobick School of Interactive Computing

Detecting Image Splicing Using Geometry Invariants And Camera Characteristics Consistency

1 Data parallelism Data-parallel Reduction Given: Given: One - PDF document

DF21500 Multicore Computing, 2009 Example: Prof. Dr. Christoph Kessler, IDA, Linkping university 1D smoothening in C ... float filter ( float a, b, c) { return wa*a + wb*b + wc*c; } P void main ( int argc , char * argv [] ) { float *

Eden: Parallel Processes, Patterns and Skeletons Jost Berthold berthold@diku.dk Department of

Automatic Skinning via Constrained Energy Optimization Alec Jacobson Columbia University

Distributed Systems Remote Procedure Calls Paul Krzyzanowski pxk@cs.rutgers.edu Except as

In-Home Daily-Life Captioning Using Radio Signals Lijie Fan* Tianhong Li* Yuan Yuan

An Overview of Algebraic Topology Richard Wong UT Austin Math Club Talk, March 2017 Slides can

Vectorization in Graphics Recognition: To Thin or not to Thin Karl Tombre and Salvatore Tabbone

Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition Ziyu Liu 1,3 ,

r rr r rs

Three questions on graphs of polytopes Guillermo Pineda-Villavicencio Federation University

We put stunning user experiences on the road. 2 Agenda Prototyping

ROVIS Rosetta visualisation and (much) more Bjrn Grieger Trajectories and Operations

CS425 /ECE428 Distributed Systems Spring 2020 Material derived from slides by I. Gupta, M.

rs t rr r

Probabilistic Graphical Models Lecture 4 Learning Bayesian Networks CS/CNS/EE 155 Andreas

Lecture 3: Introduction to OpenGL/GLUT (Part 2) Prof Emmanuel Agu Computer Science Dept.

Pose Estimation Vasileios Belagiannis 1 , Sikandar Amin 2,3 , Mykhaylo Andriluka 3,4 , Bernt

GEOMETRICAL PROPERTIES OF WITH APPLICATIONS Supanut Chaidee Department of Mathematics, Faculty

Skin Depth Investigation Lei Zang The University of Sheffield 1 Introduction: CST STUDIO

Disclosure Anatomic Study : None Lindsay Stone MD Case Example Using Pontis Device :

Overview on Nucleon Form Factors Marc Vanderhaeghen 7 th Workshop on Hadron Physics in China and

Grope and Hope Matthew T. Mason Dagstuhl Workshop on Multimodal Manipulation Under Uncertainty

Perceptive Context for Pervasive Computing Trevor Darrell Vision Interface Group MIT AI Lab

CS 4495 Computer Vision Classification 2 Aaron Bobick School of Interactive Computing

Detecting Image Splicing Using Geometry Invariants And Camera Characteristics Consistency

DF21500 Multicore Computing, 2009 Example: Prof. Dr. Christoph Kessler, IDA, Linkping university 1D smoothening in C ... float filter ( float a, b, c) { return waa + wbb + wcc; } P void main ( int argc , char argv [] ) { float *