Uninterpreted Functions: Their use in Code Transformation Catherine Olschanowsky Mary Hall Michelle Strout
CHiLL-I/E Team, Collaborators, and Funding Mary Hall (University of Utah) Michelle Strout (University of Arizona) Catherine Olschanowsky (Boise State Univ.) Mahdi Soltan Mohammadi (Univ. of Arizona) Payal Nandi (University of Utah) Eddie Davis (Boise State University) Wei He (University of Arizona) Jongsoo Park, Hongbo Rong, Raj Barik (Intel) Anand Venkat (Intel, PhD in 2016 at Utah) 2 This work was supported in part by NSF grant CCF-1563732.
I/E Transformations Inspectors: Traverse index arrays Executors: Execute the original at runtime collecting information computation using the information and generating new index arrays. and/or index arrays produced. Irregular CHiLL Transformation Computation Script Compile time Runtime Inspector 1 Index (e.g. index set splitting) Arrays Explicit Functions Composed Inspector Compositions of Loop and Data Inspector 2 Transformations (e.g. compact-and-pad) Explicit CUDA-CHiLL Explicit Sparse Functions Functions Polyhedral CHiLL Framework compiler Executor (Transformed Irregular Programmer Computation) Inspector K Inspector/Executor -Defined API
Effective at Improving Performance Wavefront parallelism [Mirchandaney 88] [Rauchwerger 98] [Zhuang 09] [Park 14] Distributed memory parallelism [Saltz 91] [Basumallik 09] [Ravishankar 12] Automatic dense-to-sparse data transformation [Mateev 00] [Pugh 98] [Arnold 10] Data and iteration reordering of parallel and reduction loops for improved data locality [Ding 99] [Mitchell 99] [Mellor-Crummey 01] [Han 06] Sparse tiling for aggregating across loops [Douglas 00] (Strout 01) [Mohiyuddin 09] (Krieger 13)
Performance vs. Generality Slower, compiler generated Faster, specifically optimized I/E Fast, Compiler Generated I/Es Require a Common Abstraction Common Among • The Inspector • The Executor • The Loop Transformation Framework Uninterpreted Function @ compile time Explicit Functions @ runtime
Transformation Framework for Sparse Codes Problem: need to modify current approaches to ... Express inspector-executor transformations (Cathie) Perform data dependence analysis (Michelle) Express sparse data transformations (Mary) Approach: Uninterpreted functions to represent Non-affine loop bounds Memory accesses Run-time reordering functions Run-time groupings
Sparse Polyhedral Framework (SPF) Loop transformation framework built on the polyhedral model Uses uninterpreted functions to represent index arrays Enables the composition of inspector-executor transformations Exposes opportunities for compiler to simplify indirect array accesses
SPF: Uninterpreted Functions Represent Index Arrays y y = A*x A x 0 1 2 3 4 5 // Dense matrix vector mult. 4 7 9 for (i = 0; i < N; i++) { for (j = 0; j < N; j++) 3 1 y[i] += A[i][j] * x[j]; 2 } 6 } 0 3 5 6 rowptr: // sparse matrix vector mult. (SpMV) for (i=0; i<n; i++) { for(k=rowptr[i];k<rowptr[i+1];k++){ 4 7 9 3 1 2 6 val: y[i] += val[k]*x[col[k]]; Uninterpreted Function } } 0 1 3 1 4 5 1 col: Iteration space (CSR) I = { [ i, k ] | 0 ≤ i < n ∧ rowptr ( i ) ≤ k < rowptr ( i + 1) }
SPF: Representing Inspector-Executor Transformations with Uninterpreted Functions Coalesce Transformation // SpMV for CSR (Compressed Sparse Row) for (i=0; i<n; i++){ { [ i, k ] → [ k 0 ] | k 0 = c ( i, k ) = T for (k=rowptr[i]; ∧ 0 ≤ k 0 < NNZ } k<rowptr[i+1]; k++){ y[i] += a[k]*x[col[k]]; = count ( I ) NNZ }} = order ( I ) c // Inspector code Old Iterators as Function NNZ = count( rowptr ) of New Iterator c = order( rowptr ) c � 1 ( k 0 )[0] c_inv = inverse( c ) = i // Executor code c � 1 ( k 0 )[1] = k // SpMV for COO (Coordinate Storage) for (k'=0; k'<NNZ; k'++) { y[c_inv[k'][0]] += a[c_inv[k'][1]]*x[col[c_inv[k'][1]]]; }
UFs within a framework Stop by Eddie’s poster to learn about his proposed IR Omega+ IEGenLib
Recommend
More recommend