Loop Transformations Sebastian Hack Saarland University Compiler - PowerPoint PPT Presentation

Loop Transformations Sebastian Hack Saarland University Compiler Construction W2015 saarland university computer science 1

Loop Transformations: Example matmul.c 2

Optimization Goals � Increase locality (caches) � Facilitate Prefetching (contiguous access patterns) � Vectorization (SIMD instructions, contiguity, avoid divergence) � Parallelization (shared and non-shared memory systems) 3

Dependences � True (flow) dependence (RAW = read after write) � Anti dependence (WAR = write after read) � Output dependence (WAW = write after write) Anti and output dependences are called false dependences. They only arise when we consider memory cells instead of values. SSA eliminates false dependences by renaming. If S j is dependent on S i , we write S 1 δ S 2 . 1: a = 1; Sometimes we also indicate the kind of 2: b = a; dependence. 3: a = a + b; 4: c = a; S 1 δ f S 2 S 1 δ o S 3 S 2 δ a S 3 . . . 4

Dependences � Must be preserved for correctness � Impose order statement instances � Compilers represent dependences on syntactic entities (CFG nodes, AST nodes, statements, etc.) � Each syntactic entity then stands for all its instances � For scalar variables this is ok � For arrays (especially in loops) this is too coarse-grained 5

Dependences in Loops for i = 1 to 3 1: X[i] = Y[i] + 1 2: X[i] = X[i] + X[i-1] � loop-independent flow dependence from S 1 to S 2 � loop-carried flow dependence from S 2 to S 2 � loop-carried anti dependence from S 2 to S 2 6

Example: GEMVER kernel for (i=0; i < N; i++) for (j=0; j < N; j++) S1: A[i,j] = A[i,j]+u1[i] * v1[j] + u2[i] * v2[j] for (k=0; k < N; k++) for (l=0; l < N; l++) S2: x[k] = x[k]+ beta * A[l,k] * y[l] 7

Dependences in Loops X[1] = Y[1] + 1 X[1] = X[1] + X[0] for i = 1 to 3 X[2] = Y[2] + 1 1: X[i] = Y[i] + 1 X[2] = X[2] + X[1] 2: X[i] = X[i] + X[i-1] X[3] = Y[3] + 1 X[3] = X[3] + X[2] How to determine dependences in loops? � Conceptually, unroll loops entirely. � Every instance has then one syntactic entity. � Construct dependence graph. In practice, this is infeasible: Loop bounds may not be constant; even if they were, the graph would be too big. We need a more compact representation. 8

Iteration Space The iteration space of loop is the set of all iterations of that loop. for i = 1 to 3 1: X[i] = Y[i] + 1 i 2: X[i] = X[i] + X[i-1] In the following, we’ll be interested in loop (nests) whose iteration space can be described by the integer points inside a polyhedron. Each iteration of a loop nest of depth n is then given by a n -dimensional iteration vector. 9

Dependence Distance Vectors j for i = 1 to 3 for j = 1 to 3 X[i,j] = X[i,j-1] + X[i-1,j-1] i Dep. vectors ( 0 , 1 ) , ( 1 , 1 ) � One way to represent dependences are distance vectors � If statement instance � t is dependent on instance � s the distance vector for these two instances is � d = � t − � s � Uniform dependences are described by distance vectors that do not contain index variables. 10

Direction Vectors � Used to approximate distance vectors � Or, if dependences cannot be represented by distance vectors (non-uniform dependences) � Vector ( ρ 1 , . . . , ρ n ) of “directions” ρ i ∈ { <, ≤ , = , ≥ , >, ∗} � Consider two statements s , t and all distance vectors of their instances. A direction vector ρ is legal for s and t if for all instances � s and � t it holds that s [ k ] ρ [ k ] � � t [ k ] forall 1 ≤ k ≤ n � Examples – The distance vector ( 0 , 1 ) corresponds to (= , < ) – The distance vector ( 1 , 1 ) corresponds to ( <, < ) – The distance vectors { ( 0 , i ) | − n ≤ i ≤ n } correspond to ( <, ∗ ) 11

Loop-Carried Dependences for i = 1 to N for j = 1 to M A[i , j ] = A[i, j] B[i , j+1] = B[i, j] C[i+1, j+1] = B[i, j+1] � Dependence on A not loop carried � Dependence on B carried by j loop � Dependence on C carried by i loop Let k be the first non- = entry in the direction vector of a dependence: Dependence carried by the k -the nested loop. Dependence level is k ( ∞ if direction vector all = ). 12

Loop Unswitching for i = 1 to N for i = 1 to N if X[i] > 0 for j = 1 to M for j = 1 to M if X[i] > 0 S S else else for j = 1 to M T T � Hoist conditional as far outside as possible � Enable other transformations 13

Loop Peeling if N ≥ 1 for i = 1 to N S S for i = 2 to N S � Align trip count to a certain number (multiple of N ) � Peeled iteration is a place where loop invariant code can be executed non-redundantly 14

Index Set Splitting assert 1 ≤ M < N for i = 1 to M for i = 1 to N S S for i = M + 1 to N S � Create specialized variants for different cases e.g. vectorization (aligned and contiguous accesses) � Can be used to remove conditionals from loops 15

Loop Unrolling for (i = 0; i < n; i += U) S(i+0) S(i+1) for i = 1 to N ... S S(i+U-1) for (; i < N; i++) S(i) � Create more instruction-level parallelism inside the loop � Less specualtion on OOO processors, less branching � Increases pressure on instruction / trace cache (code bloat) 16

Loop Fusion for i = 1 to N for i = 1 to N S S for i = 1 to N T T � Save loop control overhead � Increase locality if both loops access same data � Increase instruction-level parallelism � Important after inlining livrary functions � Not always legal: Dependences must be preserved 17

Loop Interchange for i = 1 to N for j = 1 to M for j = 1 to M for i = 1 to N S S � Expose more locality � Expose parallelism � Legality: Preserve data dependences, direction vector ( <, > ) forbidden 18

Parallelization / Vectorization for i = 1 to N parallel for i = 1 to N S S � Loop must not carry dependence � Vectorization nowadays uses SIMD code -> strip mining 19

Strip Mining for (i = 0; i < n; i += U) for i = 1 to N for (j = 0; i < U; j++) S S(i + j) � strip-mine + interchange = tiling � Vectorization is a kind of strip mining 20

Loop Transformations Sebastian Hack Saarland University Compiler - PowerPoint PPT Presentation

Loop Transformations Sebastian Hack Saarland University Compiler Construction W2015 saarland university computer science 1 Loop Transformations: Example matmul.c 2 Optimization Goals Increase locality (caches) Facilitate

Closing the Loop Closing the Loop Closing the Loop Closing the Loop Closing the Loop Closing

Loop Transformations for Parallelism & Locality Previously Loop transformations,

Coarse-Grained Parallelism Variable Privatization, Loop Alignment, Loop Fusion, Loop

Repetition Types of Loops Counting loop Know how many times to loop

Trading Strategies Introduction Trading Loop Trading Loop Trading Loop Trading Loop Three

Linear Transformations Linear Transformations 1 / 21 Linear Transformations A function T from R

CMSC427 Transformations I Credit: slides 9+ from Prof. Zwicker Transformations: outline

Transformations Composition of Transformations Congruence Transformations Dilations Similarity

Lecture 6: Normal Transformations, 3D Transformations, Euler Angles COMPSCI/MATH 290-04 Chris

lecture 3 view transformations model transformations GL_MODELVIEW transformation view

Transformations & Transformations & Coordinate Systems Coordinate Systems CSCD 472?

Transformations and Matrices Transformations I Transformations are functions Matrices

Review Transformations Scale Translate Rotate Combining Transformations

Loop Invariants: Part 2 7 January 2019 OSU CSE 1 Maintaining the Loop Invariant A claimed

Loop Optimizations Important because lots of execution Loop Optimizations Loop Optimizations

Upper and Lower Loop Bound Estimation by Symbolic Execution and Loop Acceleration Pavel Cadek

Trust in the context of smart cities Synchronicity: Privacy by Design Strategy for Smart Cities

Compliance-by-Construction ? Privacy Compliance via Model Transformations T. Antignac, R.

Slides for Lecture 21 ENEL 353: Digital Circuits Fall 2013 Term Steve Norman, PhD, PEng

ASL Introduction AIRS Retrievals of Dust-Contaminated Data L2CCR: No Cloud-Cleared Radiances

Exploiting Depmap cancer dependency data using the depmap R package Theo Killian Gatto Lab 1

MODESTO: Data-centric Analytic Optimization of Complex Stencil Programs on Heterogeneous

{} Introduction to Computer Programming Data Structures CSCI-UA 2 Dictionaries {key: value}

Dictionary learning of sound speed profiles Michael Bianco a) and Peter Gerstoft Scripps

Loop Transformations Sebastian Hack Saarland University Compiler - PowerPoint PPT Presentation

Loop Transformations Sebastian Hack Saarland University Compiler Construction W2015 saarland university computer science 1 Loop Transformations: Example matmul.c 2 Optimization Goals Increase locality (caches) Facilitate

Closing the Loop Closing the Loop Closing the Loop Closing the Loop Closing the Loop Closing

Loop Transformations for Parallelism &amp; Locality Previously Loop transformations,

Coarse-Grained Parallelism Variable Privatization, Loop Alignment, Loop Fusion, Loop

Repetition Types of Loops Counting loop Know how many times to loop

Trading Strategies Introduction Trading Loop Trading Loop Trading Loop Trading Loop Three

Linear Transformations Linear Transformations 1 / 21 Linear Transformations A function T from R

CMSC427 Transformations I Credit: slides 9+ from Prof. Zwicker Transformations: outline

Transformations Composition of Transformations Congruence Transformations Dilations Similarity

Lecture 6: Normal Transformations, 3D Transformations, Euler Angles COMPSCI/MATH 290-04 Chris

lecture 3 view transformations model transformations GL_MODELVIEW transformation view

Transformations &amp; Transformations &amp; Coordinate Systems Coordinate Systems CSCD 472?

Transformations and Matrices Transformations I Transformations are functions Matrices

Review Transformations Scale Translate Rotate Combining Transformations

Loop Invariants: Part 2 7 January 2019 OSU CSE 1 Maintaining the Loop Invariant A claimed

Loop Optimizations Important because lots of execution Loop Optimizations Loop Optimizations

Upper and Lower Loop Bound Estimation by Symbolic Execution and Loop Acceleration Pavel Cadek

Trust in the context of smart cities Synchronicity: Privacy by Design Strategy for Smart Cities

Compliance-by-Construction ? Privacy Compliance via Model Transformations T. Antignac, R.

Slides for Lecture 21 ENEL 353: Digital Circuits Fall 2013 Term Steve Norman, PhD, PEng

ASL Introduction AIRS Retrievals of Dust-Contaminated Data L2CCR: No Cloud-Cleared Radiances

Exploiting Depmap cancer dependency data using the depmap R package Theo Killian Gatto Lab 1

MODESTO: Data-centric Analytic Optimization of Complex Stencil Programs on Heterogeneous

{} Introduction to Computer Programming Data Structures CSCI-UA 2 Dictionaries {key: value}

Dictionary learning of sound speed profiles Michael Bianco a) and Peter Gerstoft Scripps

Loop Transformations for Parallelism & Locality Previously Loop transformations,

Transformations & Transformations & Coordinate Systems Coordinate Systems CSCD 472?