Java HotSpot VM Compiling for Parallelism & Locality Optimizations in Java HotSpot Server VM (a JIT) – uses SSA: dead code, LICM, CSE, CP Last time – range check elimination – Data dependences and loops – loop unrolling – instruction scheduling for the UltraSPARC III Today – OOP optimizations for Java reflection API – Finish data dependence analysis for loops – hot spot detection – virtual method inlining and dynamic deoptimization to undo Other features – generational copying collection, mark and compact or incremental for old objects – fast thread synchronization using "a breakthrough" CS553 Lecture Data Dependence Analysis 1 CS553 Lecture Data Dependence Analysis 2 Dependence Testing in General Algorithms for Solving the Dependence Problem Heuristics General code – GCD test (Banerjee76,Towle76): determines whether integer solution is do i 1 = l 1 ,h 1 possible, no bounds checking ... – Banerjee test (Banerjee 79): checks real bounds do i n = l n ,h n – Independent-variables test (pg. 820): useful when inequalities are not coupled A(f(i 1 ,...,i n )) = ... A(g(i 1 ,...,i n )) – I-Test (Kong et al. 90): integer solution in real bounds enddo – Lambda test (Li et al. 90): all dimensions simultaneously ... – Delta test (Goff et al. 91): pattern matches for efficiency enddo – Power test (Wolfe et al. 92): extended GCD and Fourier Motzkin combination Use some form of Fourier-Motzkin elimination for integers, There exists a dependence between iterations I=(i 1 , ..., i n ) and J=(j 1 , ..., j n ) exponential worst-case when – Parametric Integer Programming (Feautrier91) – f(I) = g(J) – Omega test (Pugh92) – (l 1 ,...l n ) < I,J < (h 1 ,...,h n ) CS553 Lecture Data Dependence Analysis 3 CS553 Lecture Data Dependence Analysis 4 1
Dependence Testing Dependence Testing: Simple Case Consider the following code… Sample code do i = 1,5 do i = l,h A(3*i+2) = A(2*i+1)+1 A(a*i+c 1 ) = ... A(a*i+c 2 ) enddo enddo Question Dependence? – How do we determine whether one array reference depends on another – a*i 1 +c 1 = a*i 2 +c 2 , or across iterations of an iteration space? – a*i 1 – a*i 2 = c 2 -c 1 – Solution exists if a divides c 2 -c 1 CS553 Lecture Data Dependence Analysis 5 CS553 Lecture Data Dependence Analysis 6 Example GCD Test Idea Code i 1 – Generalize test to linear functions of iterators do i = l,h Code A(2*i+2) = A(2*i-2)+1 enddo do i = l i ,h i i 2 do j = l j ,h j Dependence? A(a 1 *i + a 2 *j + a 0 ) = ... A(b 1 *i + b 2 *j + b 0 ) ... 2*i 1 – 2*i 2 = -2 – 2 = -4 enddo (yes, 2 divides -4) enddo Again Kind of dependence? – a 1 *i 1 - b 1 *i 2 + a 2 *j 1 – b 2 *j 2 = b 0 – a 0 – Anti? i 2 + d = i 1 ⇒ d = -2 – Solution exists if gcd(a 1 ,a 2 ,b 1 ,b 2 ) divides b 0 – a 0 − Flow? i 1 + d = i 2 ⇒ d = 2 CS553 Lecture Data Dependence Analysis 7 CS553 Lecture Data Dependence Analysis 8 2
Example Banerjee Test Code for (i=L; i<=U; i++) { x[a_0 + a_1*i] = ... ... = x[b_0 + b_1*i] do i = l i ,h i } do j = l j ,h j Does a_0 + a_1*i = b_0 + b_1*i’ for some real i and i’ ? A(4*i + 2*j + 1) = ... A(6*i + 2*j + 4) ... If so then (a_1*i - b_1*i’) = (b_0 - a_0) enddo enddo Determine upper and lower bounds on (a_1*i - b_1*i’) gcd(4,-6,2,-2) = 2 for (i=1; i<=5; i++) { x[i+5] = x[i]; } Does 2 divide 4-1? upper bound = a_1*max(i) - b_1 * min(i’) = 4 lower bound = a_1*min(i) - b_1*max(i’) = -4 b_0 - a_0 = CS553 Lecture Data Dependence Analysis 9 CS553 Lecture Data Dependence Analysis 10 Distance Vectors: Legality Direction Vector Definition Definition – A direction vector serves the same purpose as a distance vector when less – A dependence vector, v , is lexicographically nonnegative when the left- precision is required or available most entry in v is positive or all elements of v are zero – Element i of a direction vector is <, >, or = based on whether the source of Yes: (0,0,0), (0,1), (0,2,-2) the dependence precedes, follows or is in the same iteration as the target No: (-1), (0,-2), (0,-1,1) in loop i Example – A dependence vector is legal when it is lexicographically nonnegative (assuming that indices increase as we iterate) do i = 1,6 do j = 1,5 A(i,j) = A(i-1,j-1)+1 Why are lexicographically negative distance vectors illegal? enddo enddo What are legal direction vectors? j Direction vector: (<,<) i Distance vector: (1,1) CS553 Lecture Data Dependence Analysis 11 CS553 Lecture Data Dependence Analysis 12 3
Loop-Carried Dependences Parallelization Definition Idea – A dependence D=(d 1 ,...d n ) is carried at loop level i if d i is the first nonzero – Each iteration of a loop may be executed in parallel if it carries no element of D dependences Example do i = 1,6 Example (different from last slide) Iteration Space do j = 1,6 do i = 1,6 A(i,j) = B(i-1,j)+1 do j = 1,5 B(i,j) = A(i,j-1)*2 A(i,j) = B(i-1,j-1)+1 enddo B(i,j) = A(i,j-1)*2 enddo enddo Distance vectors: (0,1) for accesses to A enddo (1,0) for accesses to B j Loop-carried dependences i Parallelize i loop? Distance Vectors : – The j loop carries dependence due to A (0,1) for A (flow) – The i loop carries dependence due to B (1,1) for B (flow) CS553 Lecture Data Dependence Analysis 13 CS553 Lecture Data Dependence Analysis 14 Scalar Expansion: Motivation Scalar Expansion Problem Idea – Loop-carried dependences inhibit parallelism – Eliminate false dependences by introducing extra storage – Scalar references result in loop-carried dependences Example Example do i = 1,6 T(i) = A(i) + B(i) do i = 1,6 C(i) = T(i) + 1/T(i) t = A(i) + B(i) enddo C(i) = t + 1/t enddo i i Can this loop be parallelized? Can this loop be parallelized? No. What kind of dependences are these? Anti dependences. Disadvantages? Convention for these slides: Arrays start with upper case letters, scalars do not CS553 Lecture Data Dependence Analysis 15 CS553 Lecture Data Dependence Analysis 16 4
Scalar Expansion Details Example 2: Parallelization (reprise) Restrictions Why can’t this loop be parallelized? – The loop must be a countable loop i.e. The loop trip count must be independent of the body of the loop do i = 1,100 1 2 3 4 5 ... – The expanded scalar must have no upward exposed uses in the loop A(i) = A(i-1)+1 i do i = 1,6 enddo Distance Vector: (1) print(t) t = A(i) + B(i) Why can this loop be parallelized? C(i) = t + 1/t enddo do i = 1,100 − Nested loops may require much more storage 1 2 3 4 5 ... A(i) = A(i)+1 − When the scalar is live after the loop, we must move the correct array i enddo value into the scalar Distance Vector: (0) CS553 Lecture Data Dependence Analysis 17 CS553 Lecture Data Dependence Analysis 18 Example 1: Loop Permutation (reprise) Concepts Sample code Improve performance by ... – improving data locality – parallelizing the computation do j = 1,6 do i = 1,5 do i = 1,5 do j = 1,6 A(j,i) = A(j,i)+1 A(j,i) = A(j,i)+1 enddo enddo Data Dependence Testing enddo enddo – general formulation of the problem – GCD test and Banerjee test Why is this legal? – No loop-carried dependences, so we can arbitrarily change order of Data Dependences iteration execution – iteration space – distance vectors and direction vectors – loop carried CS553 Lecture Data Dependence Analysis 19 CS553 Lecture Data Dependence Analysis 20 5
Next Time Lecture – Loop transformations for parallelism and locality Suggested Exercises – 11.3.2, 11.3.3, 11.6.2, 11.6.5, examples in slides CS553 Lecture Data Dependence Analysis 21 6
Recommend
More recommend