Scalable Array SSA and Array Dataflow Analysis Silvius Rus Guobin He Lawrence Rauchwerger
SSA Program Representation: Scalars v = 100 v = 100 Original Original If (x>0) Then If (x>0) Then code code v = 100 v = 50 EndIf EndIf Print v If (x>0) Then Print v EndIf v 1 = 100 v 1 = 100 If (x>0) Then If (x>0) Then v 2 = 100 v 2 = 50 Gated EndIf EndIf SSA SSA v 3 = φ (v 2 , v 1 ) v 3 = γ (x>0, v 2 , v 1 ) Form Form Print v 3 If (x>0) Then Print v 3 2 EndIf
Constant Propagation using SSA Before Before v 1 = 100 v 1 = 100 CP CP If (x>0) Then If (x>0) Then v 2 = 100 v 2 = 50 EndIf EndIf v 3 = φ (v 2 , v 1 ) v 3 = γ (x>0, v 2 , v 1 ) Print v 3 If (x>0) Then Print v 3 Gated EndIf SSA SSA Form Form v 1 = 100 v 1 = 100 If (x>0) Then If (x>0) Then v 2 = 100 v 2 = 50 EndIf EndIf v 3 = φ (100, 100) v 3 = γ (x>0, 50, 100) Print 100 If (x>0) Then After After Print 50 3 CP CP EndIf
Array SSA: Motivation Subroutine set (A, k, v) A(k) = v Simple Solution End Call ... Treat arrays as Sites Call set(A, 1, 0) scalars If (x>0) Then Call set(A, 2, 1) A 1 (1) = 100 EndIf A 2 (2) = 200 Do j = 3, 10 Loops Print A 2 (1) Call set(A, j, 3) Call set(A, j+8, 4) EndDo Too conservative! Print A(1) + A(5) Must consider If (x>0) Then subscripts! Print A(2) Control 4 EndIf
Previous Work Analytical array subregion-based dataflow frameworks – Scalable and expressive – No standard form = harder to use than SSA; sometimes biased towards a particular analysis technique – Triolet CC ’86 , Callahan SC ’87 , Gross SPE ’90 , Burke TOPLAS ’90 , Feautrier IJPP ’91 , Maydan SPPL ’93 , Tu LCPC ’93 , Pugh TR ’94 , Gu SC ’95 , Hall SC ’95 , Creusillet LCPC ’96 , Haghighat TOPLAS ’96 , Hoeflinger ’98 , Moon ICS ’98 , Wonnacott LCPC ’00 Element-wise data flow information as Array SSA by enumeration – More accurate than treating arrays as scalars, easy to use – Complexity proportional to the dimension of the array = not scalable – At compile-time, only applicable to constant subscript expressions – Knobe SPPL ’98 , Sarkar SAS ’98 5
Array SSA Desiderata Analytical and explicit data flow information at array element level 6
Array Data Flow: Partial Kills x 1 = 100 A 1 (1) = 100 DEF(A 1 ) = {1} Disjoint x 2 = 200 A 2 (2) = 200 KILL(A 2 ) = {2} Print x 2 Print A 2 (1) USE(A 2 ) = {1} Scalars: Array SSA: The use of x 2 may be replaced with The use of A 2 may not be replaced the value defined by x 2 because it with the value defined by A 2 because it kills all its reaching definitions (x 1 ) does not kill A 1 (1) But how do we get from A 2 to A 1 ? 7
A(1) = … A(2) = … Use-Def Chains: δ Nodes A(1) = … Print A(1) Print A(2) Print A(10) A 1 (1) = … [A 2 , {1}] = δ (A 0 , [A 1 ,{1}]) A 3 (2) = … [A 4 , [1:2]] = δ (A 0 , [A 2 ,{1}], [A 3 ,{2}]) A 5 (1) = … [A 6 , [1:2]] = δ (A 0 , [A 4 ,{2}], [A 5 ,{1}]) Print A 6 (1) A 5 Print A 6 (2) A 5 A 4 A 3 Print A 6 (10) A 0 Array SSA Use-Def Chains 8 Just like scalar SSA + compare access regions
δ Nodes: Formal Definition … … A before (…)= … A current (…)= … t ], [A current ,@A c t ]) [A total ,@A t ]= δ (A undef , [A before ,@A b @A b t ∩ @A c t = ∅ @A t = @A b t @A c t • @A b t is the array region defined before A current and reaching A total • @A b t = @A before - @A current • @A c t is the array region defined by A current and reaching A total • @A c t = @A current 9 • Need analytical representation for @ sets!
Expressing @ Sets: Array Region Representation Subroutine set (A, k, v) x>0 A(k) = v x>0 End set(k=2) 2 ... k Call set(A, 1, 0) If (x>0) Then Call set(A, 2, 1) j=3,10 EndIf ∪ Do j = 3, 10 Call set(A, j, 3) set(k=j) set(k=j+8) Call set(A, j+8, 4) EndDo k k Print A(1) + A(5) If (x>0) Then Print A(2) 10 EndIf [3:18]
Run-time Linear Memory Access Descriptor (RT_LMAD) T = { LMAD , ∩ , ∪ , − , ( , ) , # , x , Θ , Gate , Recurrence , Call Site } N = { RT_LMAD } S = RT_LMAD P = { RT_LMAD → LMAD | ( RT_LMAD ) RT_LMAD → RT_LMAD ∩ RT_LMAD RT_LMAD → RT_LMAD ∪ RT_LMAD RT_LMAD → RT_LMAD − RT_LMAD RT_LMAD → RT_LMAD # Gate RT_LMAD → RT_LMAD x Recurrence RT_LMAD → RT_LMAD Θ Call Site } LMAD = Start + [Stride 1 :Span 1 , Stride 2 :Span 2 , ...] 1. Closed form for references in If blocks, Do loops, sequence of blocks 2. Closed with respect to set operations: difference, union 11 3. Control-flow sensitive and interprocedural
Subroutine set (A, k, v) δ Nodes for a A(k) = v End Sequence of Blocks Call set(A, 1, 0) If (x>0) Then Call set(A, 2, 1) EndIf Do j = 3, 10 Call set(A, j, 3) Call set(A, j+8, 4) EndDo 12
Subroutine set (A, k, v) δ Nodes for a A(k) = v End Sequence of Blocks Call set(A 1 , 1, 0) If (x>0) Then Call set(A 3 , 2, 1) EndIf Do j = 3, 10 Call set(A 5 , j, 3) Call set(A 5 , j+8, 4) EndDo 13
Subroutine set (A, k, v) δ Nodes for a A(k) = v End Sequence of Blocks Call set(A 1 , 1, 0) [A 2 , {1}] = δ (A 0 , [A 1 , {1}]) If (x>0) Then [A 4 , {1} ∪ ((x>0)#{2})] = δ (A 0 , Call set(A 3 , 2, 1) [A 2 , {1}], EndIf [A 3 , (x<0)#{2}]) Do j = 3, 10 Call set(A 5 , j, 3) [A 6 , {1} ∪ ((x>0)#{2}) ∪ [3:18]] = δ (A 0 , Call set(A 5 , j+8, 4) [A 4 , {1} ∪ ((x>0)#{2})], EndDo [A 5 , [3:18]) A 0 Print A 6 (100) A 5 Print A 6 (3) A 5 Print A 6 (11) 14 A 2 Print A 6 (1)
Definitions in Loops: µ nodes Do j = 3, 10 Call set(A, j, 3) Call set(A, j+8, 4) EndDo ? Print A(j-2) ? Print A(3) ? Print A(11) 15
Definitions in Loops: µ nodes Do j = 3, 10 Call set(A 1 , j, 3) [A 2 , {j}] = δ (A 5 , [A 1 , {j}]) Call set(A 3 , j+8, 4) [A 4 , {j} ∪ {j+8}] = δ (A 5 , [A 2 , {j}], [A 3 , {j+8}]) EndDo 16
Definitions in Loops: µ nodes Do j = 3, 10 [A 5 , [3:j-1] ∪ [11:j+7]] = µ (A 0 , [A 1 , [ 3:j-1]], [A 3 , [11:j+7]]) Call set(A 1 , j, 3) [A 2 , {j}] = δ (A 5 , [A 1 , {j}]) Call set(A 3 , j+8, 4) [A 4 , {j} ∪ {j+8}] = δ (A 5 , [A 2 , {j}], [A 3 , {j+8}]) EndDo [A 6 , [3:18]] = δ (A 0 , [A 5 , [3:18]]) ? Print A 4 (j-2) ? Print A 6 (3) ? Print A 6 (11) 17
Definitions in Loops: µ nodes Do j = 3, 10 [A 5 , ?] = µ (A 0 , Call set(A 1 , j+2, 3) [A 1 , ?], Call set(A 3 , j+8, 4) [A 3 , ?]) EndDo [ A , @ A ] ( A , [ A , @ A ], [ A , @ A ],..., [ A , @ A ]), = µ n n n n n 0 1 1 2 2 m m where j 1 j 1 @ A ( j ) [@ A ( i ) ( Kill ( i ) Kill ( l ))], − − n = − ∪ U U k k s i 1 l i 1 a = = + Kill m @ A , Kill m @ A = = U U s h a h h k 1 h 1 = + = @A k n (j) is the array region defined by A k that reaches A n upon entry to iteration j. 18
Definitions in Loops: Iteration Vectors Iteration vectors – For a given array element, which iteration wrote it last? – Important for: Forward substitution, Last value assignment – Not important for: Privatization – Hard to express when loop nests span subroutines We express the dual entity – For a given iteration j , what is the set of all memory locations last defined at j ? – Example: Last value assignment Compute LVA(j) as set of memory locations 19
Subroutine set (A, k, v) A(k) = v End Control Dependence: π nodes If (x>0) Then If (x>0) Then [A 1 , ∅ ]= π (x>0, A 0 ) Call set(A, 2, 1) Call set(A 2 , 2, 1) [A 3 , {2}]= δ (A 1 , [A 2 , {2}]) EndIf EndIf [A 4 , (x>0)#{2}] = δ (A 0 ,[A 3 ,(x>0)#{2}]) If (x>0) Then If (x>0) Then [A 5 , ∅ ] = π (x>0, A 4 ) Print A(2) Print A 5 (2) EndIf EndIf Original code Array SSA • Different from SSA: new name without definition. 20 • Essential to control-sensitive data flow analysis.
Reaching Definitions Given: – An SSA name A u – An array region Use – A block to limit the search, GivenBlock Find [A 1 , R 1 ], [A 2 , R 2 ], …[A n , R n ], [ ⊥ ,R 0 ], such that: – Use = R 1 ∪ R 2 ∪ … ∪ R n ∪ R 0 – R j ∩ R k = φ , ∀ 1 ≤ j ≠ k ≤ n – Region R k was defined by A k , k=1,n – R 0 was not defined within GivenBlock Example: – Privatization: GivenBlock = loop body, prove R 0 empty 21
Reaching Definitions [A total ,@A t ]= δ (A undef , [A before ,@A b t ], [A current ,@A c t ]) Algorithm SearchRD( A t , Use , GivenBlock ) If ( A t not in GivenBlock ) Then Report [ ⊥ , Use ]; Stop If ( A t not an SSA gate ) Then Report [ A t , Use ]; Stop Call SearchRD( A before , @A b t ∩ Use , Block(A before ) ) Call SearchRD( A current , @A c t ∩ Use , Block(A current ) ) Call SearchRD( A undef , Use - @A t , Block ( A undef )) Special operations: • Expand descriptors at µ gates 22 • Add conditionals at π gates
Array Constant Propagation Array constant collection – Attach values to reaching definitions sets – Unite sets with the same constant value Constant propagation and substitution – Full loop unrolling – Subprogram specialization – Aggressive dead code elimination 23
Constant Collection Intraprocedural collection: – Use the SearchRD algorithm – Attach values to reaching definitions sets – Collect values from assignment statements – Unite sets with same attached value Interprocedural collection – Push available sets at call sites – Collect intraprocedurally – Return collected constants back to calling context 24
Recommend
More recommend