graphite two years after
play

GRAPHITE Two Years After First Lessons Learned From Real-World - PowerPoint PPT Presentation

GRAPHITE Two Years After First Lessons Learned From Real-World Polyhedral Compilation Konrad Trifunovic 2 Albert Cohen 2 David Edelsohn 3 Li Feng 6 Tobias Grosser 5 Harsha Jagasia 1 Razya Ladelsky 4 Sebastian Pop 1 odin 1 Ramakrishna Upadrasta 2


  1. GRAPHITE Two Years After First Lessons Learned From Real-World Polyhedral Compilation Konrad Trifunovic 2 Albert Cohen 2 David Edelsohn 3 Li Feng 6 Tobias Grosser 5 Harsha Jagasia 1 Razya Ladelsky 4 Sebastian Pop 1 odin 1 Ramakrishna Upadrasta 2 Jan Sj¨ 1 Open Source Compiler Engineering, AMD, Austin, Texas, USA 2 INRIA Saclay – ˆ Ile-de-France and LRI, Paris-Sud 11 University, Orsay, France 3 IBM T. J. Watson Research, Yorktown Heights, USA 4 IBM Haifa Research, Haifa, Israel 5 University of Passau, Passau, Germany 6 Xi’an Jiaotong University, Xi’an, China January 30, 2010 GROW Workshop, Jan 2010, Pisa, Italy 1 / 13

  2. 1.Motivation Keeping sustained performance increase GROW Workshop, Jan 2010, Pisa, Italy 2 / 13

  3. 1.Motivation Keeping sustained performance increase Multi-level parallelism (ILP) Instruction-Level-Parallelism (instruction scheduling) Data-level parallelism (vectorization) Thread-level parallelism (automatic parallelization) GROW Workshop, Jan 2010, Pisa, Italy 2 / 13

  4. 1.Motivation Keeping sustained performance increase Multi-level parallelism (ILP) Instruction-Level-Parallelism (instruction scheduling) Data-level parallelism (vectorization) Thread-level parallelism (automatic parallelization) Memory hierarchy Caches Registers Scratchpad memories GROW Workshop, Jan 2010, Pisa, Italy 2 / 13

  5. 1.Motivation Keeping sustained performance increase Multi-level parallelism (ILP) Instruction-Level-Parallelism (instruction scheduling) Data-level parallelism (vectorization) Thread-level parallelism (automatic parallelization) Memory hierarchy Caches Registers Scratchpad memories Need for complex program (loop) optimizations GROW Workshop, Jan 2010, Pisa, Italy 2 / 13

  6. 2.Why polyhedral model in GCC? Why polyhedral model in GCC? GROW Workshop, Jan 2010, Pisa, Italy 3 / 13

  7. 2.Why polyhedral model in GCC? Why polyhedral model in GCC? Source to source compilers Syntax based Output source code might lose semantical information Need for source code normalization GROW Workshop, Jan 2010, Pisa, Italy 3 / 13

  8. 2.Why polyhedral model in GCC? Why polyhedral model in GCC? Source to source compilers Syntax based Output source code might lose semantical information Need for source code normalization Low level internal polyhedral representation Semantics based SSA GIMPLE form Scalar evolution analysis (inductions, reductions) Leveraging > 100 optimization passes of GCC Tight interaction with vectorizer, parallelizer and memory layout optimizations GROW Workshop, Jan 2010, Pisa, Italy 3 / 13

  9. 3.Compilation workflow Compilation workflow GIMPLE, SSA, CFG SCoP detection GPOLY C C++ F95 GENERIC Legality check SCoPs Transformations GIMPLE GIMPLE+CFG+SSA+LOOP GPOLY construction transformed GPOLY GRAPHITE GIMPLE GLOOG (CLOOG based) GRAPHITE pass RTL ASM GIMPLE, SSA, CFG x86 PPC SPU GROW Workshop, Jan 2010, Pisa, Italy 4 / 13

  10. 4.Polyhedral model – Domains GPOLY - Iteration domains D S = { ( v , h ) | 0 ≤ v , h ≤ N − 1 } h v ≥ 0 v < N for (v=0; v<N; v++) for (h=0; h<N; h++) h < N out[v][h] = 0; h ≥ 0 v GROW Workshop, Jan 2010, Pisa, Italy 5 / 13

  11. 4.Polyhedral model – Domains GPOLY - Iteration domains D S = { ( v , h ) | 0 ≤ v , h ≤ N − 1 } h v ≥ 0 v < N for (v=0; v<N; v++) for (h=0; h<N; h++) h < N out[v][h] = 0; h ≥ 0 v 2 3 0 1 1 0 0 0 v ≥ 0 v − 1 0 1 − 1 h v ≤ N − 1 6 7 B C A ≥ 0 6 7 B C 0 1 0 0 h ≥ 0 N 4 5 @ 0 − 1 1 − 1 1 h ≤ N − 1 GROW Workshop, Jan 2010, Pisa, Italy 5 / 13

  12. 4.Polyhedral model – Data accesses Data accesses - mapping iterations to memory f ( i , g ) = F × ( i , g , 1) T t 2 h v ≥ 0 v < N h < N h ≥ 0 t 1 v Linearized memory layout out[1][1] out[1][2] out[1][3] out[2][1] out[2][2] out[2][3] out[3][1] out[3][2] out[3][3] GROW Workshop, Jan 2010, Pisa, Italy 6 / 13

  13. 4.Polyhedral model – Data accesses Data accesses - mapping iterations to memory f ( i , g ) = F × ( i , g , 1) T t 2 h v ≥ 0 v < N h < N h ≥ 0 t 1 v Linearized memory layout out[1][1] out[1][2] out[1][3] out[2][1] out[2][2] out[2][3] out[3][1] out[3][2] out[3][3] GROW Workshop, Jan 2010, Pisa, Italy 6 / 13

  14. 4.Polyhedral model – Data accesses Data accesses - mapping iterations to memory f ( i , g ) = F × ( i , g , 1) T t 2 h v ≥ 0 v < N h < N h ≥ 0 t 1 v Linearized memory layout out[1][1] out[1][2] out[1][3] out[2][1] out[2][2] out[2][3] out[3][1] out[3][2] out[3][3] GROW Workshop, Jan 2010, Pisa, Italy 6 / 13

  15. 4.Polyhedral model – Data accesses Data accesses - mapping iterations to memory f ( i , g ) = F × ( i , g , 1) T t 2 h v ≥ 0 v < N h < N h ≥ 0 t 1 v Linearized memory layout out[1][1] out[1][2] out[1][3] out[2][1] out[2][2] out[2][3] out[3][1] out[3][2] out[3][3] GROW Workshop, Jan 2010, Pisa, Italy 6 / 13

  16. 4.Polyhedral model – Data accesses Data accesses - mapping iterations to memory f ( i , g ) = F × ( i , g , 1) T t 2 h v ≥ 0 v < N h < N h ≥ 0 t 1 v Linearized memory layout out[1][1] out[1][2] out[1][3] out[2][1] out[2][2] out[2][3] out[3][1] out[3][2] out[3][3] GROW Workshop, Jan 2010, Pisa, Italy 6 / 13

  17. 4.Polyhedral model – Data accesses Data accesses - mapping iterations to memory f ( i , g ) = F × ( i , g , 1) T t 2 h v ≥ 0 v < N h < N h ≥ 0 t 1 v Linearized memory layout out[1][1] out[1][2] out[1][3] out[2][1] out[2][2] out[2][3] out[3][1] out[3][2] out[3][3] GROW Workshop, Jan 2010, Pisa, Italy 6 / 13

  18. 4.Polyhedral model – Data accesses Data accesses - mapping iterations to memory f ( i , g ) = F × ( i , g , 1) T t 2 h v ≥ 0 v < N h < N h ≥ 0 t 1 v Linearized memory layout out[1][1] out[1][2] out[1][3] out[2][1] out[2][2] out[2][3] out[3][1] out[3][2] out[3][3] GROW Workshop, Jan 2010, Pisa, Italy 6 / 13

  19. 4.Polyhedral model – Data accesses Data accesses - mapping iterations to memory f ( i , g ) = F × ( i , g , 1) T t 2 h v ≥ 0 v < N h < N h ≥ 0 t 1 v Linearized memory layout out[1][1] out[1][2] out[1][3] out[2][1] out[2][2] out[2][3] out[3][1] out[3][2] out[3][3] GROW Workshop, Jan 2010, Pisa, Italy 6 / 13

  20. 4.Polyhedral model – Data accesses Data accesses - mapping iterations to memory f ( i , g ) = F × ( i , g , 1) T t 2 h v ≥ 0 v < N h < N h ≥ 0 t 1 v Linearized memory layout out[1][1] out[1][2] out[1][3] out[2][1] out[2][2] out[2][3] out[3][1] out[3][2] out[3][3] GROW Workshop, Jan 2010, Pisa, Italy 6 / 13

  21. 4.Polyhedral model – Scheduling Scheduling - execution order t = θ S ( i ) = Θ S × ( i , g , 1) T h v ≥ 0 h v ≥ 0 h v ≥ 0 v < N v < N v < N h < N h < N h < N h ≥ 0 h ≥ 0 h ≥ 0 v v v for (v=0; v<N; v++) for (t1 =0; t1 <N; t1 ++) for (h=0; h<N; h++) for (t2 =0; t2 <N; t2 ++) out[t2][t1] = 0; out[v][h] = 0; � 0 � 1 � 1 � 0 Θ ′ S = Θ S = 1 0 0 1 GROW Workshop, Jan 2010, Pisa, Italy 7 / 13

  22. 5.SSA-based polyhedral model bb 3 i_21 = PHI <i_11(7), 0(2)> b[i_21] = 0.0; b_I_lsm.5_16 = b[i_21]; Polyhedral representation D S bb 3 = { ( i ) | 0 ≤ i ≤ N − 1 } D S bb 4 = { ( i, j ) | 0 ≤ i ≤ N − 1 ∧ 0 ≤ j ≤ N − 1 } D S bb 6 = { ( i ) | 0 ≤ i ≤ N − 1 } j_22 = PHI <j_10(5), 0(3)> bb 4 F dr 1 = { ( i, a, s 1 ) | a = 0 ∧ s 1 = i ∧ 0 ≤ s 1 ≤ N − 1 } pre.3_28 = PHI <D.3_9(5), 0.0(3)> F dr 2 = { ( i, j, a, s 1 ) | a = 1 ∧ s 1 = j ∧ 0 ≤ s 1 ≤ N − 1 } D.0_6 = A[i_21][j_22]; D.1_7 = x[j_22]; F dr 4 = { ( i, a, s 1 ) | a = 0 ∧ s 1 = i ∧ 0 ≤ s 1 ≤ N − 1 } D.2_8 = D.1_7 * D.0_6; θ bb 3 = { ( i, t 1 , t 2 , t 3 ) | t 1 = 0 ∧ t 2 = i ∧ t 3 = 0 } D.3_9 = D.2_8 + pre.3_28; θ bb 4 = { ( i, j, t 1 , t 2 , t 3 , t 4 , t 5 ) | t 1 = 0 ∧ t 2 = i ∧ t 3 = 1 ∧ t 4 = j ∧ t 5 = 0 } b_I_lsm.5_5 = D.3_9; j_10 = j_22 + 1; if (j_10 < N) goto <bb 5>; else goto <bb 6>; bb 6 bb 5 goto <bb 4> b_I_lsm.5_30 = PHI <b_I_lsm.5_5(4)> b[i_21] = b_I_lsm.5_30; i_11 = i_21 + 1; MVT kernel if (i_11 < N) goto <bb 7>; else for (i = 0; i < N; i++) { goto <bb 8>; b[i] = 0; for (j = 0; j < N; j++) b[i] += A[i][j] * x[j]; bb 8 bb 7 } return; goto <bb 3>; GROW Workshop, Jan 2010, Pisa, Italy 8 / 13

  23. 6.Research – Cost-modelling for vectorization Cost-modelling for vectorization for (v=0; v<N; v++) for (h=0; h<N; h++) { s=0; for (i=0; i<K; i++) for (j=0; j<K; j++) s+= img[v+i][h+j] * filter[i][j]; out[v][h]= s; } [Trifunovic et al. 2009] GROW Workshop, Jan 2010, Pisa, Italy 9 / 13

  24. 6.Research – Cost-modelling for vectorization Cost-modelling for vectorization for (v=0; v<N; v++) for (v=0; v<N; v++) for (h=0; h<N; h++) { for (h=0; h<N; h++) { s=0; s=0; for (i=0; i<K; i++) { for (i=0; i<K; i++) vs [0:3]={0 ,0 ,0 ,0}; for (j=0; j<K; j++) for (j=0; j<K; j+=4) { s+= img[v+i][h+j] vs [0:3]+= img[v+i][h+j:h+j+3] * filter[i][j]; *filter[i][j:j+3] out[v][h]= s; } } s+= sum(vs [0:3]); } out[v][h] = s; } } [Trifunovic et al. 2009] GROW Workshop, Jan 2010, Pisa, Italy 9 / 13

Recommend


More recommend