register pressure in software pipelined loop nests
play

Register Pressure in Software-Pipelined Loop Nests Fast Computation - PowerPoint PPT Presentation

Register Pressure in Software-Pipelined Loop Nests Fast Computation and Impact on Architecture Design Alban Douillet Guang R. Gao { douillet,ggao } @capsl.udel.edu Department of Electrical & Computer Engineering University of Delaware 18


  1. Register Pressure in Software-Pipelined Loop Nests Fast Computation and Impact on Architecture Design Alban Douillet Guang R. Gao { douillet,ggao } @capsl.udel.edu Department of Electrical & Computer Engineering University of Delaware 18 th International Workshop on Languages and Compilers for Parallel Computing Hawthorne, New York October 20 th -22 nd , 2005 A. Douillet, G.R. Gao (Univ. of Delaware) Register Pressure in SWP’ed Loop Nests LCPC’05 1 / 43

  2. Introduction Scientific Applications loop nests dominant Single-dimension Software Pipelining (SSP) software pipelines most profitable loop in loop nest high register pressure register allocation is time-consuming Need for a fast method to evaluate register pressure detect infeasible schedules before calling the register allocator measure quality of register allocation solution give estimate of register needs for future architecture designs A. Douillet, G.R. Gao (Univ. of Delaware) Register Pressure in SWP’ed Loop Nests LCPC’05 2 / 43

  3. Outline Loop Nest Software-Pipelining 1 Problem Statement 2 Definitions & Issues 3 Fast Register Pressure Computation 4 Experiments 5 Conclusion 6 A. Douillet, G.R. Gao (Univ. of Delaware) Register Pressure in SWP’ed Loop Nests LCPC’05 3 / 43

  4. Outline Loop Nest Software-Pipelining 1 Problem Statement 2 Definitions & Issues 3 Fast Register Pressure Computation 4 Experiments 5 Conclusion 6 A. Douillet, G.R. Gao (Univ. of Delaware) Register Pressure in SWP’ed Loop Nests LCPC’05 4 / 43

  5. Modulo Scheduling most popular SWP technique well studied and understood full array of loop optimizations single loop, parallel execution of iterations new iteration issued every T cyles (initiation interval) b prolog c b d c b FOR J=0,4 d c b kernel b d c b c d c d epilog END FOR d A. Douillet, G.R. Gao (Univ. of Delaware) Register Pressure in SWP’ed Loop Nests LCPC’05 5 / 43

  6. Modulo Scheduling But... limited to innermost loop loop transformations to bring ILP or data cache reuse potential to innermost loop not always possible A. Douillet, G.R. Gao (Univ. of Delaware) Register Pressure in SWP’ed Loop Nests LCPC’05 6 / 43

  7. Single-Dimension Software-Pipelining (SSP) proposed by Rong et al. (CGO’04, PLDI’05) software pipelines the most profitable loop level in a loop nest equivalent to MS if innermost level selected can be seen as generalization of MS to loop nests proven performance boost vs. MS can take advantage of loop optimizations used for MS single-dimension b/c simplifies multi-dimensional DDG into a uni-dimensional DDG A. Douillet, G.R. Gao (Univ. of Delaware) Register Pressure in SWP’ed Loop Nests LCPC’05 7 / 43

  8. SSP Kernel SSP generates a kernel similar to MS enclosed stages single initiation interval T L 1 is the outermost loop and L n the innermost S i : number of stages at level i middle loop stages cycles 0 1 T ... T−2 T−1 innermost stages A. Douillet, G.R. Gao (Univ. of Delaware) Register Pressure in SWP’ed Loop Nests LCPC’05 8 / 43

  9. SSP Ideal Schedule Generated using kernel as a template new outermost iteration issued every T cycles outermost iterations executed in parallel inner iterations executed sequentially within one outermost iteration resource conflicts! A. Douillet, G.R. Gao (Univ. of Delaware) Register Pressure in SWP’ed Loop Nests LCPC’05 9 / 43

  10. SSP Ideal Schedule: Example FOR I=1,4 a T a b a FOR J=1,3 c b a b d c b a c b d c b d c b d c END FOR d c b d resource e conflicts b d c b END FOR c b d c d c b d e d c b S1=5 e d c e d c b a e d T=1 e S2=3 A. Douillet, G.R. Gao (Univ. of Delaware) Register Pressure in SWP’ed Loop Nests LCPC’05 10 / 43

  11. SSP Final Schedule a b a c b a delays some outermost iterations to d c b avoid resource conflicts b d c outermost iterations executed in groups c b d of S n delay d c b resource conflict-free schedule b d c c b d d c b a e d c b e d c e d b Sn iterations c d b A. Douillet, G.R. Gao (Univ. of Delaware) Register Pressure in SWP’ed Loop Nests LCPC’05 11 / 43 c

  12. SSP Loop Patterns ... Patterns: e d c b a OLP e d c b a Outermost Loop Pattern d c Inner Loop Execution Segment c d Innermost Loop Pattern d c ILP Draining & Filling Pattern c d d c Composition: DFP b delay ILES d OLP: all S kernel stages c b d c ILES: cyclic combination of S n c d consecutive stages d c c d d c b a e a e d c b OLP Kernel a e d c b ... A. Douillet, G.R. Gao (Univ. of Delaware) Register Pressure in SWP’ed Loop Nests LCPC’05 12 / 43

  13. SSP Implementation loop nest Loop Dependence Schedule loop 1−D nest Selection Simplification Construction DDG kernel register Register Code allocated final schedule Allocation Generation kernel A. Douillet, G.R. Gao (Univ. of Delaware) Register Pressure in SWP’ed Loop Nests LCPC’05 13 / 43

  14. Outline Loop Nest Software-Pipelining 1 Problem Statement 2 Definitions & Issues 3 Fast Register Pressure Computation 4 Experiments 5 Conclusion 6 A. Douillet, G.R. Gao (Univ. of Delaware) Register Pressure in SWP’ed Loop Nests LCPC’05 14 / 43

  15. Motivation need to determine feasibility of schedules register allocation is time-consuming unfeasible schedules b/c of high register pressure not uncommon need to evaluate quality of register allocator how far from optimal solution? need to evaluate actual register needs for architectural designs are register files big enough? A. Douillet, G.R. Gao (Univ. of Delaware) Register Pressure in SWP’ed Loop Nests LCPC’05 15 / 43

  16. Problem Statement Given a loop nest and an SSP schedule for it, evaluate the register pressure MaxLive of the final schedule. only rotating registers MaxLive = maximum number of live variables at any given cycle in the final schedule MaxLive definition similar to the one for MS. A. Douillet, G.R. Gao (Univ. of Delaware) Register Pressure in SWP’ed Loop Nests LCPC’05 16 / 43

  17. Updated SSP Implementation loop nest Loop Dependence Schedule loop 1−D nest Selection Simplification Construction DDG choose different OR increase initiation interval loop level kernel yes Register Pressure too high? Evaluation no register Register Code allocated final schedule Allocation Generation kernel A. Douillet, G.R. Gao (Univ. of Delaware) Register Pressure in SWP’ed Loop Nests LCPC’05 17 / 43

  18. Outline Loop Nest Software-Pipelining 1 Problem Statement 2 Definitions & Issues 3 Fast Register Pressure Computation 4 Experiments 5 Conclusion 6 A. Douillet, G.R. Gao (Univ. of Delaware) Register Pressure in SWP’ed Loop Nests LCPC’05 18 / 43

  19. Definitions scalar lifetime: start: definition cycle of the value end: kill cycle of the value omega: number of outermost iterations spanned classification global: constant values, ignored input & output: prolog and epilog, ignored local: within same outermost iteration cross-iteration: between outermost iterations local start cross−iteration end omega A. Douillet, G.R. Gao (Univ. of Delaware) Register Pressure in SWP’ed Loop Nests LCPC’05 19 / 43

  20. Issues more complex lifetime patterns than MS non-constant initiation rate stretched lifetimes same stage may have different lifetimes patterns a stage is not always followed by the same stages difference between first and last instance of the same stage A. Douillet, G.R. Gao (Univ. of Delaware) Register Pressure in SWP’ed Loop Nests LCPC’05 20 / 43

  21. Lifetimes Example e d c b a e d c b a d c c d d c b d c b e d c b a e d c b a A. Douillet, G.R. Gao (Univ. of Delaware) Register Pressure in SWP’ed Loop Nests LCPC’05 21 / 43

  22. Outline Loop Nest Software-Pipelining 1 Problem Statement 2 Definitions & Issues 3 Fast Register Pressure Computation 4 Experiments 5 Conclusion 6 A. Douillet, G.R. Gao (Univ. of Delaware) Register Pressure in SWP’ed Loop Nests LCPC’05 22 / 43

  23. Method Overview Method Keys: separate OLP from ILES instances of stages separate first from last instances of stages separate local from cross-iteration lifetimes Steps: count number of local lifetimes in first instance of stages count number of local lifetimes in last instance of stages count number of cross-iteration lifetimes in each stage list all possible combinations of stages in schedule add number of lifetimes for each combination in OLP and ILES MaxLive is the highest value A. Douillet, G.R. Gao (Univ. of Delaware) Register Pressure in SWP’ed Loop Nests LCPC’05 23 / 43

  24. Local Lifetimes last traditional liveness analysis computes for both first and last instances of stage s each cycle c in stage between 0 and T − 1 first live-out set of stage ( c = T ) stage of level i visited i times LT local ( s , c , first / last ) level 1 level 2 level 3 A. Douillet, G.R. Gao (Univ. of Delaware) Register Pressure in SWP’ed Loop Nests LCPC’05 24 / 43

Recommend


More recommend