 
              Be Beyond Poly olyhedral Analysis of of OpenStream Programs Nun uno Mi Migu guel l Nob obre nunomiguel.nobre@manchester.ac.uk Join oint work ork wi with: Andi Drebes, Graham Riley and Antoniu Pop IMPACT 2019: January 23, 2019 | Valencia, Spain
How to exploit today’s machines efficiently? Task-par aralle llel l str tream amin ing da dataflow mo models have strong assets: • Point-to-point synchronization ▪ Hide latency • Numerous opportunities for parallelism ▪ Task, data and pipeline • Scheduling is the runtime’s job • Provide functional determinism 2 / 15
How to exploit today’s machines efficiently? Task-par aralle llel l str tream amin ing da dataflow mo models have strong assets: • Point-to-point synchronization But also disadvantages: ▪ Hide latency • Manually specified tasks • Numerous opportunities for parallelism ▪ Challenging dependency specification ▪ Task, data and pipeline ▪ Hard debugging • Scheduling is the runtime’s job ▪ What’s the right granularity? • Provide functional determinism • Memory footprint: no in-place writes 2 / 15
How to exploit today’s machines efficiently? Task-par aralle llel l str tream amin ing da dataflow mo models have strong assets: • Point-to-point synchronization But also disadvantages: ▪ Hide latency • Manually specified tasks • Numerous opportunities for parallelism ▪ Challenging dependency specification ▪ Task, data and pipeline ▪ Hard debugging • Scheduling is the runtime’s job ▪ What’s the right granularity? • Provide functional determinism • Memory foo ootprin int: no no in-pla lace wr writ ites 2 / 15
Why the polyhedral model? • Arbitrarily compose loop transformations inc. tiling gr gran anula larit ity control • Static program analysis str tream ams me memory foo ootprin int/ t/boundin ing • Multi-objective: parallelism, ve vectoriz izatio ion, multi multi-level cache reuse • Compact program representation unlike graph algorithms • Despite restrictions: stencils ils, dense linear algebra and image filters 3 / 15
Outline 1) Manual granularity tuning • Motivating example: Gauss-Seidel stencil 2) Stream bounding & automatic granularity tuning • The polynomial indexing problem • Future work solutions 4 / 15
OpenStream: a (very) short overview Data-flow extension to OpenMP • Tasks: units of work spawned as concurrent coroutines created dynamically at runtime • Str tream ams: unbounded channels for communication between tasks Tasks access stream elements through sliding wi wind ndows: … ? p 1 c 1 ? task task ? p 2 ? task … stream 5 / 15
OpenStream: a (very) short overview Data-flow extension to OpenMP • Tasks: units of work spawned as concurrent coroutines created dynamically at runtime • Str tream ams: unbounded channels for communication between tasks Tasks access stream elements through sliding wi wind ndows: … ? p 1 c 1 ? task task ? p 2 ? task … stream 5 / 15
OpenStream: a (very) short overview Data-flow extension to OpenMP • Tasks: units of work spawned as concurrent coroutines created dynamically at runtime • Str tream ams: unbounded channels for communication between tasks Tasks access stream elements through sliding wi wind ndows: … a p 1 c 1 ? task Stream accesses dictate the task dependencies between tasks ? p 2 ? task … stream 5 / 15
OpenStream: a (very) short overview Data-flow extension to OpenMP • Tasks: units of work spawned as concurrent coroutines created dynamically at runtime • Str tream ams: unbounded channels for communication between tasks Tasks access stream elements through sliding wi wind ndows: … a p 1 c 1 b task Stream accesses dictate the task dependencies between tasks c p 2 d task … stream 5 / 15
OpenStream: a (very) short overview Data-flow extension to OpenMP • Tasks: units of work spawned as concurrent coroutines created dynamically at runtime • Str tream ams: unbounded channels for communication between tasks Tasks access stream elements through sliding wi wind ndows: … a p 1 c 1 b task Stream accesses dictate the task dependencies between tasks c p 2 d c 2 task … task stream 5 / 15
1D Gauss-Seidel: stencil code granularity tuning + 1 / 2 1 / 2 Sequential C [SeqC] for (i = 0; i < I; ++i) for (j = 1; j < N - 1; ++j) phi[j] = (phi[j - 1] + phi[j + 1]) / 2; j Previous iteration Current iteration i Current grid point Not yet computed Flow dependence distance vector 6 / 15
1D Gauss-Seidel: stencil code granularity tuning + 1 / 2 1 / 2 Sequential C [SeqC] for (i = 0; i < I; ++i) for (j = 1; j < N - 1; ++j) phi[j] = (phi[j - 1] + phi[j + 1]) / 2; j Previous iteration Current iteration i Current grid point Not yet computed Flow dependence distance vector 6 / 15
1D Gauss-Seidel: stencil code granularity tuning + 1 / 2 1 / 2 Sequential C [SeqC] for (i = 0; i < I; ++i) for (j = 1; j < N - 1; ++j) phi[j] = (phi[j - 1] + phi[j + 1]) / 2; j Previous iteration Current iteration i Current grid point Not yet computed Flow dependence distance vector 6 / 15
1D Gauss-Seidel: stencil code granularity tuning + 1 / 2 1 / 2 Sequential C [SeqC] for (i = 0; i < I; ++i) for (j = 1; j < N - 1; ++j) phi[j] = (phi[j - 1] + phi[j + 1]) / 2; j Previous iteration Current iteration i Current grid point Not yet computed Flow dependence distance vector 6 / 15
1D Gauss-Seidel: stencil code granularity tuning + 1 / 2 1 / 2 Sequential C [SeqC] for (i = 0; i < I; ++i) for (j = 1; j < N - 1; ++j) phi[j] = (phi[j - 1] + phi[j + 1]) / 2; j Previous iteration Current iteration i Current grid point Not yet computed Flow dependence distance vector 6 / 15
1D Gauss-Seidel: stencil code granularity tuning + 1 / 2 1 / 2 Sequential C [SeqC] for (i = 0; i < I; ++i) for (j = 1; j < N - 1; ++j) phi[j] = (phi[j - 1] + phi[j + 1]) / 2; j Previous iteration Current iteration i Current grid point Not yet computed Flow dependence distance vector 6 / 15
1D Gauss-Seidel: stencil code granularity tuning + 1 / 2 1 / 2 Sequential C [SeqC] for (i = 0; i < I; ++i) for (j = 1; j < N - 1; ++j) phi[j] = (phi[j - 1] + phi[j + 1]) / 2; j Previous iteration Current iteration i Current grid point Not yet computed Flow dependence distance vector 6 / 15
1D Gauss-Seidel: stencil code granularity tuning + 1 / 2 1 / 2 Sequential C [SeqC] for (i = 0; i < I; ++i) for (j = 1; j < N - 1; ++j) phi[j] = (phi[j - 1] + phi[j + 1]) / 2; j Previous iteration Current iteration i Current grid point Not yet computed Flow dependence distance vector 6 / 15
1D Gauss-Seidel: stencil code granularity tuning + 1 / 2 1 / 2 OpenStream: Fine-grained tasks [OS-FG] Sequential C [SeqC] stream_array S[N]; for (i = 0; i < I; ++i) for (i = 0; i < I; ++i) for (j = 1; j < N - 1; ++j) for (j = 1; j < N - 1; ++j) phi[j] = (phi[j - 1] + phi[j + 1]) / 2; task { read once from S[j]; // phi[j] (discarded) peek once from S[j - 1]; // phi[j - 1] j peek once from S[j + 1]; // phi[j + 1] write once into S[j]; // phi[j] Previous iteration Current iteration // work function: i // phi[j] = (phi[j - 1] + phi[j + 1]) / 2; Current grid point } Not yet computed Flow dependence distance vector 6 / 15
1D Gauss-Seidel: stencil code granularity tuning 1) Semantically equivalent C code (SA) 2) Pluto source-to-source compiler 3) OpenMP parallel code [OMP-PT] 4) OpenStream: Pluto-tiled tasks [OS-PT] j Loop iteration/ fine-g rained task Loop tile/ Pluto-tiled task i Flow dependence distance vector between tiles 7 / 15
1D Gauss-Seidel: stencil code granularity tuning 1) Semantically equivalent C code (SA) 2) Pluto source-to-source compiler 3) OpenMP parallel code [OMP-PT] 4) OpenStream: Pluto-tiled tasks [OS-PT] OpenStream: Spatially tiled tasks [OS-ST] j j Loop iteration/ Loop iteration/ fine-g rained task fine-g rained task Loop tile/ Spatially tiled task Pluto-tiled task i i Flow dependence Flow dependence distance vector distance vector between tiles between tiles 7 / 15
1D Gauss-Seidel: results 8 / 15
2D Gauss-Seidel: a visual picture OpenStream: Fine-grained tasks [OS-FG] Previous iteration k Current iteration Current grid point j Not yet computed Flow dependence i distance vector 9 / 15
2D Gauss-Seidel: a visual picture OpenStream: Pluto-tiled tasks [OS-PT] k j i 10 / 15
2D Gauss-Seidel: a visual picture OpenStream: Pluto-tiled tasks [OS-PT] OpenStream: Spatially tiled tasks [OS-ST] k k j j i i 10 / 15
2D Gauss-Seidel: results 11 / 15
The polynomial problem • Stream indexing is polynomial ▪ e.g. parametric tiling 12 / 15
The polynomial problem • Stream indexing is polynomial ▪ e.g. parametric tiling • Deadlock undecidability Albert Cohen, Alain Darte, and Paul Feautrier. 2016. Static Analysis of OpenStream Programs ▪ 12 / 15
The polynomial problem • Stream indexing is polynomial ▪ e.g. parametric tiling • Deadlock undecidability Albert Cohen, Alain Darte, and Paul Feautrier. 2016. Static Analysis of OpenStream Programs ▪ • Schedule found: no deadlock Paul Feautrier and Albert Cohen. 2018. On Polynomial Code Generation ▪ 12 / 15
Recommend
More recommend