bo bounded st stream sc scheduli ling in in polyh
play

Bo Bounded St Stream Sc Scheduli ling in in Polyh lyhedral l - PowerPoint PPT Presentation

Bo Bounded St Stream Sc Scheduli ling in in Polyh lyhedral l OpenStream Nuno Mig iguel Nob obre | nunomiguel.nobre@manchester.ac.uk Andi Drebes | andi.drebes@inria.fr Graham Riley | graham.riley@manchester.ac.uk Antoniu Pop |


  1. Bo Bounded St Stream Sc Scheduli ling in in Polyh lyhedral l OpenStream Nuno Mig iguel Nob obre | nunomiguel.nobre@manchester.ac.uk Andi Drebes | andi.drebes@inria.fr Graham Riley | graham.riley@manchester.ac.uk Antoniu Pop | antoniu.pop@manchester.ac.uk IMPACT 2020: January 22, 2020 | Bologna, Italy

  2. The case for streaming dataflow languages … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … 2 / 11

  3. The case for streaming dataflow languages Instead of barrier synchronization … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … 2 / 11

  4. The case for streaming dataflow languages Instead of barrier synchronization Point-to-point synchronization: Hide latency More opportunities for parallelism … … … … … … … … … … … … … … … … … … … … … … … … … … … … … … 2 / 11

  5. The case for streaming dataflow languages Instead of barrier synchronization Point-to-point synchronization: Hide latency More opportunities for parallelism … … … … … … … Task … … … … … … … … … … … … … … … … … … … … … … … 2 / 11

  6. The case for streaming dataflow languages Instead of barrier synchronization Point-to-point synchronization: Hide latency More opportunities for parallelism … … … … … … … Task … … … … … … … Data … … … … … … … … … … … … … … … … 2 / 11

  7. The case for streaming dataflow languages Instead of barrier synchronization Point-to-point synchronization: Hide latency More opportunities for parallelism … … … … … … … Task … … … … … … … Data … Pipeline … … … … … … … … … … … … … … … 2 / 11

  8. The case for streaming dataflow languages Instead of barrier synchronization Point-to-point synchronization: Hide latency More opportunities for parallelism … … … … … … … Task … … … … … … … Data … Pipeline … … … … … … … … Scheduling is the runtime’s job Provide functional determinism … … … … … … … No in-place writes: Fewer dependencies 2 / 11

  9. The case for streaming dataflow languages Instead of barrier synchronization Point-to-point synchronization: Hide latency GPU FPGA More opportunities for parallelism … … … … … … … … … … … … … … Task … … … … … … … … … … … … … … Data … … Pipeline … … … … … … … … … … … … … … … … Scheduling is the runtime’s job Provide functional determinism … … … … … … … … … … … … … … No in-place writes: Fewer dependencies Memory footprint 2 / 11

  10. Outline 1) OpenStream • Overview & polyhedral subset • Computing dependencies and schedules 2) Stream bounding • Basic strategy & limitations • Usage guidelines 3 / 11

  11. OpenStream: a short overview Data-flow extension to OpenMP • Tas asks: units of work spawned as concurrent coroutines created dynamically at runtime • Str Streams: unbounded channels for communication between tasks Tasks access stream elements through win indows: 4 / 11

  12. OpenStream: a short overview Data-flow extension to OpenMP • Tas asks: units of work spawned as concurrent coroutines created dynamically at runtime • Str Streams: unbounded channels for communication between tasks Tasks access stream elements through win indows: stream s; W s R s … … s Task dependencies: Control program Accesses on stream s overlapping windows 4 / 11

  13. OpenStream: a short overview Data-flow extension to OpenMP • Tas asks: units of work spawned as concurrent coroutines created dynamically at runtime • Str Streams: unbounded channels for communication between tasks Tasks access stream elements through win indows: stream s; task p1 { R s W s write three times to s; } p 1 … … s p 1 Task dependencies: Control program Accesses on stream s overlapping windows 4 / 11

  14. OpenStream: a short overview Data-flow extension to OpenMP • Tas asks: units of work spawned as concurrent coroutines created dynamically at runtime • Str Streams: unbounded channels for communication between tasks Tasks access stream elements through win indows: stream s; task p1 { R s W s write three times to s; } task p2 { p 1 p 2 … … s write two times to s; } p 1 p 2 Task dependencies: Control program Accesses on stream s overlapping windows 4 / 11

  15. OpenStream: a short overview Data-flow extension to OpenMP • Tas asks: units of work spawned as concurrent coroutines created dynamically at runtime • Str Streams: unbounded channels for communication between tasks Tasks access stream elements through win indows: stream s; task p1 { R s W s write three times to s; } task p2 { p 1 p 2 … … s write two times to s; } task r { peek three times from s; p 1 p 2 r } r Task dependencies: Control program Accesses on stream s overlapping windows 4 / 11

  16. OpenStream: a short overview Data-flow extension to OpenMP • Tas asks: units of work spawned as concurrent coroutines created dynamically at runtime • Str Streams: unbounded channels for communication between tasks Tasks access stream elements through win indows: stream s; task p1 { W s R s write three times to s; } task p2 { p 1 p 2 … … s write two times to s; } task r { peek three times from s; p 1 p 2 r c } r c task c { read five times from s; } Task dependencies: Control program Accesses on stream s overlapping windows 4 / 11

  17. Polyhedral OpenStream: computing dependencies stream s; parameter N; for (i = 0; i < N; ++i) task tw { write two times to s; } for (j = 0; j < N/2; ++j) task tc { read four times from s; } Polyhedral control program: • No nested task creation • Affine control statements 5 / 11

  18. Polyhedral OpenStream: computing dependencies stream s; parameter N; for (i = 0; i < N; ++i) task tw { W s (t w ,i) = 2i window: [2i, 2i + 1] write two times to s; } for (j = 0; j < N/2; ++j) task tc { R s (t c ,j) = 4j window: [4j, 4j + 3] read four times from s; } Polyhedral control program: Can statically count W s and R s and obtain access windows: • No nested task creation Ehrhart polynomials • • Affine control statements Brion generating functions • 5 / 11

  19. Polyhedral OpenStream: computing dependencies stream s; parameter N; for (i = 0; i < N; ++i) task tw { W s (t w ,i) = 2i window: [2i, 2i + 1] write two times to s; } 2i ≤ 4j + 3 ∧ 4j ≤ 2i + 1 for (j = 0; j < N/2; ++j) 2j ≤ i ≤ 2j + 1 task tc { R s (t c ,j) = 4j window: [4j, 4j + 3] read four times from s; } Polyhedral control program: Can statically count W s and R s Compute dependencies by intersecting windows and obtain access windows: • No nested task creation Ehrhart polynomials • • Affine control statements t w,0 t w,1 Brion generating functions • … t c,0 5 / 11

  20. Polyhedral OpenStream: scheduling Dependencies: polynomial (in)equalities 𝑞 𝑗 𝑦 , semi-algebraic sets: 𝑇 = 𝑦 ∈ ℝ 𝑒 𝑞 1 𝑦 ≥ 0, 𝑞 2 𝑦 ≥ 0, … , 𝑞 𝑜 𝑦 ≥ 0} 6 / 11

  21. Polyhedral OpenStream: scheduling Dependencies: polynomial (in)equalities 𝑞 𝑗 𝑦 , semi-algebraic sets: 𝑇 = 𝑦 ∈ ℝ 𝑒 𝑞 1 𝑦 ≥ 0, 𝑞 2 𝑦 ≥ 0, … , 𝑞 𝑜 𝑦 ≥ 0} A polynomial 𝑄(𝑦) is strictly positive in 𝑇 iff: 𝑙 1 𝑦 𝑞 2 𝑙 2 𝑦 … 𝑞 𝑜 𝑙 𝑜 (𝑦) 𝑄 𝑦 = ෍ 𝜇 𝑙 𝑞 1 𝜇 𝑙 ≥ 0 ∑𝜇 𝑙 > 0 𝑙∈ℕ 𝑜 6 / 11

  22. Polyhedral OpenStream: scheduling Dependencies: polynomial (in)equalities 𝑞 𝑗 𝑦 , semi-algebraic sets: 𝑇 = 𝑦 ∈ ℝ 𝑒 𝑞 1 𝑦 ≥ 0, 𝑞 2 𝑦 ≥ 0, … , 𝑞 𝑜 𝑦 ≥ 0} A polynomial 𝑄(𝑦) is strictly positive in 𝑇 iff: 𝑙 1 𝑦 𝑞 2 𝑙 2 𝑦 … 𝑞 𝑜 𝑙 𝑜 (𝑦) 𝑄 𝑦 = ෍ 𝜇 𝑙 𝑞 1 𝜇 𝑙 ≥ 0 ∑𝜇 𝑙 > 0 𝑙∈ℕ 𝑜 Cannot possibly exhaust all 𝑙 in finite time: Semi-decidable (undecidable) problem • In practice, ∼ conservative ‘Farkas lemma’ • 6 / 11

  23. Stream bounding: back-pressure WaRs stream s; t c,0 t c,1 parameter N; for (i = 0; i < N; ++i) task tw { write two times to s; … … s } for (j = 0; j < N/2; ++j) task tc { t w,0 t w,1 read four times from s; t w,2 t w,3 } t w,0 t w,1 t w,2 t w,3 … t c,0 t c,1 7 / 11

  24. Stream bounding: back-pressure WaRs stream s; t c,0 t c,1 parameter N; for (i = 0; i < N; ++i) task tw { write two times to s; … … s } for (j = 0; j < N/2; ++j) task tc { t w,0 t w,1 read four times from s; t w,2 t w,2 t w,3 } Stream bound: 4 elements t w,0 t w,1 t w,2 t w,3 … t c,0 t c,1 7 / 11

Recommend


More recommend