refactoring conventional task schedulers to exploit
play

Refactoring Conventional Task Schedulers to Exploit Asymmetric ARM - PowerPoint PPT Presentation

Refactoring Conventional Task Schedulers to Exploit Asymmetric ARM big.LITTLE Architectures in Dense Linear Algebra Sandra Cataln, Rafael Rodrguez- Luis Costero, Francisco D. Igual, Snchez, Enrique S. Quintana-Ort Katzalin Olcoz


  1. Refactoring Conventional Task Schedulers to Exploit Asymmetric ARM big.LITTLE Architectures in Dense Linear Algebra Sandra Catalán, Rafael Rodríguez- Luis Costero, Francisco D. Igual, Sánchez, Enrique S. Quintana-Ortí Katzalin Olcoz

  2. https://www.youtube.com/watch?v=

  3. Task parallelism

  4. Contribution Asymmetry-aware Asymmetry-oblivious + DLA library scheduler

  5. Contribution Asymmetry-aware Asymmetry-oblivious + DLA library scheduler Task parallelism Data parallelism

  6. Contribution Asymmetry-aware Asymmetry-oblivious + DLA library scheduler Virtual Cores Task parallelism Data parallelism

  7. Software execution models for ARM big.LITTLE

  8. Target architecture

  9. Execution Models CPU Migration Cluster swithching mode Global task scheduling

  10. Parallel execution of DLA operations on multi-threaded architectures

  11. A=U T U

  12. Runtime task scheduling of DLA operations ● Task scheduling for the Cholesky factorization

  13. Runtime task scheduling of DLA operations ● Task scheduling in heterogeneous architectures – The runtime distinguishes between CPU and GPU targets: OmpSs, StarPU, MAGMA, libflame – Tasks assigned depending on target properties and specific techniques are applied

  14. Runtime task scheduling of DLA operations ● Task scheduling in asymmetric architectures – Asymmetry-concious runtime: Botlev-OmpSs – Critical-aware Task Scheduler policy – Each task is mapped to a single core

  15. Data parallel libraries of BLAS3 kernels ● Multi-threaded implementation of the BLAS-3

  16. Data parallel libraries of BLAS3 kernels ● Data-parallel libraries for asymmetric architectures: – Global Task Scheduling – Dynamic workload distribution between the clusters – Static workload distribution in a cluster – Specific loop strides for each type of core

  17. Retargeting existing task schedulers to asymmetric architectures

  18. Evaluation of conventional runtimes on AMPs

  19. Combining conventional runtimes with asymmetric libraries ● GTS model (inspired in CPUM) – Virtual cores composed of 1A15 + 1A7 – Both cores are active simultaneously ● Parallelism: – Task-level: symmetric runtime – Data-level: asymmetric library

  20. Combining conventional runtimes with asymmetric libraries ● Comparison with other approaches: ✔ Any conventional task scheduler will work transparently with no special modifications ✔ Any improvement in the runtime will impact the performance on an AMP ✔ Any improvement in the asymmetry-aware library will impact the performace on an AMP ✗ Need of a tuned asymmetry-aware DLA library

  21. Experimental results

  22. Performance evaluation of the asymmetric BLIS

  23. Performance evaluation of the asymmetric BLIS

  24. Integration of the asymmetric BLIS in a conventional task scheduler

  25. Performance comparison versus asymmetry-aware task scheduler

  26. Conclusions

  27. In this work... ● Task-parallelism + Data-parallelism on AMPs ● Reuse of existing task schedulers. ● Competitive with asymmetry-aware schedulers

  28. Thank you

Recommend


More recommend