vector load balancing in charm
play

Vector Load Balancing in Charm++ Ronak Buch Parallel Programming - PowerPoint PPT Presentation

Vector Load Balancing in Charm++ Ronak Buch Parallel Programming Laboratory, University of Illinois at Urbana-Champaign October 21, 2020 18th Annual Workshop on Charm++ and Its Applications Ronak Buch rabuch2@illinois.edu Vector Load


  1. Vector Load Balancing in Charm++ Ronak Buch Parallel Programming Laboratory, University of Illinois at Urbana-Champaign October 21, 2020 18th Annual Workshop on Charm++ and Its Applications Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 1/23 1 / 23

  2. Load Balancing achieve high performance and scalability because of it Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 2/23 • Load balancing is a hallmark of Charm++ • Performance often limited by maximum load on a PE • RTS measures load and migrates objects in response • Dynamic, irregular applications have been able to 2 / 23

  3. What is Load? about performance metric for this value or pipeline stalls improves upon merely profiling, sometimes more detail is helpful Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 3/23 • Load is really just a proxy value we use to reason ◦ In truth, we want to minimize execution time ◦ Unbalanced, fast program > balanced, slow program • CPU time per object by itself is often a sufficient • However, in the same way measuring cache misses 3 / 23

  4. Vector Load Balancing of multiple values measurements (CPU/GPU/network/memory), discrete parts of an iteration, application specific parameters, etc. Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 4/23 • Rather than being a single value, load is now a vector ◦ Store vector loads in LBDatabase ◦ Pass vector loads to strategies ◦ Use vector loads in strategies • Can be used generically: for various hardware 4 / 23

  5. Vector Strategies computationally difficult approximations Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 5/23 • Extra dimensionality makes vector load balancing • Objects can no longer be totally ordered • Want to minimize the maximum in each dimension • NP-complete problem, so only interested in 5 / 23

  6. Vector Strategies load dimension and places it on PE with minimum load in that dimension holistically Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 6/23 • A simple strategy finds object with global maximum ◦ Only works well when object has load in only one dimension • For more realistic cases, have to consider vector 6 / 23

  7. Vector Strategies normal hyperplane, then repeatedly allow furthest PE below the hyperplane to choose an object Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 7/23 • Find object with maximum p -norm and place on PE with minimum p -norm after placement ◦ Works well, but computationally expensive ◦ PE “weight” varies with object, i.e. ∥ (2 , 0) ∥ 2 < ∥ (0 , 3) ∥ 2 , but when adding (3 , 0) , ∥ (5 , 0) ∥ 2 > ∥ (3 , 3) ∥ 2 • Calculate average load vector in d -space and create a 7 / 23

  8. New Load Balancing APIs - Phase iteration separated by barriers (or weaker sychronization) load balancing: called again, all automatic LB measurements for calling chare attributed to specified phase phase Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 8/23 • Many applications have orthogonal phases within an • New functions have been added to track phases for ◦ void CkMigratable::CkLBSetPhase(int phase) - Until ◦ int CkMigratable::CkLBGetPhase() - Returns current 8 / 23

  9. New Load Balancing APIs - Manual load, int dimension) - Sets specified dimension of vector load for calling chare CkMigratable::CkLBGetObjVectorLoad() - Returns current vector load for calling chare Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 9/23 • Added new API for recording vector load data ◦ void CkMigratable::CkLBSetObjTime(LBRealType ◦ std::vector<LBRealType> 9 / 23

  10. Using Vector Strategies support vector load balancing and HybridLB vector version of the chosen strategy is automatically used if available Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 10/23 • Currently only strategies built on top of TreeLB ◦ TreeLB is new flexible, optimized replacement of CentralLB ◦ Eventually all non-distributed strategies should use TreeLB • If vector loads are detected in the LB database, a 10 / 23

  11. Writing Vector Strategies replicated in a static constexpr field for external access accessible with LBRealType getLoad(int dimension) vector and non-vector cases Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 11/23 • Objects and PEs are templated on dimension, • A specific dimension of Object or PE load is • Template specialization allows LB author to handle 11 / 23

  12. Writing Vector Strategies void solve(std::vector<Obj<1>>& objs, std::vector<P>& procs, 12/23 Vector Load Balancing in Charm++ rabuch2@illinois.edu Ronak Buch }; } // scalar implementation S& solution, bool objsSorted) { public : template < typename O, typename P, typename S> class Example <Obj<1>, P, S> : public Strategy<Obj<1>, P, S> { template < typename P, typename S> }; } // vector implementation S& solution, bool objsSorted) { void solve(std::vector<O>& objs, std::vector<P>& procs, public : class Example : public Strategy<O, P, S> { 12 / 23

  13. Vector LB Performance - AMPI AMPI - No Load Balancing Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 13/23 13 / 23

  14. Vector LB Performance - AMPI AMPI - Regular Load Balancing Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 14/23 14 / 23

  15. Vector LB Performance - AMPI AMPI - Vector Load Balancing Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 15/23 15 / 23

  16. Vector LB Performance - AMPI LB Off Phase Unaware (1.44x speedup) Phase Aware (1.67x speedup) Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 16/23 16 / 23

  17. Vector LB Performance Timeline of phase-based application: Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 17/23 17 / 23

  18. Vector LB Performance No LB Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 18/23 18 / 23

  19. Vector LB Performance (non-vector) GreedyLB Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 19/23 19 / 23

  20. Vector LB Performance Vector Greedy Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 20/23 20 / 23

  21. Applications dimension in vector would benefit! Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 21/23 • ChaNGa ◦ Working, but no performance results at scale yet ◦ Time spent in each rung of multi-stepping corresponds to • NAMD ◦ In process of making vector of CPU and GPU load • Please contact me if you think your application 21 / 23

  22. Future Vector LB Work needed long way automatically record accelerator load for cache/memory balancing Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 22/23 • Performance is still an issue, so optimizations ◦ Discretization, clustering, space-partitioning, etc. should go a • Exploit distribution of load per-dimension • Integrate HAPI into load measurement to • Add support for constraint based objective functions 22 / 23

  23. Conclusions balance more complex, this scope will likely increase been shown to improve decision quality over traditional LB in testing Ronak Buch rabuch2@illinois.edu Vector Load Balancing in Charm++ 23/23 • Applications often have scope for improved load • As programming techniques and hardware become • Providing more detailed load data via Vector LB has 23 / 23

Recommend


More recommend