Dynamic Load Balancing in Dynamic Load Balancing in Charm+ + Charm+ + Abhinav S Bhatele Parallel Programming Lab, UIUC
Outline Outline • Dynamic Load Balancing framework in Charm+ + • Measurement Based Load Balancing • Examples: – Hybrid Load Balancers – Topology- aware Load Balancers • User Control and Flexibility • Future Work
Dynamic Load- Bal Dynamic Load- Balancing ancing • Task of load balancing (LB) – Given a collection of migratable objects and a set of processors – Find a mapping of objects to processors • Almost same amount of computation on each processor – Additional constraints • Ensure communication between processors is minimum • Take topology of the machine into consideration • Dynamic mapping of chares to processors – Load on processors keeps changing during the actual execution
Load- Balanc Load- Balancing Approaches ing Approaches • A rich set of strategies in Charm+ + • Two main ideas – No correlation between successive iterations • Fully dynamic • Seed load balancers – Load varies slightly over iterations • CSE, Molecular Dynamics simulations • Measurement- based load balancers
Principle of Persiste Principle of Persistence nce • Object communication patterns and computational loads tend to persist over time – In spite of dynamic behavior • Abrupt and large, but infrequent changes (e.g. AMR) • Slow and small changes (e.g. particle migration) • Parallel analog of principle of locality – Heuristics, that hold for most CSE applications
Measurement Based Load Balancing Measurement Based Load Balancing • Based on principle of persistence • Runtime instrumentation (LB Database) – communication volume and computation time • Measurement based load balancers – Use the database periodically to make new decisions – Many alternative strategies can use the database • Centralized vs. distributed • Greedy improvements vs. complete reassignment • Topology- aware
Load Balancer Strategies Load Balancer Str ategies • Centralized • Distributed – Object load data are – Load balancing sent to processor 0 among neighboring processors – Integrate to a complete object graph – Build partial object graph – Migration decision is broadcasted from – Migration decision is processor 0 sent to its neighbors – Global barrier – No global barrier
Load Balancing on Large Machines Load Balancing on Large Machines • Existing load balancing strategies don’t scale on extremely large machines • Limitations of centralized strategies: – Central node: memory/ communication bottleneck – Decision- making algorithms tend to be very slow • Limitations of distributed strategies: – Difficult to achieve well- informed load balancing decisions
Simulation Study - Simulation Study - Memory Overhead Memory Overhead Simulation performed with the performance simulator BigSim 5 0 0 4 5 0 4 0 0 3 5 0 3 0 0 Memory usage 2 5 0 (MB) 32K processors 2 0 0 1 5 0 64K processors 1 0 0 5 0 0 1 2 8 K 2 5 6 K 5 1 2 K 1 M Number of objects lb_test benchmark is a parameterized program that creates a specified number of communicating objects in 2D- mesh .
Load Balancing Load Balancing Execution Time Execution Time 4 0 0 3 5 0 3 0 0 2 5 0 E x e c u t i o n 2 0 0 T i me ( i n G r e e d y L B s e c o n d s ) G r e e d y C o mmL B 1 5 0 R e f i n e L B 1 0 0 5 0 0 1 2 8 K 2 5 6 K 5 1 2 K 1 M N u mb e r o f O b j e c t s Execution time of load balancing algorithms on a 64K processor simulation
Hierarchical Load Hierarchical Load Balancers Balancers • Hierarchical distributed load balancers – Divide into processor groups – Apply different strategies at each level – Scalable to a large number of processors
Hierarchical Tree (an exa Hierarchical Tree (an example) mple) 64K processor hierarchical tree 1 Level 2 0 1024 63488 64512 Level 1 64 … … … … …... Level 0 0 1023 1024 2047 63488 64511 64512 65535 Apply different strategies at each level 1024
An Example: Hybrid An Example: Hybrid LB LB • Dividing processors into independent sets of groups, and groups are organized in hierarchies (decentralized) • Each group has a leader (the central node) which performs centralized load balancing • A particular hybrid strategy that works well Gengbin Zheng, PhD Thesis, Gengbin Zheng, PhD Thesis, 2005 2005
Our HybridLB Scheme Our HybridLB Scheme Refinement- based Load balancing 1 Load Data 0 1024 63488 64512 Load Data (OCG) … … … … …... 0 1023 1024 2047 63488 64511 64512 65535 token Greedy- based Load balancing object
Memory Overhead Memory Overhead 5 0 0 4 5 0 4 0 0 3 5 0 3 0 0 Memory usage 2 5 0 (MB) CentralLB 2 0 0 HybridLB 1 5 0 1 0 0 5 0 0 2 5 6 K 5 1 2 K 1 M Number of Objects Simulation of lb_test (for 64k processors)
Total Load Ba Total Load Balancing Time lancing Time Simulation of lb_test for 64K processors 4 5 0 4 0 0 3 5 0 3 0 0 2 5 0 Time(s) 2 0 0 GreedyCommLB 1 5 0 HybridLB(GreedyCommLB) 1 0 0 5 0 0 2 5 6 K 5 1 2 K 1 M Number of Objects N procs 4096 8192 16384 Memory 6.8MB 22.57MB 22.63MB lb_test benchmark’s actual run on BG/ L at IBM (512K objects)
Load Balancing Quality Load Balancing Quality Simulation of lb_test for 64K processors 0 . 1 2 0 . 1 0 . 0 8 Maximum predicted 0 . 0 6 load (seconds) GreedyCommLB 0 . 0 4 HybridLB 0 . 0 2 0 2 5 6 K 5 1 2 K 1 M Number of Objects
Topology- aware mapping of tasks Topology- aware mapping of tasks • Problem – Map tasks to processors connected in a topology, such that: • Compute load on processors is balanced • Communicating chares (objects) are placed on nearby processors.
Mapping Mo Mapping Model del • Task Graph : – G t = (V t , E t ) – Weighted graph, undirected edges – Nodes chares, w ( v a ) computation – Edges communication, c ab bytes between v a and v b • Topology- graph : – G p = (V p , E p ) – Nodes processors – Edges Direct Network Links – Ex: 3D- Torus, 2D- Mesh, Hypercube
Model (Contd.) Model (Contd.) • Task Mapping – Assigns tasks to processors – P : V t V p • Hop- Bytes – Hop- Bytes Communication cost – The cost imposed on the network is more if more links are used – Weigh inter- processor communication by distance on the network
Load Balancing Framework in Charm+ + Load Balancing Framework in Charm+ + • Issues of mapping and decomposition separated • User had full control over mapping • Many choices – Initial static mapping – Mapping at run- time as newer objects created – Write a new load balancing strategy: inherit from BaseLB
Future Work Future Work • Hybrid Model- based Load Balancers – User gives a model to the LB – Combine it with measurement based load balancer • Multicast aware Load Balancers – Try and place targets of multicast on the same processor
Conclusions Conclusions • Measurement based LBs are good for most cases • Need scalable LBs in the future due to large machines like BG/ L – Hybrid Load Balancers – Communication sensitive LBs – Topology aware LBs
Recommend
More recommend