Dynamic Load Balancing in Dynamic Load Balancing in Charm+ + - PowerPoint PPT Presentation

Dynamic Load Balancing in Dynamic Load Balancing in Charm+ + Charm+ + Abhinav S Bhatele Parallel Programming Lab, UIUC

Outline Outline • Dynamic Load Balancing framework in Charm+ + • Measurement Based Load Balancing • Examples: – Hybrid Load Balancers – Topology- aware Load Balancers • User Control and Flexibility • Future Work

Dynamic Load- Bal Dynamic Load- Balancing ancing • Task of load balancing (LB) – Given a collection of migratable objects and a set of processors – Find a mapping of objects to processors • Almost same amount of computation on each processor – Additional constraints • Ensure communication between processors is minimum • Take topology of the machine into consideration • Dynamic mapping of chares to processors – Load on processors keeps changing during the actual execution

Load- Balanc Load- Balancing Approaches ing Approaches • A rich set of strategies in Charm+ + • Two main ideas – No correlation between successive iterations • Fully dynamic • Seed load balancers – Load varies slightly over iterations • CSE, Molecular Dynamics simulations • Measurement- based load balancers

Principle of Persiste Principle of Persistence nce • Object communication patterns and computational loads tend to persist over time – In spite of dynamic behavior • Abrupt and large, but infrequent changes (e.g. AMR) • Slow and small changes (e.g. particle migration) • Parallel analog of principle of locality – Heuristics, that hold for most CSE applications

Measurement Based Load Balancing Measurement Based Load Balancing • Based on principle of persistence • Runtime instrumentation (LB Database) – communication volume and computation time • Measurement based load balancers – Use the database periodically to make new decisions – Many alternative strategies can use the database • Centralized vs. distributed • Greedy improvements vs. complete reassignment • Topology- aware

Load Balancer Strategies Load Balancer Str ategies • Centralized • Distributed – Object load data are – Load balancing sent to processor 0 among neighboring processors – Integrate to a complete object graph – Build partial object graph – Migration decision is broadcasted from – Migration decision is processor 0 sent to its neighbors – Global barrier – No global barrier

Load Balancing on Large Machines Load Balancing on Large Machines • Existing load balancing strategies don’t scale on extremely large machines • Limitations of centralized strategies: – Central node: memory/ communication bottleneck – Decision- making algorithms tend to be very slow • Limitations of distributed strategies: – Difficult to achieve well- informed load balancing decisions

Simulation Study - Simulation Study - Memory Overhead Memory Overhead Simulation performed with the performance simulator BigSim 5 0 0 4 5 0 4 0 0 3 5 0 3 0 0 Memory usage 2 5 0 (MB) 32K processors 2 0 0 1 5 0 64K processors 1 0 0 5 0 0 1 2 8 K 2 5 6 K 5 1 2 K 1 M Number of objects lb_test benchmark is a parameterized program that creates a specified number of communicating objects in 2D- mesh .

Load Balancing Load Balancing Execution Time Execution Time 4 0 0 3 5 0 3 0 0 2 5 0 E x e c u t i o n 2 0 0 T i me ( i n G r e e d y L B s e c o n d s ) G r e e d y C o mmL B 1 5 0 R e f i n e L B 1 0 0 5 0 0 1 2 8 K 2 5 6 K 5 1 2 K 1 M N u mb e r o f O b j e c t s Execution time of load balancing algorithms on a 64K processor simulation

Hierarchical Load Hierarchical Load Balancers Balancers • Hierarchical distributed load balancers – Divide into processor groups – Apply different strategies at each level – Scalable to a large number of processors

Hierarchical Tree (an exa Hierarchical Tree (an example) mple) 64K processor hierarchical tree 1 Level 2 0 1024 63488 64512 Level 1 64 … … … … …... Level 0 0 1023 1024 2047 63488 64511 64512 65535 Apply different strategies at each level 1024

An Example: Hybrid An Example: Hybrid LB LB • Dividing processors into independent sets of groups, and groups are organized in hierarchies (decentralized) • Each group has a leader (the central node) which performs centralized load balancing • A particular hybrid strategy that works well Gengbin Zheng, PhD Thesis, Gengbin Zheng, PhD Thesis, 2005 2005

Our HybridLB Scheme Our HybridLB Scheme Refinement- based Load balancing 1 Load Data 0 1024 63488 64512 Load Data (OCG) … … … … …... 0 1023 1024 2047 63488 64511 64512 65535 token Greedy- based Load balancing object

Memory Overhead Memory Overhead 5 0 0 4 5 0 4 0 0 3 5 0 3 0 0 Memory usage 2 5 0 (MB) CentralLB 2 0 0 HybridLB 1 5 0 1 0 0 5 0 0 2 5 6 K 5 1 2 K 1 M Number of Objects Simulation of lb_test (for 64k processors)

Total Load Ba Total Load Balancing Time lancing Time Simulation of lb_test for 64K processors 4 5 0 4 0 0 3 5 0 3 0 0 2 5 0 Time(s) 2 0 0 GreedyCommLB 1 5 0 HybridLB(GreedyCommLB) 1 0 0 5 0 0 2 5 6 K 5 1 2 K 1 M Number of Objects N procs 4096 8192 16384 Memory 6.8MB 22.57MB 22.63MB lb_test benchmark’s actual run on BG/ L at IBM (512K objects)

Load Balancing Quality Load Balancing Quality Simulation of lb_test for 64K processors 0 . 1 2 0 . 1 0 . 0 8 Maximum predicted 0 . 0 6 load (seconds) GreedyCommLB 0 . 0 4 HybridLB 0 . 0 2 0 2 5 6 K 5 1 2 K 1 M Number of Objects

Topology- aware mapping of tasks Topology- aware mapping of tasks • Problem – Map tasks to processors connected in a topology, such that: • Compute load on processors is balanced • Communicating chares (objects) are placed on nearby processors.

Mapping Mo Mapping Model del • Task Graph : – G t = (V t , E t ) – Weighted graph, undirected edges – Nodes  chares, w ( v a )  computation – Edges  communication, c ab  bytes between v a and v b • Topology- graph : – G p = (V p , E p ) – Nodes  processors – Edges  Direct Network Links – Ex: 3D- Torus, 2D- Mesh, Hypercube

Model (Contd.) Model (Contd.) • Task Mapping – Assigns tasks to processors – P : V t  V p • Hop- Bytes – Hop- Bytes  Communication cost – The cost imposed on the network is more if more links are used – Weigh inter- processor communication by distance on the network

Load Balancing Framework in Charm+ + Load Balancing Framework in Charm+ + • Issues of mapping and decomposition separated • User had full control over mapping • Many choices – Initial static mapping – Mapping at runtime as newer objects created – Write a new load balancing strategy: inherit from BaseLB

Future Work Future Work • Hybrid Model- based Load Balancers – User gives a model to the LB – Combine it with measurement based load balancer • Multicast aware Load Balancers – Try and place targets of multicast on the same processor

Conclusions Conclusions • Measurement based LBs are good for most cases • Need scalable LBs in the future due to large machines like BG/ L – Hybrid Load Balancers – Communication sensitive LBs – Topology aware LBs

Dynamic Load Balancing in Dynamic Load Balancing in Charm+ + - PowerPoint PPT Presentation

Dynamic Load Balancing in Dynamic Load Balancing in Charm+ + Charm+ + Abhinav S Bhatele Parallel Programming Lab, UIUC Outline Outline Dynamic Load Balancing framework in Charm+ + Measurement Based Load Balancing Examples:

Load Balancing Load Balancing Load balancing: distributing data and/or computations across

Load Balancing with nftables by Laura Garca (Zen Load Balancer Team) Netdev 1.1 Prototype of

Internal Load Balancing in 5 mins Deliver scalable and resilient internal-only services on GCP

Epidemic Algorithm for Load Balancing Harshitha Menon, Laxmikant Kal e 15th April 1 / 25

Charm++ Interoperability Nikhil Jain Charm Workshop - 2013 1 Monday, April 15, 13 1

Recent Results in Charm Physics Recent Results in Charm Physics Topics Topics Rare Charm

Vector Load Balancing in Charm++ Ronak Buch Parallel Programming Laboratory, University of

L O A D B A L A N C I N G I S I M P O S S I B L E LOAD BALANCING IS IMPOSSIBLE Tyler McMullen

Load Balancing in Ceph: Load Balancing With Pseudorandom Placement Esteban Molina-Estolano,

State of Charm++ Laxmikant Kale http://charm.cs.uiuc.edu Parallel Programming Laboratory

Welcome to the 2017 Charm++ Workshop! Laxmikant (Sanjay) Kale http://charm.cs.illinois.edu

Charm physics and XYZ states at BESIII Evgeny BOGER JINR Dubna On behalf of BESIII

Balancing Gas system information provision 12 June 2018 GRTgaz balancing in a nutshell -> 2

Load Balancing Load Balancing: Example Example Problem Consider 6 jobs whose processing times

Load Balancing and Termination Detection Load balancing used to distribute computations fairly

Parallel Programming and High-Performance Computing Part 6: Dynamic Load Balancing Dr.

Dont ask what you can do for LuaTeX, but what LuaTeX can do for you. Patrick Gundlach

Use of master-worker and integration with OSG Connect Roman Zubatyuk Department of Chemistry

Example: N-Queens (1) Slightly Less Informal Standard search problems: Formulation 1:

ETSI EMTEL (Special Committee on Emergency Communications) CHAIRMAN Ray Forbes Producing and

Character triples and group graded equivalences Virgilius-Aurelian Minut a Babes , -Bolyai

AGENCY from Ombudsman Reportable conduct Inform the Ombudsman Monitor investigation Assess

The Rise in Conflict Associated With Mining Operations: What Lies Beneath? Tony Andrews

King Fahd University of Petroleum & Minerals College of Computer Sciences & Engineering

Dynamic Load Balancing in Dynamic Load Balancing in Charm+ + - PowerPoint PPT Presentation

Dynamic Load Balancing in Dynamic Load Balancing in Charm+ + Charm+ + Abhinav S Bhatele Parallel Programming Lab, UIUC Outline Outline Dynamic Load Balancing framework in Charm+ + Measurement Based Load Balancing Examples:

Load Balancing Load Balancing Load balancing: distributing data and/or computations across

Load Balancing with nftables by Laura Garca (Zen Load Balancer Team) Netdev 1.1 Prototype of

Internal Load Balancing in 5 mins Deliver scalable and resilient internal-only services on GCP

Epidemic Algorithm for Load Balancing Harshitha Menon, Laxmikant Kal e 15th April 1 / 25

Charm++ Interoperability Nikhil Jain Charm Workshop - 2013 1 Monday, April 15, 13 1

Recent Results in Charm Physics Recent Results in Charm Physics Topics Topics Rare Charm

Vector Load Balancing in Charm++ Ronak Buch Parallel Programming Laboratory, University of

L O A D B A L A N C I N G I S I M P O S S I B L E LOAD BALANCING IS IMPOSSIBLE Tyler McMullen

Load Balancing in Ceph: Load Balancing With Pseudorandom Placement Esteban Molina-Estolano,

State of Charm++ Laxmikant Kale http://charm.cs.uiuc.edu Parallel Programming Laboratory

Welcome to the 2017 Charm++ Workshop! Laxmikant (Sanjay) Kale http://charm.cs.illinois.edu

Charm physics and XYZ states at BESIII Evgeny BOGER JINR Dubna On behalf of BESIII

Balancing Gas system information provision 12 June 2018 GRTgaz balancing in a nutshell -&gt; 2

Load Balancing Load Balancing: Example Example Problem Consider 6 jobs whose processing times

Load Balancing and Termination Detection Load balancing used to distribute computations fairly

Parallel Programming and High-Performance Computing Part 6: Dynamic Load Balancing Dr.

Dont ask what you can do for LuaTeX, but what LuaTeX can do for you. Patrick Gundlach

Use of master-worker and integration with OSG Connect Roman Zubatyuk Department of Chemistry

Example: N-Queens (1) Slightly Less Informal Standard search problems: Formulation 1:

ETSI EMTEL (Special Committee on Emergency Communications) CHAIRMAN Ray Forbes Producing and

Character triples and group graded equivalences Virgilius-Aurelian Minut a Babes , -Bolyai

AGENCY from Ombudsman Reportable conduct Inform the Ombudsman Monitor investigation Assess

The Rise in Conflict Associated With Mining Operations: What Lies Beneath? Tony Andrews

King Fahd University of Petroleum &amp; Minerals College of Computer Sciences &amp; Engineering

Balancing Gas system information provision 12 June 2018 GRTgaz balancing in a nutshell -> 2

King Fahd University of Petroleum & Minerals College of Computer Sciences & Engineering