Load Balancing Epidemic Algorithm for Load Balancing Harshitha Menon, Laxmikant Kal´ e 15th April 1 / 25
Load Balancing Outline 1 Introduction Motivation Background Load Balancing Strategies 2 Distributed Load Balancing Information Propagation Load Transfer 3 Evaluation 4 Conclusion 2 / 25
Load Balancing Introduction Outline 1 Introduction Motivation Background Load Balancing Strategies 2 Distributed Load Balancing Information Propagation Load Transfer 3 Evaluation 4 Conclusion 3 / 25
Load Balancing Introduction Motivation Motivation Load imbalance in parallel applications Performance is limited by most overloaded processor Leads to drop in system utilization Hampers scalability of the application The chance that one processor is severely overloaded gets higher as no of processors increases For some applications computation load varies over time 4 / 25
Load Balancing Introduction Background Dynamic Load Balancing Framework in Charm++ Application is composed of large number of migratable units 5 / 25
Load Balancing Introduction Background Dynamic Load Balancing Framework in Charm++ Application is composed of large number of migratable units Load balancing strategy is invoked periodically 5 / 25
Load Balancing Introduction Background Dynamic Load Balancing Framework in Charm++ Application is composed of large number of migratable units Load balancing strategy is invoked periodically Based on principle of persistence 5 / 25
Load Balancing Introduction Background Dynamic Load Balancing Framework in Charm++ Application is composed of large number of migratable units Load balancing strategy is invoked periodically Based on principle of persistence Instruments the application tasks at fine-grained level 5 / 25
Load Balancing Introduction Background Dynamic Load Balancing Framework in Charm++ Application is composed of large number of migratable units Load balancing strategy is invoked periodically Based on principle of persistence Instruments the application tasks at fine-grained level When the load balancing is invoked Gathers the statistics based on the strategy (centralized or hierarchical) Executes load balancing strategy Migrates objects based on new mapping 5 / 25
Load Balancing Introduction Load Balancing Strategies Load Balancing Strategies Centralized Strategies Has global view of the system (good quality load balancing) Clear bottleneck beyond few thousand processors Distributed Strategies Processors make autonomous decisions based on local view (neighborhood) Scalable Yield poor load balance due to limited information Hierarchical Load balancer Subgroup of processors collect information at the root and receive aggregated information at higher levels Scalable and good quality May suffer from excessive data collection at lowest levels 6 / 25
Load Balancing Distributed Load Balancing Outline 1 Introduction Motivation Background Load Balancing Strategies 2 Distributed Load Balancing Information Propagation Load Transfer 3 Evaluation 4 Conclusion 7 / 25
Load Balancing Distributed Load Balancing Grapevine - Proposed Distributed Load Balancer Key Features Fully distributed scheme 8 / 25
Load Balancing Distributed Load Balancing Grapevine - Proposed Distributed Load Balancer Key Features Fully distributed scheme Use partial information of the global state of the system 8 / 25
Load Balancing Distributed Load Balancing Grapevine - Proposed Distributed Load Balancer Key Features Fully distributed scheme Use partial information of the global state of the system Propabilistic transfer of load 8 / 25
Load Balancing Distributed Load Balancing Grapevine - Proposed Distributed Load Balancer Key Features Fully distributed scheme Use partial information of the global state of the system Propabilistic transfer of load Scalable and good quality 8 / 25
Load Balancing Distributed Load Balancing Grapevine - Proposed Distributed Load Balancer Two Phases 9 / 25
Load Balancing Distributed Load Balancing Grapevine - Proposed Distributed Load Balancer Two Phases Information propagation 9 / 25
Load Balancing Distributed Load Balancing Grapevine - Proposed Distributed Load Balancer Two Phases Information propagation Load transfer 9 / 25
Load Balancing Distributed Load Balancing Information Propagation Information Propagation Based on gossip protocol 10 / 25
Load Balancing Distributed Load Balancing Information Propagation Information Propagation Based on gossip protocol Each underloaded processor starts the gossip Randomly sample peers and send its load information 10 / 25
Load Balancing Distributed Load Balancing Information Propagation Information Propagation Based on gossip protocol Each underloaded processor starts the gossip Randomly sample peers and send its load information On receiving load information, Combine the information with already known Forward it to random peers 10 / 25
Load Balancing Distributed Load Balancing Information Propagation Information Propagation Based on gossip protocol Each underloaded processor starts the gossip Randomly sample peers and send its load information On receiving load information, Combine the information with already known Forward it to random peers No explicit synchronization 10 / 25
Load Balancing Distributed Load Balancing Information Propagation Information Propagation Number of rounds taken to propagate a single update r = O (log f n ) 20 16 Rounds 12 8 f=2 4 f=3 f=4 0 0 4096 8192 12288 16384 System Size (n) Expected number of rounds taken to spread information 11 / 25
Load Balancing Distributed Load Balancing Information Propagation Information Propagation Two Flavors 18 Naive 16 Random selection 14 Rounds Informed 12 Biased selection 10 Naive Incorporate current knowledge 8 Informed 0 4096 8192 12288 16384 System Size (n) Expected number of rounds taken to spread information 12 / 25
Load Balancing Distributed Load Balancing Load Transfer Load Transfer Probabilistic transfer of load Naive transfer: Select processors uniformly at random Informed transfer: Select processors based on their load p i = 1 � 1 − L i � Z × L avg p i probability assigned to i th processor L i load of i th processor L avg average load of the system Z normalization constant 13 / 25
Load Balancing Distributed Load Balancing Load Transfer Load Transfer Naive Transfer 50 20 50 0.00048 40 40 15 0.00036 Probability Requests 30 30 Load Load 10 0.00024 20 20 0.00012 5 10 10 0 0 0 0 0 1000 2000 3000 4000 0 1000 2000 3000 4000 0 1000 2000 3000 4000 0 1000 2000 3000 4000 Underloaded Processors Underloaded Processors Underloaded Processors Underloaded Processors Informed Transfer 50 0.0012 20 50 40 40 15 Probability 0.0008 Requests 30 30 Load Load 10 20 20 0.0004 5 10 10 0 0 0 0 0 1000 2000 3000 4000 0 1000 2000 3000 4000 0 1000 2000 3000 4000 0 1000 2000 3000 4000 Underloaded Processors Underloaded Processors Underloaded Processors Underloaded Processors (a) Initial load (b) Probabilities assigned (c) Work units transferred (d) Final load. 14 / 25
Load Balancing Distributed Load Balancing Load Transfer Quality of Load Balancing 70 1 Quality is evaluation based on Max Load Imbalance Imbalance given by 60 0.75 Max Load Imbalance I = L max 50 0.5 − 1 L avg 40 0.25 30 0 1 4 16 64 256 1024 4096 Underloaded Processor Info Evaluation of partial information 15 / 25
Load Balancing Evaluation Outline 1 Introduction Motivation Background Load Balancing Strategies 2 Distributed Load Balancing Information Propagation Load Transfer 3 Evaluation 4 Conclusion 16 / 25
Load Balancing Evaluation Evaluation Applications LeanMD AMR Applications were run on IBM BG/Q Vesta Comparison with GreedyLB DiffusionLB RefineLB AmrLB HybridLB Metrics to evaluate Execution time per step excluding LB time Load balancing overhead Total application time 17 / 25
Load Balancing Evaluation Evaluation with LeanMD Time per step No LB Refine LB Quality of our strategy is Diff LB Hybrid LB 1000 Time per Step (ms) Greedy LB Gv LB equivalent to centralized 100 10 2048 4096 8192 16384 32768 Number of Processes 18 / 25
Load Balancing Evaluation Evaluation with LeanMD Load Balancing overhead Number of Processes Centralized have high Strategies 2048 4096 8192 16384 32768 HybridLB - 1.35 0.7 0.368 0.2375 overhead GreedyLB 8.62 8.9 10.33 11.2 23.4 RefineLB 55 50 27 34 121 Distributed schemes have DiffLB 0.039 0.043 0.040 0.043 0.040 GvLB 0.013 0.016 0.023 0.030 0.045 low overhead Load balancing cost (in seconds) of various strategies for LeanMD 19 / 25
Load Balancing Evaluation Evaluation LeanMD Total application time Number of Processes Using centralized Strategies 2048 4096 8192 16384 32768 NoLB 201 102 51 25 13 strategies overhead HybridLB - 72 37 20 12 GreedyLB 201 148 133 127 243 exceeds benefit RefineLB 675 567 306 362 1227 DiffLB 140 72 37 22 13 Grapevine gives the best GvLB 119 64 32 17 10 Total application time (in seconds) for LeanMD on performance BG/Q 20 / 25
Recommend
More recommend