near optimal adaptive control of a large grid application
play

Near-Optimal Adaptive Control of a Large Grid Application Det - PowerPoint PPT Presentation

Near-Optimal Adaptive Control of a Large Grid Application Det Buaklee Greg Tracy Mary Vernon Steve Wright Computer Science Department University of Wisconsin - Madison Talk Outline Condor Stochastic Optimization, ATR ATR


  1. Near-Optimal Adaptive Control of a Large Grid Application Det Buaklee Greg Tracy Mary Vernon Steve Wright Computer Science Department University of Wisconsin - Madison

  2. Talk Outline • Condor • Stochastic Optimization, ATR • ATR Execution Time Analysis • Model for Minimum Execution Time • Results: Optimized ATR Performance ICS’02 New York City June 26, 2002 [2]

  3. Condor • Provides high throughput computation • Manages a heterogeneous & dynamic pool • MW layer supports Master-Worker applications – Submitting node is the “master” node – Condor dynamically allocates “worker” nodes – Worker nodes can drop out during computation (min,max) Application MW Layer Condor PVM/TCP Communication Link ICS’02 New York City June 26, 2002 [3]

  4. Stochastic Optimization • Non-trivial ~ 10,000 lines + LP codes • Optimization of a model with uncertain data – Large number of possible scenarios for the data – Arises in planning-under-uncertainty applications • x : vector of variables (unknowns) – aim to find the x that optimizes expected model performance over all the scenarios • Objective function is an expectation Q ( x ) min c T x + Q ( x ) subject to A x = b , x ≥ 0 x ICS’02 New York City June 26, 2002 [4]

  5. Properties of Expectation Q ( x ) • Probabilistic weighted sum over the objective for each individual scenario ω i , i=1,2,…N Q ( x ) N Q ( x ) = ∑ p i Q ( x ; ω i ) i =1 x • N is number of scenarios evaluated – Maybe sampled from the full set of scenarios – Increase N to improve the accuracy ICS’02 New York City June 26, 2002 [5]

  6. ATR Parallelism For each N Iteration master T G workers N = 16 = number of scenarios evaluated G = 4 = number of task groups T = 8 = number of tasks per iteration ICS’02 New York City June 26, 2002 [6]

  7. Goals Given N and a set of workers: • Compute (near)optimal adaptive values of B, G, T – Automated process – Fast/simple runtime computation • Compare adaptive and non-adaptive B, G, and grouping/scheduling of tasks Approach: LogP/LogGP/LoPC model ICS’02 New York City June 26, 2002 [7]

  8. ATR in parallel • Each task i returns value of Σ i Q ( x ; ω ), and a subgradient (slope) for this partial sum • Sum over tasks to obtain complete function Q ( x ) and its subgradient Q(x 1 ; ω 2 ) Q(x 1 ; ω 1 ) Q(x 2 ; ω 2 ) Q(x 2 ; ω 1 ) Q(x 2 ; ω 3 ) Q(x 1 ; ω 3 ) Q ( x ) x 1 x 2 x 3 Master Workers • At the end of each iteration, set new x to be minimizer of the latest approximation to Q ( x ) ICS’02 New York City June 26, 2002 [8]

  9. Execution Time Analysis Measure LogP/LogGP/LoPC model parameters – L (network latency) – o (message processing overhead) – G (gap per byte - Bandwidth) – P (number of Processors master execution time worker execution time communication time ICS’02 New York City June 26, 2002 [9]

  10. Execution Time Measurement Worker Master Time to Update Master Time Execution Model Function m ( x ) to Compute a New Iterate, x Time (msec) (sec) G T (sec) num avg avg min max avg min max it. 1 2 3 25 25 20.54 6.51 3.36 915 82 0.38 0.01 2.06 50 50 10.56 6.04 3.56 1405 47 1.32 0.01 3.05 50 100 10.36 6.83 3.64 1936 32 2.42 0.05 7.60 100 100 5.19 5.94 3.40 1162 31 1.57 0.05 3.41 200 200 2.69 6.12 3.84 2092 25 2.25 0.03 6.25 400 400 1.35 6.74 3.30 2411 21 3.33 0.05 13.27 • One master and one worker experiment • High variability ICS’02 New York City June 26, 2002 [10]

  11. 1 Worker Execution Times Worker Execution Time 50 MIPS 600 40 MIPS 780 30 (sec) MIPS 1100 MIPS 1700 20 10 0 0 200 400 600 800 1000 1200 Number of Scenarios Evaluated (N/G) • For a given planning problem t w is linear in – Number of scenarios evaluated – Processor speed Total worker time = n( t w ) max ICS’02 New York City June 26, 2002 [11]

  12. 2 Master Execution Times Updating m( x ) after each task group (G) returns lightly loaded master, default debug level 1000 lightly loaded master, reduced debug level Time to Update m ( x ) (msec) 100 isolated master, reduced debug level 10 1 0.1 0.01 0 1000 2000 3000 4000 Worker Completion Event Count • Variability in execution time due to: – Excessive default debug I/O – Interference from Condor administrative tasks • Eliminating both makes this execution time <1ms i.e., negligible ICS’02 New York City June 26, 2002 [12]

  13. 3 Master Execution Times Time to computing new x Time to Compute New Time to Compute New 7 30 T = 200 6 25 T = 100 5 20 x (sec) x (sec) 4 15 3 10 2 5 1 0 0 0 200 400 600 800 0 10 20 30 40 Iteration Number Iteration Number 20term problem SSN network design problem • Hard to make prediction for the next iterate • Same characteristic for all planning problem ICS’02 New York City June 26, 2002 [13]

  14. 3 Master Execution Times Generating new x at the end of each iteration: 100 Processing Time (sec) 3.5 80 Compute New Total Master Avg. Time to 3 75 Iteration (n) Number of 60 2.5 x (sec) 2 50 40 1.5 1 25 20 0.5 0 0 0 0 200 400 600 800 1000 0 200 400 600 800 1000 Number of Task (T) Number of Tasks (T) • Number of iterations ( n ) and time to compute x for each iteration depends on N, T • Given N, total master processing time ( t M ) is fixed! Optimize: T is large, but not too large ICS’02 New York City June 26, 2002 [14]

  15. Communication Costs Between local nodes Between Wisconsin and Bologna, Italy 8 0.98 Experiment 1 0.84 Experiment 2 Time (sec) Time (usec) 6 0.70 0.56 4 0.42 0.28 2 0.14 0.00 0 0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14 16 Size of Data Sent (KB) Size of Data Sent (KB) • Round trip time measurement • Critical path contains one round trip time per iterate • Round trip time << worker execution time for message sizes used in ATR (250–1200 bytes) ICS’02 New York City June 26, 2002 [15]

  16. Effect of Basket Size Number of Iterations (n) 160 maximum 140 120 average 100 minimum 80 60 40 20 0 0 1 2 3 4 5 6 Basket Size (B) • More iterations (n) needed for larger B – approximately linear relationship of B and n • Optimal B=1 ICS’02 New York City June 26, 2002 [16]

  17. Model Vocabulary N number of scenarios in model T number of tasks per iteration G number of groups of scenarios (units of work) B number of vectors x evaluated in parallel t M total master execution time t W individual worker execution time n total number of iterations ICS’02 New York City June 26, 2002 [17]

  18. Building the Model Master, Worker, Communication Times • Total master execution time – Variable for N, T, B – Include only time to generate new x • Worker execution time per iteration: – Very low variation – Consistent from one iteration to another • Insignificant contributions from: – Communication time – Master updating Q ( x ) t M + n(t w ) max t M + n(t w ) max – (if T not too large) ICS’02 New York City June 26, 2002 [18]

  19. Model Validation for Homogenous Worker Pool Compute Total Execution New x (sec) Time (min) Planning Benchmark N T Note num Problem Average Total it. Model Measured (t W ) (sec) (t M ) (n) 20-terms 5,000 200 597 2762 2.35 69.4 70.5 WI pool ssn 40,000 100 84 297 30.97 48.8 52.2 WI-NM Flock WI-Argonne ssn 20,000 50 108 180 20.91 40.8 44.7 Flock WI-Argonne ssn 20,000 100 84 244 20.89 33.5 36.3 Flock WI-Argonne ssn 20,000 200 61 295 20.88 26.4 29.3 Flock WI-Argonne ssn 20,000 400 44 441 20.96 22.9 24.9 Flock ssn 10000 100 44 64 6.32 10.3 12.1 WI-pool Model: t M + nt w ICS’02 New York City June 26, 2002 [19]

  20. Model Validation for Heterogeneous Worker Pool Computing new X Worker Time ( sec) Non Adaptive Execution Time (min) (sec) Number of Measur min max n t M avg. t w t w Model Workers ed Request 70 50.37 7.04 4.21 28.62 34.23 34.65 50 70 50.02 7.03 4.19 28.62 34.22 35.07 50 58 35.8 6.62 4.18 13.82 13.96 14.35 50 42 60.71 2.86 1.36 13.88 10.73 13.75 150 38 53.18 2.76 1.38 9.42 6.85 9.07 150 36 46.77 2.86 1.37 9.78 6.65 9.63 150 36 61.3 2.11 1.68 10.21 7.15 9.67 200 Model: t M + n(t w ) max ICS’02 New York City June 26, 2002 [20]

  21. Optimal Configuration for Homogenous Worker Pool Original ATR Execution Time (T = 100, G = 25) Near-Optimize ATR Execution Time Reduced Debug Default Debug B=3 B=6 B=3 B=6 61 min 92 min 68 min 149 min 18 min • G should be equal to number of available processors • T should be large up to a point • B should be set to 1 3x – 6x faster! ICS’02 New York City June 26, 2002 [21]

  22. Heterogeneous task assignment master node’s worker queue 9 9 10 13 15 20 20 20 benchmark: 27 18 9 27 18 9 20 10 26 13 30 15 40 20 20 20 E w : master node’s job queue per iteration 1 2 3 4 5 6 7 8 ICS’02 New York City June 26, 2002 [22]

Recommend


More recommend