perf ormance optimization of throttled time warp
play

Perf ormance Optimization of Throttled Time- Warp Simulation Seng - PowerPoint PPT Presentation

Perf ormance Optimization of Throttled Time- Warp Simulation Seng Chuan TAY and Yong Meng TEO Depart ment of Comput er Science Nat ional Universit y of Singapore Email: t eoym@comp.nus.edu.sg ht t p:/ / www.comp.nus.edu.sg/ ~t eoym


  1. Perf ormance Optimization of Throttled Time- Warp Simulation Seng Chuan TAY and Yong Meng TEO Depart ment of Comput er Science Nat ional Universit y of Singapore Email: t eoym@comp.nus.edu.sg ht t p:/ / www.comp.nus.edu.sg/ ~t eoym Proceedings of the 34 th Annual Simulation Symposium, IEEE Computer Society Press, pp. 211-218, Seattle, USA, April 2001. 1

  2. Outline ! Introduction ! Throttled Time-Warp Simulation ! Analytic Performance Model ! Model Validation and Performance Analysis ! Conclusions 2

  3. Introduction " Time-Warp – greedy parallelism " TW throttle – matches (controls) the degree of speculative event parallelism with available machine resources (parallelism) " Determine appropriate level of runtime optimism in the throttle " Analytical framework based on optimizing the elapsed time; validated using simulation 3

  4. Throttled Time-Warp " Global progress window - to characterize the simulation progress of each LP. " Hysteresis band - to smooth the fluctuation of LVT advancement. " Event regulator - to regulate event execution, i.e., to accelerate the slow LPs, and suspend the LVT advancement in fast LPs. 4

  5. Time-Warp Throttle r : spread ratio of the length of hysteresis zone to the GPW − hu hl = r − GFT GVT 5

  6. Analytic Performance Model Unthrottled GVT Window unthrottle = GVT W d where W is a constant. 6

  7. Throttled GVT Window f(a): number of events on the longest path of LP network f(d) = d x c + (d - 1) x µ Assume a GVT computation is activated when a majority (> 50%) of LVTs exceeds the hysteresis zone.     1 r 1 r = + = +     GVT throttled f ( d ) x x 50 % f ( d ) x     7 2 2 4 4

  8. Characterization of Elapsed Time E Total : total number of true and false event executed. E Arr : total number of true and false arrival events. E Dep : total number of true and false departure events. + : the expected number of undone events caused by a D straggler. − : the expected number of undone events caused by the D n n-th wave of anti-message. λ : arrival rate µ : service rate N P : number of true events processed by an LP.   l ∑ + − =  + +  E N x 1 D D Total p n   = n 1 E Total λ < µ E Arr = if 2 λ E Total x λ + µ otherwise E Dep = E Total - E Arr 8

  9. Elapsed time = Computation Time (T Cp ) + Communication Time (T Cm ) + Processor Idle Time (T Id ) 9

  10. Computation Time (T Cp ) T e : event processing time T s : state saving time T Cp = E Total x (T e + T s ) 10

  11. Communication Time (T Cm ) T b : time required to construct a message in buffer for each transmission. T t : message tramsmission time Two commumication time components: E T (i) transmission of event message ( ) Cm G (ii) Transmission of GVT protocols ( ) T 11 Cm

  12. Transmission of event messages: E T = E Arr x T b + E Dep x (T b + T t ) Cm 12

  13. Transmission of GVT Protocols " An LP after advancing it’s LVT beyond hu, will send a request for GVT signal to the coordinator. GVT hl hu GFT slow zone hyteresis zone fast zone global progress window " If the request signals from a majority (>50%) of LPs in the slow and hysteresis zones have been received, the GVT coordinator broadcasts a request for LPT signal to all LPs. Otherwise the coordinator waits until more than 50% of the request signal have been received. " Whenever the request for LPT signal is received in an LP, the LP will repeat its local progress time to the GVT coordinator. " After all the LPTs have been received, the GVT coordinator computes 13 the minimum, and broadcast the new GVT to all LPs.

  14. As the GVT computation procedure incurs two transmissions and two receptions, the overhead incurred is communication is 2 x (T b + T t ) + 2 x T b = 4 T b + 2 T t The Communication overhead due to GVT computation is  +  l ∑ + −   N 1 D D p n   ( ) = = + G n 1 T x 4 T 2 T Cm b t ˆ G V T 14

  15. Processor Idle Time (T Id ) " Delay due to GVT Computation ( ) G T " 50% of the event in the interval [GVT, hu) will have to be Id executed and the states saved. GVT hl hu GFT " The time delay is 2   l ( ) ∑ + − = + + +   G T N x 1 D D x T T Id p n e s   = n 1 15

  16. Insufficient Executable Events Let A be the elapsed time amplifier due to insufficient number of executable events. We have T Cp x A = T Cp + T Id So T Id = T Cp x (A-1) 16

  17. Optimized Degree of Parallelism " The number of events available for execution converges to f(d) x r due to the use of TW throttle. " To make full use of the p processors f(d) x r ≥ P So P r ≥ ( d ) f " To reduce rollback overhead, P r = f ( d ) 17

  18. Model Validation and Performance Analysis " Fujitsu AP3000 distributed - memory parallel computer is used. " The following parameters are timed. Mean time ( µ sec) Parameter 1200 T e T s 990 T b 2750 T t 1290 18

  19. $ application examples 8 x 8 MIN 4 x 4 TORUS diameter = 3 diameter = 4 19

  20. 20

  21. Optimal r Examples Empirical Predicted homogeneous MIN 0.75 0.75 homogeneous TORUS 0.7375 0.7273 heterogeneous pipeline 0.5 - heterogeneous TORUS 0.6 - 21

  22. Effectiveness of Parallelism Throttle Simulation Number of rollbacks duration (sec) Unthrottled TW Throttled TW 2000000 37068 12326 4000000 60261 22652 6000000 90053 32578 8000000 144937 42904 10000000 174341 55630 12000000 216712 65945 Table 2: Comparison of Measured Rollback Counts (Heterogeneous Linear Pipeline) Simulation Number of rollbacks duration (sec) Unthrottled TW Throttled TW 500000 144195 120162 1000000 313698 241302 1500000 508389 362421 2000000 728465 485641 2500000 964502 602814 3000000 1232452 724972 Table 3: Comparison of Measured Rollback Counts (Heterogeneous Torus Network) 22

  23. Elapsed Time Performance 23

  24. 24

  25. Sensitivity Analysis of Throttling - MIN(8x8) 25

  26. Conclusion " Throttling is an efficient method to reduce the elapsed time in certain TW-based parallel simulations. " The best elapsed time is neither obtained by a completely in-pace LVT advancement (no causality error), nor a completely uncontrolled TW. " A controlled degree of causality error is desirable in achieving optimal performance. 26

  27. References 1. Y.M. Teo and S.C. Tay, Scalable Optimistic Parallel Simulation , in Annual Review of Scalable Computing, Series in Scalable Computing - Volume 1, chapter 2, pp. 37-73, edited by C.K. Yuen and K. Hwang, ISBN 981-02-4119-4, World Scientific Publishing Co./Singapore University Press, October 2000. 2. S C Tay and Y M Teo, Probabilistic Checkpointing in Time Warp Parallel Simulation , Proceedings of the 8th International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS 2000), pp. 366-373, IEEE Computer Society Press, San Francisco CA, USA, August 29 - September 1, 2000. 3. Y M Teo and S C Tay, Performance Evaluation of a Parallel Simulation Environment , Proceedings of the 32nd Annual Simulation Symposium, pp. 86-93, IEEE Computer Society Press, San Diego, USA, April 1999. 4. S C Tay, Y M Teo and R Ayani, Performance Analysis of Time Warp Simulation with Cascading Rollbacks , Proceedings of 12 th ACM/IEEE/SCS Workshop on Parallel and Distributed Simulation, pp. 30-37, IEEE Computer Society Press, Canada, May 1998. 5. Y M Teo, S C Tay and S T Kong, Structured Parallel Simulation and Programming , Proceedings of the 31st Annual Simulation Symposium, pp. 135- 142, IEEE Computer Society Press, Boston, USA, April 1998. 27

Recommend


More recommend