modeling critical sections in amdahl s law and its
play

Modeling Critical Sections in Amdahls Law and its Implications for - PowerPoint PPT Presentation

Modeling Critical Sections in Amdahls Law and its Implications for Multicore Design Stijn Eyerman and Lieven Eeckhout Ghent University, Belgium ISCA, Saint-Malo, France June 23, 2010 Amdahls Law Speedup by parallelizing fraction f


  1. Modeling Critical Sections in Amdahl’s Law and its Implications for Multicore Design Stijn Eyerman and Lieven Eeckhout Ghent University, Belgium ISCA, Saint-Malo, France June 23, 2010

  2. Amdahl’s Law Speedup by parallelizing fraction f across n processors: 1 S = (1 − f ) + f n Parallel performance is bounded by sequential part: 1 lim n →∞ S = 1 − f S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 2

  3. Amdahl’s software model parallel fraction: sequential fraction: f par = 1 − f seq f seq Can we model critical sections in Amdahl’s Law? S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 3

  4. Extending Amdahl’s software model parallel part inside critical sections sequential part parallel part outside critical sections f seq + f par , cs + f par , ncs = 1 P ctn = probability for two critical sections to contend S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 4

  5. Extending Amdahl’s software model Assumptions Each thread is executed equal share of the critical sections Critical sections are entered at random times Critical sections contend randomly S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 5

  6. Compute parallel speedup in the presence of critical sections? Case #1: Low contention: all threads execute equally long total exec time ≅ avg per-thread exec time Case #2: High contention total exec time ≅ avg exec time slowest thread S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 6

  7. Case #1 f par , cs Each thread executes a fraction of critical sections n = f par , cs If no contention: exec time n = ( j + 1) f par , cs If contention with j threads: exec time n Avg time spent in critical section: n − 1 Pr[ contend with j threads ] ⋅ ( j + 1) f par , cs ∑ = n j = 0 S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 7

  8. n − 1 f par , cs ∑ ( ) Pr[ contend with j threads ] ⋅ j + 1 n j = 0 n − 1 ∑ Pr[ i of n − 1 other threads in critical sections ] ⋅ = i = 0 i f par , cs ∑ ( ) Pr[ j of i critical sections contend ] ⋅ j + 1 n j = 0 n − 1   i   n − 1 i f par , cs i − j ⋅ i 1 − P j 1 − P n − 1 − i ∑ ∑ ( ) ( ) ( ) P P j + 1 = ⋅     cs cs ctn ctn i j n     i = 0 j = 0 f par , cs with P cs = f par , cs + f par , ncs S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 8

  9. Avg time spent in critical section = n − 1   i   n − 1 i f par , cs i − j ⋅ i 1 − P j 1 − P n − 1 − i ∑ ∑ ( ) ( ) ( ) P P j + 1 ⋅     cs cs ctn ctn i j n     i = 0 j = 0   ctn + 1 − P cs P ctn = f par , cs ⋅ P cs P   n   sequential part parallel part S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 9

  10. Back to Amdahl’s Law 1 S = ( ) + f par , ncs ctn + f par , cs ⋅ 1 − P cs P ctn f seq + f par , cs ⋅ P cs P n Impact of critical sections can be modeled as a sequential plus a parallel part S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 10

  11. Case #2 Exec time determined by chain of contending critical sections Approx total exec time as the avg exec time of slowest thread S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 11

  12. Avg exec time of slowest thread Length of chain of contending critical sections = f par , cs P ctn Minimum execution time = f seq + f par , cs P ctn Maximum execution time ( ) + f par , ncs ctn + f par , cs 1 − P ctn = f seq + f par , cs P n Average execution time ( ) + f par , ncs ctn + f par , cs 1 − P ctn = f seq + f par , cs P 2 ⋅ n S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 12

  13. Putting it together & validation Q: Total exec time for parallel workload? A: Max (case #1, case #2) Avg error of 3% compared to synthetic simulation 1.2 normalized exec time 1 case #1 formula 1 case #2 formula 2 0.8 synthetic simulation synthetic simulation 0.6 0.4 0.2 f par , cs = 0.5, f par , ncs = 0.5, P ctn = 0.5 0 0 2 4 6 8 10 number of threads S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 13

  14. 1 Theoretical result: lim n →∞ S = f seq + f par , cs ⋅ P ctn Parallel performance is fundamentally limited by critical sections 10000 8000 f seq = 0 S 6000 4000 2000 0 0.01 0.03 0.1 0.09 0.05 0.08 0.07 0.06 0.07 0.05 0.04 0.03 f par , cs 0.09 0.02 0.01 P ctn S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 14

  15. What are the implications for multicore design? S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 15

  16. Amdahl’s Law suggests wimpy small cores in asymmetric multicore 1 S = 1 − f f + p n + p linear speedup w/ increasing sublinear speedup in single- no. small cores thread performance (Pollack’s law) [M. Hill and M. Marty, IEEE Computer, 2008] S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 16

  17. Critical sections have big impact on asymmetric multicore performance 1 lim n →∞ S = f seq p + f par , cs ⋅ P ctn sequential part due to sequential part is critical sections is executed on big core executed on small cores S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 17

  18. Implication: small cores in asymmetric multicore should not be wimpy but middle-of-the-road Intuition: small cores should be sufficiently large to execute critical sections quickly 256 BCEs (base core equivalents) – Hill & Marty S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 18

  19. Asymmetric vs symmetric multicores S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 19

  20. Accelerating Critical Sections (ACS) by Suleman et al. [ASPLOS’09] • Execute critical sections on big core • Naive ACS – Accelerate all critical sections • Perfect ACS – Accelerate contending critical sections only • Selective ACS – Predict whether critical sections will contend – mitigate false serialization S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 20

  21. Evaluating ACS S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 21

  22. Conclusions • Model impact of critical sections in Amdahl’s Law • Theoretical result – Parallel performance is fundamentally limited by critical sections • Implications for multicore design – Small cores in asymmetric multicore should not be wimpy but middle-of-the-road – Symmetric multicores may yield better performance than asymmetric multicores (w/ wimpy small cores) – Accelerating critical sections is a promising idea • ACS, DVFS, SMT, scalable cores • Longue Vie à la Microarchitecture! S. Eyerman & L. Eeckhout -- ISCA 2010 -- June 23, 2010 22

  23. Modeling Critical Sections in Amdahl’s Law and its Implications for Multicore Design Stijn Eyerman and Lieven Eeckhout Ghent University, Belgium Thank you !

Recommend


More recommend