the scalable commutativity rule designing scalable
play

The Scalable Commutativity Rule: Designing Scalable Software for - PowerPoint PPT Presentation

The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors Austin T. Clements Thesis advisors: M. Frans Kaashoek Nickolai Zeldovich Robert Morris Eddie Kohler x86 CPU trends x86 CPU trends 2005 x86 CPU trends


  1. The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors Austin T. Clements Thesis advisors: M. Frans Kaashoek Nickolai Zeldovich Robert Morris Eddie Kohler

  2. x86 CPU trends

  3. x86 CPU trends 2005

  4. x86 CPU trends 100,000 Clock speed (MHz) 10,000 1,000 100 10 1 1985 1990 1995 2000 2005 2010 2015 Sources: Stanford CPUDB, Intel ARK

  5. x86 CPU trends 100,000 Clock speed (MHz) Power (watts) 10,000 1,000 100 10 1 1985 1990 1995 2000 2005 2010 2015 Sources: Stanford CPUDB, Intel ARK

  6. x86 CPU trends 100,000 Clock speed (MHz) Power (watts) 10,000 1,000 100 10 1 1985 1990 1995 2000 2005 2010 2015 Sources: Stanford CPUDB, Intel ARK

  7. x86 CPU trends 100,000 Clock speed (MHz) Power (watts) 10,000 Cores per socket 1,000 100 10 1 1985 1990 1995 2000 2005 2010 2015 Sources: Stanford CPUDB, Intel ARK

  8. x86 CPU trends 100,000 Clock speed (MHz) Power (watts) 10,000 Cores per socket Total megacycles/sec 1,000 100 10 1 1985 1990 1995 2000 2005 2010 2015 Sources: Stanford CPUDB, Intel ARK

  9. Parallelize or perish Software must be increasingly parallel to keep up with hardware, but scaling with parallelism is notoriously hard

  10. Parallelize or perish Software must be increasingly parallel to keep up with hardware, but scaling with parallelism is notoriously hard Exim mail server 10k 8k Messages/second 6k 4k 2k 0 1 6 12 18 24 30 36 42 48 Cores

  11. Parallelize or perish Software must be increasingly parallel to keep up with hardware, but scaling with parallelism is notoriously hard Exim mail server 10k 8k Messages/second 6k 4k 2k 0 1 6 12 18 24 30 36 42 48 Cores Problem lies in the OS kernel

  12. OS kernel scalability Kernel scalability is important • Many applications depend on the OS kernel • If the kernel doesn't scale, many applications won't scale And hard • |kernel threads| > ∑ |application threads| • Diverse and unknown workloads

  13. Current approach to scalable software development 2008 Corey 2009 OSDI '08 2010 Linux scalability OSDI '10 2011 Bonsai VM 2012 ASPLOS '12 2013 RadixVM EuroSys '13 2014

  14. Current approach to scalable software development 2008 Corey 2009 OSDI '08 2010 Linux scalability OSDI '10 Workload 2011 Bonsai VM 2012 ASPLOS '12 2013 RadixVM EuroSys '13 2014

  15. Current approach to scalable software development 2008 Corey 2009 OSDI '08 2010 Linux scalability OSDI '10 Plot Workload 2011 scalability Bonsai VM 2012 ASPLOS '12 2013 RadixVM EuroSys '13 2014

  16. Current approach to scalable software development 2008 Corey 2009 OSDI '08 Di ff erential x() pro fi le 2010 Linux scalability OSDI '10 Plot Workload 2011 scalability Bonsai VM 2012 ASPLOS '12 2013 RadixVM EuroSys '13 2014

  17. Current approach to scalable software development 2008 Corey 2009 OSDI '08 Di ff erential x() pro fi le 2010 Linux scalability OSDI '10 Plot Workload 2011 scalability Bonsai VM 2012 ASPLOS '12 Fix top +++ bottleneck 2013 RadixVM EuroSys '13 2014

  18. Current approach to scalable software development 2008 Corey 2009 OSDI '08 Di ff erential x() pro fi le 2010 Linux scalability OSDI '10 Plot Workload 2011 scalability Bonsai VM 2012 ASPLOS '12 Fix top +++ bottleneck 2013 RadixVM EuroSys '13 2014

  19. Current approach to scalable software development Successful in practice because it focuses developer e ff ort Disadvantages • Requires huge amounts of e ff ort • New workloads expose new bottlenecks • More cores expose new bottlenecks • The real bottlenecks may be in the interface design

  20. Current approach to scalable software development Successful in practice because it focuses developer e ff ort Disadvantages • Requires huge amounts of e ff ort • New workloads expose new bottlenecks • More cores expose new bottlenecks • The real bottlenecks may be in the interface design

  21. Interface scalability example creat("x") creat("y") creat("z")

  22. Interface scalability example creat("x") creat("y") creat("z") stdin stdout stderr

  23. Interface scalability example creat("x") creat("y") creat("z") stdin stdout stderr Solution: Change the interface?

  24. Interface scalability example creat("x") creat("y") creat("z") stdin stdout stderr Solution: Change the interface?

  25. Approach: Interface-driven scalability The scalable commutativity rule Whenever interface operations commute, they can be implemented in a way that scales.

  26. Approach: Interface-driven scalability The scalable commutativity rule Whenever interface operations commute, they can be implemented in a way that scales. Scalable implementation Commutes exists ? creat with lowest FD

  27. Approach: Interface-driven scalability The scalable commutativity rule Whenever interface operations commute, they can be implemented in a way that scales. Scalable implementation Commutes exists ? creat with lowest FD creat → 3 creat → 4

  28. Approach: Interface-driven scalability The scalable commutativity rule Whenever interface operations commute, they can be implemented in a way that scales. Scalable implementation Commutes exists ✗ creat with lowest FD

  29. Approach: Interface-driven scalability The scalable commutativity rule Whenever interface operations commute, they can be implemented in a way that scales. Scalable implementation Commutes exists ✗ creat with lowest FD ? creat with any FD creat → 42 creat → 17

  30. Approach: Interface-driven scalability The scalable commutativity rule Whenever interface operations commute, they can be implemented in a way that scales. Scalable implementation Commutes exists ✗ creat with lowest FD rule creat with any FD ✓ ✓

  31. Advantages of interface-driven scalability The rule enables reasoning about scalability throughout the software design process Design Guides design of scalable interfaces Implement Sets a clear implementation target Test Systematic, workload-independent scalability testing

  32. Contributions The scalable commutativity rule • Formalization of the rule and proof of its correctness • State-dependent, interface-based commutativity Commuter: An automated scalability testing tool sv6: A scalable POSIX-like kernel

  33. Outline De fi ning the rule • De fi nition of scalability • Intuition • Formalization Applying the rule • Commuter • Evaluation

  34. A scalability bottleneck 40 gmake Exim 35 Normalized throughput 30 25 20 15 10 5 0 1 6 12 18 24 30 36 42 48 Cores

  35. A scalability bottleneck 40 gmake Exim 35 Normalized throughput 30 25 20 15 10 5 0 1 6 12 18 24 30 36 42 48 Cores One contended cache line A single contended cache line can wreck scalability

  36. Cost of a contended cache line 3.5k 3k 2.5k Cycles to read 2k 1.5k 1k 500 0 1 10 20 30 40 50 60 70 80 1 writer + N readers

  37. Cost of a contended cache line 3.5k 3k 2.5k Cycles to read 2k 1.5k open 1k 500 0 1 10 20 30 40 50 60 70 80 1 writer + N readers

  38. What scales on today's multicores? Core X W R - W ✗ ✗ ✓ Core Y R ✗ ✓ ✓ - - ✓ ✓

  39. What scales on today's multicores? Core X W R - W ✗ ✗ ✓ Core Y R ✗ ✓ ✓ ✓ - - ✓ ✓

  40. What scales on today's multicores? Core X W R - W ✗ ✗ ✓ Core Y R ✗ ✗ ✓ ✓ - - ✓ ✓

  41. What scales on today's multicores? Core X W R - W ✗ ✗ ✓ Core Y R ✗ ✓ ✓ - - ✓ ✓ We say two or more operations are scalable if they are con fl ict-free . Good approximation of current hardware.

  42. The intuition behind the rule Whenever interface operations commute, they can be implemented in a way that scales. Operations commute results independent of order ⇒ communication is unnecessary ⇒ without communication, no con fl icts ⇒

  43. Example: Reference counter T1 iszero() → F T2 iszero() → F T3 dec() → 2 dec() → 1 T4 T5 dec() → 0

  44. Example: Reference counter T1 iszero() → F T2 iszero() → F T3 dec() → 2 dec() → 1 T4 T5 dec() → 0 R1

  45. Example: Reference counter T1 iszero() → F T2 iszero() → F T3 dec() → 2 dec() → 1 T4 T5 dec() → 0 R1 ✓ R1 commutes; con fl ict-free implementation: shared counter

  46. Example: Reference counter T1 iszero() → F T2 iszero() → F T3 dec() → 2 dec() → 1 T4 T5 dec() → 0 R1 R2 ✓ R1 commutes; con fl ict-free implementation: shared counter

  47. Example: Reference counter T1 iszero() → F T2 iszero() → F T3 dec() → 2 dec() → 1 T4 T5 dec() → 0 R1 R2 ✓ R1 commutes; con fl ict-free implementation: shared counter ✗ R2 does not commute because dec() returns counter value

  48. Example: Reference counter T1 iszero() → F T2 iszero() → F T3 dec() → 2 ok dec() → 1 ok T4 T5 dec() → 0 ok R1 R2' ✓ R1 commutes; con fl ict-free implementation: shared counter ✗ R2 does not commute because dec() returns counter value

Recommend


More recommend