pharo vm performance
play

Pharo VM performance Clement Bera Myself Clment Bra 2011-2013: - PowerPoint PPT Presentation

Pharo VM performance Clement Bera Myself Clment Bra 2011-2013: Engineer on the Pharo VM 2013-2017: PhD student Optimisations of the Pharo VM JIT compiler Binary tree benchmark 16 14 12 10 8 6 4 2 0 Interpreter


  1. Pharo VM performance Clement Bera

  2. Myself • Clément Béra • 2011-2013: Engineer on the Pharo VM • 2013-2017: PhD student • Optimisations of the Pharo VM JIT compiler

  3. Binary tree benchmark 16 14 12 10 8 6 4 2 0 Interpreter Stack Cog V1 Cog V2 Spur Sista 2005 2009 2010 2011 2014 future

  4. Binary tree benchmark 16 Pharo 5 14 2016 12 10 8 6 4 2 0 Interpreter Stack Cog V1 Cog V2 Spur Sista 2005 2009 2010 2011 2014 future

  5. Plan • Pharo 5 (stable) • First time we out benched most competitors • Pharo 6 (released next week ???) • Pharo 7

  6. Code execution GC

  7. GC • Pharo 5 • New memory manager Spur • Pharo 6 • New compactor • Pharo 7 • Incremental GC ???

  8. Pharo 5: Spur • Efficient scavenges • In most applications, most GC time is now in scavenges Code execution GC

  9. Pharo 6: New compactor Loading a 200 Mb Moose Model in 250 Mb image February April 1 min Total time 2 min 2 sec Time in 1 min 2 sec Full GC Full GC 15 sec 0.5 sec avg pause Time in 15 sec 15 sec scavenge

  10. Pharo 6: New compactor Loading a 200 Mb Moose Model in 250 Mb image February April 1 min Total time 2 min 2 sec Time in 1 min 2 sec Full GC Full GC 15 sec 0.5 sec avg pause <- GC tuning gets Time in 15 sec 15 sec scavenge it down to 5 sec

  11. Pharo 7: Incremental GC ?? • Full GC pauses: ~500 ms at ~500Mb • Java default GC at 200ms soft real time • Solution • Incremental marking • Incremental compaction

  12. Code execution • Pharo 5: • Spur got 1.8x • Pharo 6: • Polishing and micro-optimisations • Pharo 7: • Sista gets 1.5x-5x

  13. Pharo 5: Spur 1.8x • Class table speeds-up look-up caches • New immediate objects • 22 bits hash

  14. Pharo 6 • Register allocation improvements • Two path compilation • Frameless code for setter-like methods

  15. Sista: Pharo 7 ? • Program introspection • Speculate on types based on previous runs • Optimize frequently used code • Deoptimize and reoptimize code incorrectly speculated

  16. Goals • Program readability • Performance

  17. Program readability array do: #yourself. array do: [ :elem | elem yourself ]. 1 to: array size do: [ :i | (array at: i) yourself ].

  18. Program readability 0 2 5 20 87M/ 28M/ 13M/ 3.7M array do: #yourself. sec sec sec /sec 15M/ 21M/ 10M/ 3.9M array do: [ :elem | elem yourself ]. sec sec sec /sec 94M/ 40M/ 22M/ 6.5M 1 to: array size do: [ :i | (array at: i) yourself ]. sec sec sec /sec

  19. Performance Kmeans TCAP Richards DeltaBlue BinaryTree JSJSON SpectralNorm Sista ThreadRing Pharo A* 0 1 2 3 4 5 6

  20. Getting stable • Support most development workflow • Support image recompilation • Integration has started

  21. In-image design Scorch CompiledCode to CompiledCode Smalltalk-specific optimisations Smalltalk image CompiledCode (persisted across start-ups) Virtual machine Cogit CompiledCode to native code Machine-specific optimisations Baseline JIT native functions Optimising JIT (discarded on shut-down)

  22. Missing • IDE support • Debugger • Methods to show • Stability, testing

  23. Are you interested ? • Incremental GC ? • VM performance ? • VM features ? • Come and talk to us !

  24. We are looking for… • Use-cases showing what to improve • Large real-world benchmarks • Contributors • Investment

  25. Conclusion • Pharo 5: Fastest VM • Pharo 6: Polishing • Pharo 7: Going further

Recommend


More recommend