boosting the priority of garbage scheduling collection on
play

Boosting the Priority of Garbage: Scheduling Collection on - PowerPoint PPT Presentation

Boosting the Priority of Garbage: Scheduling Collection on Heterogeneous Multicore Processors Shoaib Akram , Jennifer B. Sartor, Kenzo Van Craeynest, Wim Heirman, Lieven Eeckhout Ghent University, Belgium Shoaib.Akram@UGent.be Popularity of


  1. Boosting the Priority of Garbage: Scheduling Collection on Heterogeneous Multicore Processors Shoaib Akram , Jennifer B. Sartor, Kenzo Van Craeynest, Wim Heirman, Lieven Eeckhout Ghent University, Belgium Shoaib.Akram@UGent.be

  2. Popularity of Managed Languages The 2015 Top Ten Programming Languages, spectrum.ieee.org. 2

  3. The Garbage Collection Advantage Memory automatically reclaimed for reuse Takes extra CPU cycles to provide the service Concurrent collectors suited to multicores 3

  4. Heterogeneous Multicores Performance big Out-of-Order LITTLE 600 Series Exynox 8890 4x ARM Cortex A72 4x ARM Cortex A53 In-Order 4x ARM Cortex A53 4x Exynos M1 Power 4

  5. Managed Language Applications on Heterogeneous Multicores Performance Application à big Out-of-Order LITTLE In-Order Power Garbage Collector à big or LITTLE? 5

  6. GC on big versus LITTLE Applica;on and collector running concurrently big Applica'on Allocates objects on heap big Collector LITTLE Iden;fies live objects on heap and then reclaims memory taken up by remaining objects Run Collector on big versus LITTLE and measure the difference in execution time 6

  7. GC on big versus LITTLE 20 % increase in execution time 16 12 8 4 0 7

  8. GC on big versus LITTLE 20 % increase in execution time 16 12 8 4 0 8

  9. GC on big versus LITTLE 20 GC-Critical % increase in execution time 16 12 8 GC-Uncritical 4 0 Some applications exhibit GC-Criticality GC on LITTLE detrimental for GC-Critical 9

  10. GC on big versus LITTLE What happens if GC runs on LITTLE for GC-Cri;cal apps? Paused !!! Applica'on Allocates objects on heap Serial collec;on Collector Iden;fies live objects on heap and then reclaims memory taken up by remaining objects Application is paused if no free memory on heap because collector still running 10

  11. Giving GC Fair Share of Big Core • gc-fair – Equally share the big core among all threads – Based on Van Craeynest et al [PACT 2013] • Baseline is gc-on-LITTLE – Pin the GC threads on LITTLE cores • Observe the % reduc;on in execu;on ;me 11

  12. Giving GC Fair Share of Big Core 25 3 LITTLE 20 % execution time 2 LITTLE 15 1 LITTLE reduction 10 GC-Uncritical 5 0 -5 -10 -15 12

  13. Giving GC Fair Share of Big Core 25 3 LITTLE 20 % execution time 2 LITTLE 15 1 LITTLE reduction 10 GC-Uncritical 5 0 -5 -10 -15 13

  14. Giving GC Fair Share of Big Core 25 3 LITTLE 20 % execution time 2 LITTLE 15 1 LITTLE reduction 10 GC-Uncritical 5 0 -5 -10 -15 gc-on-LITTLE for GC-Uncritical 14

  15. Giving GC Fair Share of Big Core 25 3 LITTLE 20 % execution time 2 LITTLE 15 1 LITTLE reduction 10 GC-Uncritical 5 0 -5 GC-Critical -10 -15 gc-on-LITTLE for GC-Uncritical 15

  16. Giving GC Fair Share of Big Core 25 3 LITTLE 20 % execution time 2 LITTLE 15 1 LITTLE reduction 10 GC-Uncritical 5 0 -5 GC-Critical -10 -15 gc-on-LITTLE for GC-Uncritical 16

  17. Giving GC Fair Share of Big Core 25 3 LITTLE 20 % execution time 2 LITTLE 15 1 LITTLE reduction 10 GC-Uncritical 5 0 -5 GC-Critical -10 -15 gc-on-LITTLE for GC-Uncritical gc-fair for GC-Critical 17

  18. Giving GC Fair Share of Big Core 25 3 LITTLE 20 % execution time 2 LITTLE 15 1 LITTLE reduction 10 GC-Uncritical 5 0 -5 GC-Critical -10 -15 GC-Criticality depends on architecture, application, and runtime environment 18

  19. Our Contribution 25 3 LITTLE 20 % execution time 2 LITTLE GC-Criticality-Aware 15 1 LITTLE reduction 10 GC-Uncritical 5 Scheduler 0 -5 GC-Critical -10 Dynamically adjusts # big core cycles -15 given to the concurrent collector GC-Criticality depends on architecture, application, and runtime environment 19

  20. GC-Criticality-Aware Scheduler Runtime Activity à How Scheduler Reacts? App alone 'me app gc Schd. gc-on-LITTLE 20

  21. GC-Criticality-Aware Scheduler gc-on-LITTLE to gc-fair App alone 'me app gc Schd. gc-on-LITTLE 21

  22. GC-Criticality-Aware Scheduler gc-on-LITTLE to gc-fair JVM signals the scheduler App alone Stop Concurrent Scan 'me app gc Schd. gc-on-LITTLE gc-fair Stop pause to do book-keeping ignored Scan stop pause: JVM signals scheduler gc-fair gives equal priority to GC and app 22

  23. GC-Criticality-Aware Scheduler Boost States Stop scan pauses observed even with gc-fair Scheduler How many quanta scheduled on the BIG core? gc-on-LITTLE First GC thread = 0, Second GC thread = 0 gc-fair First GC thread = 1, Second GC thread = 1 Boost the priority of garbage Give GC more consecu;ve quanta on big Scheduler State How many quanta scheduled on the BIG core? gc-boost P0 First GC thread = 1, Second GC thread = 1 gc-boost P1 First GC thread = 1, Second GC thread = 2 … Degrade boost state when no longer cri;cal 23

  24. GC-Criticality-Aware Scheduler gc-boost:P0 to gc-on-LITTLE JVM signals the scheduler App alone Stop Concurrent App alone 'me app gc Schd. gc-boost:P0 gc-on-LITTLE If no scan pause in state P0, go to gc-on-LITTLE Can configure # zero stop scan intervals before returning to gc-on-LITTLE 24

  25. GC-Criticality-Aware Scheduler Summary • JVM detects GC-Criticality during runtime • Communicates criticality information down to the scheduler • Scheduler dynamically adapts big core cycles given to GC 25

  26. Experimental Setup • Java Virtual Machine – Jikes Research Virtual Machine (Version 3.1.2) – Full-heap concurrent collector with two threads – Tackle non-determinism by warming up the JVM – Heap size 2x of minimum • Benchmarks – Ten benchmarks from DaCapo – Vary the # threads – 1 to 4 • Heterogeneous Multicore Setup – Sniper multicore simulator (Version 4.0) – Different four core heterogeneous architectures – Varying # of big and LITTLE cores 26

  27. Performance of GC-Criticality-Aware Scheduler 3 big plus one LITTLE core 25 gc-fair % execution time 20 15 GC-Uncritical reduction 10 5 0 -5 GC-Critical -10 -15 -20 27

  28. Performance of GC-Criticality-Aware Scheduler 3 big plus one LITTLE core 25 gc-fair % execution time 20 gc-boost 15 GC-Uncritical reduction 10 5 0 -5 GC-Critical -10 -15 -20 gc-boost performance neutral for GC-Uncritical 28

  29. Performance of GC-Criticality-Aware Scheduler 3 big plus one LITTLE core 25 gc-fair % execution time 20 gc-boost 15 GC-Uncritical reduction 10 5 0 -5 GC-Critical -10 -15 -20 gc-boost performance neutral for GC-Uncritical Improves perf. of GC-Critical by 14% on avg. 29

  30. Understanding the Performance Advantage of Big Core Cycles per instruction 1.2 L3 Miss 1 L2 Miss 0.8 0.6 L1-D Miss 0.4 L1-I 0.2 Base 0 Application Collector 30

  31. Understanding the Performance Advantage of Big Core LITTLE Cycles per instruction 1.2 L3 Miss 1 L2 Miss 0.8 0.6 L1-D Miss 0.4 L1-I 0.2 Base 0 Application Collector Collector performs a heap traversal chasing pointers 31

  32. Understanding the Performance Advantage of Big Core LITTLE Cycles per instruction 1.2 big L3 Miss 1 L2 Miss 0.8 0.6 L1-D Miss 0.4 L1-I 0.2 Base 0 Application Collector Collector performs a heap traversal chasing pointers Instruction-level parallelism J Memory-level parallelism L 32

  33. Performance of GC-Criticality-Aware Scheduler Lowering frequency of LITTLE core 25 Similar freq. % execution time 20 15 reduction 10 5 0 -5 GC-Uncritical GC-Critical -10 -15 -20 33

  34. Performance of GC-Criticality-Aware Scheduler Lowering frequency of LITTLE core 25 Similar freq. % execution time 20 1 GHz slower 15 reduction 10 5 0 -5 GC-Uncritical GC-Critical -10 -15 -20 Lowering frequency increases GC-Criticality 34

  35. Performance of GC-Criticality-Aware Scheduler Lowering frequency of LITTLE core 25 Similar freq. % execution time 20 1 GHz slower 15 reduction 10 5 0 -5 GC-Uncritical GC-Critical -10 -15 -20 Lowering frequency increases GC-Criticality Improves perf. of GC-Critical by 20% on avg. 35

  36. Performance of GC-Criticality-Aware Scheduler Different # LITTLE cores 15 GC-UnCri;cal % execuBon Bme GC-Cri;cal 10 reducBon 5 0 1L 2L 3L -5 Allocation rate lowers with more LITTLE cores gc-boost is beneficial for different # LITTLE 36

  37. Energy Efficiency of GC-Criticality-Aware Scheduler 3 big plus one LITTLE core energy-delay product 25 % reduction in 20 15 GC-Uncritical 10 5 0 GC-Critical -5 -10 Negligible change in EDP for GC-Uncritical 20% avg. reduction in EDP for GC-Critical 37

  38. More in the Paper • Sensitivity studies – Varying number of total cores – Scheduling quantum and # zero scan intervals – Heap size • GC-Criticality using OpenJDK’s collector 38

  39. Conclusions • Concurrent garbage collection benefits from out-of-order execution • Java applications that allocate rapidly exhibit GC-Criticality • GC-Criticality-Aware scheduler adjusts big core cycles given to GC on a heterogeneous multicore – Uses information provided by the JVM – Improves both performance and energy efficiency 39

  40. Boosting the Priority of Garbage: Scheduling Collection on Heterogeneous Multicore Processors Thank You ! Shoaib.Akram@UGent.be http://users.elis.ugent.be/~sakram

  41. GC Criticality with OpenJDK’s CMS 8 % increase in execution time 6 4 2 0 41

Recommend


More recommend