Boosting the Priority of Garbage: Scheduling Collection on Heterogeneous Multicore Processors Shoaib Akram , Jennifer B. Sartor, Kenzo Van Craeynest, Wim Heirman, Lieven Eeckhout Ghent University, Belgium Shoaib.Akram@UGent.be
Popularity of Managed Languages The 2015 Top Ten Programming Languages, spectrum.ieee.org. 2
The Garbage Collection Advantage Memory automatically reclaimed for reuse Takes extra CPU cycles to provide the service Concurrent collectors suited to multicores 3
Heterogeneous Multicores Performance big Out-of-Order LITTLE 600 Series Exynox 8890 4x ARM Cortex A72 4x ARM Cortex A53 In-Order 4x ARM Cortex A53 4x Exynos M1 Power 4
Managed Language Applications on Heterogeneous Multicores Performance Application à big Out-of-Order LITTLE In-Order Power Garbage Collector à big or LITTLE? 5
GC on big versus LITTLE Applica;on and collector running concurrently big Applica'on Allocates objects on heap big Collector LITTLE Iden;fies live objects on heap and then reclaims memory taken up by remaining objects Run Collector on big versus LITTLE and measure the difference in execution time 6
GC on big versus LITTLE 20 % increase in execution time 16 12 8 4 0 7
GC on big versus LITTLE 20 % increase in execution time 16 12 8 4 0 8
GC on big versus LITTLE 20 GC-Critical % increase in execution time 16 12 8 GC-Uncritical 4 0 Some applications exhibit GC-Criticality GC on LITTLE detrimental for GC-Critical 9
GC on big versus LITTLE What happens if GC runs on LITTLE for GC-Cri;cal apps? Paused !!! Applica'on Allocates objects on heap Serial collec;on Collector Iden;fies live objects on heap and then reclaims memory taken up by remaining objects Application is paused if no free memory on heap because collector still running 10
Giving GC Fair Share of Big Core • gc-fair – Equally share the big core among all threads – Based on Van Craeynest et al [PACT 2013] • Baseline is gc-on-LITTLE – Pin the GC threads on LITTLE cores • Observe the % reduc;on in execu;on ;me 11
Giving GC Fair Share of Big Core 25 3 LITTLE 20 % execution time 2 LITTLE 15 1 LITTLE reduction 10 GC-Uncritical 5 0 -5 -10 -15 12
Giving GC Fair Share of Big Core 25 3 LITTLE 20 % execution time 2 LITTLE 15 1 LITTLE reduction 10 GC-Uncritical 5 0 -5 -10 -15 13
Giving GC Fair Share of Big Core 25 3 LITTLE 20 % execution time 2 LITTLE 15 1 LITTLE reduction 10 GC-Uncritical 5 0 -5 -10 -15 gc-on-LITTLE for GC-Uncritical 14
Giving GC Fair Share of Big Core 25 3 LITTLE 20 % execution time 2 LITTLE 15 1 LITTLE reduction 10 GC-Uncritical 5 0 -5 GC-Critical -10 -15 gc-on-LITTLE for GC-Uncritical 15
Giving GC Fair Share of Big Core 25 3 LITTLE 20 % execution time 2 LITTLE 15 1 LITTLE reduction 10 GC-Uncritical 5 0 -5 GC-Critical -10 -15 gc-on-LITTLE for GC-Uncritical 16
Giving GC Fair Share of Big Core 25 3 LITTLE 20 % execution time 2 LITTLE 15 1 LITTLE reduction 10 GC-Uncritical 5 0 -5 GC-Critical -10 -15 gc-on-LITTLE for GC-Uncritical gc-fair for GC-Critical 17
Giving GC Fair Share of Big Core 25 3 LITTLE 20 % execution time 2 LITTLE 15 1 LITTLE reduction 10 GC-Uncritical 5 0 -5 GC-Critical -10 -15 GC-Criticality depends on architecture, application, and runtime environment 18
Our Contribution 25 3 LITTLE 20 % execution time 2 LITTLE GC-Criticality-Aware 15 1 LITTLE reduction 10 GC-Uncritical 5 Scheduler 0 -5 GC-Critical -10 Dynamically adjusts # big core cycles -15 given to the concurrent collector GC-Criticality depends on architecture, application, and runtime environment 19
GC-Criticality-Aware Scheduler Runtime Activity à How Scheduler Reacts? App alone 'me app gc Schd. gc-on-LITTLE 20
GC-Criticality-Aware Scheduler gc-on-LITTLE to gc-fair App alone 'me app gc Schd. gc-on-LITTLE 21
GC-Criticality-Aware Scheduler gc-on-LITTLE to gc-fair JVM signals the scheduler App alone Stop Concurrent Scan 'me app gc Schd. gc-on-LITTLE gc-fair Stop pause to do book-keeping ignored Scan stop pause: JVM signals scheduler gc-fair gives equal priority to GC and app 22
GC-Criticality-Aware Scheduler Boost States Stop scan pauses observed even with gc-fair Scheduler How many quanta scheduled on the BIG core? gc-on-LITTLE First GC thread = 0, Second GC thread = 0 gc-fair First GC thread = 1, Second GC thread = 1 Boost the priority of garbage Give GC more consecu;ve quanta on big Scheduler State How many quanta scheduled on the BIG core? gc-boost P0 First GC thread = 1, Second GC thread = 1 gc-boost P1 First GC thread = 1, Second GC thread = 2 … Degrade boost state when no longer cri;cal 23
GC-Criticality-Aware Scheduler gc-boost:P0 to gc-on-LITTLE JVM signals the scheduler App alone Stop Concurrent App alone 'me app gc Schd. gc-boost:P0 gc-on-LITTLE If no scan pause in state P0, go to gc-on-LITTLE Can configure # zero stop scan intervals before returning to gc-on-LITTLE 24
GC-Criticality-Aware Scheduler Summary • JVM detects GC-Criticality during runtime • Communicates criticality information down to the scheduler • Scheduler dynamically adapts big core cycles given to GC 25
Experimental Setup • Java Virtual Machine – Jikes Research Virtual Machine (Version 3.1.2) – Full-heap concurrent collector with two threads – Tackle non-determinism by warming up the JVM – Heap size 2x of minimum • Benchmarks – Ten benchmarks from DaCapo – Vary the # threads – 1 to 4 • Heterogeneous Multicore Setup – Sniper multicore simulator (Version 4.0) – Different four core heterogeneous architectures – Varying # of big and LITTLE cores 26
Performance of GC-Criticality-Aware Scheduler 3 big plus one LITTLE core 25 gc-fair % execution time 20 15 GC-Uncritical reduction 10 5 0 -5 GC-Critical -10 -15 -20 27
Performance of GC-Criticality-Aware Scheduler 3 big plus one LITTLE core 25 gc-fair % execution time 20 gc-boost 15 GC-Uncritical reduction 10 5 0 -5 GC-Critical -10 -15 -20 gc-boost performance neutral for GC-Uncritical 28
Performance of GC-Criticality-Aware Scheduler 3 big plus one LITTLE core 25 gc-fair % execution time 20 gc-boost 15 GC-Uncritical reduction 10 5 0 -5 GC-Critical -10 -15 -20 gc-boost performance neutral for GC-Uncritical Improves perf. of GC-Critical by 14% on avg. 29
Understanding the Performance Advantage of Big Core Cycles per instruction 1.2 L3 Miss 1 L2 Miss 0.8 0.6 L1-D Miss 0.4 L1-I 0.2 Base 0 Application Collector 30
Understanding the Performance Advantage of Big Core LITTLE Cycles per instruction 1.2 L3 Miss 1 L2 Miss 0.8 0.6 L1-D Miss 0.4 L1-I 0.2 Base 0 Application Collector Collector performs a heap traversal chasing pointers 31
Understanding the Performance Advantage of Big Core LITTLE Cycles per instruction 1.2 big L3 Miss 1 L2 Miss 0.8 0.6 L1-D Miss 0.4 L1-I 0.2 Base 0 Application Collector Collector performs a heap traversal chasing pointers Instruction-level parallelism J Memory-level parallelism L 32
Performance of GC-Criticality-Aware Scheduler Lowering frequency of LITTLE core 25 Similar freq. % execution time 20 15 reduction 10 5 0 -5 GC-Uncritical GC-Critical -10 -15 -20 33
Performance of GC-Criticality-Aware Scheduler Lowering frequency of LITTLE core 25 Similar freq. % execution time 20 1 GHz slower 15 reduction 10 5 0 -5 GC-Uncritical GC-Critical -10 -15 -20 Lowering frequency increases GC-Criticality 34
Performance of GC-Criticality-Aware Scheduler Lowering frequency of LITTLE core 25 Similar freq. % execution time 20 1 GHz slower 15 reduction 10 5 0 -5 GC-Uncritical GC-Critical -10 -15 -20 Lowering frequency increases GC-Criticality Improves perf. of GC-Critical by 20% on avg. 35
Performance of GC-Criticality-Aware Scheduler Different # LITTLE cores 15 GC-UnCri;cal % execuBon Bme GC-Cri;cal 10 reducBon 5 0 1L 2L 3L -5 Allocation rate lowers with more LITTLE cores gc-boost is beneficial for different # LITTLE 36
Energy Efficiency of GC-Criticality-Aware Scheduler 3 big plus one LITTLE core energy-delay product 25 % reduction in 20 15 GC-Uncritical 10 5 0 GC-Critical -5 -10 Negligible change in EDP for GC-Uncritical 20% avg. reduction in EDP for GC-Critical 37
More in the Paper • Sensitivity studies – Varying number of total cores – Scheduling quantum and # zero scan intervals – Heap size • GC-Criticality using OpenJDK’s collector 38
Conclusions • Concurrent garbage collection benefits from out-of-order execution • Java applications that allocate rapidly exhibit GC-Criticality • GC-Criticality-Aware scheduler adjusts big core cycles given to GC on a heterogeneous multicore – Uses information provided by the JVM – Improves both performance and energy efficiency 39
Boosting the Priority of Garbage: Scheduling Collection on Heterogeneous Multicore Processors Thank You ! Shoaib.Akram@UGent.be http://users.elis.ugent.be/~sakram
GC Criticality with OpenJDK’s CMS 8 % increase in execution time 6 4 2 0 41
Recommend
More recommend