kingsguard write rationing garbage collection for hybrid
play

Kingsguard: Write-Rationing Garbage Collection for Hybrid Memories - PowerPoint PPT Presentation

Kingsguard: Write-Rationing Garbage Collection for Hybrid Memories Shoaib Akram (Ghent) , Jennifer B. Sartor (Ghent), Kathryn S. Mckinley (Google), and Lieven Eeckhout (Ghent) Shoaib.Akram@UGent.be DRAM is facing challenges Scalability


  1. Kingsguard: Write-Rationing Garbage Collection for Hybrid Memories Shoaib Akram (Ghent) , Jennifer B. Sartor (Ghent), Kathryn S. Mckinley (Google), and Lieven Eeckhout (Ghent) Shoaib.Akram@UGent.be

  2. DRAM is facing challenges Scalability Reliability Energy 2

  3. Phase change memory is promising But … reset to amorphous GB/$ J temperature Latency L set to crystalline Endurance L read time 3 3

  4. Hybrid DRAM-PCM memory Energy Speed Capacity Endurance DRAM PCM Challenge Mitigate PCM wear-out and extend its lifetime 4

  5. How to mitigate PCM wear-out? Phase change memory as … Wear Level Wear Level Wear Level Operating System Language runtime 5

  6. PCM only with leveling is not practical 32 GB PCM memory, 32 cores Lifetime in years 2 1 0 6

  7. OS to limit PCM writes DRAM PCM Drawbacks Coarse grained Page migrations can be costly 7

  8. Managed runtime to limit PCM writes nursery mature observer DRAM mature PCM Our work uses garbage collection to keep highly written objects in DRAM 8

  9. Distribution of writes in GC runtime mature nursery GC 70% of writes 9

  10. Distribution of writes in GC runtime mature nursery GC 22% 70% to 2% of objects of writes 10

  11. Contribution Write-Rationing Garbage Collectors mature GC DRAM PCM 11

  12. Two write-rationing garbage collectors Kingsguard- Kingsguard- Nursery Writers 12

  13. Heap organization in DRAM GC nursery mature large DRAM 13

  14. KG-N Kingsguard-Nursery GC nursery mature large DRAM PCM 14

  15. KG-W Kingsguard-Writers nursery mature large observer DRAM mature large PCM 15

  16. Observing writes Object header references primitives format Write barrier sets a header bit on object writes Write barrier configurations Observe references Observe references and primitives 16

  17. Additional optimizations in KG-W Large object optimization Allocate selected large objects in DRAM Metadata optimization Allocate PCM metadata in DRAM 17

  18. Large object optimization nursery large ½ of remaining Monitor PCM write rate nursery to turn opt on/off 18

  19. Metadata optimization Mature Meta Full-heap GC: Mark live PCM objects KG-W: Keep mark bytes of PCM objects in DRAM 19

  20. Metadata optimization Mature Meta Full-heap GC: Mark live PCM objects KG-W: Keep mark bytes of PCM objects in DRAM address_mark_bit = start_meta + idx_pcm_obj 20

  21. Evaluation Methodology Hardware Software (1) Simulator Jikes research virtual machine (2) Real Java applications 21

  22. Simulation with Sniper 7 DaCapo applications 4 cores, 1 MB per core LLC Scale simulated rates to a 32 core machine using trends from real hw 22

  23. Memory systems Homogeneous PCM parameters 32 GB DRAM 4X read latency 32 GB PCM 4X write energy 10 M writes/cell Hybrid 1 GB DRAM 32 GB PCM 23

  24. PCM lifetimes PCM-Only KG-N KG-W 40 Lifetime in years 30 17 20 9 10 0 PCM alone is not practical PCM lasts more than 10 years with KG-W 24

  25. EDP reduction compared to DRAM PCM-Only KG-N KG-W % reduction in EDP 80 40 0 -40 4 cores Higher is better -80 EDP : Energy Delay Product KG-W has 35% better EDP than DRAM-Only 25

  26. Emulation on NUMA hardware DRAM: Socket 0 PCM: Socket 1 D D D D R R R R CPU CPU A A A A M M M M Modify JVM to divide heap in DRAM and PCM Use Intel perf monitor to measure writes 26

  27. PCM write rates on NUMA hardware PCM-Only KG-N KG-W Write rate in GB/s 1.5 1.0 0.5 130 MB/s 0.0 DaCapo Pjbb GraphChi Avg KG-N reduces write rate by 3.8X over PCM-Only KG-W reduces write rate by 1.9X over KG-N 27

  28. Crystal Gazer: Profile-Driven Write-Rationing Garbage Collection for Hybrid Memories 28

  29. Takeaways Promising to monitor heaps at a fine granularity Write-rationing GC makes PCM practical as main memory Similar conclusion with different evaluation methods 29

Recommend


More recommend