energy consumption of pcm main memory
play

Energy Consumption of PCM Main Memory Santiago Bock, Bruce - PowerPoint PPT Presentation

Analyzing the Impact of Useless Write-Backs on the Endurance and Energy Consumption of PCM Main Memory Santiago Bock, Bruce Childers, Rami Melhem, Daniel Moss and Youtao Zhang University of Pittsburgh Introduction Datacenters are growing


  1. Analyzing the Impact of Useless Write-Backs on the Endurance and Energy Consumption of PCM Main Memory Santiago Bock, Bruce Childers, Rami Melhem, Daniel Mossé and Youtao Zhang University of Pittsburgh

  2. Introduction Datacenters are growing in size and number  • Energy consumption will cost $7.4 billion in 2011 Memory consumes 20% to 40% of energy in a typical server  • Larger memories due to multi-core • Smaller transistor sizes leak more current PCM for main memory  Low static power due to non-volatility Read performance comparable to DRAM Better scalability than DRAM High energy cost of writes Limited write endurance Santiago Bock

  3. Motivation A write-back is useless when its data is not used again  • Avoiding useless write-backs requires future knowledge Idea: use application information  • Memory allocator • Control flow analysis • Stack pointer  Focus of this work • How many useless write-backs can be avoided? • What’s the impact on endurance and energy consumption? Santiago Bock

  4. Outline Introduction  Motivation  What is Phase Change Memory?  What are useless write-backs?  How do we count useless write-backs?  How much can we gain?  Conclusions  Santiago Bock

  5. Background on PCM Main Memory PCM writes  • Modify physical state • Slow • High energy cost Limited to 10 6 to 10 8 • Main memory architecture  • L2 cache • Small DRAM cache (optional) • Large PCM main memory Santiago Bock

  6. Useless Write-Backs Cache Status Action Comment A Write A A becomes dirty A is used A Read A A is used again A Read A A is dead A is evicted and written back B Read B The write-back of A is useless because A is dead Original value of A is overwritten Write A A Santiago Bock

  7. Useless Write-Backs Detecting useless write-backs  • Difficult to identify last read before a write • Use program information to detect dead memory locations Detecting dead memory locations depends on the type of  memory region • Heap: use calls to malloc() and free() • Global: use control flow analysis • Stack: use the stack pointer Santiago Bock

  8. Analysis Framework Configuration Configuration Program Useless Endurance write-backs Trace gains Instrumentation Analyzer Model Energy savings Trace : address and type of each memory reference  Analyzer : cache simulator and list of dead memory locations  Santiago Bock

  9. Analysis for Heap Data Trace: Cache: List of allocated blocks: List of dead blocks: malloc(1) returns 3 3,1 write to 3 a 3,1 free(3) 3 becomes dead! a 3,1 write-back of a is useless! read from 7 b 3,1 malloc returns 3 b 3,1 Santiago Bock

  10. Analysis for Global Data Trace: Cache: Objects (id, last access, last write-back): 1 a 5,1,0 write 5 3 a 5,3,0 read 5 7 b 5,3,7 read 9 3 < 7: useless write-back! 9 write 5 a 5,9,7 Santiago Bock

  11. Analysis for Stack Data Trace: Cache: Min Stack Pointer: read 3, stack 100 100 Stack: 100: write 90, stack 80 a 80 96: 92: stack frame becomes dead 88: 84: read 5, stack 100 a 80 80: write-back of a is useless read 2, stack 100 b 80 Santiago Bock

  12. Methodology SPEC CPU2006 benchmark suite  • 26 benchmarks • 52 combinations of benchmark/input Pin collects traces  • 100 billion instructions L2 Cache  • 1MB • 8-way, LRU DRAM Cache  • No cache, 8MB, 16MB, 32MB and 64MB • 16-way, LRU Cache line size  • 8B (limit study), 32B, 64B and 128B Santiago Bock

  13. 1E+00 1E+00 1E+02 1E+02 1E+04 1E+04 1E+06 1E+06 1E+08 1E+08 1E+10 1E+10 Santiago Bock  Experimental Results Categorization of benchmarks based on memory region sjeng lbm • • libquantum lbm bwaves mcf Global intensive: more than 4MB global size Heap intensive: more than 1 million object allocations bzip2-1 bzip2-1 bzip2-2 bzip2-2 bzip2-3 bzip2-3 Number of Object Allocations in the Heap bzip2-4 bzip2-4 bzip2-5 bzip2-5 bzip2-6 bzip2-6 libquantum leslie3d Size of Global Region in Bytes cactusADM zeusmp bwaves mcf GemsFDTD namd perlbench-1 gamess-1 perlbench-2 gamess-2 perlbench-3 gamess-3 sphinx3 milc soplex-2 milc gromacs gromacs h264ref-2 hmmer-1 h264ref-1 hmmer-2 gobmk-3 namd h264ref-3 soplex-1 gobmk-4 soplex-2 gobmk-5 astar-1 cactusADM astar-2 gobmk-1 omnetpp gobmk-2 tonto soplex-1 calculix leslie3d povray h264ref-1 hmmer-2 h264ref-2 astar-2 GemsFDTD h264ref-3 hmmer-1 gcc-1 gcc-2 gcc-1 gcc-3 gcc-8 gcc-4 gcc-3 povray gcc-5 gcc-6 gcc-6 gcc-7 gcc-9 gcc-7 gcc-8 astar-1 gcc-9 sjeng gcc-4 gobmk-1 calculix gobmk-2 gcc-5 gobmk-3 gcc-2 gobmk-4 sphinx3 perlbench-3 gobmk-5 perlbench-1 gamess-1 omnetpp gamess-2 perlbench-2 gamess-3 zeusmp tonto

  14. Heap (8-byte cache line) Fraction of useless write-backs Energy savings 70% 70% DRAM DRAM 60% 60% PCM PCM 50% 50% 40% 40% 30% 30% 20% 20% 10% 10% 0% 0% Santiago Bock

  15. Heap (Average Endurance Gains) 30% 8 Byte Cache Line 25% 32 Byte Cache Line 64 Byte Cache Line 20% 128 Byte Cache Line 15% 10% 5% 0% No DRAM 8MB 16MB 32MB 64MB DRAM Cache Size Santiago Bock

  16. Heap (Average Energy Savings) 20% 8 Byte Cache Line 32 Byte Cache Line 15% 64 Byte Cache Line 128 Byte Cache Line 10% 5% 0% DRAM PCM PCM PCM PCM PCM Total Total Total Total 0MB 8MB 16MB 32MB 64MB 8MB 16MB 32MB 64MB Type of Saving and DRAM Cache Size Santiago Bock

  17. Global (8-byte cache line) Fraction of useless write-backs Energy savings 40% 40% DRAM DRAM 35% 35% PCM PCM 30% 30% 25% 25% 20% 20% 15% 15% 10% 10% 5% 5% 0% 0% Santiago Bock

  18. Global (Average Energy Savings) 20% 8 Byte Cache Line 32 Byte Cache Line 15% 64 Byte Cache Line 128 Byte Cache Line 10% 5% 0% DRAM PCM PCM PCM PCM PCM 0MB 8MB 16MB 32MB 64MB Type of savings and DRAM cache size Santiago Bock

  19. Global (Average Energy Savings) 20% 8 Byte Cache Line 32 Byte Cache Line 15% 64 Byte Cache Line 128 Byte Cache Line 10% 5% 0% DRAM PCM PCM PCM PCM PCM Total Total Total Total (0MB) (8MB) (16MB) (32MB) (64MB) (8MB) (16MB) (32MB) (64MB) Type of savings and DRAM cache size Santiago Bock

  20. Stack Very few useless write-backs  • Fraction of useless write-backs between 0% and 2.3% • Average endurance gains and energy savings between 0% and 0.1% Programs use a small part of the stack  • 10KB to 20KB • Kept mostly in the cache • Few opportunities to evict dead data from the cache Santiago Bock

  21. Conclusions We showed that a considerable amount of write-backs are  useless We showed there is potential  • Up to 20% energy savings • Up to 26% endurance gains Next step: develop techniques to avoid useless write-backs  • Low energy cost • Low performance impact Santiago Bock

  22. Thank you! Questions? sab104@cs.pitt.edu http://www.cs.pitt.edu/~sab104 Santiago Bock

Recommend


More recommend