Analyzing the Impact of Useless Write-Backs on the Endurance and Energy Consumption of PCM Main Memory Santiago Bock, Bruce Childers, Rami Melhem, Daniel Mossé and Youtao Zhang University of Pittsburgh
Introduction Datacenters are growing in size and number • Energy consumption will cost $7.4 billion in 2011 Memory consumes 20% to 40% of energy in a typical server • Larger memories due to multi-core • Smaller transistor sizes leak more current PCM for main memory Low static power due to non-volatility Read performance comparable to DRAM Better scalability than DRAM High energy cost of writes Limited write endurance Santiago Bock
Motivation A write-back is useless when its data is not used again • Avoiding useless write-backs requires future knowledge Idea: use application information • Memory allocator • Control flow analysis • Stack pointer Focus of this work • How many useless write-backs can be avoided? • What’s the impact on endurance and energy consumption? Santiago Bock
Outline Introduction Motivation What is Phase Change Memory? What are useless write-backs? How do we count useless write-backs? How much can we gain? Conclusions Santiago Bock
Background on PCM Main Memory PCM writes • Modify physical state • Slow • High energy cost Limited to 10 6 to 10 8 • Main memory architecture • L2 cache • Small DRAM cache (optional) • Large PCM main memory Santiago Bock
Useless Write-Backs Cache Status Action Comment A Write A A becomes dirty A is used A Read A A is used again A Read A A is dead A is evicted and written back B Read B The write-back of A is useless because A is dead Original value of A is overwritten Write A A Santiago Bock
Useless Write-Backs Detecting useless write-backs • Difficult to identify last read before a write • Use program information to detect dead memory locations Detecting dead memory locations depends on the type of memory region • Heap: use calls to malloc() and free() • Global: use control flow analysis • Stack: use the stack pointer Santiago Bock
Analysis Framework Configuration Configuration Program Useless Endurance write-backs Trace gains Instrumentation Analyzer Model Energy savings Trace : address and type of each memory reference Analyzer : cache simulator and list of dead memory locations Santiago Bock
Analysis for Heap Data Trace: Cache: List of allocated blocks: List of dead blocks: malloc(1) returns 3 3,1 write to 3 a 3,1 free(3) 3 becomes dead! a 3,1 write-back of a is useless! read from 7 b 3,1 malloc returns 3 b 3,1 Santiago Bock
Analysis for Global Data Trace: Cache: Objects (id, last access, last write-back): 1 a 5,1,0 write 5 3 a 5,3,0 read 5 7 b 5,3,7 read 9 3 < 7: useless write-back! 9 write 5 a 5,9,7 Santiago Bock
Analysis for Stack Data Trace: Cache: Min Stack Pointer: read 3, stack 100 100 Stack: 100: write 90, stack 80 a 80 96: 92: stack frame becomes dead 88: 84: read 5, stack 100 a 80 80: write-back of a is useless read 2, stack 100 b 80 Santiago Bock
Methodology SPEC CPU2006 benchmark suite • 26 benchmarks • 52 combinations of benchmark/input Pin collects traces • 100 billion instructions L2 Cache • 1MB • 8-way, LRU DRAM Cache • No cache, 8MB, 16MB, 32MB and 64MB • 16-way, LRU Cache line size • 8B (limit study), 32B, 64B and 128B Santiago Bock
1E+00 1E+00 1E+02 1E+02 1E+04 1E+04 1E+06 1E+06 1E+08 1E+08 1E+10 1E+10 Santiago Bock Experimental Results Categorization of benchmarks based on memory region sjeng lbm • • libquantum lbm bwaves mcf Global intensive: more than 4MB global size Heap intensive: more than 1 million object allocations bzip2-1 bzip2-1 bzip2-2 bzip2-2 bzip2-3 bzip2-3 Number of Object Allocations in the Heap bzip2-4 bzip2-4 bzip2-5 bzip2-5 bzip2-6 bzip2-6 libquantum leslie3d Size of Global Region in Bytes cactusADM zeusmp bwaves mcf GemsFDTD namd perlbench-1 gamess-1 perlbench-2 gamess-2 perlbench-3 gamess-3 sphinx3 milc soplex-2 milc gromacs gromacs h264ref-2 hmmer-1 h264ref-1 hmmer-2 gobmk-3 namd h264ref-3 soplex-1 gobmk-4 soplex-2 gobmk-5 astar-1 cactusADM astar-2 gobmk-1 omnetpp gobmk-2 tonto soplex-1 calculix leslie3d povray h264ref-1 hmmer-2 h264ref-2 astar-2 GemsFDTD h264ref-3 hmmer-1 gcc-1 gcc-2 gcc-1 gcc-3 gcc-8 gcc-4 gcc-3 povray gcc-5 gcc-6 gcc-6 gcc-7 gcc-9 gcc-7 gcc-8 astar-1 gcc-9 sjeng gcc-4 gobmk-1 calculix gobmk-2 gcc-5 gobmk-3 gcc-2 gobmk-4 sphinx3 perlbench-3 gobmk-5 perlbench-1 gamess-1 omnetpp gamess-2 perlbench-2 gamess-3 zeusmp tonto
Heap (8-byte cache line) Fraction of useless write-backs Energy savings 70% 70% DRAM DRAM 60% 60% PCM PCM 50% 50% 40% 40% 30% 30% 20% 20% 10% 10% 0% 0% Santiago Bock
Heap (Average Endurance Gains) 30% 8 Byte Cache Line 25% 32 Byte Cache Line 64 Byte Cache Line 20% 128 Byte Cache Line 15% 10% 5% 0% No DRAM 8MB 16MB 32MB 64MB DRAM Cache Size Santiago Bock
Heap (Average Energy Savings) 20% 8 Byte Cache Line 32 Byte Cache Line 15% 64 Byte Cache Line 128 Byte Cache Line 10% 5% 0% DRAM PCM PCM PCM PCM PCM Total Total Total Total 0MB 8MB 16MB 32MB 64MB 8MB 16MB 32MB 64MB Type of Saving and DRAM Cache Size Santiago Bock
Global (8-byte cache line) Fraction of useless write-backs Energy savings 40% 40% DRAM DRAM 35% 35% PCM PCM 30% 30% 25% 25% 20% 20% 15% 15% 10% 10% 5% 5% 0% 0% Santiago Bock
Global (Average Energy Savings) 20% 8 Byte Cache Line 32 Byte Cache Line 15% 64 Byte Cache Line 128 Byte Cache Line 10% 5% 0% DRAM PCM PCM PCM PCM PCM 0MB 8MB 16MB 32MB 64MB Type of savings and DRAM cache size Santiago Bock
Global (Average Energy Savings) 20% 8 Byte Cache Line 32 Byte Cache Line 15% 64 Byte Cache Line 128 Byte Cache Line 10% 5% 0% DRAM PCM PCM PCM PCM PCM Total Total Total Total (0MB) (8MB) (16MB) (32MB) (64MB) (8MB) (16MB) (32MB) (64MB) Type of savings and DRAM cache size Santiago Bock
Stack Very few useless write-backs • Fraction of useless write-backs between 0% and 2.3% • Average endurance gains and energy savings between 0% and 0.1% Programs use a small part of the stack • 10KB to 20KB • Kept mostly in the cache • Few opportunities to evict dead data from the cache Santiago Bock
Conclusions We showed that a considerable amount of write-backs are useless We showed there is potential • Up to 20% energy savings • Up to 26% endurance gains Next step: develop techniques to avoid useless write-backs • Low energy cost • Low performance impact Santiago Bock
Thank you! Questions? sab104@cs.pitt.edu http://www.cs.pitt.edu/~sab104 Santiago Bock
Recommend
More recommend