Future Scaling of Processor- Memory Interfaces Jung Ho Ahn †§ , Norman P. Jouppi † , Christos Kozyrakis ‡ , Jacob Leverich ‡ , Robert S. Schreiber † † HP Labs, ‡ Stanford University, § Seoul National University
Executive summary performance reliability Challenges system-wide energy efficiency Holistic Main memory system assessments multithreaded/consolidated chipkill Multicore DIMM rank subsetting capacity vs. efficiency Solutions efficiency/latency/ throughput tradeoffs 2 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces
Issues on DRAM based main memory • � Chip Multiprocessors (CMPs) demand � � High capacity � � High bandwidth • � Global wires improve slowly � � Energy efficiency challenges = DRAM power matters! � � Performance/power variation by access patterns • � Hard/soft errors 3 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces
How DRAM works bank 7 bank 1 bank 0 Row decoder wl DRAM 16,384 r ows Memory array request Sense amplifier bit DRAM – 1T1C cell Column decoder data 8,192 columns 4 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces
Performance/power variations 5 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces
DIMM = Dual Inline Memory Module Overfetching problem � � DRAM row size = 8kb, 8 or 16 DRAMs per DIMM � � Cache line size = 512b � � Over 99% of bits are unused if row/col = 1 6 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces
Solution = Multicore DIMM MCDIMM features � � VMD = Virtual Memory Device : rank subsetting � � Demux register � � Over 99% of bits are unused if row/col = 1 7 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces
Demux register Demux Register (optional ) Counter Demultiplexer Register 8 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces
Alternative solution = mini-rank MCDIMM vs. mini-rank � � Register for data path vs. control path � � Timing constraint due to access interference � � Load balancing between rank subsets 9 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces
Governing equations Total main memory power = D · S · R · SP + E RW · BW RW + D · E AP · f AP � � D : # of DRAM chips per subset � � S : # of subsets per rank � � R : # of ranks per channel � � SP : static power of a DRAM chip � � E RW : energy needed to read/write a bit � � BW RW : read/write bandwidth per memory channel � � E AP : energy to activate/precharge a row � � f AP : frequency of activate/precharge per memory channel 10 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces
Governing equations f AP f AP BW RW BW RW = � · f AP = · f CM = · f CM f CM CL CL � � BW RW : read/write bandwidth per memory channel � � f AP : frequency of activate/precharge per memory channel � � f CM : frequency of cache miss � � CL : line size of last-level cache � � � : row/col (bank conflict ratio) 11 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces
� on multicore applications �������� � ��������� � ���������� � ���������� � � � ��� � ��� � ������������������ ��� � ��� � ��� � ��� � � � ��� � ��� � ��� � � � ������ � ������� � �������� � ��������� � ������� � � � �� � �������� � �������������� � 12 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces
MCDIMM reliability issues • � SECDED � � Single error correction, double error detection � � Typically, (64 + 8) ECC solution is enough. • � SCCDCD � � Single chip-error correction, double chip-error detection � � Chipkill � � Implementations • � Interleaving SECDED over multiple ranks • � Employing stronger error correcting code � � 2b + l additional bits to correct b bits of bursty error + to detect l bits of bursty error 13 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces
Multicore configuration DIMM MT Core MT Core L1$ L1$ MC MC Dir Dir L2$ L2$ MT Core MT Core L1$ L1$ MC MC Dir Dir L2$ L2$ 14 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces
Experimental setup • � System architecture � � 32nm, 2GHz in-order CMT, max IPC = 16, 64 threads � � 64B $ line, 4 1MB L2 $ � � hierarchical MESI, reverse directory • � Simulator/modeling � � Intel Pin based in-house simulator � � CACTI • � Applications � � SPLASH-2/PARSEC/SPEC2006 15 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces
Energy-delay product & system power with 1 rank per memory channel 16 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces
Energy-delay product & system power with 1 rank per memory channel SPLASH-2 � SPEC CPU 2006 � PARSEC � 17 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces
Energy-delay product & system power with 4 ranks per memory channel 18 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces
Energy-delay product & system power with 4 ranks per memory channel SPLASH-2 � SPEC CPU 2006 � PARSEC � 19 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces
Energy-delay product & system power with Chipkill enabled 20 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces
Energy-delay product & system power with Chipkill enabled SPLASH-2 � SPEC CPU 2006 � PARSEC � 21 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces
Conclusion Challenges on main memory systems � � Performance/capacity demands � � Energy-efficiency goals � � Reliability constraints Multicore DIMM � � Instantiation of rank subsetting � � Gain energy efficiency & concurrency � � Sacrifice serialization latency � � Advantage in EDP (energy-delay product) with proper subsetting � � Energy-efficient, capacity-inefficient reliability solution 22 Nov 19, 2009 SC09 – Future Scaling of Processor-Memory Interfaces
Recommend
More recommend