previous lecture slides for lecture 8
play

Previous Lecture Slides for Lecture 8 ENCM 501: Principles of - PDF document

slide 2/29 ENCM 501 W14 Slides for Lecture 8 Previous Lecture Slides for Lecture 8 ENCM 501: Principles of Computer Architecture Winter 2014 Term conditional branches in various ISAs Steve Norman, PhD, PEng introduction to memory


  1. slide 2/29 ENCM 501 W14 Slides for Lecture 8 Previous Lecture Slides for Lecture 8 ENCM 501: Principles of Computer Architecture Winter 2014 Term ◮ conditional branches in various ISAs Steve Norman, PhD, PEng ◮ introduction to memory systems Electrical & Computer Engineering ◮ review of SRAM and DRAM Schulich School of Engineering University of Calgary 4 February, 2014 ENCM 501 W14 Slides for Lecture 8 slide 3/29 ENCM 501 W14 Slides for Lecture 8 slide 4/29 Today’s Lecture The “1T” DRAM (Dynamic RAM) cell BITLINE WORDLINE Q ◮ more about DRAM The bit is stored as a voltage on a capacitor. A relatively high ◮ introduction to caches voltage at Q is a 1, and a relatively low voltage at Q is a 0. Related reading in Hennessy & Patterson: Sections B.1–B.2 When the stored bit is a 1, charge is slowly leaking from node Q to ground. In a DRAM array, each row of cells must periodically be read and written back to strengthen the voltages in cells with stored 1’s—this is called refresh . DRAM gets the name dynamic from the continuing activity needed to keep the stored data valid. slide 5/29 slide 6/29 ENCM 501 W14 Slides for Lecture 8 ENCM 501 W14 Slides for Lecture 8 Writing to a DRAM cell Reading from a DRAM cell BITLINE WORDLINE Q BITLINE WORDLINE Q Pre-charge BITLINE and some nearby electrically similar reference wire to the same voltage, somewhere between Set BITLINE to the appropriate voltage for a 1 or a 0. logic 0 and logic 1. Turn on WORDLINE . Turn on WORDLINE . The cell will create a voltage difference between BITLINE and the reference wire, such that Q will take on the appropriate voltage. the difference can be reliably measured by a sense amplifier . Reading a DRAM cell destroys the data in the cell. After a read, the data must be written back .

  2. slide 7/29 slide 8/29 ENCM 501 W14 Slides for Lecture 8 ENCM 501 W14 Slides for Lecture 8 A 4 × 4 DRAM array BL 3 BL 3 BL 2 BL 2 BL 1 BL 1 BL 0 BL 0 WL 3 DRAM DRAM DRAM DRAM A circuit schematic is shown on the next slide. CELL CELL CELL CELL ADDRESS DECODER WL 2 There is no good commercial reason to build such a tiny A 1 DRAM array, but nevertheless the schematic can be used to DRAM DRAM DRAM DRAM CELL CELL CELL CELL partially explain how DRAM works. A 0 WL 1 In a read operation, half of the bitlines get used to capture bit DRAM DRAM DRAM DRAM CELL CELL CELL CELL values from DRAM cells, and the other half are used as WL 0 reference wires. This technique is called folded bitlines. The DRAM DRAM DRAM DRAM schematic does not show the physical layout of folded bitlines. CELL CELL CELL CELL The block labeled [THIS IS COMPLICATED!] has a lot to do! [THIS IS COMPLICATED!] In there we need bitline drivers, sense amplifiers, refresh logic, CTRL and more . . . D 3 D 2 D 1 D 0 ENCM 501 W14 Slides for Lecture 8 slide 9/29 ENCM 501 W14 Slides for Lecture 8 slide 10/29 DRAM arrays have long latencies compared to A 4 GB DRAM SO-DIMM SRAM arrays. Why? 1. DRAM arrays typically have much larger capacities than SRAM arrays, so the ratio of cell dimensions to bitline length is much worse for DRAM arrays. 2. A passive capacitor (DRAM) is less effective at changing bitline voltages than is an active pair of inverters (SRAM). 3. Today, SRAM circuits are usually on the same chip as processor cores, while DRAMs are off-chip, connected to processor chips by wires that may be as long as tens of (Image source: Wikipedia—see millimeters. http://en.wikipedia.org/wiki/File: 4. DRAM circuits have to dedicate some time to refresh , 4GB_DDR3_SO-DIMM.jpg for details.) but SRAM circuits don’t. slide 11/29 slide 12/29 ENCM 501 W14 Slides for Lecture 8 ENCM 501 W14 Slides for Lecture 8 A 4 GB DRAM SO-DIMM, continued Why is DRAM bandwidth good when latency is so bad? A partial answer . . . SO-DIMM : small outline dual inline memory module. The module in the image appears to have eight 4 Gb DRAM The internal arrangement of a typical 4 Gb DRAM chip might chips (but might have sixteen 2 Gb DRAM chips, with eight on be four DRAM arrays—called banks —of 1 Gb each. each side of the module). The dimensions of a bank would then be 2 15 rows × 2 15 columns. 64 of the 204 connectors are for data. The rest are for address bit, control, power and ground, etc. So access to a single row accesses 32 Kb of data! It pays for The module shown can receive or send data at a rate of up to the DRAM controller to do writes and reads of chunks of data 10667 MB/s—64-bit transfers at a rate of 1333 million much larger than 4 or 8 bytes. transfers per second. But because the data bus width of a DIMM is only 8 bytes, How long would it take two such modules—working in these big transfers have to be serialized into multi-transfer parallel—to transfer 64 bytes to a DRAM controller? bursts .

  3. slide 13/29 slide 14/29 ENCM 501 W14 Slides for Lecture 8 ENCM 501 W14 Slides for Lecture 8 Quantifying Cache Performance No VM, only one level of cache . . . L1 I- DRAM CONTROLLER Rather than starting with a multi-core system with multiple CACHE levels of caches and complex interactions between caches and TLBs, let’s start with a simple system: DRAM ◮ one core CORE MODULES ◮ no virtual memory ◮ only one level of caches L1 D- ◮ constant processor clock frequency CACHE This is shown on the next slide . . . We’ll measure time in processor clock cycles . ENCM 501 W14 Slides for Lecture 8 slide 15/29 ENCM 501 W14 Slides for Lecture 8 slide 16/29 Hits and misses Miss rates, miss penalties, memory access time (1) Here is the definition of miss rate : The purpose of a cache memory is to provide a fast mirror of number of misses a small portion of the contents of a much larger memory one miss rate = level farther away from the processor core. (In our simple total number of cache accesses system, the next level is just DRAM.) Miss rate is program-dependent and also depends on the A cache hit occurs when a memory access can be handled by a design of a cache. What kinds of programs have low miss cache without any delay waiting for help from the next level rates? What aspects of cache design lead to low miss rates? out. The miss penalty is defined as the average number of clock A cache miss , then, is a memory access that is not a hit. cycles a processor must stall in response to a miss. Even for L1 I-caches and D-caches are generally designed to keep the the simple system we’re considering, it’s an average , not a core running at full speed, in the ideal, happy, but unlikely constant property of the cache and DRAM hardware. Let’s circumstance that all memory accesses hit in these caches. write down some reasons why the length of stall might vary from one miss to the next. slide 17/29 slide 18/29 ENCM 501 W14 Slides for Lecture 8 ENCM 501 W14 Slides for Lecture 8 Miss rates, miss penalties, memory access time (2) We’ll start with a very specific cache design Hit time can be defined as the length of time needed to Most textbooks on computer architecture discuss cache design complete a memory access in the case of a cache hit. It is options in one of two ways: likely to be 1 processor clock cycle in the case of an L1 cache. ◮ lengthy exploration of most of the available options, We can now define average memory access time (AMAT) as followed by some specific examples of cache designs—this is what is done in Section B.1 of our textbook; hit time + miss rate × miss penalty ◮ presentation of structures that are too simple to work very well, followed by presentation of more complex Suppose hit time is 1 cycle and miss penalty is 100 cycles. structures that perform better. What is AMAT if the miss rate is 0? 1%? 5%? 50%? Instead of either of those approaches, let’s start with a We’ll return to this kind of analysis to quantify the overall structure that would be fairly effective for an L1 cache in impact of miss rates and miss penalties on program running 2014, then consider the costs and benefits of changing that times, but first we’ll look qualitatively at design options for structure. caches.

Recommend


More recommend