The Big Picture Computer Keyboard, Mouse Processor Devices Memory (active) (passive) Input Cache Memory (where Disk, Control programs, (“brain”) Network Output data live Datapath when (“brawn”) Display , running) CSE 675.02 Printer Slides from Dan Garcia, UCB The Levels in Memory Hierarchy Memory Hierarchy (1/3) • Higher the level, smaller and faster the memory. • Processor • Try to keep most of the action in the higher levels. • executes instructions on order of nanoseconds to picoseconds • holds a small amount of code and data in registers • Memory • More capacity than registers, still limited • Access time ~50-100 ns • Disk • HUGE capacity (virtually limitless) • VERY slow: runs ~milliseconds
Review: Why We Use Caches Memory Hierarchy (2/3) Processor µProc 1000 CPU 60%/yr. Increasing Performance “Moore’s Law” Higher Distance Processor-Memory from Proc., 100 Levels in Level 1 Performance Gap: Decreasing memory Level 2 (grows 50% / year) speed 10 hierarchy DRAM Level 3 7%/yr. DRAM . . . Lower 1 1989 1980 1981 1982 1983 1984 1985 1986 1987 1988 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 Level n • 1989 first Intel CPU with cache on chip Size of memory at each level • 1998 Pentium III has two levels of cache on chip As we move to deeper levels the latency goes up and price per bit goes down. Q: Can $/bit go up as move deeper? Memory Caching Memory Hierarchy (3/3) • If level closer to Processor, it must be: • We’ve discussed three levels in the hierarchy: processor, memory, disk • smaller • Mismatch between processor and • faster memory speeds leads us to add a new • subset of lower levels (contains most level: a memory cache recently used data) • Implemented with SRAM technology: • Lowest Level (usually disk) contains faster but more expensive than DRAM all available data memory. • Other levels? • “S” = Static, no need to refresh, ~10ns • “D” = Dynamic, need to refresh, ~60ns • arstechnica.com/paedia/r/ram_guide/ram_guide.part1-1.html
Memory Hierarchy Analogy: Library (1/2) Memory Hierarchy Analogy: Library (2/2) • You’re writing a term paper • Open books on table are cache (Processor) at a table in SEL • smaller capacity: can have very few open books fit on table; again, when table fills up, • SEL Library is equivalent to disk you must close a book • essentially limitless capacity • much, much faster to retrieve data • very slow to retrieve a book • Illusion created: whole library open on • Table is memory the tabletop • smaller capacity: means you must return • Keep as many recently used books open on book when table fills up table as possible since likely to use again • easier and faster to find a book there • Also keep as many books on table as once you’ve already retrieved it possible, since faster than going to library Memory Hierarchy Basis Cache Design • Disk contains everything. • How do we organize cache? • When Processor needs something, • Where does each memory address bring it into to all higher levels of map to? memory. (Remember that cache is subset of • Cache contains copies of data in memory, so multiple memory addresses memory that are being used. map to the same cache location.) • Memory contains copies of data on • How do we know which elements are disk that are being used. in cache? • Entire idea is based on Temporal • How do we quickly locate them? Locality: if we use it now, we’ll want to use it again soon (a Big Idea)
Direct-Mapped Cache (1/2) Direct-Mapped Cache (2/2) Cache 4 Byte Direct Memory • In a direct-mapped cache, each Index Mapped Cache Address Memory memory address is associated with 0 0 one possible block within the cache 1 1 2 2 • Therefore, we only need to look in a 3 3 single location in the cache for the data if 4 it exists in the cache 5 6 • Block is the unit of transfer between 7 cache and memory 8 • Cache Location 0 can be 9 occupied by data from: A B • Memory location 0, 4, 8, ... C D • 4 blocks => any memory E location that is multiple of 4 F Issues with Direct-Mapped Direct-Mapped Cache Terminology Tag Index Offset • All fields are read as unsigned integers. • Since multiple memory addresses map to same cache index, how do we tell • Index: specifies the cache index (which which one is in there? “row” of the cache we should look in) • What if we have a block size > 1 byte? • Offset: once we’ve found correct block, specifies which byte within the block • Answer: divide memory address into we want -- I.e., which “column” three fields HEIGHT WIDTH • Tag: the remaining bits after offset and ttttttttttttttttt iiiiiiiiii oooo index are determined; these are used to distinguish between all the memory tag index byte addresses that map to the same to check to offset location if have select within correct block block block
TIO Dan’s great cache mnemonic Direct-Mapped Cache Example (1/3) 2 (H+W) = 2 H * 2 W • Suppose we have a 16KB of data in a AREA (cache size, B) direct-mapped cache with 4 word blocks = HEIGHT (# of blocks) * WIDTH (size of one block, B/block) • Determine the size of the tag, index and offset fields if we’re using a 32-bit WIDTH architecture (size of one block, B/block) Tag Index Offset • Offset • need to specify correct byte within a block HEIGHT AREA • block contains 4 words (# of blocks) (cache size, B) = 16 bytes = 2 4 bytes • need 4 bits to specify correct byte Direct-Mapped Cache Example (2/3) Direct-Mapped Cache Example (3/3) • Tag: use remaining bits as tag • Index: (~index into an “array of blocks”) • tag length = addr length - offset - index • need to specify correct row in cache = 32 - 4 - 10 bits • cache contains 16 KB = 2 14 bytes = 18 bits • block contains 2 4 bytes (4 words) • so tag is leftmost 18 bits of memory address • # blocks/cache • Why not full 32 bit address as tag? = bytes/cache • All bytes within block need same address (4b) bytes/block 2 14 bytes/cache = • Index must be same for every address within 2 4 bytes/block a block, so its redundant in tag check, thus 2 10 blocks/cache = can leave off to save memory (10 bits in this example) • need 10 bits to specify this many rows
Caching Terminology Accessing data in a direct mapped cache Ex.: 16KB of data, Memory When we try to read memory, • • direct-mapped, Address (hex)Value of Word 3 things can happen: 4 word blocks ... ... 1. cache hit: a 00000010 Read 4 addresses • cache block is valid and contains 00000014 b 1. 0x00000014 c 00000018 proper address, so read desired word 0000001C d 2. 0x0000001C 3. 0x00000034 2. cache miss: ... ... 4. 0x00008014 nothing in cache in appropriate block, 00000030 e f 00000034 so fetch from memory Memory values • 00000038 g on right: h 0000003C 3. cache miss, block replacement: ... ... • only cache/ wrong data is in cache at appropriate i 00008010 memory level of block, so discard it and fetch desired 00008014 j hierarchy k 00008018 data from memory (cache always copy) 0000801C l ... ... Accessing data in a direct mapped cache 16 KB Direct Mapped Cache, 16B blocks • Valid bit: determines whether anything • 4 Addresses: is stored in that row (when computer • 0x00000014, 0x0000001C, initially turned on, all entries invalid) 0x00000034, 0x00008014 Valid 0x4-7 0x8-b 0xc-f 0x0-3 Tag Index • 4 Addresses divided (for convenience) 0 0 into Tag, Index, Byte Offset fields 1 0 2 0 3 0 000000000000000000 0000000001 0100 4 0 5 0 000000000000000000 0000000001 1100 6 0 7 0 000000000000000000 0000000011 0100 ... ... 000000000000000010 0000000001 0100 1022 0 Tag Index Offset 1023 0
1. Read 0x00000014 So we read block 1 (0000000001) • 000000000000000000 0000000001 0100 • 000000000000000000 0000000001 0100 Tag field Index field Offset Tag field Index field Offset Valid Valid 0x4-7 0x8-b 0xc-f 0x4-7 0x8-b 0xc-f 0x0-3 0x0-3 Tag Tag Index Index 0 0 0 0 1 1 0 0 2 2 0 0 3 3 0 0 4 4 0 0 5 5 0 0 6 6 0 0 7 7 0 0 ... ... ... ... 1022 1022 0 0 1023 1023 0 0 No valid data So load that data into cache, setting tag, valid • 000000000000000000 0000000001 0100 • 000000000000000000 0000000001 0100 Tag field Index field Offset Tag field Index field Offset Valid Valid 0x4-7 0x8-b 0xc-f 0x4-7 0x8-b 0xc-f 0x0-3 0x0-3 Index Tag Index Tag 0 0 0 0 1 1 0 1 0 a b c d 2 2 0 0 3 3 0 0 4 4 0 0 5 5 0 0 6 6 0 0 7 7 0 0 ... ... ... ... 1022 1022 0 0 1023 1023 0 0
Recommend
More recommend