ECE232: Hardware Organization and Design Lecture 22: Introduction to - PowerPoint PPT Presentation

ECE232: Hardware Organization and Design Lecture 22: Introduction to Caches Adapted from Computer Organization and Design , Patterson & Hennessy, UCB

Overview  Caches hold a subset of data from the main memory  Three types of caches • Direct mapped • Set associative • Fully associative  Today: Direct mapped • Each memory value can only be in one place in the cache • Is it there (Hit?) • Or is it not there (Miss?) ECE232: Introduction to Caches 2

Direct Mapped Cache - Textbook Location determined by address  Direct mapped: only one choice  (Block address) modulo (#Blocks in cache) • #Blocks is a  power of 2 Use low-order  address bits ECE232: Introduction to Caches 3

Direct mapped cache (assume 1 byte/Block) 4-Block Direct Cache Block 0 can be  Memory Mapped Cache occupied by data from Memory blocks • 0000 2 0 0 0, 4, 8, 12 1 1 2 2 Cache Block 1 can be 3 3  0100 2 4 occupied by data from 5 Memory blocks • Cache 6 1, 5, 9, 13 Index 7 1000 2 8 Cache Block 2 can be  9 occupied by data from 10 Memory blocks • 11 2, 6, 10, 14 1100 2 12 13 14 Cache Block 3 can be  15 occupied by data from Memory blocks • 3, 7, 11, 15 Block Index ECE232: Introduction to Caches 4

Direct Mapped Cache – Index and Tag Memory 1 byte 00 00 2 0 0 1 1 2 2 3 3 01 00 2 4 5 Cache 6 Index Memory block 7 10 00 2 address 8 9 10 index tag 11 11 00 2 12 13 index determines block in cache  14 index = (address) mod (# blocks)  15 The number of cache blocks is power  of 2  cache index is the lower n bits Block of memory address Index ECE232: Introduction to Caches 5

Direct Mapped w/Tag Memory tag 0 0 1 1 00 10 2 11 2 3 3 4 5 01 10 6 Cache Memory block 7 Index 8 address 9 10 10 10 index tag 11 12 tag determines which memory  13 block occupies cache block 11 10 14 15 hit: cache tag field = tag bits of  address Block miss: tag field  tag bits of  Index address ECE232: Introduction to Caches 6

Direct Mapped Cache Simplest mapping is a direct mapped cache  Each memory address is associated with one possible block  within the cache • Therefore, we only need to look in a single location in the cache for the data if it exists in the cache ECE232: Introduction to Caches 7

Finding Item within Block In reality, a cache block consists of a number of bytes/words  to (1) increase cache hit due to locality property and (2) reduce the cache miss time Given an address of item, index tells which block of cache to  look in Then, how to find requested item within the cache block?  Or, equivalently, “What is the byte offset of the item within  the cache block?” ECE232: Introduction to Caches 8

Selecting part of a block (block size > 1 byte) If block size > 1, rightmost bits of index are really the offset  within the indexed block TAG INDEX OFFSET Tag to check if have Index to select a Byte offset correct block block in cache Example: Block size of 8 bytes; select byte 4 (or 2 nd word)  tag Memory address 0 11 1 2 11 01 100 3 Cache Index ECE232: Introduction to Caches 9

Accessing data in a direct mapped cache Three types of events:  cache hit: cache block is valid and contains proper address,  so read desired word cache miss: nothing in cache in appropriate block, so fetch  from memory cache miss, block replacement: wrong data is in cache at  appropriate block, so discard it and fetch desired data from memory Cache Access Procedure:  • (1) Use Index bits to select cache block • (2) If valid bit is 1, compare the tag bits of the address with the cache block tag bits • (3) If they match, use the offset to read out the word/byte ECE232: Introduction to Caches 10

Tags and Valid Bits How do we know which particular block is stored in a cache  location? Store block address as well as the data • Actually, only need the high-order bits • Called the tag • What if there is no data in a location?  Valid bit: 1 = present, 0 = not present • Initially 0 • ECE232: Introduction to Caches 11

Cache Example 8-blocks, 1 byte/block, direct mapped  Initial state  Index V Tag Data 000 N 001 N 010 N 011 N 100 N 101 N 110 N 111 N ECE232: Introduction to Caches 12

Cache Example Addr Binary Hit/mis Cache addr s block 22 10 110 Miss 110 Index V Tag Data 000 N 001 N 010 N 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N ECE232: Introduction to Caches 13

Cache Example Addr Binary addr Hit/miss Cache block 26 11 010 Miss 010 Index V Tag Data 000 N 001 N 010 Y 11 Mem[11010] 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N ECE232: Introduction to Caches 14

Cache Example Addr Binary addr Hit/miss Cache block 22 10 110 Hit 110 26 11 010 Hit 010 Index V Tag Data 000 N 001 N 010 Y 11 Mem[11010] 011 N 100 N 101 N 110 Y 10 Mem[10110] 111 N ECE232: Introduction to Caches 15

Cache Example Addr Binary addr Hit/miss Cache block 16 10 000 Miss 000 3 00 011 Miss 011 16 10 000 Hit 000 Index V Tag Data 000 Y 10 Mem[10000] 001 N 010 Y 11 Mem[11010] 011 Y 00 Mem[00011] 100 N 101 N 110 Y 10 Mem[10110] 111 N ECE232: Introduction to Caches 16

Cache Example Addr Binary addr Hit/miss Cache block 18 10 010 Miss 010 Index V Tag Data 000 Y 10 Mem[10000] 001 N 010 Y 10 Mem[10010] 011 Y 00 Mem[00011] 100 N 101 N 110 Y 10 Mem[10110] 111 N ECE232: Introduction to Caches 17

Example: Larger Block Size 64 blocks, 16 bytes/block  To what block number does address 1200 map? • Block address =  1200/16  = 75  Block number = 75 modulo 64 = 11  31 10 9 4 3 0 Tag Index Offset 22 bits 6 bits 4 bits ECE232: Introduction to Caches 18

Block Size Considerations Larger blocks should reduce miss rate  Due to spatial locality • But in a fixed-sized cache  Larger blocks  fewer of them • • More competition  increased miss rate Larger blocks  pollution • Larger miss penalty  Can override benefit of reduced miss rate • Early restart and critical-word-first can help • ECE232: Introduction to Caches 19

Cache Misses On cache hit, CPU proceeds normally  On cache miss  Stall the CPU pipeline • Fetch block from next level of hierarchy • Instruction cache miss • • Restart instruction fetch Data cache miss • • Complete data access ECE232: Introduction to Caches 20

Write-Through On data-write hit, could just update the block in cache  But then cache and memory would be inconsistent • Write through: also update memory  But makes writes take longer  e.g., if base CPI = 1, 10% of instructions are stores, write to • memory takes 100 cycles Effective CPI = 1 + 0.1×100 = 11 • Solution: write buffer  Holds data waiting to be written to memory • CPU continues immediately • • Only stalls on write if write buffer is already full ECE232: Introduction to Caches 21

Write-Back Alternative: On data-write hit, just update the block in cache  Keep track of whether each block is dirty • When a dirty block is replaced  Write it back to memory • Can use a write buffer to allow replacing block to be read first • ECE232: Introduction to Caches 22

Measuring Cache Performance Components of CPU time  Program execution cycles • • Includes cache hit time Memory stall cycles • • Mainly from cache misses With simplifying assumptions:  Memory stall cycles Memory accesses    Miss rate Miss penalty Program Instructio ns Misses    Miss penalty Program Instructio n ECE232: Introduction to Caches 23

Average Access Time Hit time is also important for performance  Average memory access time (AMAT)  AMAT = Hit time + Miss rate × Miss penalty • Example  CPU with 1ns clock, hit time = 1 cycle, miss penalty = 20 • cycles, I-cache miss rate = 5% AMAT = 1 + 0.05 × 20 = 2ns • • 2 cycles per instruction ECE232: Introduction to Caches 24

Summary  Today: Direct mapped cache  Performance: tied to whether values are located in the cache • Cache miss = bad performance  Need to understand how to numerically determine system performance based on cache hit rate  Why might direct mapped caches be bad • Lots of data map to same location in cache  Idea • Maybe we should have multiple locations for each data value • Next time: set associative ECE232: Introduction to Caches 25

ECE232: Hardware Organization and Design Lecture 22: Introduction to - PowerPoint PPT Presentation

ECE232: Hardware Organization and Design Lecture 22: Introduction to Caches Adapted from Computer Organization and Design , Patterson & Hennessy, UCB Overview Caches hold a subset of data from the main memory Three types of caches

ECE232: Hardware Organization and Design Lecture 7: Binary Numbers and Adders Adapted from

ECE232: Hardware Organization and Design Lecture 4: Logic Operations and Introduction to

ECE232: Hardware Organization and Design Lecture 9: Floating Point Adapted from Computer

ECE232: Hardware Organization and Design Lecture 21: Memory Hierarchy Adapted from Computer

ECE232: Hardware Organization and Design Lecture 29: Computer Input/Output Adapted from Computer

ECE232: Hardware Organization and Design Lecture 5: MIPs Decision-Making Instructions Adapted from

ECE232: Hardware Organization and Design Lecture 23: Associative Caches Adapted from Computer

ECE232: Hardware Organization and Design Lecture 11: Introduction to MIPs Datapath Adapted from

ECE232: Hardware Organization and Design Lecture 28: More Virtual Memory Adapted from Computer

Hardware Observability Framework Hardware Observability Framework Hardware Observability

software and hardware for the Internet of Things. Choose hardware Design hardware Design

Sec Secure ure Hardware Hardware and Hardware and Hardware- En Enabled abled Security

VC. VC. Hardware Startup The Hardware Revolu/on The Hardware Revolution Removing Barriers to

Spark architecture Spark architecture Hardware organization Hardware organization In local

Flexible Hardware Design at Flexible Hardware Design at Low Levels of Abstraction Low Levels of

LibreCores Free and Open Digital Hardware Requirements Design Implementation Hardware

An Imitation Learning Approach for Cache Replacement Evan Z. Liu, Milad Hashemi, Kevin Swersky,

Organization Lecture-13 Caches-2 Performance Shakil M. Khan Example: Intrinsity FastMATH

Recitation 7 Caching By yzhuang Announcements Pick up your exam from ECE course hub

EECS 388: Embedded Systems 10. Timing Analysis Heechul Yun 1 Agenda Execution time analysis

lecture 18 virtual physical physical virtual cache 2 address address address address -

Credits Some of the material in this presentation is taken from: Computer Architecture: A

Improving Cache Performance AMAT: Average Memory Access Time AMAT = T hit + Miss Rate x Miss

Multicore Workshop Caches Mark Bull David Henty EPCC, University of Edinburgh Overview