Slides for Lecture 11 ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary 13 February, 2014
slide 2/25 ENCM 501 W14 Slides for Lecture 11 Previous Lecture 12:30 to 1:10pm: Quiz #1 1:15 to 1:45pm . . . ◮ fully-associative caches ◮ cache options for handling writes ◮ write buffers ◮ multi-level caches
slide 3/25 ENCM 501 W14 Slides for Lecture 11 Today’s Lecture ◮ more about multi-level caches ◮ classifying cache misses: the 3 C’s ◮ introduction to virtual memory Related reading in Hennessy & Patterson: Sections B.2–B.4
slide 4/25 ENCM 501 W14 Slides for Lecture 11 AMAT in a two-level cache system Textbook formula: AMAT = L1 hit time + L1 miss rate × (L2 hit time + L2 miss rate × L2 miss penalty) It was pointed out (I think) in the previous lecture that the L1 hit time should be weighted by the L1 hit rate. What reasonable assumption would imply that such a weighting would be INCORRECT ? This definition (not from the textbook!) is incorrect : For a system with two levels of caches, the L2 hit rate of a program is the number of L2 hits divided by the total number of memory accesses. What is a correct definition for L2 hit rate, compatible with the formula for AMAT?
slide 5/25 ENCM 501 W14 Slides for Lecture 11 L2 cache design tradeoffs An L1 cache must keep up with a processor core. That is a challenge to a circuit design team but keeps the problem simple: If a design is too slow, it fails. For an L2 cache, the tradeoffs are more complex: ◮ Increasing capacity improves L2 miss rate but makes L2 hit time, chip area and (probably) energy use worse. ◮ Decreasing capacity improves L2 hit time, chip area, and (probably) energy use, but makes L2 miss rate worse. Suppose L1 hit time = 1 cycle, L1 miss rate = 0.020, L2 miss penalty = 100 cycles. Which is better, considering AMAT only, not chip area or energy? ◮ (a) L2 hit time = 10 cycles, L2 miss rate = 0.50 ◮ (b) L2 hit time = 12 cycles, L2 miss rate = 0.40
slide 6/25 ENCM 501 W14 Slides for Lecture 11 Valid bits in caches going 1 → 0 I hope it’s obvious why V bits for blocks go 0 → 1. But why might V bits go 1 → 0? In other words, why does it sometimes make sense to invalidate one or more cache blocks? Here are two big reasons. (There are likely some other good reasons.) ◮ DMA: direct memory access. ◮ Instruction writes by O/S kernels and by programs that write their own instructions. Let’s make some notes about each of these reasons.
slide 7/25 ENCM 501 W14 Slides for Lecture 11 3 C’s of cache misses: compulsory, capacity, conflict It’s useful to think about the causes of cache misses. Compulsory misses (sometimes called “cold misses”) happen on access to instructions or data that have never been in a cache. Examples include: ◮ instruction fetches in a program that has just been copied from disk to main memory; ◮ data reads of information that has just been copied from a disk controller or network interface to main memory. Compulsory misses would happen even if a cache had the same capacity as the main memory the cache was supposed to mirror.
slide 8/25 ENCM 501 W14 Slides for Lecture 11 Capacity misses This kind of miss arises because a cache is not big enough to contain all the instructions and/or data a program accesses while it runs. Capacity misses for a program can be counted by simulating a program run with a fully-associative cache of some fixed capacity. Since instruction and data blocks can be placed anywhere within a fully-associative cache, it’s reasonable to assume that any miss on access to a previously accessed instruction or data item in a fully-associative cache occurs because the cache is not big enough. Why is this a good but not perfect approximation?
slide 9/25 ENCM 501 W14 Slides for Lecture 11 Conflict misses Conflict misses (also called “collison misses”) occur in direct-mapped and N -way set-associative caches because too many accesses to memory generate a common index . In the absence of a 4th kind of miss— coherence misses, which can happen when multiple processors share access to a memory system—we can write: conflict misses = total misses − compulsory misses − capacity misses The main idea behind increasing set-associativity is to reduce conflict misses without the time and energy problems of a fully-associative cache.
slide 10/25 ENCM 501 W14 Slides for Lecture 11 3 C’s: Data from experiments Textbook Figure B.8 has a lot of data; it’s unreasonable to try to jam all of that data into a few lecture slides. So here’s a subset of the data, for 8 KB capacity. N is the degree of associativity, and miss rates are in misses per thousand accesses. miss rates N compulsory capacity conflict 1 0.1 44 24 2 0.1 44 5 4 0.1 44 < 0 . 5 8 0.1 44 < 0 . 5 This is real data from practical applications. It is worthwhile to study the table to see what general patterns emerge.
slide 11/25 ENCM 501 W14 Slides for Lecture 11 Caches and Virtual Memory Both are essential systems to support applications running on modern operating systems. As mentioned two weeks ago, it really helps to keep in mind what problems are solved by caches and what very different problems are solved by virtual memory. Caches are an impressive engineering workaround for difficult facts about relative latencies of memory arrays. Virtual memory (VM) is a concept , a great design idea , that solves a wide range of problems for a computer systems in which multiple applications are sharing resources.
slide 12/25 ENCM 501 W14 Slides for Lecture 11 VM preliminaries: The O/S kernel When a computer with an operating system is powered up or reset, instructions in ROM begin the job of copying a special program called the kernel from the file system into memory. The kernel is a vital piece of software—once it is running, it controls hardware —memory, file systems, network interfaces, etc.—and schedules the access of other running programs to processor cores.
slide 13/25 ENCM 501 W14 Slides for Lecture 11 VM preliminaries: Processes A process can be defined as an instance of a program in execution . Because the kernel has special behaviour and special powers not available to other running programs, the kernel is usually not considered to be a process. So when a computer is in normal operation there are many programs running concurrently: one kernel and many processes .
slide 14/25 ENCM 501 W14 Slides for Lecture 11 VM preliminaries: Examples of processes Suppose you are typing a command into a terminal window on Linux system. Two processes are directly involved: the terminal and a shell —the shell is the program that interprets your commands and launches other programs in response to your commands. Suppose you enter the command gcc foo.c bar.c A flurry of processes will come and go—one for the driver program gcc , two invocations of the compiler cc1 , two invocations of the assembler as , and one invocation of the linker. Then if you enter the command ./a.out , a process will be created from the executable you just built.
slide 15/25 ENCM 501 W14 Slides for Lecture 11 #1 problem solved by VM: protection Processes need to be able to access memory quickly but safely. It would be disastrous if a process could accidentally or maliciously access memory in use for kernel instructions or kernel data. It would also be disastrous if processes could accidentally or maliciously access each other’s memory. (In the past, perhaps the #1 problem solved by VM was allowing the combined main memory use of all processes to exceed DRAM capacity. That’s still important today, but less important than it used to be, because current DRAM circuits are cheap and have huge capacities.)
slide 16/25 ENCM 501 W14 Slides for Lecture 11 How VM provides memory protection The kernel gives a virtual address space to each process. Suppose P is a process. P can use its own virtual address space with ◮ no risk that P will access other processes’ memory; ◮ no risk other processes will access P’s memory. (That is a slight oversimplification—modern OSes allow intentional sharing of memory by cooperating processes.) Processes never know the physical DRAM addresses of the memory they use. The addresses used by processes are virtual addresses , which get translated into physical addresses for access to memory circuits. Translations are managed by the kernel .
slide 17/25 ENCM 501 W14 Slides for Lecture 11 Pages The basic unit of virtual memory is called a page . The size of a page must be a power of two. Different systems have different page sizes, and some instances a single system will support two or more different page sizes at the same time. A very common page size is 4 KB. How many 4 KB pages are available in a system with 8 GB of memory? An address in a VM system is split into ◮ a page number which indicates which page the address belongs to; ◮ and a page offset which gives the location within a page of a byte, word, or similar small chunk of data
slide 18/25 ENCM 501 W14 Slides for Lecture 11 Example address splits for VM Let’s show the address split for an address width of 40 bits and a page size of 4 KB. Let’s show the address split for an address width of 48 bits and a page size of 4 KB. Let’s show the address split for an address width of 40 bits and a “huge page” size of 2 MB.
Recommend
More recommend