slide 2/25 ENCM 501 W14 Slides for Lecture 11 Previous Lecture Slides for Lecture 11 ENCM 501: Principles of Computer Architecture Winter 2014 Term 12:30 to 1:10pm: Quiz #1 1:15 to 1:45pm . . . Steve Norman, PhD, PEng ◮ fully-associative caches ◮ cache options for handling writes Electrical & Computer Engineering Schulich School of Engineering ◮ write buffers University of Calgary ◮ multi-level caches 13 February, 2014 ENCM 501 W14 Slides for Lecture 11 slide 3/25 ENCM 501 W14 Slides for Lecture 11 slide 4/25 Today’s Lecture AMAT in a two-level cache system Textbook formula: AMAT = L1 hit time + L1 miss rate × (L2 hit time + L2 miss rate × L2 miss penalty) ◮ more about multi-level caches It was pointed out (I think) in the previous lecture that the L1 ◮ classifying cache misses: the 3 C’s hit time should be weighted by the L1 hit rate. What ◮ introduction to virtual memory reasonable assumption would imply that such a weighting Related reading in Hennessy & Patterson: Sections B.2–B.4 would be INCORRECT ? This definition (not from the textbook!) is incorrect : For a system with two levels of caches, the L2 hit rate of a program is the number of L2 hits divided by the total number of memory accesses. What is a correct definition for L2 hit rate, compatible with the formula for AMAT? slide 5/25 slide 6/25 ENCM 501 W14 Slides for Lecture 11 ENCM 501 W14 Slides for Lecture 11 L2 cache design tradeoffs Valid bits in caches going 1 → 0 An L1 cache must keep up with a processor core. That is a challenge to a circuit design team but keeps the problem I hope it’s obvious why V bits for blocks go 0 → 1. But why simple: If a design is too slow, it fails. might V bits go 1 → 0? For an L2 cache, the tradeoffs are more complex: In other words, why does it sometimes make sense to ◮ Increasing capacity improves L2 miss rate but makes L2 invalidate one or more cache blocks? hit time, chip area and (probably) energy use worse. Here are two big reasons. (There are likely some other good ◮ Decreasing capacity improves L2 hit time, chip area, and reasons.) (probably) energy use, but makes L2 miss rate worse. ◮ DMA: direct memory access. Suppose L1 hit time = 1 cycle, L1 miss rate = 0.020, L2 miss ◮ Instruction writes by O/S kernels and by programs that penalty = 100 cycles. Which is better, considering AMAT only, write their own instructions. not chip area or energy? Let’s make some notes about each of these reasons. ◮ (a) L2 hit time = 10 cycles, L2 miss rate = 0.50 ◮ (b) L2 hit time = 12 cycles, L2 miss rate = 0.40
slide 7/25 slide 8/25 ENCM 501 W14 Slides for Lecture 11 ENCM 501 W14 Slides for Lecture 11 3 C’s of cache misses: compulsory, capacity, Capacity misses conflict This kind of miss arises because a cache is not big enough to It’s useful to think about the causes of cache misses. contain all the instructions and/or data a program accesses while it runs. Compulsory misses (sometimes called “cold misses”) happen on access to instructions or data that have never been in a Capacity misses for a program can be counted by simulating a cache. Examples include: program run with a fully-associative cache of some fixed ◮ instruction fetches in a program that has just been copied capacity. Since instruction and data blocks can be placed anywhere within a fully-associative cache, it’s reasonable to from disk to main memory; assume that any miss on access to a previously accessed ◮ data reads of information that has just been copied from instruction or data item in a fully-associative cache occurs a disk controller or network interface to main memory. because the cache is not big enough. Compulsory misses would happen even if a cache had the same capacity as the main memory the cache was supposed to Why is this a good but not perfect approximation? mirror. ENCM 501 W14 Slides for Lecture 11 slide 9/25 ENCM 501 W14 Slides for Lecture 11 slide 10/25 Conflict misses 3 C’s: Data from experiments Textbook Figure B.8 has a lot of data; it’s unreasonable to try Conflict misses (also called “collison misses”) occur in to jam all of that data into a few lecture slides. So here’s a direct-mapped and N -way set-associative caches because too subset of the data, for 8 KB capacity. N is the degree of many accesses to memory generate a common index . associativity, and miss rates are in misses per thousand In the absence of a 4th kind of miss— coherence misses, accesses. which can happen when multiple processors share access to a miss rates memory system—we can write: N compulsory capacity conflict conflict misses = 1 0.1 44 24 2 0.1 44 5 total misses − compulsory misses − capacity misses 4 0.1 44 < 0 . 5 8 0.1 44 < 0 . 5 The main idea behind increasing set-associativity is to reduce conflict misses without the time and energy problems of a This is real data from practical applications. It is worthwhile fully-associative cache. to study the table to see what general patterns emerge. slide 11/25 slide 12/25 ENCM 501 W14 Slides for Lecture 11 ENCM 501 W14 Slides for Lecture 11 Caches and Virtual Memory VM preliminaries: The O/S kernel Both are essential systems to support applications running on modern operating systems. As mentioned two weeks ago, it When a computer with an operating system is powered up or really helps to keep in mind what problems are solved by reset, instructions in ROM begin the job of copying a special caches and what very different problems are solved by program called the kernel from the file system into memory. virtual memory. The kernel is a vital piece of software—once it is running, it Caches are an impressive engineering workaround for controls hardware —memory, file systems, network interfaces, difficult facts about relative latencies of memory arrays. etc.—and schedules the access of other running programs to Virtual memory (VM) is a concept , a great design idea , processor cores. that solves a wide range of problems for a computer systems in which multiple applications are sharing resources.
slide 13/25 slide 14/25 ENCM 501 W14 Slides for Lecture 11 ENCM 501 W14 Slides for Lecture 11 VM preliminaries: Processes VM preliminaries: Examples of processes Suppose you are typing a command into a terminal window on Linux system. Two processes are directly involved: the A process can be defined as an instance of a program in terminal and a shell —the shell is the program that interprets execution . your commands and launches other programs in response to your commands. Because the kernel has special behaviour and special powers not available to other running programs, the kernel is usually Suppose you enter the command gcc foo.c bar.c not considered to be a process. A flurry of processes will come and go—one for the driver So when a computer is in normal operation there are many program gcc , two invocations of the compiler cc1 , two programs running concurrently: one kernel and many invocations of the assembler as , and one invocation of the processes . linker. Then if you enter the command ./a.out , a process will be created from the executable you just built. ENCM 501 W14 Slides for Lecture 11 slide 15/25 ENCM 501 W14 Slides for Lecture 11 slide 16/25 #1 problem solved by VM: protection How VM provides memory protection The kernel gives a virtual address space to each process. Processes need to be able to access memory quickly but safely. Suppose P is a process. P can use its own virtual address It would be disastrous if a process could accidentally or space with maliciously access memory in use for kernel instructions or ◮ no risk that P will access other processes’ memory; kernel data. ◮ no risk other processes will access P’s memory. It would also be disastrous if processes could accidentally or maliciously access each other’s memory. (That is a slight oversimplification—modern OSes allow intentional sharing of memory by cooperating processes.) (In the past, perhaps the #1 problem solved by VM was Processes never know the physical DRAM addresses of the allowing the combined main memory use of all processes to memory they use. The addresses used by processes are virtual exceed DRAM capacity. That’s still important today, but less addresses , which get translated into physical addresses for important than it used to be, because current DRAM circuits access to memory circuits. Translations are managed by the are cheap and have huge capacities.) kernel . slide 17/25 slide 18/25 ENCM 501 W14 Slides for Lecture 11 ENCM 501 W14 Slides for Lecture 11 Pages Example address splits for VM The basic unit of virtual memory is called a page . The size of a page must be a power of two. Different systems have different page sizes, and some instances a single system Let’s show the address split for an address width of 40 bits will support two or more different page sizes at the same time. and a page size of 4 KB. A very common page size is 4 KB. Let’s show the address split for an address width of 48 bits How many 4 KB pages are available in a system with 8 GB of and a page size of 4 KB. memory? Let’s show the address split for an address width of 40 bits An address in a VM system is split into and a “huge page” size of 2 MB. ◮ a page number which indicates which page the address belongs to; ◮ and a page offset which gives the location within a page of a byte, word, or similar small chunk of data
Recommend
More recommend