slide 2/20 ENCM 501 W14 Slides for Lecture 15 Previous Lecture Slides for Lecture 15 ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve Norman, PhD, PEng ◮ Virtual memory, page tables, and TLBs. Electrical & Computer Engineering Schulich School of Engineering University of Calgary 6 March, 2014 ENCM 501 W14 Slides for Lecture 15 slide 3/20 ENCM 501 W14 Slides for Lecture 15 slide 4/20 Today’s Lecture Context switch: a definition Consider a computer with multiple cores. For (relative) simplicity, assume that every process on this computer has a single thread; if that’s true, then a process is ◮ context switches and effects on memory latency either running in one core only, or suspended, waiting for the ◮ memory system summary OS kernel to give it some running time in one of the cores. ◮ introduction to ILP (instruction-level parallelism) With this assumption, a context switch is an event in which a ◮ review of simple pipelining running process (say, Process A) is suspended, and some other process (say, Process B) gets to continue (or start) in the core Related reading in Hennessy & Patterson: Sections C.1–C.3 where Process A was running. In reality, in 2014, some processes will be single-threaded and others will be multi-threaded. We’ll look at that in detail later in the course. slide 5/20 slide 6/20 ENCM 501 W14 Slides for Lecture 15 ENCM 501 W14 Slides for Lecture 15 Causes of context switches Saving process state in a context switch (1) What might cause an OS kernel to suspend Process A, and Suppose the kernel is suspending Process A and allowing give Process B some time to run? Process B to resume. Here is an incomplete list of possible reasons: The kernel will ◮ the kernel receives a timer interrupt , indicating that ◮ save Process A’s register values (GPRs, floating-point Process A has used up a time slice ; registers, PC, other special purpose registers) by copying ◮ the kernel notices that Process A is blocked , waiting for them to some safe location in memory; input from user, disk, network, or some other source; ◮ restore Process B’s register values by copying them from ◮ Process A asks to be suspended, with a system call such memory into the appropriate registers. as nanosleep ; What else does the kernel have to do regarding the states of ◮ page fault in Process A—Process A tries to access a page Processes A and B? that is not present in physical memory.
slide 7/20 slide 8/20 ENCM 501 W14 Slides for Lecture 15 ENCM 501 W14 Slides for Lecture 15 Saving process state in a context switch (2) Memory system summary Think about this simple C function: Again, suppose the kernel is suspending Process A and int add_em(const int *a, int n) allowing Process B to resume. { Q1: What must the kernel do with all the physical pages of int i, sum = 0; memory (also known as page frames) that Process A was for (i = 0; i < n; i++) using? sum += a[i]; Q2: What must the kernel do with the page tables for return sum; Processes A and B? } Q3: What must the kernel do with the TLBs in the core where What is the cost of the statement sum += a[i]; ? Process A was running? That’s not an easy question! The answer depends both on Q4: What is the impact of the context switch on TLB miss the effects of memory accesses in the recent past, and also on rates, and I-cache and D-cache miss rates? the impact of this particular data memory access on memory accesses in the near future. ENCM 501 W14 Slides for Lecture 15 slide 9/20 ENCM 501 W14 Slides for Lecture 15 slide 10/20 ILP: Instruction-Level Parallelism Review of simple pipelining Before diving into microarchitectures with multiple pipelines, ILP is a general term for enhancing instruction throughput let’s review the design challenges of getting a single pipeline to within a single processor core by having multiple work fast and correctly. instructions “in flight” at any given time. The basic organization of a pipeline involves Two important forms of ILP are ◮ pipeline stages: A stage performs some small simple step ◮ pipelining: each instruction takes several clock cycles to as part of handling an instruction. For example, one stage complete, but instructions are started one per clock cycle might be responsible for reading GPR values used in an ◮ multiple issue: two or more instructions are started in the instruction, and another stage might compute memory same clock cycle addresses to be used in loads and stores. ◮ pipeline registers: At the end of each clock cycle, a Modern processors use both pipelining and multiple issue, and pipeline register captures the results produced by a stage, use complex sets of related features to try to maximize making those results available for the next stage in the instruction throughput. next cycle. slide 11/20 slide 12/20 ENCM 501 W14 Slides for Lecture 15 ENCM 501 W14 Slides for Lecture 15 First stage of a simple pipeline: IF (instruction IF stage ID stage branch CLK fetch) target branch address decision CLK address 32 32 We’ll look at an example pipeline that can handle a few instruction 32 different kinds of MIPS instructions. 1 32 32 instruction The IF stage is responsible for 0 memory 32 ◮ updating the PC register as appropriate PC ◮ reading an instruction from memory and copying the 32 32 instruction in a pipeline register so the instruction is add available to the next stage, called the ID stage. 32 0x00000004 IF/ID Despite what we’ve just learned about memory, we’ll pretend usual PC update that “instruction memory” is a simple functional unit that can be read within a single clock cycle! In every single clock cycle, the IF stage will dump a new instruction into the IF/ID pipeline register.
slide 13/20 slide 14/20 ENCM 501 W14 Slides for Lecture 15 ENCM 501 W14 Slides for Lecture 15 More stages IF ID EX MEM WB Attention: This slide and others like it will not attempt to This lecture will follow the 5-stage design presented in Section describe every detail of a pipeline stage. Instead it will just C.3 of the course textbook. The stages are: explain the general role of a stage. ◮ IF , which we’ve just seen The ID stage: ◮ ID: instruction decode and GPR read ◮ decodes the instruction—finds out what kind of ◮ EX: execute—perform computation in ALU instruction it is, and what its operands are (arithmetic/logic unit) ◮ copies two GPR values into the ID/EX register ◮ MEM: access to data memory for load or store ◮ copies an offset into the ID/EX register, in case the offset ◮ WB: writeback—write result of a load or an instruction is needed for load, store, or branch like DADD to a GPR ◮ copies some instruction address information into the Let’s sketch out what each of these stages do . . . ID/EX register, in case that is needed to generate a branch target address ENCM 501 W14 Slides for Lecture 15 slide 15/20 ENCM 501 W14 Slides for Lecture 15 slide 16/20 “R-type” instructions IF ID EX MEM WB The EX stage performs a computation in the ALU. For an R-type instruction , the ALU performs whatever operation is appropriate (add, subtract, AND, OR, etc.), and R-type is MIPS jargon for instructions such as DADDU , DSUBU , writes the result into the EX/MEM register. OR , AND , etc. For a load or store , the ALU computes a memory address, An R-type instruction involves performing some simple ALU and writes the address into the EX/MEM register. computation involving two GPR values, and writing the result For a branch , the ALU computes a branch target address and to a GPR. makes a branch decision. Both of those results get written into the EX/MEM register. Attention: The branch instruction handling described on this slide is specific to textbook Figure C.22! We’ll look at problems related to that design in the next lecture. slide 17/20 slide 18/20 ENCM 501 W14 Slides for Lecture 15 ENCM 501 W14 Slides for Lecture 15 IF ID EX MEM WB IF ID EX MEM WB The MEM stage is mostly for data memory access by loads and stores. Again we pretend that memory is really simple! The WB stage is used to update a GPR with the result of an R-type or load instruction. For an R-type instruction , not much happens. Results are copied from the EX/MEM register to the MEM/WB register. For an R-type or load instruction , a GPR is updated, using the appropriate result from the MEM/WB register. For a load , data read from memory gets copied into the MEM/WB register. It wasn’t mentioned before, but the 5-bit number specifying the destination register had to be passed from ID through EX For a store , data memory is updated using an address and and MEM to get to WB at the same time as the ALU or load data found in the EX/MEM register. result. For a branch , if the decision in EX was to take the branch, the PC gets updated with the branch target address. For a store or a branch , nothing happens in WB. Those instructions finish in MEM. Attention, again: The branch instruction handling described on this slide is specific to textbook Figure C.22!
slide 19/20 slide 20/20 ENCM 501 W14 Slides for Lecture 15 ENCM 501 W14 Slides for Lecture 15 A rough sketch of the 5-stage pipeline Upcoming Topics IF ID EX MEM WB CLK CLK CLK CLK instr. I-mem decode CLK ? CLK ◮ Pipeline hazards and solutions to pipeline hazards. ALU D-mem Related reading in Hennessy & Patterson: Sections C.1–C.3 add GPRs PC IF/ID ID/EX EX/MEM MEM/WB A lot of detail has been left out, but there’s enough here for us to trace processing of LW followed by DSUBU followed by SW.
Recommend
More recommend