Slides for Lecture 3 ENCM 501: Principles of Computer Architecture Winter 2014 Term Steve Norman, PhD, PEng Electrical & Computer Engineering Schulich School of Engineering University of Calgary 16 January, 2014
slide 2/19 ENCM 501 W14 Slides for Lecture 3 Previous Lecture ◮ brief list of ENCM 501 topics ◮ what does “computer architecture” mean? ◮ trends in computer system performance ◮ classes of computers ◮ trends in computer technology ◮ preliminaries for energy and power use
slide 3/19 ENCM 501 W14 Slides for Lecture 3 Today’s Lecture ◮ completion of yesterday’s tutorial ◮ energy and power use in processors ◮ brief coverage of trends in cost Related material in Hennessy & Patterson (our course textbook): Sections 1.5–1.6.
slide 4/19 ENCM 501 W14 Slides for Lecture 3 About the Wed Jan 15 tutorial We didn’t quite finish, and it’s useful to review how if statements and loops work at the assembly language level. So the next two slides come from yesterday’s tutorial . . .
slide 5/19 ENCM 501 W14 Slides for Lecture 3 Comparisons, branches and jumps Compare: dest gets 1 if src1 < src2 0 otherwise . . . SLT dest , src1 , src2 MIPS is unusual—the result does not go into a “condition code register”. Note also that there are variations for unsigned comparison and comparison of a GPR to a constants. Branch: If GPR1 == GPR2 goto label . . . GPR1 , GPR2 , label BEQ There are variations. The most common is BNE : branch if not equal. Jump: Goto label . . . J label
slide 6/19 ENCM 501 W14 Slides for Lecture 3 Example loop and if statement Translate this fragment: long int *p, *q; // R16, R17 long int sum; // R18 sum = 0; q = p + 100; while (p != q) { if (*p > 0) sum += *p; p++; } Repeat, changing the types of p and q to int * .
slide 7/19 ENCM 501 W14 Slides for Lecture 3 Below is a very straightforward translation of the C code. A good C compiler with optimization turned on would produce faster but less straightforward code. OR R18, R0, R0 # sum = 0 DADDIU R17, R16, 800 # q = p + 100 L1: BEQ R16, R17, L2 # if (p == q) goto L2 NOP LD R8, 0(R16) # R8 = *p SLT R9, R0, R8 # R9 = 0 < *p BEQ R9, R0, L3 # if (!R9) goto L3 NOP DADDU R18, R18, R8 # sum += *p L3: DADDIU R16, R16, 8 # p++ J L1 # goto L1 NOP L2: # [next instruction after while loop]
slide 8/19 ENCM 501 W14 Slides for Lecture 3 Remarks about the previous slide Why the NOPs? In MIPS there is a delay slot following every jump or branch. After a jump, the delay slot instruction is executed before the jump target instruction is executed. After a branch, the delay slot instruction gets executed regardless of whether the branch is taken. Okay, but why do delay slots exist? It made sense in the 1980’s—simple pipelining was feasible, but getting the jump/branch target instruction started one clock cycle after a jump/branch was not. It’s up to compiler writers to try to find safe and useful work to do in delay slots. (Filling a delay slot with a NOP is pure waste.)
slide 9/19 ENCM 501 W14 Slides for Lecture 3 “Repeat, changing the types of p and q to int * .” ◮ Change 800 to 400 for q = p + 100 ◮ Change 8 to 4 for p++ ◮ Change LD to LW . Subtle detail: LW will sign-extend the 32-bit number it gets from memory to make an equivalent 64-bit number in the destination GPR, so the 64-bit SLT that follows will “do the right thing”. You will not be tested on the weird little detail about LW !
slide 10/19 ENCM 501 W14 Slides for Lecture 3 Preliminaries for energy and power use (3) V DD V DD R PU R PU gate output gate output C C R PD R PD What are the energy flows when the gate output goes from logic 0 to logic 1? What are they when the gate output goes from logic 1 to logic 0?
slide 11/19 ENCM 501 W14 Slides for Lecture 3 Energy and power Power is the time rate of energy use. (That should not be a new idea for 4th-year engineering students!) instantaneous power = d dt energy use average power = energy use over time interval duration of time interval
slide 12/19 ENCM 501 W14 Slides for Lecture 3 Energy and power use of a single logic gate The energy spent per clock cycle of a gate with an output that makes a 0 → 1 or 1 → 0 transition every single clock cycle is 1 2 . 2 C V DD If the clock period is T , the frequency is f = 1 / T , so the power use by the gate is 1 2 / T = 1 2 f . 2 C V DD 2 C V DD The equations are correct but an assumption here is incorrect. Why is this not a good model for power use by a logic gate in a processor circuit?
slide 13/19 ENCM 501 W14 Slides for Lecture 3 Energy and power use of a processor chip (1) A useful concept, unfortunately not mentioned in Section 1.5 of your textbook, is a , the activity factor . Let C total is the sum of all of the capacitive loads for all of the logic gates in an IC. Then aC total is the average capacitive load that actually does a 0 → 1 or 1 → 0 transition in a clock cycle. Why is a much less than 1 for a modern processor chip? How could a be greater than 1 for certain small regions within a modern processor chip? Which is a better way to think? ◮ a is not hard for engineers to estimate, and is pretty much determined by the design of a processor chip. ◮ a is scarily unpredictable.
slide 14/19 ENCM 501 W14 Slides for Lecture 3 Energy and power use of a processor chip (2) Two formulas, assuming that a varies over time, but doesn’t change very much over a single processor clock cycle . . . Energy used and heat that must be dissipated in a single clock cycle, due to switching: E dynamic = 1 2 . 2 a ( t ) C total V DD Power consumption: P dynamic = 1 2 f . 2 a ( t ) C total V DD
slide 15/19 ENCM 501 W14 Slides for Lecture 3 Energy and power use of a processor chip (3) An ideal CMOS logic gate does not consume any power when its output is not switching, because either its pull-up network or its pull-down network is completely turned off. In real CMOS ICs, however, there are are various paths for current to leak from V DD to ground: P static = V DD I leakage This is a major concern at both ends of the computing spectrum: ◮ It gradually drains batteries in battery-powered embedded systems. ◮ It wastes power in servers that spend significant time idle, waiting for tasks to arrive.
slide 16/19 ENCM 501 W14 Slides for Lecture 3 Both energy and power matter in processor design Because most processors are idle much of the time energy spent on a typical task is a good measure of the efficiency of a processor. However, power at maximum load is critical as well . . . ◮ The power supply must be able to supply the needed current without dropping V DD . ◮ The cooling system must be capable of removing heat at a rate equal to average power during sustained heavy load.
slide 17/19 ENCM 501 W14 Slides for Lecture 3 Energy and power management in processor chips A simple processor chip is either on or off. When it’s on, the whole chip is on, and V DD and f are fixed. More complex processor chips . . . ◮ turn off idle regions within the chip; ◮ use DVFS (dynamic voltage-frequency scaling)— V DD and f go up and down with the processor load. DVFS relies on the fact that a CMOS circuit can operate correctly over a wide range of V DD values. Lower V DD is more energy-efficient but results in slower switching times, so when V DD is reduced, f must be reduced as well.
slide 18/19 ENCM 501 W14 Slides for Lecture 3 Trends in Cost This is a massively complex topic; we’ll look at in the same brief and superficial way that the textbook does. First, let’s be clear about what a chip die is and what a wafer is.
slide 19/19 ENCM 501 W14 Slides for Lecture 3 Upcoming Topics ◮ measuring and reporting computer performance ◮ quantitative principles of computer design ◮ a survey of ISA design ideas Related reading in Hennessy & Patterson: Sections 1.8–1.9, A.1–A.7
Recommend
More recommend