Lets Build a Processor • Almost ready to move into chapter 5 and start building a processor • First, let’s review Boolean Logic and build the ALU we’ll need (Material from Appendix B) operation a 32 ALU result 32 b 32 86 2004 Morgan Kaufmann Publishers
Review: Boolean Algebra & Gates • Problem: Consider a logic function with three inputs: A, B, and C. Output D is true if at least one input is true Output E is true if exactly two inputs are true Output F is true only if all three inputs are true • Show the truth table for these three functions. • Show the Boolean equations for these three functions. • Show an implementation consisting of inverters, AND, and OR gates. 87 2004 Morgan Kaufmann Publishers
An ALU (arithmetic logic unit) • Let's build an ALU to support the andi and ori instructions – we'll just build a 1 bit ALU, and use 32 of them operation op a b res a result b • Possible Implementation (sum-of-products): 88 2004 Morgan Kaufmann Publishers
Review: The Multiplexor • Selects one of the inputs to be the output, based on a control input S note: we call this a 2-input mux A even though it has 3 inputs! 0 C B 1 • Lets build our ALU using a MUX: 89 2004 Morgan Kaufmann Publishers
Different Implementations • Not easy to decide the “best” way to build something – Don't want too many inputs to a single gate – Don’t want to have to go through too many gates – for our purposes, ease of comprehension is important • Let's look at a 1-bit ALU for addition: CarryIn c out = a b + a c in + b c in a sum = a xor b xor c in Sum b CarryOut • How could we build a 1-bit ALU for add, and, and or? • How could we build a 32-bit ALU? 90 2004 Morgan Kaufmann Publishers
Building a 32 bit ALU CarryIn Operation a0 CarryIn Result0 ALU0 b0 CarryOut Operation CarryIn a1 CarryIn Result1 a ALU1 0 b1 CarryOut 1 Result a2 CarryIn Result2 ALU2 2 b2 b CarryOut CarryOut a31 CarryIn Result31 ALU31 b31 91 2004 Morgan Kaufmann Publishers
What about subtraction (a – b) ? • Two's complement approach: just negate b and add. • How do we negate? • A very clever solution: Binvert Operation CarryIn a 0 1 Result b 0 2 1 CarryOut 92 2004 Morgan Kaufmann Publishers
Adding a NOR function • Can also choose to invert a. How do we get “a NOR b” ? Ainvert Operation Binvert CarryIn a 0 0 1 1 Result b 0 2 + 1 CarryOut 93 2004 Morgan Kaufmann Publishers
Tailoring the ALU to the MIPS • Need to support the set-on-less-than instruction (slt) – remember: slt is an arithmetic instruction – produces a 1 if rs < rt and 0 otherwise – use subtraction: (a-b) < 0 implies a < b • Need to support test for equality (beq $t5, $t6, $t7) – use subtraction: (a-b) = 0 implies a = b 94 2004 Morgan Kaufmann Publishers
Supporting slt • Can we figure out the idea? Operation Ainvert Operation Ainvert Binvert CarryIn Binvert CarryIn a 0 a 0 0 0 1 1 1 1 Result Result b 0 b 0 2 + 2 + 1 1 Less 3 Less 3 Set CarryOut Overflow Overflow detection all other bits Use this ALU for most significant bit
Supporting slt Binvert Operation Ainvert CarryIn a0 CarryIn Result0 b0 ALU0 Less CarryOut a1 CarryIn Result1 b1 ALU1 0 Less CarryOut a2 CarryIn Result2 b2 ALU2 0 Less CarryOut . . . . . . . . . CarryIn a31 CarryIn Result31 Set b31 ALU31 0 Less Overflow 96 2004 Morgan Kaufmann Publishers
Test for equality • Notice control lines: Bnegate Operation Ainvert 0000 = and a0 CarryIn 0001 = or Result0 b0 ALU0 0010 = add Less CarryOut 0110 = subtract 0111 = slt a1 CarryIn Result1 1100 = NOR b1 ALU1 0 Less Zero . CarryOut . . a2 CarryIn • Note: zero is a 1 when the result is zero! Result2 b2 ALU2 0 Less CarryOut . . . . . . . . . . . . CarryIn Result31 a31 CarryIn Set b31 ALU31 0 Less Overflow 97 2004 Morgan Kaufmann Publishers
Conclusion • We can build an ALU to support the MIPS instruction set – key idea: use multiplexor to select the output we want – we can efficiently perform subtraction using two’s complement – we can replicate a 1-bit ALU to produce a 32-bit ALU • Important points about hardware – all of the gates are always working – the speed of a gate is affected by the number of inputs to the gate – the speed of a circuit is affected by the number of gates in series (on the “critical path” or the “deepest level of logic”) • Our primary focus: comprehension, however, – Clever changes to organization can improve performance (similar to using better algorithms in software) – We saw this in multiplication, let’s look at addition now 98 2004 Morgan Kaufmann Publishers
Problem: ripple carry adder is slow • Is a 32-bit ALU as fast as a 1-bit ALU? • Is there more than one way to do addition? – two extremes: ripple carry and sum-of-products Can you see the ripple? How could you get rid of it? c 1 = b 0 c 0 + a 0 c 0 + a 0 b 0 c 2 = b 1 c 1 + a 1 c 1 + a 1 b 1 c 2 = c 3 = b 2 c 2 + a 2 c 2 + a 2 b 2 c 3 = c 4 = b 3 c 3 + a 3 c 3 + a 3 b 3 c 4 = Not feasible! Why? 99 2004 Morgan Kaufmann Publishers
Carry-lookahead adder • An approach in-between our two extremes • Motivation: – If we didn't know the value of carry-in, what could we do? – When would we always generate a carry? g i = a i b i – When would we propagate the carry? p i = a i + b i • Did we get rid of the ripple? c 1 = g 0 + p 0 c 0 c 2 = g 1 + p 1 c 1 c 2 = c 3 = g 2 + p 2 c 2 c 3 = c 4 = g 3 + p 3 c 3 c 4 = Feasible! Why? 100 2004 Morgan Kaufmann Publishers
Use principle to build bigger adders CarryIn a0 CarryIn b0 Result0–3 a1 b1 a2 ALU0 b2 pi P0 a3 gi G0 b3 C1 Carry-lookahead unit ci + 1 a4 CarryIn b4 Result4–7 a5 • Can’t build a 16 bit adder this way... (too big) b5 ALU1 a6 • Could use ripple carry of 4-bit CLA adders b6 pi + 1 P1 a7 gi + 1 G1 b7 • Better: use the CLA principle again! C2 ci + 2 a8 CarryIn b8 Result8–11 a9 b9 ALU2 a10 pi + 2 b10 P2 a11 gi + 2 G2 b11 C3 ci + 3 a12 CarryIn b12 Result12–15 a13 b13 a14 ALU3 b14 pi + 3 P3 a15 gi + 3 G3 b15 C4 ci + 4 101 CarryOut 2004 Morgan Kaufmann Publishers
ALU Summary • We can build an ALU to support MIPS addition • Our focus is on comprehension, not performance • Real processors use more sophisticated techniques for arithmetic • Where performance is not critical, hardware description languages allow designers to completely automate the creation of hardware! 102 2004 Morgan Kaufmann Publishers
Chapter Five 103 2004 Morgan Kaufmann Publishers
The Processor: Datapath & Control • We're ready to look at an implementation of the MIPS • Simplified to contain only: – memory-reference instructions: lw, sw – arithmetic-logical instructions: add, sub, and, or, slt – control flow instructions: beq, j • Generic Implementation: – use the program counter (PC) to supply instruction address – get the instruction from memory – read registers – use the instruction to decide exactly what to do • All instructions use the ALU after reading the registers Why? memory-reference? arithmetic? control flow? 104 2004 Morgan Kaufmann Publishers
More Implementation Details • Abstract / Simplified View: 4 Add Add Data Register # ALU Address PC Address Instruction Registers Register # Data Instruction memory memory Register # Data Two types of functional units: – elements that operate on data values (combinational) – elements that contain state (sequential) 105 2004 Morgan Kaufmann Publishers
State Elements • Unclocked vs. Clocked • Clocks used in synchronous logic – when should an element that contains state be updated? Falling edge Clock period Rising edge cycle time 106 2004 Morgan Kaufmann Publishers
An unclocked state element • The set-reset latch – output depends on present inputs and also on past inputs R Q Q S 107 2004 Morgan Kaufmann Publishers
Latches and Flip-flops • Output is equal to the stored value inside the element (don't need to ask for permission to look at the value) • Change of state (value) is based on the clock • Latches: whenever the inputs change, and the clock is asserted • Flip-flop: state changes only on a clock edge (edge-triggered methodology) "logically true", — could mean electrically low A clocking methodology defines when signals can be read and written — wouldn't want to read a signal at the same time it was being written 108 2004 Morgan Kaufmann Publishers
D-latch • Two inputs: – the data value to be stored (D) – the clock signal (C) indicating when to read & store D • Two outputs: – the value of the internal state (Q) and it's complement C D Q C Q _ Q D 109 2004 Morgan Kaufmann Publishers
D flip-flop • Output changes only on the clock edge Q Q D D D Q D D latch latch Q C C Q C D C Q 110 2004 Morgan Kaufmann Publishers
Recommend
More recommend