Summary of previous lecture • computer architecture = instruction set architecture + machine organisation • CPI = ∑ (CPI i x instr. count i ) / ( ∑ instr. count i ) exe. instr. cycle • minimise: X CPI X = time count time instr. code cycle CPI • CISC: count size time instr. code cycle CPI RISC: count size time dt10 2011 4.1
Computer arithmetic (3 rd Ed: p.160-175, Apx. B; 4 th Ed: p.87-94, p.224-229, Apx. C.5) • two’s complement: signed integer representation • e.g. 1011 2C = (1× -2 3 ) + (0×2 2 ) + (1×2 1 )+(1×2 0 ) = -5 ten • n-bit: range (-2 n-1 ) .. (2 n-1 -1) • sign extension: 1011 2C = 1111011 2C • overflow: A, B > 0, A+B ≤ 0 A, B < 0, A+B ≥ 0 • in MIPS: slt, slti work with two’s complement slt u , slti u work with unsigned representation (do not cause exception when overflow) dt10 2011 4.2
Logical operations • shift left logical – sll $10, $16, 8 # reg10 = reg16 « 8 bits – reg16 0 .. 0 0000 0000 1101 – reg10 0 .. 0 1101 0000 0000 20 bits introduce zeros – format 0 0 16 10 8 0 R type source1 source2 dest. shamt funct • shift left logical variable (sllv): shamt in register source1 • right shifts: srl, srlv, sra (sign-extend high order bits) • bitwise: or, and, ori, andi dt10 2011 4.3
ALU building blocks • and, or, inv, mux • ALU: Arithmetic Logic Unit n=32 for MIPS • bit-level realisation: hierarchical, regular structure dt10 2011 4.4
Bit-wise logical / selection operations • and, or, … • selector / multiplexor dt10 2011 4.5
Add / subtract dt10 2011 4.6
Deriving ALU cell by interleaving components • group components together to form larger repeated unit • the dotted boxes have the same function and interface dt10 2011 4.7
Selecting ALU operation • programmable inverter for b i (using xor) • connecting mux in series • d 0 d 1 : 00 and, 01 or, 10 add, 11 subtract • detecting overflow: exercise dt10 2011 4.8
Comparison operations • slt: set on less than, if a < b then 1 else 0 • if a < b, a-b < 0, so MSB of (a-b) is 1 • implementation – provide additional input to each cell – LSB input from MSB ALUb output, other inputs set to 0 – include additional mux in cell for selection – to select slt, s=0, d 0 =1, d 1 =1 dt10 2011 4.9
Zero detector • beq, bne: test a=b or a-b=0 • include another gate to test if output zero • summary: s d 0 d 1 011 100 101 110 111 function: set on and or add subtract less than dt10 2011 4.10
Performance estimation • clocked circuit: no combinational loops • speed limited by propagation delay through the slowest combinational path • slowest path: usually carry path • clock rate: approx. 1/(delay of slowest path) assuming – edge-triggered design – flip-flop propagation delay, set-up time, clock skew etc. negligible (see 3 rd Ed: B.7, B.11; 4 th Ed: C.7, C.11) dt10 2011 4.11
Fast addition • carry select – compute both zero-carry-in and one-carry-in after n stages – e.g. 8 bits: use three 4-bit ripple carry adders • other possibilities – carry-lookahead adder – conditional-sum adder dt10 2011 4.12
Multiply and divide (3 rd Ed: p.250-274, 4 th Ed: p.230-242) • implementing multiplication using ALU • Booth’s multiplication algorithm • implementing division using ALU • related MIPS instructions – mult, multu, div, divu, mfhi, mflo, sll, srl, sra dt10 2011 4.13
Multiply: example • multiplicand × multiplier = product 2 × 11 = 22 • idea: sum of multiplicand shifted successively by 1 bit relative to multiplier; CSAA • 0010 mc 1011 mp × ...0010 ← mc shifted 0 bit × bit 0 of mp ..0010. ← " " 1 " × " 1 " + 0010... ← " " 3 " × " 3 " 0010110 dt10 2011 4.14
Multiplication hardware: first version The multiplicand register, ALU, and Product register are all 64 bits wide, with only the Multiplier register containing 32 bits. The 32-bit multiplicand starts in the right half of the Multiplicand register, and is shifted left 1 bit on each step. The multiplier is shifted in the opposite direction at each step. The algorithm starts with the product initialised to 0. Control decides when to shift the Multiplicand and Multiplier registers and when to write new values into the Product register . dt10 2011 4.15
pr = mp × mc dt10 2011 4.16
Multiplication hardware: second version The Multiplicand register, ALU, and Multiplier register are all 32 bits wide, with only the Product register left as 64 bits. Now the product is shifted right. dt10 2011 4.17
pr = mp × mc LH: left Half dt10 2011 4.18
Booth’s insight • substitute n additions by 1 subtraction, 1 addition • successive 1s in multiplier mp ⇒ successive addition of shifted multiplicand mc 0010 mc 0110 mp × m = 00100 0010 shift left mc since mp1 = 1 k = 2 shift left mc since mp2 = 1 + 0010 001100 • given number of 1s in mp = k, and initially shifted mc = m, then summing the k terms give: m + 2m + 2 2 m +···+2 k-1 m (geometric series) dt10 2011 4.19
Booth’s algorithm • replace summing k terms m + 2m +···+ 2 k-1 m by 1 subtraction and 1 addition: -m + 2 k m (and k shifts to get the 2 k factor) • proof: let S = 1+2+···+2 k-1 , so 2×S = 2+4+···+2 k-1 +2 k S = 2×S - S = 2 k +(2 k-1 -2 k-1 )+···+(2-2)-1 = 2 k -1 • exercise: check that it works for signed numbers • algorithm detects a string of 1s in mp: 4 cases .. 0 1 1 1 1 .. 1 1 0 0 0 dt10 2011 4.20
Division • invented by Briggs • Dividend = Quotient × Divisor + Remainder 74 = 9 × 8 + 2 1001 q • ds 1000 1001010 dd (r ´ : intermediate value) -1000 align MSB(ds) and MSB(dd) 0010 r ´< ds: q = q ++ <0> 0101 1010 ← r ´ ≥ ds: q = q ++ <1> -1000 10 r (r = r ´ when finished) • compare r´ and ds: calculate r ´ = r ´ - ds – r ´ < 0, r ´ = r ´ + ds (restore old value of r ´) – r ´ ≥ 0, accept r ´ for further calculation dt10 2011 4.21
First version of the division hardware loop: r = r - ds if r ≥ 0 left shift q, LSB(q)=1 else r = r + ds, left shift (q) LSB(q) = 0 right shift ds The Divisor register, ALU, and Remainder register are all 64 bits wide, with only the Quotient register being 32 bits. The 32-bit divisor starts in the left half of the Divisor register and is shifted right 1 bit on each step. The remainder is initialised with the dividend. Control decides when to shift the Divisor and Quotient registers and when to write the new value into the Remainder register. dt10 2011 4.22
dd = (q × ds) + r dt10 2011 4.23
To note • refining division implementation – remainder shift left: reduce divisor / ALU size – combine quotient and remainder registers • MIPS instructions – multu, divu: unsigned operations – result in HI, LO registers – mflo: move data from LO register • exercise: signed numbers in – multiplication (3 rd Ed: p.180, 4 th Ed: p.234) – division (3 rd Ed: p.187, 4 th Ed: p.239) dt10 2011 4.24
Recommend
More recommend