Summary of previous lecture computer architecture = instruction - PowerPoint PPT Presentation

Summary of previous lecture • computer architecture = instruction set architecture + machine organisation • CPI = ∑ (CPI i x instr. count i ) / ( ∑ instr. count i ) exe. instr. cycle • minimise: X CPI X = time count time instr. code cycle CPI • CISC: count size time instr. code cycle CPI RISC: count size time dt10 2011 4.1

Computer arithmetic (3 rd Ed: p.160-175, Apx. B; 4 th Ed: p.87-94, p.224-229, Apx. C.5) • two’s complement: signed integer representation • e.g. 1011 2C = (1× -2 3 ) + (0×2 2 ) + (1×2 1 )+(1×2 0 ) = -5 ten • n-bit: range (-2 n-1 ) .. (2 n-1 -1) • sign extension: 1011 2C = 1111011 2C • overflow: A, B > 0, A+B ≤ 0 A, B < 0, A+B ≥ 0 • in MIPS: slt, slti work with two’s complement slt u , slti u work with unsigned representation (do not cause exception when overflow) dt10 2011 4.2

Logical operations • shift left logical – sll $10, $16, 8 # reg10 = reg16 « 8 bits – reg16 0 .. 0 0000 0000 1101 – reg10 0 .. 0 1101 0000 0000 20 bits introduce zeros – format 0 0 16 10 8 0 R type source1 source2 dest. shamt funct • shift left logical variable (sllv): shamt in register source1 • right shifts: srl, srlv, sra (sign-extend high order bits) • bitwise: or, and, ori, andi dt10 2011 4.3

ALU building blocks • and, or, inv, mux • ALU: Arithmetic Logic Unit n=32 for MIPS • bit-level realisation: hierarchical, regular structure dt10 2011 4.4

Bit-wise logical / selection operations • and, or, … • selector / multiplexor dt10 2011 4.5

Add / subtract dt10 2011 4.6

Deriving ALU cell by interleaving components • group components together to form larger repeated unit • the dotted boxes have the same function and interface dt10 2011 4.7

Selecting ALU operation • programmable inverter for b i (using xor) • connecting mux in series • d 0 d 1 : 00 and, 01 or, 10 add, 11 subtract • detecting overflow: exercise dt10 2011 4.8

Comparison operations • slt: set on less than, if a < b then 1 else 0 • if a < b, a-b < 0, so MSB of (a-b) is 1 • implementation – provide additional input to each cell – LSB input from MSB ALUb output, other inputs set to 0 – include additional mux in cell for selection – to select slt, s=0, d 0 =1, d 1 =1 dt10 2011 4.9

Zero detector • beq, bne: test a=b or a-b=0 • include another gate to test if output zero • summary: s d 0 d 1 011 100 101 110 111 function: set on and or add subtract less than dt10 2011 4.10

Performance estimation • clocked circuit: no combinational loops • speed limited by propagation delay through the slowest combinational path • slowest path: usually carry path • clock rate: approx. 1/(delay of slowest path) assuming – edge-triggered design – flip-flop propagation delay, set-up time, clock skew etc. negligible (see 3 rd Ed: B.7, B.11; 4 th Ed: C.7, C.11) dt10 2011 4.11

Fast addition • carry select – compute both zero-carry-in and one-carry-in after n stages – e.g. 8 bits: use three 4-bit ripple carry adders • other possibilities – carry-lookahead adder – conditional-sum adder dt10 2011 4.12

Multiply and divide (3 rd Ed: p.250-274, 4 th Ed: p.230-242) • implementing multiplication using ALU • Booth’s multiplication algorithm • implementing division using ALU • related MIPS instructions – mult, multu, div, divu, mfhi, mflo, sll, srl, sra dt10 2011 4.13

Multiply: example • multiplicand × multiplier = product 2 × 11 = 22 • idea: sum of multiplicand shifted successively by 1 bit relative to multiplier; CSAA • 0010 mc 1011 mp × ...0010 ← mc shifted 0 bit × bit 0 of mp ..0010. ← " " 1 " × " 1 " + 0010... ← " " 3 " × " 3 " 0010110 dt10 2011 4.14

Multiplication hardware: first version The multiplicand register, ALU, and Product register are all 64 bits wide, with only the Multiplier register containing 32 bits. The 32-bit multiplicand starts in the right half of the Multiplicand register, and is shifted left 1 bit on each step. The multiplier is shifted in the opposite direction at each step. The algorithm starts with the product initialised to 0. Control decides when to shift the Multiplicand and Multiplier registers and when to write new values into the Product register . dt10 2011 4.15

pr = mp × mc dt10 2011 4.16

Multiplication hardware: second version The Multiplicand register, ALU, and Multiplier register are all 32 bits wide, with only the Product register left as 64 bits. Now the product is shifted right. dt10 2011 4.17

pr = mp × mc LH: left Half dt10 2011 4.18

Booth’s insight • substitute n additions by 1 subtraction, 1 addition • successive 1s in multiplier mp ⇒ successive addition of shifted multiplicand mc 0010 mc 0110 mp × m = 00100 0010 shift left mc since mp1 = 1 k = 2 shift left mc since mp2 = 1 + 0010 001100 • given number of 1s in mp = k, and initially shifted mc = m, then summing the k terms give: m + 2m + 2 2 m +···+2 k-1 m (geometric series) dt10 2011 4.19

Booth’s algorithm • replace summing k terms m + 2m +···+ 2 k-1 m by 1 subtraction and 1 addition: -m + 2 k m (and k shifts to get the 2 k factor) • proof: let S = 1+2+···+2 k-1 , so 2×S = 2+4+···+2 k-1 +2 k S = 2×S - S = 2 k +(2 k-1 -2 k-1 )+···+(2-2)-1 = 2 k -1 • exercise: check that it works for signed numbers • algorithm detects a string of 1s in mp: 4 cases .. 0 1 1 1 1 .. 1 1 0 0 0 dt10 2011 4.20

Division • invented by Briggs • Dividend = Quotient × Divisor + Remainder 74 = 9 × 8 + 2 1001 q • ds 1000 1001010 dd (r ´ : intermediate value) -1000 align MSB(ds) and MSB(dd) 0010 r ´< ds: q = q ++ <0> 0101 1010 ← r ´ ≥ ds: q = q ++ <1> -1000 10 r (r = r ´ when finished) • compare r´ and ds: calculate r ´ = r ´ - ds – r ´ < 0, r ´ = r ´ + ds (restore old value of r ´) – r ´ ≥ 0, accept r ´ for further calculation dt10 2011 4.21

First version of the division hardware loop: r = r - ds if r ≥ 0 left shift q, LSB(q)=1 else r = r + ds, left shift (q) LSB(q) = 0 right shift ds The Divisor register, ALU, and Remainder register are all 64 bits wide, with only the Quotient register being 32 bits. The 32-bit divisor starts in the left half of the Divisor register and is shifted right 1 bit on each step. The remainder is initialised with the dividend. Control decides when to shift the Divisor and Quotient registers and when to write the new value into the Remainder register. dt10 2011 4.22

dd = (q × ds) + r dt10 2011 4.23

To note • refining division implementation – remainder shift left: reduce divisor / ALU size – combine quotient and remainder registers • MIPS instructions – multu, divu: unsigned operations – result in HI, LO registers – mflo: move data from LO register • exercise: signed numbers in – multiplication (3 rd Ed: p.180, 4 th Ed: p.234) – division (3 rd Ed: p.187, 4 th Ed: p.239) dt10 2011 4.24

Summary of previous lecture computer architecture = instruction - PowerPoint PPT Presentation

Summary of previous lecture computer architecture = instruction set architecture + machine organisation CPI = (CPI i x instr. count i ) / ( instr. count i ) exe. instr. cycle minimise: X CPI X = time count time

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

Baldwin Space Summary October 25 1 Baldwin School Space Summary 2 Baldwin School Space Summary

Previous Lecture Todays Lecture Slides for Lecture 5 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 30 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 28 Completion of divide-by-3 counter

Previous Lecture Todays Lecture Slides for Lecture 12 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 3 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 2 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 35 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 32 Completion of a timing analysis

Previous Lecture Todays Lecture Slides for Lecture 26 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 33 ENEL 353: Digital Circuits Fall

Previous Lecture Todays Lecture Slides for Lecture 6 ENEL 353: Digital Circuits Fall 2013

Previous Lecture Todays Lecture Slides for Lecture 27 ENEL 353: Digital Circuits Fall

1 2 Capital Program Overview Previous FCS data Overview Previous FCS data

Todays Lecture: Complete matrix example from previous lecture Image processing

Floating Point Representation and Digital Logic Lecture 11 CS301 Administrative Daily

ECE 3060 VLSI and Advanced Digital Design Lecture 13 Datapath1: ALUs and Adders Datapath

Switching Data arriving to an input port of a switch have to be moved to Data arriving to an

Introduction CPU performance factors Instruction count

Experiences Building The Largest Multitouch Screen in Latin America Ariel Molina Rueda

Power Optimization of FPGA Interconnect Via Circuit and CAD Techniques Safeen Huda and Jason

CSCI 510/610: Advanced Computer Architecture Implementing a Datapath in Verilog A Lab Manual

CS 31: Intro to Systems Digital Logic Martin Gagn Swarthmore College January 31, 2017