Outline • Multiplication in the digital domain • HW ‐ mapping Introduction to Structured VLSI Design • Pipelining optimization ‐ Integer Arithmetic and Pipelining Joachim Rodrigues Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic 8 ‐ bit Signed/Unsigned Integers Signed and Unsigned Integers Signed overflow ↑ ‐ 128 1000 0000 n-1 ‐ 127 1000 0001 � Unsigned integer: ∑ bit i • 2 i ... ... 1111 1100 i=0 MSB defines sign 1111 1101 ‐ 2 1111 1110 ‐ 1 1111 1111 � Two's complement signed integer: Signed integers 0 0000 0000 0 1 0000 0001 1 n-2 2 0000 0010 2 bit n-1 • (-2 n-1 ) + ∑ bit i • 2 i 3 0000 0011 3 ... ... ... i=0 126 Unsigned integers 126 0111 1110 Signed overflow ↓ 127 0111 1111 127 1000 0000 128 1000 0001 129 ... ... 1111 1110 254 1111 1111 255 Unsigned overflow ↓ n-1 5 4 3 2 1 0 Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic
Unsigned Overflow Examples Add/Subtract 10+6 = 16, outside [0..15] A n ‐ 1 B n ‐ 1 A 1 B 1 A 0 B 0 1010 +0110 C 4 = 1 0000 ... + + + C n ‐ 1 C n C 2 C 1 C 0 = 0 7-10 = -3, outside [0..15] C n = C 4 = 1 & add ⇔ Unsigned overflow 0111 S 0 S n ‐ 1 S 1 ⇔ Unsigned overflow Carry-out & add - 1010 same as � The HW for sum/difference (S) doesn't care about signed/unsigned 0111 0101 � Unsigned overflow = Carry ‐ out & add OR no carry-out & subtract ⇔ Unsigned overflow + 1 C 4 = 0 1101 � Signed overflow = C n ⊕ C n ‐ 1 C n = C 4 = 0 & subtract ⇔ Unsigned overflow � True sign = S n ‐ 1 ⊕ signed overflow = (A n ‐ 1 ⊕ B n ‐ 1 ⊕ C n ‐ 1 ) ⊕ (C n ⊕ C n ‐ 1 ) = A n ‐ 1 ⊕ B n ‐ 1 ⊕ C n No carry-out & subtract ⇔ Unsigned overflow Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Signed Overflow Example Multiplication � Product = Multiplicand * Multiplier 6+7 = 13, outside [-8..7] � log (product) = log (multiplicand) + log (multiplier) C 3 = 1 0110 � Width of product is (worst ‐ case) sum of widths of factors +0111 � May overflow if single length product register is used C 4 =0 1101 � Paper ‐ and ‐ pencil method � Conditional add (controlled by bits of multiplier) and shift C n ⊕ C n-1 = C 4 ⊕ C 3 = 0 ⊕ 1 = 1 ⇔ Carry-outs different ⇔ Signed overflow � Partial product progressively develops into product � 1 product bit/cycle S n-1 ⊕ signed overflow = � Unsigned and signed multiplication A n-1 ⊕ B n-1 ⊕ C n = A 3 ⊕ B 3 ⊕ C 4 = 0 ⊕ 0 ⊕ 0 = 0 ⇔ True sign = Positive/zero � Signs require extra attention � Sequential, combinational or pipelined implementation � Tradeoff between hardware resources, throughput, latency, power Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic
Multiplying Using Paper and Pencil ... more Paper and Pencil Multiplicand * Multiplier Partl ‐ product Partl ‐ multiplier We will concentrate on unsigned integers for the next few slides ! 1011*1110 0000 1110 Example: 0000 (0) + 0000 ‐ > 0000 0 111 LSB ”controls” 1011 * 1110 1011. (1) + 1011 . whether to add 0000 (*0 = zero) ‐ > 0101 10 11 ”0” or multiplicand 1011. . (1) + 1011 . . to partial product +1011. (*1 = copy) ‐ > 1000 010 1 1011. . . (1) + 1011 . . . +1011.. (*1 = copy) 10011010 1001 1010 +1011... (*1 = copy) 10011010 0 Multiplicand Partial prod uct, part.mul. Disadvantage: 2n ‐ bi t ALU Advantage: n ‐ bit ALU In decimal: 11 * 14 = 154 0: add zero, 1: add multiplicand Shifting in carry ‐ out prevents overflow Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Seq. Multiplication, Initialize Seq. Multiplication, Step n ‐ bit reg. n ‐ bit reg. Repeat step n times Multiplicand Multiplicand Load Load Control signal Conditional add C n Add C n Add C n Partial Partial product x 0 Multiplier Load Load Shift right multiplier bit 0 bit 0 2n ‐ bit reg. 2n ‐ bit reg. Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic
Seq. Multiplication, Result Don't forget ... Signed Multiplication � Either transform to multiply of non ‐ negative integers: n ‐ bit reg. Multiplicand 1. Record signs and negate any negative factors. 2. Perform unsigned multiplication. 3. Negate product if signs above differ. C n Add � Or directly perform signed multiplication: 1. Take into account the sign bit of multiplicand by shifting in true sign bits rather than carry ‐ outs, i.e. A n ‐ 1 ⊕ B n ‐ 1 ⊕ C n rather than C n . Product 2. Take into account the sign bit of multiplier by bit 0 2n ‐ bit reg. doing a conditional subtract rather than a conditional add during the last iteration. one partial product per clock cycle => very slow Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Seq. signed multiplication, step Multiplication by a Constant n ‐ bit reg. Repeat step n times Multiplicand As a designer you need to assure that division with a small constant is accomplished by a number of shifts and adds Conditional add for iteration 1.. n ‐ 1, Some numerical examples: conditional subtract for iteration n Add/ True *2 (*10 2 ): multiplicand << 1 sub sign *3 (*11 2 ): multiplicand << 1 + multiplicand *4 (*100 2 ): multiplicand << 2 *5 (*101 2 ): multiplicand << 2 + multiplicand *255 (*11111111 2 ): multiplicand << 8 – multiplicand True sign Partial Partial product x Shift right multiplier bit 0 2n ‐ bit reg. True sign = A n ‐ 1 ⊕ B n ‐ 1 ⊕ C n Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic
String of n ‐ bit Adders Carry ‐ save Adders in Multipliers � Unrolling loop lowers latency � Significantly reduced delays for multi ‐ input adders Mp 1 *Mc Mp 0 *Mc � Full ‐ adders with clever interconnect when compared to sequential 0 � Sum and carries fed separately to adder at next level add ‐ and ‐ shift at the expense � Carries drawn diagonally, sums drawn vertically of much more hardware � Typically, a final (carry ‐ propagate) adder assimilates the carries Mp 2 *Mc � n x n multiplication requires n ‐ 1 n ‐ bit adders A 0,2 B 0,2 A 0,1 B 0,1 B 0,0 C 0,2 C 0,1 A 0,0 C 0,0 � t saved_latency = n*(t clk ‐ out +t set ‐ up ) + + + CSA 0 Mp n ‐ 1 *Mc C 1,3 S 1,2 C 1,2 S 1,1 C 1,1 S 1,0 A 1,2 C 1,0 A 1,1 A 1,0 CSA 1 + + + C 2,3 P 2n ‐ 1 P 2n ‐ 2..n P n ‐ 1 P 2 P 1 P 0 S 2,2 C 2,2 S 2,1 C 2,1 S 2,0 Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic ... Pipelined Version 6 x 6 Parallel Array Multiplier MP i, j = Multiplier i AND Multiplicand j MP 0,3 MP 0,2 MP 0,1 MP 0,0 MP 1,3 MP 1,2 MP 1,1 MP 1,0 0 0 0 MP 2,3 MP 2,2 MP 2,1 + MP 2,0 + + MP 3,3 MP 3,2 MP 3,1 MP 3,0 Pipeline registers + + + Pipeline registers + + + Pipeline registers Carry ‐ propagate adder P 7 P 6 P 5 P 4 P 3 P 2 P 1 P 0 Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic
Sequential, Combinational, and Pipelined � The sequential shift ‐ and ‐ add algorithm corresponds to a for ‐ loop that may be implemented by: � a state machine or � instructions (low ‐ end microcontrollers) � The sequential algorithm may be unrolled and implemented as a deep combinational circuit: � String of n ‐ bit adders and AND ‐ gates, or Pipelining � Carry ‐ save adders, AND ‐ gates, and final (n ‐ 1) ‐ bit adder � Advantage: low latency � Disadvantage: more hardware � The deep combinational circuit may be pipelined � Advantage: very high throughput � Disadvantages: pipeline latency, more hardware, and higher power Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Laundry process Comparison • Non ‐ pipelined: – Delay: 60 min – Throughput 1/60 load per min • Pipelined: – Delay: 60 min – Throughput k/(40+k*20) load per min about 1/20 when k is large – Throughput 3 times better than non ‐ pipelined Joachim Rodrigues, Informatik og Matematisk Modellering, jnr@imm.dtu.dk Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic
Recommend
More recommend