outline
play

Outline Multiplication in the digital domain HW mapping - PowerPoint PPT Presentation

Outline Multiplication in the digital domain HW mapping Introduction to Structured VLSI Design Pipelining optimization Integer Arithmetic and Pipelining Joachim Rodrigues Joachim Rodrigues, EIT, LTH, Introduction to Structured


  1. Outline • Multiplication in the digital domain • HW ‐ mapping Introduction to Structured VLSI Design • Pipelining optimization ‐ Integer Arithmetic and Pipelining Joachim Rodrigues Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic 8 ‐ bit Signed/Unsigned Integers Signed and Unsigned Integers Signed overflow ↑ ‐ 128 1000 0000 n-1 ‐ 127 1000 0001 � Unsigned integer: ∑ bit i • 2 i ... ... 1111 1100 i=0 MSB defines sign 1111 1101 ‐ 2 1111 1110 ‐ 1 1111 1111 � Two's complement signed integer: Signed integers 0 0000 0000 0 1 0000 0001 1 n-2 2 0000 0010 2 bit n-1 • (-2 n-1 ) + ∑ bit i • 2 i 3 0000 0011 3 ... ... ... i=0 126 Unsigned integers 126 0111 1110 Signed overflow ↓ 127 0111 1111 127 1000 0000 128 1000 0001 129 ... ... 1111 1110 254 1111 1111 255 Unsigned overflow ↓ n-1 5 4 3 2 1 0 Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic

  2. Unsigned Overflow Examples Add/Subtract 10+6 = 16, outside [0..15] A n ‐ 1 B n ‐ 1 A 1 B 1 A 0 B 0 1010 +0110 C 4 = 1 0000 ... + + + C n ‐ 1 C n C 2 C 1 C 0 = 0 7-10 = -3, outside [0..15] C n = C 4 = 1 & add ⇔ Unsigned overflow 0111 S 0 S n ‐ 1 S 1 ⇔ Unsigned overflow Carry-out & add - 1010 same as � The HW for sum/difference (S) doesn't care about signed/unsigned 0111 0101 � Unsigned overflow = Carry ‐ out & add OR no carry-out & subtract ⇔ Unsigned overflow + 1 C 4 = 0 1101 � Signed overflow = C n ⊕ C n ‐ 1 C n = C 4 = 0 & subtract ⇔ Unsigned overflow � True sign = S n ‐ 1 ⊕ signed overflow = (A n ‐ 1 ⊕ B n ‐ 1 ⊕ C n ‐ 1 ) ⊕ (C n ⊕ C n ‐ 1 ) = A n ‐ 1 ⊕ B n ‐ 1 ⊕ C n No carry-out & subtract ⇔ Unsigned overflow Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Signed Overflow Example Multiplication � Product = Multiplicand * Multiplier 6+7 = 13, outside [-8..7] � log (product) = log (multiplicand) + log (multiplier) C 3 = 1 0110 � Width of product is (worst ‐ case) sum of widths of factors +0111 � May overflow if single length product register is used C 4 =0 1101 � Paper ‐ and ‐ pencil method � Conditional add (controlled by bits of multiplier) and shift C n ⊕ C n-1 = C 4 ⊕ C 3 = 0 ⊕ 1 = 1 ⇔ Carry-outs different ⇔ Signed overflow � Partial product progressively develops into product � 1 product bit/cycle S n-1 ⊕ signed overflow = � Unsigned and signed multiplication A n-1 ⊕ B n-1 ⊕ C n = A 3 ⊕ B 3 ⊕ C 4 = 0 ⊕ 0 ⊕ 0 = 0 ⇔ True sign = Positive/zero � Signs require extra attention � Sequential, combinational or pipelined implementation � Tradeoff between hardware resources, throughput, latency, power Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic

  3. Multiplying Using Paper and Pencil ... more Paper and Pencil Multiplicand * Multiplier Partl ‐ product Partl ‐ multiplier We will concentrate on unsigned integers for the next few slides ! 1011*1110 0000 1110 Example: 0000 (0) + 0000 ‐ > 0000 0 111 LSB ”controls” 1011 * 1110 1011. (1) + 1011 . whether to add 0000 (*0 = zero) ‐ > 0101 10 11 ”0” or multiplicand 1011. . (1) + 1011 . . to partial product +1011. (*1 = copy) ‐ > 1000 010 1 1011. . . (1) + 1011 . . . +1011.. (*1 = copy) 10011010 1001 1010 +1011... (*1 = copy) 10011010 0 Multiplicand Partial prod uct, part.mul. Disadvantage: 2n ‐ bi t ALU Advantage: n ‐ bit ALU In decimal: 11 * 14 = 154 0: add zero, 1: add multiplicand Shifting in carry ‐ out prevents overflow Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Seq. Multiplication, Initialize Seq. Multiplication, Step n ‐ bit reg. n ‐ bit reg. Repeat step n times Multiplicand Multiplicand Load Load Control signal Conditional add C n Add C n Add C n Partial Partial product x 0 Multiplier Load Load Shift right multiplier bit 0 bit 0 2n ‐ bit reg. 2n ‐ bit reg. Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic

  4. Seq. Multiplication, Result Don't forget ... Signed Multiplication � Either transform to multiply of non ‐ negative integers: n ‐ bit reg. Multiplicand 1. Record signs and negate any negative factors. 2. Perform unsigned multiplication. 3. Negate product if signs above differ. C n Add � Or directly perform signed multiplication: 1. Take into account the sign bit of multiplicand by shifting in true sign bits rather than carry ‐ outs, i.e. A n ‐ 1 ⊕ B n ‐ 1 ⊕ C n rather than C n . Product 2. Take into account the sign bit of multiplier by bit 0 2n ‐ bit reg. doing a conditional subtract rather than a conditional add during the last iteration. one partial product per clock cycle => very slow Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Seq. signed multiplication, step Multiplication by a Constant n ‐ bit reg. Repeat step n times Multiplicand As a designer you need to assure that division with a small constant is accomplished by a number of shifts and adds Conditional add for iteration 1.. n ‐ 1, Some numerical examples: conditional subtract for iteration n Add/ True *2 (*10 2 ): multiplicand << 1 sub sign *3 (*11 2 ): multiplicand << 1 + multiplicand *4 (*100 2 ): multiplicand << 2 *5 (*101 2 ): multiplicand << 2 + multiplicand *255 (*11111111 2 ): multiplicand << 8 – multiplicand True sign Partial Partial product x Shift right multiplier bit 0 2n ‐ bit reg. True sign = A n ‐ 1 ⊕ B n ‐ 1 ⊕ C n Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic

  5. String of n ‐ bit Adders Carry ‐ save Adders in Multipliers � Unrolling loop lowers latency � Significantly reduced delays for multi ‐ input adders Mp 1 *Mc Mp 0 *Mc � Full ‐ adders with clever interconnect when compared to sequential 0 � Sum and carries fed separately to adder at next level add ‐ and ‐ shift at the expense � Carries drawn diagonally, sums drawn vertically of much more hardware � Typically, a final (carry ‐ propagate) adder assimilates the carries Mp 2 *Mc � n x n multiplication requires n ‐ 1 n ‐ bit adders A 0,2 B 0,2 A 0,1 B 0,1 B 0,0 C 0,2 C 0,1 A 0,0 C 0,0 � t saved_latency = n*(t clk ‐ out +t set ‐ up ) + + + CSA 0 Mp n ‐ 1 *Mc C 1,3 S 1,2 C 1,2 S 1,1 C 1,1 S 1,0 A 1,2 C 1,0 A 1,1 A 1,0 CSA 1 + + + C 2,3 P 2n ‐ 1 P 2n ‐ 2..n P n ‐ 1 P 2 P 1 P 0 S 2,2 C 2,2 S 2,1 C 2,1 S 2,0 Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic ... Pipelined Version 6 x 6 Parallel Array Multiplier MP i, j = Multiplier i AND Multiplicand j MP 0,3 MP 0,2 MP 0,1 MP 0,0 MP 1,3 MP 1,2 MP 1,1 MP 1,0 0 0 0 MP 2,3 MP 2,2 MP 2,1 + MP 2,0 + + MP 3,3 MP 3,2 MP 3,1 MP 3,0 Pipeline registers + + + Pipeline registers + + + Pipeline registers Carry ‐ propagate adder P 7 P 6 P 5 P 4 P 3 P 2 P 1 P 0 Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic

  6. Sequential, Combinational, and Pipelined � The sequential shift ‐ and ‐ add algorithm corresponds to a for ‐ loop that may be implemented by: � a state machine or � instructions (low ‐ end microcontrollers) � The sequential algorithm may be unrolled and implemented as a deep combinational circuit: � String of n ‐ bit adders and AND ‐ gates, or Pipelining � Carry ‐ save adders, AND ‐ gates, and final (n ‐ 1) ‐ bit adder � Advantage: low latency � Disadvantage: more hardware � The deep combinational circuit may be pipelined � Advantage: very high throughput � Disadvantages: pipeline latency, more hardware, and higher power Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic Laundry process Comparison • Non ‐ pipelined: – Delay: 60 min – Throughput 1/60 load per min • Pipelined: – Delay: 60 min – Throughput k/(40+k*20) load per min about 1/20 when k is large – Throughput 3 times better than non ‐ pipelined Joachim Rodrigues, Informatik og Matematisk Modellering, jnr@imm.dtu.dk Joachim Rodrigues, EIT, LTH, Introduction to Structured VLSI Design jrs@eit.lth.se Integer arithmetic

Recommend


More recommend