A New Family of High-Performance Parallel Decimal Multipliers* Alvaro Vázquez, Elisardo Antelo Paolo Montuschi Dept. of Electronic and Computer Science Dept. of Computer Engineering University of Santiago de Compostela Politecnico di Torino Spain Italy alvaro@dec.usc.es elisardo@dec.usc.es montuschi@polito.it *A. Vázquez and E. Antelo supported in part by the Ministry of Science and Technology of Spain under contract TIN2004-07797-C02 and Xunta de Galicia under contract PGIDT03TIC10502PR. 1 ARITH 18 - Montpellier, France. June 25-27, 2007
A New Family of High-Performance Parallel Decimal Multipliers Outline • Introduction. Previous work. • Implementation of decimal parallel multiplication: – Fast carry-save addition using non conventional BCD. – Design of high-performance decimal p:2 CSAs. – Parallel partial product generation . • Architectures. – Signed-digit (SD) Radix-10. – SD Radix-4/Radix-5 (combined binary/decimal). • Evaluation and Comparison. • Conclusions. 2 ARITH 18 - Montpellier, France. June 25-27, 2007
A New Family of High-Performance Parallel Decimal Multipliers Introduction • High-performance decimal floating-point units. • Parallel multiplier: scaling performance by pipelining. • Multiplication stages: 1. Generation of partial products (PPG) 2. Reduction of partial products (PPR) 3. Conversion to non-redundant representation. • Problems of decimal implementation: – High value-range for decimal digits (0-9) PPG – Inefficiency of conventional BCD coding PPG, PPR 3 ARITH 18 - Montpellier, France. June 25-27, 2007
A New Family of High-Performance Parallel Decimal Multipliers Previous Work on Decimal Multiplication • Previous proposals for PPG 1. Direct generation of partial products (digit-by-digit) 2. Using multiplicand multiples (X,2X,3X,4X,…,9X). – Direct implementation. – SD multiplier. [Ex. 2 radix5 digits (-5X, 5X) (-2X,-X, X,2X)] • Previous proposals for PPR 1. Carry-save BCD-8421. a. Full BCD operands (3:2 CSAs + correction) b. Carry operand 1 bit each 4-bit. (4-bit decimal CPAs) 2. Signed-digit representation for decimal digits. – SD adders more complex than CSA based implementations. 4 ARITH 18 - Montpellier, France. June 25-27, 2007
A New Family of High-Performance Parallel Decimal Multipliers Proposed techniques • X multiplicand, Y multiplier BCD integer words. • BCD digit represented as: BCD-8421 (r j =2 j ) 3 ∑ = Z z r BCD-4221 (r 3 ,r 2 ,r 1 ,r 0 ) = (4,2,2,1) i i , j j = BCD-5211 (r 3 ,r 2 ,r 1 ,r 0 ) = (5,2,1,1) j 0 1. Decimal carry-save addition using BCD-4221. 2. Implementation of decimal CSAs for PPR. 3. Implementation of PPG using multiplier recoding: – SD radix-10 – SD radix-4. – SD radix-5. 5 ARITH 18 - Montpellier, France. June 25-27, 2007
A New Family of High-Performance Parallel Decimal Multipliers Decimal carry-save addition (BCD-8421) • Add 3 decimal digits to produce 2 decimal digits (sum and carry digits). A i ,B i ,C i ,S i ,H i є [0,9] A i +B i +C i = S i +2H i 2H i є [0,18] and even a i,j b i,j c i,j 8 4 2 1 A i : 5 0 1 0 1 3:2 CSA B i : 6 0 1 1 0 C i : 9 1 0 0 1 Xor (a i,j ,b i,j ,c i,j ) s i,j = Xor S i : 10 1 0 1 0 h i,j = a i,j b i,j + (a i,j + b i,j ) c i,j H i : 5 0 1 0 1 PROBLEM WITH BCD-8421 Carry-out x2 Input digits in [0,9] BUT Sum digit out of Carry-in decimal range [0,9] ->[0,16] 10 1 0 0 0 - 2H i : A i +B i +C i = S i +2H i = 20 Sum digits require correction 6 ARITH 18 - Montpellier, France. June 25-27, 2007
A New Family of High-Performance Parallel Decimal Multipliers Decimal carry-save addition (BCD-4221) • Add 3 decimal digits to produce 2 decimal digits (sum and carry digits). A i ,B i ,C i ,S i ,H i ,W i є [0,9] A i +B i +C i = S i +2H i = S i + L1 shift (W i ) L1- -shift 4 2 2 1 a i,j b i,j c i,j A i : 5 1 0 0 1 3:2 CSA B i : 6 1 1 0 0 C i : 9 1 1 1 1 Xor (a i,j ,b i,j ,c i,j ) s i,j = Xor S i : 6 1 0 1 0 h i,j = a i,j b i,j + (a i,j + b i,j ) c i,j H i : 7 1 1 0 1 SOLUTION WITH BCD-4221 x2 W i : 7 1 1 0 0 (BCD-5211) Input digits in [0,9] and Sum digit always in range [0,9]. 2H i : 14 1 1 0 0 - L1-shift (W i ) Carry-out Carry-in A i +B i +C i = S i +2H i = 20 7 ARITH 18 - Montpellier, France. June 25-27, 2007
A New Family of High-Performance Parallel Decimal Multipliers Decimal carry-save addition (BCD-5211) • Add 3 decimal digits to produce 2 decimal digits (sum and carry digits). A i ,B i ,C i ,S i ,H i є [0,9] A i +B i +C i = S i +2H i = S i + L1 shift (H i ) BCD-4221 L1- -shift 5 2 1 1 a i,j b i,j c i,j A i : 5 1 0 0 0 3:2 CSA B i : 6 1 0 0 1 C i : 9 1 1 1 1 Xor (a i,j ,b i,j ,c i,j ) s i,j = Xor S i : 8 1 1 1 0 h i,j = a i,j b i,j + (a i,j + b i,j ) c i,j H i : 6 1 0 0 1 Carry-in SOLUTION WITH BCD-5211 x2 L1-shift 2H i : 12 1 0 0 1 - Input digits in [0,9] and Sum digit BCD-4221 Carry-out always in range [0,9]. 12 1 0 1 0 - BCD-5211 A i +B i +C i = S i +2H i = 20 8 ARITH 18 - Montpellier, France. June 25-27, 2007
A New Family of High-Performance Parallel Decimal Multipliers Decimal multiplication by ±2 n and ±5 n • Multiplication by 2 • Multiplication by 5 • Multiplication by 2 • Multiplication by 5 x 10 x 10 4 2 2 1 4 2 2 1 4 2 2 1 4 2 2 1 4 2 2 1 25 BCD-4221 25 0 1 0 0 1 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 BCD-4221 x 10 L3-SHIFT x 5 Digit recoding 5 2 1 1 5 2 1 1 x 100 x 10 5 2 1 1 5 2 1 1 5 2 1 1 25 0 1 0 0 1 0 0 0 BCD-5211 BCD-5211 125 0 0 1 0 0 1 0 0 1 - - - x 2 L1-SHIFT x 10 4 2 2 1 4 2 2 1 Digit recoding x 100 x 10 4 2 2 1 4 2 2 1 4 2 2 1 1 0 0 1 0 0 0 0 BCD-4221 50 BCD-4221 125 0 0 0 1 0 1 0 0 1 0 0 1 • Negative operands (10’ ’s complement) by bit inversion (2 s complement) by bit inversion (2’ ’s complement) s complement) • Negative operands (10 BCD-4221 BCD-4221 0 5 9 6 9 4 0 3 0000 1001 1111 1100 1111 0110 0000 0011 +1 Bit-complement Hot-one -596 = - 10000 + 9403 +1 9 ARITH 18 - Montpellier, France. June 25-27, 2007
A New Family of High-Performance Parallel Decimal Multipliers Proposed decimal 3:2 CSA (BCD-4221) A i +B i +C i = S i +2H i = S i + L1 shift (W i ) L1- -shift 10 ARITH 18 - Montpellier, France. June 25-27, 2007
A New Family of High-Performance Parallel Decimal Multipliers Proposed decimal 3:2 CSA (BCD-4221) BCD-4221 BCD-5211 0 0000 0000 Digit recoder 1 0001 0001 BCD-4221 to BCD-5211 2 0010 0100 Critical path 0100 AREA: 18 NAND2 3 0011 0101 (0.35 times 4-bit 3:2 CSA area) 0101 DELAY: 4 FO4 4 0100 0111 (0.9 times binary 3:2 CSA delay) 0110 Decimal (digit) 3:2 CSA 5 1001 1000 0111 AREA: 66 NAND2 6 1100 1010 (1.35 times 4-bit 3:2 CSA area) 1010 *DELAY: 1.4 times carry 7 1101 1011 path/ same sum path 1011 8 1110 1110 *Ratio respect sum path (critical path) delay of bin. 3:2 CSA. 9 1111 1111 11 ARITH 18 - Montpellier, France. June 25-27, 2007
A New Family of High-Performance Parallel Decimal Multipliers Decimal CSA tree (BCD-4221) • Example: 9:2 Decimal CSA (digit slice). 4-bit 3:2 4-bit 3:2 4-bit 3:2 • 1.35 area ratio resp. binary CSA. Critical path • 1.40 delay ratio resp. binary CSA. x2 x2 x2 • Hardware complexity (1 digit): 4-bit 3:2 4-bit 3:2 – 4-bit 3to2: 7x48 NAND2 – Digit recoder (x2): 7x18 NAND2. x2 x2 Mux 2:1 • Critical path delay: 4-bit 3:2 – 1-bit 3to2: 4.5/2.2 FO4 (2/1 XOR) For combined – Recoder: 4 FO4 (1.75 XOR) x2 Decimal/Binary CSA – 9:2 Decimal CSA: 25 FO4. – 9:2 Binary CSA: 18 FO4. 4-bit 3:2 x2 12 ARITH 18 - Montpellier, France. June 25-27, 2007
A New Family of High-Performance Parallel Decimal Multipliers Decimal CSA tree BCD-4221 (area-optimized) • Example: 9:2 Decimal CSA (digit slice). 4-bit 3:2 4-bit 3:2 4-bit 3:2 • Area optimization : Group inputs Critical path with similar multiplicative factor. x2 x1 • 1.20 area ratio resp. binary CSA. 4-bit 3:2 4-bit 3:2 • 1.40 delay ratio resp. binary CSA. x2 x2 • Hardware complexity (1 digit): – 4-bit 3to2: 7x48 NAND2 4-bit 3:2 – Digit recoder (x2): 5x18 NAND2. x2 x1 x2 x2 • Critical path delay: – 9:2 Decimal CSA: 25 FO4. 4-bit 3:2 – 9:2 Binary CSA: 18 FO4. x2 13 ARITH 18 - Montpellier, France. June 25-27, 2007
A New Family of High-Performance Parallel Decimal Multipliers SD radix-10 multiplier recoding • Multiplicand X (BCD-4221) • Multiplier Y (BCD-8421) 4d Y i є [0,9] 4 x 2 x 5 SD radix-10 digit recoder x 2 Yb i є [-5,5] 1 5 (hot-one code) 4d-bit decimal adder Mult. multiples gen. X 2X 3X 4X 5X Mux-5 (recoded sign) Integer d-digit precision operands 4d • 1 SD radix-10 digit/multiplicand digit • d+1 partial products (additional encoded SD radix-10 digit) 14 ARITH 18 - Montpellier, France. June 25-27, 2007
A New Family of High-Performance Parallel Decimal Multipliers SD radix-4 multiplier recoding • Multiplicand X (BCD-4221) • Multiplier Y (BCD-8421) 4d Y i є [0,9] 4 x 2 SD radix-4 digit recoder x 2 1 Yb i = Y U i 4+ Y L i 2 2 x 2 Y U i є [0,2] Y L i є [-2,2] 8X 4X 2X X Mult. multiples gen. (hot-one code) Mux-2 Mux-2 (recoded sign) 4d 4d Integer d-digit precision operands • 2 SD radix-4 digit/multiplicand digit • 2d partial products 15 ARITH 18 - Montpellier, France. June 25-27, 2007
Recommend
More recommend