a parallel decimal multiplier using hybrid binary coded
play

A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal - PowerPoint PPT Presentation

A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal (BCD) Codes Xiaoping Cui, Weiqiang Liu* and Wenwen Dong College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China


  1. A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal (BCD) Codes Xiaoping Cui, Weiqiang Liu* and Wenwen Dong College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China Fabrizio Lombardi Department of Electrical and Computer Engineering, Northeastern University, Boston, USA

  2. Outline Motivation Review of BCD Representations and Decimal Multiplier The Proposed Partial Product Tree Evaluation and Comparison Conclusions 2

  3. Motivation Why Decimal Arithmetic is Needed? � Binary arithmetic introduces conversion and rounding errors � Decimal arithmetic is highly demanded in many applications (financial, commercial and so on) that cannot tolerant errors. � Decimal specification has been added to the revised IEEE 754-2008 standard. � High performance decimal arithmetic circuits are required. 3

  4. X(BCD) Y(BCD) 4d 4d Introduction Radix-10 Generation of Multiplier � Partial Production Generation recoder 5X 4X 3X 2X 1X 4(d+1) 4(d+1) 4(d+1) 4(d+1) 4d 6d Sign Digit (SD) Radix-10 recoding Yb d-1 6 XS-3 digits in [-3,12] . PPG . Redundant BCD excess-3 (XS-3) . 6 Yb K Block Selection of Multiples (MUX-5) . PP[0] … PP[k] … PP[d-1] PP[d] . . 6 Yb 0 Overloaded decimal digit set (ODDS) code 4(d+1) 4(d+1) 4(d+1) 4d ... ... Double BCD recoding ODDS digits in [0,15] ODDS digits in [0,15] (d+1):2 PPR tree � Partial Product Compression A(BCD XS-6) B(BCD-8421) Decimal 3:2 CSA 8d 8d BCD Adder (2d digits) Binary compressor 8d � Final Decimal Adder P(BCD) [7] A. Vazquez, E. Antelo, and J. Bruguera, “Fast Radix-10 Multiplication Parallel prefix/carry select adder Using Redundant BCD codes”, IEEE Transactions on Computers , vol. 63, no. 8, pp. 1902–1914, Aug. 2014. 4

  5. Partial Production Generation SD Radix-10 Recoding SD Radix-10 recoding scheme � y i,3 y i,2 y i,1 y i,0 y5 i y4 i y3 i y2 i y1 i ys i y5 i y4 i y3 i y2 i y1 i ys i (ys i-1 =0) (ys i-1 =1) 0000 00000 0 00001 0 < <  Y Y 5&& Y − 5 i i i 1  0001 00001 0 00010 0 + + < < ≥ ≥ Y 1 Y 5&& Y 5  1 i i i − 0010 00010 0 00100 0 = − Yb i − ≥ < (10 Y ) Y 5&& Y 5 0011 00100 0 01000 0  i i i 1 − − 0100 01000 0 10000 0 − + ≥ ≥ (10 Y ) 1 Y 5&& Y 5  i i i 1 − 0101 10000 0 01000 1 0110 01000 1 00100 1 0111 00100 1 00010 1 1000 00010 1 00001 1 1001 00001 1 1 00000 5

  6. Partial Production Generation Redundant BCD Codes 6

  7. Partial Production Generation XS-3 Recoding (Redundant Odds [0, 15]) X(BCD) Digit-set [0,9] ……………… …………… 4d X d-1 X i X i-1 X 0 ╳ 5 ╳ 5 ╳ 4 ╳ 3 ╳ 2 X i +3 STEP 1: Digit 4 4 4 4 N*X d-1 +3 …………… N*X i +3 N*X i-1 +3 ……… N*X 0 +3 4(d+1) 4d 4(d+1) 4(d+1) 4(d+1) Mappings 5X 4X 3X 2X 1X [3,9N+3] D i D i T d-1 T d-1 D d-1 T d-2 D d-1 T d-2 T i T i D i-1 D i-1 T i-2 T i-2 T 0 T 0 D 0 D 0 Digits in XS-3[-3,12] Digits in XS-3[-3,12] T i-1 T i-1 Carry-out ……… … Y k (BCD) STEP 2: 4 + + + Carry assimilation 4 Digit in XS-3[-3,12] 4 4 4 Ys k-1 Ys k SD Radix-10 [-3,12] NX d-1 NX i NX 0 5X i 4X i 3X i 2X i 1X i NX i-1 1digit encoding 4 4 4 4 Y5 k Y4 k Y3 k Y2 k Y1 k 4 MUX-5 Convert the XS-3 digits to ODDS by Advantage of XS-3 Codes: difficult Ys k 1 1 1 1 adding pre-computed correction term: multiples (such as 3X) can be obtained in * Ys k fc(16)=10 32 + 1 1 1 1 a carry-free manner 07407407407407417037037037037037 Digits in ODDS[0,15] 7

  8. Partial Product Compression X(BCD) Y(BCD) Decimal PP Compression 4d 4d Using ODDS Radix-10 Generation of Multiplier recoder 5X 4X 3X 2X 1X The (d+1:2) PP Reduction (PPR): 4(d+1) 4(d+1) 4(d+1) 4(d+1) 4d 6d (1) A regular binary CSA tree Yb d-1 6 XS-3 digits in [-3,12] . PPG . . 6 Yb K Block Selection of Multiples (MUX-5) . PP[0] … PP[k] … PP[d-1] PP[d] PP[0] … PP[k] … PP[d-1] PP[d] . (2) A binary counter is used to count (2) A binary counter is used to count . 6 Yb 0 0 carries generated between the digit 4(d+1) 4(d+1) 4(d+1) 4d ... ... columns in the binary CSA tree ODDS digits in [0,15] (d+1):2 PPR tree (3) The ODDS partial products in A(BCD XS-6) B(BCD-8421) (1) and (2) are added by the 8d 8d binary CSA tree and the decimal BCD Adder (2d digits) digit 3:2 compressor 8d P(BCD) 8

  9. Partial Product (PP) Compression Decimal PP Compression � Decimal 3:2 CSA Based on BCD-4221/521 1 a i,j b i,j c i,j h i,j s i,j 9

  10. Proposed Design Partial Partial A Final product product P decimal compression generation B adder New Design of PPR Tree Block 10

  11. Proposed Design: A New PPR Tree Proposed PPR (reduction) Tree (d+1):2 Binary BCD-4221 Sum A Decimal Digit CSA Tree Correction Block 3:2 Compressor (ODDS to 4221) (Decimal Counter) (BCD-4221) 11

  12. A PPR Tree FOR 16*16-digit multiplier PP i [0] PP i [k] PP i [16] BCD-4221 Sum � The No. of PP rows in the 1 st , . . . . . . Correction Block 8 1 2 nd , 3 rd and 4 th stages are 17, 9, C i-1 [0] 8-bit C i [0] 17:2 · counter · u i,3; u i,2; u i,1 · Binary · · 3 4 · · 6 and 4, respectively. · PPR tree 1 · 4 C i-1 [K] C i [7] 3 3-bit C i [8] counter · C i [9] · v i,1 1 2 C i [10] p 1 p 2 p 3 p 4 p 1 p 2 p 3 p 4 p 1 p 2 p 3 p 4 p 1 p 2 p 3 p 4 · +6 C i-1 [13] 2 C i [11] 3 3-bit cout2 cin cout2 cin cout2 cin cout2 cin cout2 cin cout2 cin cout2 cin c c i-1 [13] counter C i [12] c i [13] z i,1 C i [13] 2 1 cout1 sum cout1 sum cout1 sum cout1 sum A i B i 1 x6 z i-1,1 1 x6 v i-1,1 (16) (8) (8) (4) (4) (2) (2) (1) 3 c i,3 s i,3 c i,2 c i,1 s i,1 c i,0 s i,0 x6 s i,2 4 4 u i-1,3; u i-1,2; u i-1,1 4221 4221 4*4221 4221 4221 4221 5 5 � 4-bit binary 4:2 compressor 6:2 Decimal PPR Tree 2*4221 4221 4 4 x2 x1 in last compression stage S i (4221) H i (4221) 12

  13. the Proposed PPR Tree C[4] C[0] BCD-4221 Sum C[1] C[2] C[3] C[5] C[6] C[7] BCD-4221 8-bit and Correction Block HA 3:2 3:2 3-bit counter correction C[8]C[9]C[10] C[11]C[12]C[13] 3-bit 8-bit counter 3:2 3:2 3:2 3:2 counter � The 8-bit BCD-4221 counter is faster 1 1 1 1 1 1 1 1 u i,3 u i,2 u i,1 u i,0 v i,1 v i,0 z i,1 z i,0 than a binary counter (only two 3:2 CSA u i,3 u i,2 u i,1 u i-1,1 u i,0 u i,0 u i-1,3 u i-1,2 v i,0 v i,0 v i,1 v i-1,1 z i,0 z i,0 z i,1 z i-1,1 delay). 4 4 4 4 � 3-bit counters are used to generate a F 3:2 BCD-4221 decimal correction digit by B i (4*4221) A i (4221) x2 using only one 3:2 compressor. 4 4 F 3:2 4221 4221 � To balance the paths in the decimal 6:2 6:2 Decimal PPR tree and reduce the critical path. x2 x2 PPR Tree Block x2 F 3:2 x2 F 3:2 x2 x1 4 4 6:2 decimal PPR tree block H i (4221) S i (4221) 13

  14. Using Hybrid (Multiple) BCD Codes BCD- 8421 PPG 4*4221 BCD-4221 2*4221 XS-3 4221 PPR Tree 4221 Binary BCD-4221 sum Adder BCD- ODDS PPR Tree correction block set 8421 Decimal excess-6 Adder BCD-8421 14

  15. Advantages of the proposed PPR tree A BCD-4221 counter is faster than a binary counter (a 8-bit counter has 1 two 3:2 CSA stages, and 3-bit counter has one 3:2 CSA stage.) 2 2 A non-fixed size BCD-4221 counter correction block is used to A non-fixed size BCD-4221 counter correction block is used to balance the paths and reduce the critical path delay of decimal 6:2 PPR tree. 3 The final two PP rows are generated using a decimal PPR tree based on BCD-4221 that is easy to be converted to BCD-8421. 15

  16. Evaluation Area and Delay (LE-Based Model) for the Proposed 16 × 16-digit Multipliers. Delay Area Block #FO4 #NAND2 PPG Stage 10.2 14900 PPR Tree 25.3 14306 Adder Setup 3.2 1050 Decimal Adder 11.5 2400 Total 50.2 32656 16

  17. Evaluation Area and Delay (LE-Based) Comparision for Different BCD Multiplier Designs. Delay Area Design #FO4 #NAND2 Non-Redundant [13] 58.3 35750 Redundant [7] 51.4 30600 50.2 32656 Proposed Compared with [13] -13.89% Compared with [13] -8.65% Compared with [7] -2.23% Compared with [7] +6.05% [7] A. Vazquez, E. Antelo, and J. Bruguera, “Fast Radix-10 Multiplication Using Redundant BCD codes”, IEEE Transactions on Computers , vol. 63, no. 8, pp. 1902–1914, Aug. 2014. [13] A. Vazquez, E. Antelo and P. Montuschi, “Improved Design of High-Performance Parallel Decimal Multipliers”, IEEE Transactions on Computers , vol. 59, no. 5, pp. 679–693, May 2010. 17

  18. Evaluation Area and Delay Comparison Using NanGate 45nm open cell library Delay Area(μm 2 ) Design Ratio Ratio (ns) Proposed 3.21 1 43053.5 1 Non-Redundant 3.66 1.14 48326.1 1.12 [13] � The proposed design reduces the delay by 12.30% and the area by 10.9% compared with [13]. � [7] reduces the delay by 10.75% and the area by 11.1% compared with [13] (no direct comparison as some parts of PPR circuit of [7] are not provided in detail). 18

  19. Conclusion � Design of parallel decimal multiplier is studied � A parallel decimal multiplier based on a new PPR tree is proposed by using: � A BCD-4221 sum correction block with non-fixed size counters, � A decimal PPR tree based on BCD-4221 decimal digit 3:2 compressor. � The proposed parallel decimal multiplier is faster than previous best designs. 19

Recommend


More recommend