A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal - - PowerPoint PPT Presentation

▶

Jun 01, 2023 21 likes •222 views

A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal (BCD) Codes Xiaoping Cui, Weiqiang Liu* and Wenwen Dong College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China

SLIDE 1

A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal (BCD) Codes

Xiaoping Cui, Weiqiang Liu* and Wenwen Dong

College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China

Fabrizio Lombardi

Department of Electrical and Computer Engineering, Northeastern University, Boston, USA

SLIDE 2

Outline

Motivation Review of BCD Representations and Decimal Multiplier

The Proposed Partial Product Tree Evaluation and Comparison Conclusions

SLIDE 3

Motivation

Why Decimal Arithmetic is Needed?

Binary arithmetic introduces conversion and rounding errors Decimal arithmetic is highly demanded in many applications 3 (financial, commercial and so on) that cannot tolerant errors. Decimal specification has been added to the revised IEEE 754-2008 standard. High performance decimal arithmetic circuits are required.

SLIDE 4

Introduction

Partial Production Generation

Sign Digit (SD) Radix-10 recoding Redundant BCD excess-3 (XS-3) Overloaded decimal digit set (ODDS) code Double BCD recoding

Generation of Multiplier 5X 4X 3X 2X 1X XS-3 digits in [-3,12] Selection of Multiples (MUX-5) PP[0] … PP[k] … PP[d-1] PP[d] ODDS digits in [0,15] Radix-10 recoder 4d X(BCD) 4d Y(BCD) 6d . . . . . . Yb0 YbK Ybd-1 6 6 6 PPG Block 4(d+1) 4(d+1) 4(d+1) 4(d+1) 4d 4(d+1) 4(d+1) 4(d+1) 4d ... ...

Partial Product Compression

Decimal 3:2 CSA Binary compressor

Final Decimal Adder

Parallel prefix/carry select adder 4

(d+1):2 PPR tree ODDS digits in [0,15] A(BCD XS-6) B(BCD-8421) BCD Adder (2d digits) 8d 8d 8d P(BCD) [7] A. Vazquez, E. Antelo, and J. Bruguera, “Fast Radix-10 Multiplication Using Redundant BCD codes”, IEEE Transactions on Computers, vol. 63,

no. 8, pp. 1902–1914, Aug. 2014.

SLIDE 5

Partial Production Generation

SD Radix-10 recoding scheme

SD Radix-10 Recoding

yi,3yi,2yi,1yi,0 y5i y4i y3iy2iy1i (ysi-1=0) ysi y5i y4i y3iy2iy1i (ysi-1=1) ysi 0000 00000 00001 0001 00001 00010

5&& 5

i i i

Y Y Y − < <   + < ≥

5

0010 00010 00100 0011 00100 01000 0100 01000 10000 0101 10000 01000 1 0110 01000 1 00100 1 0111 00100 1 00010 1 1000 00010 1 00001 1 1001 00001 1 00000 1

1 1 1

1 5&& 5 (10 ) 5&& 5 (10 ) 1 5&& 5

i i i i i i i i i i

Y Y Y Yb Y Y Y Y Y Y

− − −

+ < ≥  = − − ≥ <  − − + ≥ ≥ 

SLIDE 6

Partial Production Generation

Redundant BCD Codes 6

SLIDE 7

XS-3 Recoding (Redundant Odds [0, 15])

Partial Production Generation

N*Xd-1+3…………… N*Xi+3 N*Xi-1+3 ……… N*X0+3 4 4 4 4 X0 Xi-1 Xi Xd-1 D0 T0 Di-1 Ti-1 Ti-2 Di Ti Dd-1 Td-2 Td-1 ……………… …………… Digit-set [0,9] STEP 1: Digit Mappings [3,9N+3]

X(BCD)

╳5

╳5 ╳4 ╳3 ╳2

Xi+3 5X 4X 3X 2X 1X Digits in XS-3[-3,12]

4(d+1) 4(d+1) 4(d+1) 4(d+1) 4d

D0 T0 Di-1 Ti-1 Ti-2 Di Ti Dd-1 Td-2 Td-1

+ +

4 4 4 ……… 4 … NX0 NXi-1 NXi NXd-1 Carry-out STEP 2: Carry assimilation [-3,12]

Digits in XS-3[-3,12]

MUX-5 4 4 4 4 4 5Xi 4Xi 3Xi 2Xi 1Xi 1 1 1 1 1 1 1 1 SD Radix-10 1digit encoding Y5kY4k Y3kY2kY1k 4 Yk(BCD) Ysk Ysk-1 Ysk Ysk

Digit in XS-3[-3,12] Digits in ODDS[0,15]

Convert the XS-3 digits to ODDS by adding pre-computed correction term: fc(16)=1032 + 07407407407407417037037037037037 Advantage of XS-3 Codes: difficult multiples (such as 3X) can be obtained in a carry-free manner

SLIDE 8

Decimal PP Compression Using ODDS

Generation of Multiplier 5X 4X 3X 2X 1X XS-3 digits in [-3,12] Selection of Multiples (MUX-5) PP[0] … PP[k] … PP[d-1] PP[d] Radix-10 recoder 4d X(BCD) 4d Y(BCD) 6d . . . . . . Yb0 YbK Ybd-1 6 6 6 PPG Block 4(d+1) 4(d+1) 4(d+1) 4(d+1) 4d

Partial Product Compression

The (d+1:2) PP Reduction (PPR): (1) A regular binary CSA tree (2) A binary counter is used to count

PP[0] … PP[k] … PP[d-1] PP[d] (d+1):2 PPR tree ODDS digits in [0,15] A(BCD XS-6) B(BCD-8421) BCD Adder (2d digits) 4(d+1) 4(d+1) 4(d+1) 4d ... ... 8d 8d 8d P(BCD)

(3) The ODDS partial products in (1) and (2) are added by the binary CSA tree and the decimal digit 3:2 compressor

(2) A binary counter is used to count carries generated between the digit columns in the binary CSA tree

SLIDE 9

Decimal 3:2 CSA Decimal PP Compression Based on BCD-4221/521 1

ci,j bi,j ai,j

Partial Product (PP) Compression

si,j hi,j

SLIDE 10

Proposed Design

A B

P Partial product compression Partial product generation Final decimal adder New Design of PPR Tree Block

SLIDE 11

Proposed Design: A New PPR Tree

Proposed PPR (reduction) Tree (d+1):2 Binary CSA Tree (ODDS to 4221) BCD-4221 Sum Correction Block (Decimal Counter) A Decimal Digit 3:2 Compressor (BCD-4221)

SLIDE 12

A PPR Tree FOR 16*16-digit multiplier

17:2 Binary PPR tree

8-bit counter 3-bit 3-bit counter · · · 4 1 8 1 3 3 2 PPi[0] PPi[k] PPi[16] . . . . . . Ci-1[0] · · · · · · 3 1 ui,3; ui,2; ui,1 vi,1 Ci[0] · · · Ci[7] Ci[8] Ci[9] Ci[10] Ci[11] Ci-1[K] Ci-1[13] 4 2 BCD-4221 Sum Correction Block p1 p2 p3 p4 cout2 cin p1 p2 p3 p4 cout2 cin p1 p2 p3 p4 cout2 cin p1 p2 p3 p4 cout2 cin +6 c

The No. of PP rows in the 1st, 2nd, 3rd and 4th stages are 17, 9, 6 and 4, respectively.

counter 6:2 Decimal PPR Tree 4 4 2 4221 4221 2*4221 x6 x6 x6 4221 4221 4221 4221 4*4221 5 5 1 1 3 1 4 4 Bi Ai zi,1 Ci[12] Ci[13] Hi(4221) Si(4221) x2 x1 ui-1,3; ui-1,2; ui-1,1 vi-1,1 zi-1,1 cout2 cin cout1 sum cout2 cin cout1 sum cout2 cin cout1 sum cout1 sum

(2) (1) (4) (2) (8) (4) (16) (8)

si,3 ci,3

ci[13]

si,2 ci,2 si,1 ci,1 si,0 ci,0

ci-1[13]

4-bit binary 4:2 compressor in last compression stage

SLIDE 13

the Proposed PPR Tree

3:2 3:2 HA 3:2 3:2

ui,3ui,2ui,1ui-1,1 ui,0ui,0ui-1,3ui-1,2

3:2 3:2 3:2 C[4] C[0] C[5] C[6] C[7] C[1] C[2] C[3] C[8]C[9]C[10] C[11]C[12]C[13] 4 4 4 4

ui,3

ui,2

ui,1

ui,0

vi,1

vi,0

vi,0vi,0vi,1vi-1,1 zi,1

zi,0

zi,0zi,0zi,1zi-1,1

BCD-4221 Sum Correction Block F 8-bit counter 3-bit counter

BCD-4221 8-bit and 3-bit counter correction

The 8-bit BCD-4221 counter is faster than a binary counter (only two 3:2 CSA delay). 3-bit counters are used to generate a

3:2 x2 3:2 3:2 x2 x2 4221 4221 x2 x2 4 4 6:2 Decimal PPR Tree Block Ai(4221) Bi(4*4221) Hi(4221) Si(4221) 4 4 x2 x1 F F F

6:2 decimal PPR tree block

BCD-4221 decimal correction digit by using only one 3:2 compressor. To balance the paths in the decimal 6:2 PPR tree and reduce the critical path.

SLIDE 14

Using Hybrid (Multiple) BCD Codes

44221 4221 BCD-4221 PPR Tree BCD- 8421 PPG XS-3 24221 4221 ODDS BCD-4221 sum correction block Binary PPR Tree Adder set excess-6 BCD-8421 Decimal Adder BCD- 8421

SLIDE 15

Advantages of the proposed PPR tree 1

A BCD-4221 counter is faster than a binary counter (a 8-bit counter has two 3:2 CSA stages, and 3-bit counter has one 3:2 CSA stage.)

2

A non-fixed size BCD-4221 counter correction block is used to

3 2

A non-fixed size BCD-4221 counter correction block is used to balance the paths and reduce the critical path delay of decimal 6:2 PPR tree. The final two PP rows are generated using a decimal PPR tree based on BCD-4221 that is easy to be converted to BCD-8421.

SLIDE 16

Evaluation

Block Delay #FO4 Area #NAND2 PPG Stage

10.2 14900

Area and Delay (LE-Based Model) for the Proposed 16×16-digit Multipliers.

PPR Tree

25.3 14306

Adder Setup

3.2 1050

Decimal Adder

11.5 2400

Total

50.2 32656 16

SLIDE 17

Evaluation

Design Delay #FO4 Area #NAND2 Non-Redundant [13]

58.3 35750

Area and Delay (LE-Based) Comparision for Different BCD Multiplier Designs.

Redundant [7]

51.4 30600

Proposed

50.2 Compared with [13] -13.89% Compared with [7] -2.23% 32656 Compared with [13] -8.65% Compared with [7] +6.05%

[7] A. Vazquez, E. Antelo, and J. Bruguera, “Fast Radix-10 Multiplication Using Redundant BCD codes”, IEEE Transactions on Computers, vol. 63, no. 8, pp. 1902–1914, Aug. 2014. [13] A. Vazquez, E. Antelo and P. Montuschi, “Improved Design of High-Performance Parallel Decimal Multipliers”, IEEE Transactions

n Computers, vol. 59, no. 5, pp. 679–693, May 2010.

SLIDE 18

Evaluation

Design Delay (ns) Ratio

Area(μm2)

Ratio Proposed

3.21 1 43053.5 1

Area and Delay Comparison Using NanGate 45nm open cell library

The proposed design reduces the delay by 12.30% and the area by 10.9% compared with [13]. [7] reduces the delay by 10.75% and the area by 11.1% compared with [13] (no direct comparison as some parts of PPR circuit of [7] are not provided in detail).

Non-Redundant [13]

3.66 1.14 48326.1 1.12 18

SLIDE 19

Conclusion

Design of parallel decimal multiplier is studied A parallel decimal multiplier based on a new PPR tree is proposed by using: A BCD-4221 sum correction block with non-fixed size counters, 19 A decimal PPR tree based on BCD-4221 decimal digit 3:2 compressor. The proposed parallel decimal multiplier is faster than previous best designs.

SLIDE 20

A Parallel Decimal Multiplier Using Hybrid Binary Coded Decimal (BCD) Codes

Xiaoping Cui, Weiqiang Liu* and Wenwen Dong

Fabrizio Lombardi

Outline

Motivation Review of BCD Representations and Decimal Multiplier

The Proposed Partial Product Tree Evaluation and Comparison Conclusions

Motivation

Why Decimal Arithmetic is Needed?

Introduction

Partial Production Generation

Partial Product Compression

Final Decimal Adder

Partial Production Generation

SD Radix-10 Recoding

5&& 5

Y Y Y − < <   + < ≥

5

1 5&& 5 (10 ) 5&& 5 (10 ) 1 5&& 5

Y Y Y Yb Y Y Y Y Y Y

+ < ≥  = − − ≥ <  − − + ≥ ≥ 

Partial Production Generation

Redundant BCD Codes 6

XS-3 Recoding (Redundant Odds [0, 15])

Partial Production Generation

Convert the XS-3 digits to ODDS by adding pre-computed correction term: fc(16)=1032 + 07407407407407417037037037037037 Advantage of XS-3 Codes: difficult multiples (such as 3X) can be obtained in a carry-free manner

Decimal PP Compression Using ODDS

Partial Product Compression

The (d+1:2) PP Reduction (PPR): (1) A regular binary CSA tree (2) A binary counter is used to count

(3) The ODDS partial products in (1) and (2) are added by the binary CSA tree and the decimal digit 3:2 compressor

(2) A binary counter is used to count carries generated between the digit columns in the binary CSA tree

Decimal 3:2 CSA Decimal PP Compression Based on BCD-4221/521 1

Partial Product (PP) Compression

Proposed Design

P Partial product compression Partial product generation Final decimal adder New Design of PPR Tree Block

Proposed Design: A New PPR Tree

Proposed PPR (reduction) Tree (d+1):2 Binary CSA Tree (ODDS to 4221) BCD-4221 Sum Correction Block (Decimal Counter) A Decimal Digit 3:2 Compressor (BCD-4221)

A PPR Tree FOR 16*16-digit multiplier

The No. of PP rows in the 1st, 2nd, 3rd and 4th stages are 17, 9, 6 and 4, respectively.

4-bit binary 4:2 compressor in last compression stage

the Proposed PPR Tree

BCD-4221 8-bit and 3-bit counter correction

6:2 decimal PPR tree block

Using Hybrid (Multiple) BCD Codes

4*4221 4221 BCD-4221 PPR Tree BCD- 8421 PPG XS-3 2*4221 4221 ODDS BCD-4221 sum correction block Binary PPR Tree Adder set excess-6 BCD-8421 Decimal Adder BCD- 8421

Advantages of the proposed PPR tree 1

A BCD-4221 counter is faster than a binary counter (a 8-bit counter has two 3:2 CSA stages, and 3-bit counter has one 3:2 CSA stage.)

2

A non-fixed size BCD-4221 counter correction block is used to

3 2

A non-fixed size BCD-4221 counter correction block is used to balance the paths and reduce the critical path delay of decimal 6:2 PPR tree. The final two PP rows are generated using a decimal PPR tree based on BCD-4221 that is easy to be converted to BCD-8421.

Evaluation

Area and Delay (LE-Based Model) for the Proposed 16×16-digit Multipliers.

Evaluation

Area and Delay (LE-Based) Comparision for Different BCD Multiplier Designs.

Evaluation

Area and Delay Comparison Using NanGate 45nm open cell library

Conclusion

Thank you! Thank you! Questions?

44221 4221 BCD-4221 PPR Tree BCD- 8421 PPG XS-3 24221 4221 ODDS BCD-4221 sum correction block Binary PPR Tree Adder set excess-6 BCD-8421 Decimal Adder BCD- 8421