¡ A Novel Modular Adder for One Thousand Bits and More Using Fast Carry Chains of Modern FPGAs Marcin Rogawski, Ekawat Homsirikamol & Kris Gaj George Mason University USA 1 ¡
Co-Authors Ekawat Homsirikamol Marcin Rogawski a.k.a “Ice” PhD @ GMU, Summer 2013 PhD Student Currently @ Cadence Design Systems San Jose, CA
Motivation • Adders used in multiple branches of science & engineering • Basic building block of more complex arithmetic computations (multiplication, modular reduction, etc.) • Need for long-operand adders ( ≥ 1024 bits) in cryptography (RSA, Diffie-Hellman, Elliptic Curve Cryptography, Pairing-Based Cryptography, post-quantum cryptography) • FPGAs contain special dedicated resources (fast carry chains) supporting fast addition, but only for operands in the range of 32-64 bits. 3
Fast Carry Chains of Modern FPGAs Xilinx FPGAs Altera FPGAs cin a) b) cin b 0 a 0 s 0 s LUT LUT FA 0 0 1 a 0 b 0 b 1 a 1 s 1 s LUT LUT FA 1 0 1 a 1 b 1 cout cout • Minimize delays • Save reconfigurable resources 4
Parallel Prefix Network (PPN) Adder 5
Parallel Prefix Network – Major Concept (1) Given: Generate-propagate signals for each bit position (g n-1 , p n-1 ) …. (g 2 , p 2 ) (g 1 , p 1 ) (g 0 , p 0 ) Calculate (in parallel): Generate-propagate signals for each block of bits starting at position 0 (g [0,n-1] , p [0,n-1] ) …. (g [0,2] , p [0,2] ) (g [0,1] , p [0,1] ) (g [0,0] , p [0,0] ) 6
Parallel Prefix Network – Major Concept (2) Calculate: Projected carry at position i pc i = g [0,i-1] + c 0 p [0,i-1] Assuming c 0 = 0 (no need to cascade adders that are already very long): pc i = g [0,i-1] where i=1..n 7
Kogge-Stone PPN • Minimum Latency (log 2 N) • Large Area 8
Brent-Kung PPN • Good trade-off between Latency (2 log 2 N – 2) and Area 9
Parallel Prefix Network (PPN) Adder in FPGA • All logic must be implemented using LUTs! • Large PPN required (e.g., n=1024) 10
Our High-Radix Parallel Prefix Network Adder 11
GPS: Generate-Propagate-Sum in Xilinx FPGAs 12
S: Sum unit in Xilinx FPGAs 13
Our High-Radix Parallel Prefix Network Adder • GPS and S units implemented using Fast Carry Chains • The size of PPN reduced from n=1024 to N=1024/w 14
General Construction for the Modular Adder A B R = A + B mod P n n cout#1 R = A + B – P n when 2 − P n n n A + B ≥ 2 n > P (cout#1) cout#2 or A + B – P ≥ 0 (cout#2) n n R = A + B 0 1 n otherwise R 15
Our Construction for the High-Radix PPN Modular Adder A B A B A B N − 1 N − 1 1 1 0 0 w w w w fg fg w w N − 1 0 fp N − 1 fp fg fg fg 0 N − 1 1 0 GPS GPS GPS fp N − 1 fp fp 0 1 w w w fpc N fg PPN N − 2 IP IP IP w w w N − 1 1 0 sg sg sg N − 1 1 0 GPSc GPSc fpc N − 1 fpc 1 GPS sp N − 1 sp 1 sp 0 w w w w w w sg sg sel N − 1 0 0 1 0 1 0 1 sel sel sp N − 1 sp 0 fpc fpc 1 0 0 N − 1 S S 1 spc N − 1 1 spc spc PPN 1 N w w w spc spc R R R N − 1 1 0 N − 1 1 16
GPSc: Generate-Propagate-Sum with carry 17
Our Construction for the High-Radix PPN Modular Adder A B A B A B N − 1 N − 1 1 1 0 0 w w w w fg fg w w N − 1 0 fp N − 1 fp fg fg fg 0 N − 1 1 0 GPS GPS GPS fp N − 1 fp fp 0 1 w w w fpc N fg PPN N − 2 IP IP IP w w w N − 1 1 0 sg sg sg N − 1 1 0 GPSc GPSc fpc N − 1 fpc 1 GPS sp N − 1 sp 1 sp 0 w w w w w w sg sg sel N − 1 0 0 1 0 1 0 1 sel sel sp N − 1 sp 0 fpc fpc 1 0 0 N − 1 S S 1 spc N − 1 1 spc spc PPN 1 N w w w spc spc R R R N − 1 1 0 N − 1 1 Two additions: overlapped in time sharing resources (S units) 18
Target FPGA Families Xilinx FPGAs Technology ¡ Low-‑cost ¡ High-‑ performance ¡ 65 ¡nm ¡ Virtex-‑5 ¡ 45 ¡nm ¡ Spartan-‑6 ¡ Altera FPGAs Technology ¡ Low-‑cost ¡ High-‑ performance ¡ 65 ¡nm ¡ Stra2x ¡III ¡ 40 ¡nm ¡ Cyclone ¡IV ¡ 19
Design Flow SpecificaEon ¡ Test ¡Vectors ¡ RTL Design Functional VHDL ¡Code ¡ Verification Option Optimization Post ¡ GMU ATHENa (FPL 2010) & Parameter Exploration Place ¡& ¡Route ¡ Results ¡ FPGA ¡Tools ¡ Timing Netlist ¡ Verification ATHENa used to simplify parameter exploration (multiple values of generics) and option optimization for both Xilinx and Altera FPGAs 20
Choosing the Best PPN & Word Size Adders – Altera Cyclone IV 21
Choosing the Best PPN & Word Size Adders – Xilinx Spartan 6 22
Choosing the Best PPN & Word Size Modular Adders – Altera Cyclone IV 23
Choosing the Best PPN & Word Size Modular Adders – Xilinx Spartan 6 24
The Best Choices of PPN Type & Word Size Adders Modular Adders Family PPN (w, N) PPN Word Size Cyclone IV KS (16, 64) KS (16, 64) Stratix III BK (16, 64) BK (16, 64) Spartan 6 BK (32, 32) KS (128, 8) Virtex 5 KS (16, 64) KS (64, 16) KS: Kogge-Stone Parallel Prefix Network BK: Brent-Kung Parallel Prefix Network w – word size N – size of PPN 25
The Best Long-Operand Adders Proposed to Date H.D. Nguyen, B. Pasca, T.B. Preu β er, FPGA-Specific Arithmetic Optimizations of Short-Latency Adders FPL 2011, Chania, Greece • Adders based on Carry-Select Architecture (rather than PPN Architecture) • Three specific architectures proposed • AAM: Add-Add-Multiplex • CAI: Compare-Add-Increment • CCA: Compare-Compare-Add • Limited results ( only Virtex 5 ) included in the original paper • AAM architecture re-implemented and results collected for different FPGAs 26
Comparison with Other Adders – Virtex 5 27
Comparison with Other Adders – Virtex 5 28
Comparison with Other Adders – Spartan 6 29
Comparison with Other Adders – Spartan 6 30
Comparison with Other Adders – Cyclone IV 31
Comparison with Other Adders – Stratix III 32
Comparison Between Modular Adders Xilinx Virtex 5 33
Comparison Between Modular Adders Xilinx Virtex 5 34
Comparison Between Modular Adders Xilinx Spartan 6 35
Comparison Between Modular Adders Xilinx Spartan 6 36
Comparison Between Modular Adders Altera Stratix III 37
Comparison Between Modular Adders Altera Stratix III 38
Comparison Between Modular Adders Altera Cyclone IV 39
Comparison Between Modular Adders Cyclone IV 40
Modular Adder/Subtractor B A B n n A B n SUB n n n n 0 1 1 n n cout#1 A 2 − P P n n n n n cout#1 2 − P n 0 1 n n P n n n n n cout#2 n n n n n cout#2 n SUB 0 1 n n n 0 1 0 n 0 1 R 1 n R R 41
Overhead of Modular Adder/Subtractor Altera Cyclone IV 42
Overhead of Modular Adder/Subtractor Altera Stratix III 43
Overhead of Modular Adder/Subtractor Xilinx Spartan 6 44
Overhead of Modular Adder/Subtractor Xilinx Virtex 5 45
Proposed New Dedicated Resources of Modern FPGAs • Dedicated (hardwired) PPNs (Kogge-Stone and/or Brent-Kung) • Standard sizes (e.g., 32 and/or 64) • Support fast addition and modular addition for large operand sizes (as described in this paper) • Support for fast addition and modular addition of medium operand sizes (up to 64), using classical PPN adders • Pipelined registers that can be activated or bypassed 46
Conclusions • A new family of High-Radix Parallel Prefix Network Adders using fast carry chains of modern FPGAs • New family outperforming the best previously known FPGA-specific adders and modular adders for Xilinx FPGAs • Very small performance penalty for an extension to adders/subtractors • A proposal for embedding medium-size hardwired PPN structures in the new generations of FPGAs 47
Future Work • Possible optimizations for Altera FPGAs • Better (preferably analytical) method of choosing an optimum word size for • Other FPGA families • Other operand sizes • Optimal method of pipelining for adders and modular adders • Extended and more detailed proposal of new FPGA resources supporting fast addition 48
Thank you! Suggestions? Questions? ATHENa: http:/cryptography.gmu.edu/athena CERG: http://cryptography.gmu.edu 49
Recommend
More recommend