Solving the Discrete Logarithm of a 113-bit Koblitz Curve with an FPGA Cluster Erich Wenger and Paul Wolfger Graz University of Technology WECC 2014, Chennai, India
We solved… • the discrete logarithm of a 113-bit Koblitz Curve. • Challenge generated using SHA-256 • Extrapolated 24 days on 18 Virtex-6 FPGAs
ECDLP Records • In 2000: Binary Koblitz curve - ECC2K-108 using 9,500 PCs in 126 days • In 2004: Binary elliptic curve - ECC2-109 using 2,600 PCs for 510 days • In 2012: Elliptic curve over 112-bit prime field using 200 Playstation 3 for 6 months
TU Graz Records • IT-Security Lecture • 2012: 75 bit in days on quad-core • 2013: 80 bit in 17 days on Core i5-2400 • Master project: • Virtex 6 FPGA • 83 bit in (avg) 4.1 days • Room for improvement…
The higher the security level… …the lower the speed. With knowledge on the best attacks… … realistic security bounds are possible. …potentially smaller parameters can be used. …potentially faster algorithms can be used.
Elliptic Curve Discrete Logarithm Problem Parallelized Pollard’s Rho Algorithm
We are looking for…
Pollard’s Rho Algorithm Iteration function Parallelized Pollard’s Rho
Iteration Function
Iteration Function 41-bit Koblitz Curve Reference Iteration function Expected Measured iterations iterations 929 · 10 3 906 · 10 3 Teske [29] f ( X i ) = X i + R [ j ] σ l ( X i + R [ j ]) 145 · 10 3 147 · 10 3 � Wiener and Zuccherato [31] f ( X i ) = min 0 ≤ l<m f ( X i ) = X i + σ l ( X i ) 145 · 10 3 166 · 10 3 Gallant et al. [14] f ( X i ) = X i + σ ( l mod 16) / 2+3 ( X i ) 145 · 10 3 166 · 10 3 Bailey et al. [4]
Architecture
FPGA Development Board
ASIC Design One NAND Gate: AND OR XOR • 4 Transistors NAND NOR XNOR µm 2 • 3.136 @ UMC 90nm • Register/Flip-flop: 1.25 1.25 2.5 ~5 GE 1 GE 1 2.5
X-Ref Target - Figure 3 FPGA Design S RHI Re s et Type D S RLO Q INIT1 S ync/A s ync CE INIT0 COUT CK S R FF/LAT DX DMUX DI2 D6:1 A6:A1 W6:W1 D D O6 FF/LAT O5 INIT1 DX Q DQ INIT0 D DI1 CK S RHI CE S RLO S RHI WEN MC 3 1 D CK S RLO S R Q INIT1 DI CE INIT0 CK S R CX CMUX DI2 C6:1 A6:A1 W6:W1 C C O6 FF/LAT O5 INIT1 CX CQ Q INIT0 D DI1 CK S RHI CE S RLO WEN MC 3 1 S RHI CK D S RLO S R LUT INIT1 CI Q CE INIT0 CK S R BX BMUX DI2 B6:1 A6:A1 FF W6:W1 B B O6 FF/LAT O5 BX INIT1 Q BQ D INIT0 DI1 CK S RHI CE S RLO WEN MC 3 1 CK S RHI D S RLO S R BI INIT1 Q CE INIT0 CK S R AX AMUX DI2 A6:1 A6:A1 W6:W1 A A O6 FF/LAT O5 AX INIT1 Q AQ D INIT0 DI1 CK S RHI CE S RLO WEN MC 3 1 CK S R AI 0/1 S R CE CLK CK WEN CIN u g 3 64_0 3 _040209
FPGA Development Board
Multiple Small Cores Area Time
Core Idea
Xi ci di ECC Breaker Branching Point Addition Interface Table multiplier F 2 m adder adder F n F n 79 % NextInput squarer F 2 m inverter F 2 m FIFO FIFO Iteration Point Lambda 14 % Function Automorphism Table FIFO multiplier multiplier F n F n Distinguished Point Storage Xi+1 ci+1 di+1
Point Addition and FF Inversion y1 y2 x2 x1 a S + M ADD ADD ADD S + M FIFO INV 3S + M FIFO FIFO S + M MUL FIFO 7S + M SQU ADD 14S + M ADD MUL FIFO 28S + M 56S + M ADD y3 x3 S
Binary Field Multiplier Method Size Parallel 5,497 LUTs Mastrovito 7,104 LUTs Bernstein’s Batch 4,409 LUTs Binary Edwards Recursive 3,757 LUTs Karatsuba
Point Automorphism σ i ( P ) smallest Point i +1 x y x y Square Square x y Compare C → N C → N rot 0 rot 1 rot 2 rot 3 ... FIFO x' y' x' y' comparator tree σ i +1 ( P ) BARREL FIFO ROTATE N → C N → C x' y'
Details • 210 pipeline stages • Per default: canonical basis • Normal basis used for point automorphism module • Karatsuba Multiplier for F 2 m tion m • Itoh-Tsujii Inversion F 2 m tion m • Montgomery Multiplier based on DSP slices F n
Computation Time 7,000,000 Distinguished Triples 5,250,000 April 19th, 2014 3,500,000 1,750,000 0 0 10 20 30 40 50 Time [Days] Extrapolated: 24 days
Challenge Generation import hashlib PX = str_to_poly(hashlib.sha256(str (0)). hexdigest ()) PY= PolynomialRing (K, ’PY’).gen() P_ROOTS = (PY^2+PX*PY+PX^3+a*PX^2+b). roots () P=E([PX ,P_ROOTS [0][0]]); P=P*h QX = str_to_poly(hashlib.sha256(str (1)). hexdigest ()) Q_ROOTS = (PY^2+QX*PY+QX^3+a*QX^2+b). roots () Q=E([QX ,Q_ROOTS [0][0]]); Q=Q*h
Different FPGAs Development maximum Series LUTs used Price Kit Frequency Virtex-6 ML605 38% 261 MHz 2,495 USD Spartan-6 LX150T - 147 MHz 995 USD Artix-7 AC701 62% 264 MHz 999 USD Virtex-7 VC707 28% 313 MHz 3,495 USD Kintex-7 KC705 42% 313 MHz 1,695 USD
Different Targets Days Target Iterations Costs [USD] (Estimated) 8.5 x 10 ECC2K-112 42,000 22 90 x 10 ECC2-113 42,000 118 4,055 x 10 ECC2K-130 1,000,000 127 46,239 x 10 ECC2-131 10,000,000 145 3,030 x 10 ECC2-163 1,000,000,000 189,934
Open Issues • Power problems • Maximum frequency: 165 MHz vs 275 MHz • Multiple instances • Negation map and fruitless cycles
Random Facts • Necessary budget: • 18 FPGAs: 2,500 USD x 18 = 45,000 USD • Power consumption: different budget :-) • 1.5 man-years: 100,000 USD (different budget) • Money actually spent: 20 EUR on chocolate
Room for improvement YES!!! � 2x speed equals 2 extra bits to attack 128x speed equals 14 extra bits to attack
Prime numbers: 109, 113, 127, 131, … New Challenges 70" Expected(Number(of(Itera2ons([bits]( 60" 50" 40" 30" 20" 10" 0" " " " " " " " " " 1 0 s 1 0 s 1 0 s s s s = = = = = = a a a a a a a a a r r r " " " " " " t t t z z z z z z s s s t t t t t t r r r i i i i i i l l l l l l e e e b b b b b b i i i o o e o o e o o e W W W K K K K K K " " " " " " t t t t t t " " " i i t i i t i i t b b b b b b i i i b b b * * * * * * 3 3 7 7 1 1 * * * 3 7 1 1 1 2 2 3 3 1 2 3 1 1 1 1 1 1 1 1 1 Without"Speedup" With"Speedup"
Prime numbers: 109, 113, 127, 131, … New Challenges
Solving the Discrete Logarithm of a 113-bit Koblitz Curve with an FPGA Cluster Erich Wenger and Paul Wolfger Graz University of Technology WECC 2014, Chennai, India
Recommend
More recommend