High-speed Diffie-Hellman, part 2 D. J. Bernstein University of Illinois at Chicago
Classic question about the Diffie-Hellman system: How quickly can we compute n th powers mod p ? “Modular exponentiation.” p ; Assume standard prime p = 2 262 � 5081. e.g. How quickly can we compute n mod 2 262 g � 5081, g ; n ? given integers
This talk asks the analogous question for elliptic-curve Diffie-Hellman: How quickly can we compute n th multiples in an elliptic-curve group? “Elliptic-curve scalar multiplication.” Assume standard field and standard elliptic curve.
e.g. NIST P-224: the elliptic curve y 2 = x 3 � 3 x + a 6 over Z =p . p = 2 224 � 2 96 + 1 Here a 6 = 18958286285566608 and 00040866854449392 64155046809686793 21075787234672564. e.g. NIST P-256: the elliptic curve y 2 = x 3 � 3 x + � � � over Z =p where p = 2 256 � 2 224 + 2 192 + 2 96 � 1. e.g. Curve25519: the elliptic curve y 2 = x 3 + 486662 x 2 + x over Z =p p = 2 255 � 19. where
x; y ) on curve, Your task: Given ( n � 0, and given integer n th multiple of ( x; y ) compute in the elliptic-curve group. nx; ny ) Warning: Answer is not ( unless you’re extremely lucky. Elliptic-curve point addition is not vector addition; 0 0 ) is almost never x; y ) + ( x ; y ( 0 0 ). x + x ; y + y ( Can emphasize this by changing � , [ n ], etc. But notation: +, this talk uses simplified notation.
Similar tasks are critical for elliptic-curve signatures. e.g. Schnorr signatures, unfortunately patented: n , Signer has secret key nB . public key m : choose random z , To sign f 0 ; 1 ; : : : ; # h B i � 1 g ; uniform in r = SHA-256( z B ; m ); compute s = z + r n mod # h B i ; compute m; r ; s ). send ( m; r ; s ): Check To verify ( r = SHA-256( sB � r nB ; m ).
Multiples via additions Typical recursive formulas: 2 P = P + P . 3 P = 2 P + P . 4 P = 2 P +2 P . 5 P = 3 P +2 P . 6 P = 3 P +3 P . 7 P = 5 P +2 P . 2 nP = 7 P +( n � 7) P if 4 � n< 8. (2 n +1) P = 2 nP + P if 4 � n< 8. (4 n +1) P = 4 nP + P if 4 � n< 8. (4 n +3) P = 4 nP +3 P if 4 � n< 8. 2 nP = nP + nP if 8 � n . (8 n +1) P = 8 nP + P if 4 � n . (8 n +3) P = 8 nP +3 P if 4 � n . (8 n +5) P = 8 nP +5 P if 4 � n . (8 n +7) P = 8 nP +7 P if 4 � n .
This “addition chain” (“length-3 sliding windows”) � lg n doublings and uses � 0 : 25 lg n more additions nP for average n . to compute � 320 additions for � � e.g. n 2 0 ; 1 ; : : : ; 2 256 � 1 average . Some easy improvements from fast negation on elliptic curves: (16 n � 7) P = 16 nP � 7 P , etc. Also use “endomorphisms” for “Koblitz curves,” “GLV curves.” More complicated methods replace 0 : 25 by � 1 = lg lg n .
Explicit doubling formulas y 2 = x 3 � 3 x + a 6 : On curve 00 00 ) where x; y ) = ( x ; y 2( � = (3 x 2 � 3) = 2 y , 00 = x � 2 � 2 x , 00 = 00 ) y � ( x � x � y . 7 subs etc., 2 squarings, 1 more mult, 1 division. How do we divide efficiently in a finite field?
p � 2 in prime field Z =p . f =g = f g p � 2 with g Can compute � lg p squarings and � (lg p ) = lg lg p more mults. p = 2 224 � 2 96 + 1: e.g. 223 squarings, 11 more mults. q � 2 f =g = f g More generally, q . in any field of size There are faster division methods (e.g. “Euclid”—beware timing attacks!); smaller “I/M ratio.” Special methods for some fields.
Speedup: delay divisions Division costs many mults even with fastest division methods. Save time by delaying divisions. Naive division-delay method: Store field elements as fractions until end of computation. Divide once before output. Mult fractions with 2 field mults. Divide fractions with 2 field mults. Add fractions with 3 field mults.
Speedup: unify denominators For elliptic-curve doubling, y have denominator 2 � = (3 x 2 � 3) = 2 y ; in y ) 2 denominator (2 00 = x � 2 � 2 x ; in y ) 3 denominator (2 00 = 00 ) y � ( x � x � y . in Subsequent computations will perform separate computations y ) 2 ; (2 y ) 3 on the denominators (2 00 00 . x ; y of Save time by manipulating denominators together.
“Jacobian coordinates”: x; y ; z ) to represent Store ( x=z 2 ; y =z 3 ). elliptic-curve point ( 00 00 ) where x=z 2 ; y =z 3 ) = ( x ; y 2( � = (3( x=z 2 ) 2 � 3) = 2( y =z 3 ) �= 2 y z with � = 3 x 2 � 3 z 4 ; = 00 = x � 2 � 2( x=z 2 ) � 2 � 8 xy 2 ) = (2 y z ) 2 ; = ( 00 = 00 ) y � (( x=z 2 ) � x � ( y =z 3 ) xy 2 � � � 3 � 8 y 4 ) = (2 y z ) 3 . = (12
x=z 2 ; y =z 3 ) = ( x 2 =z 2 ; y 2 =z 3 2( 2 ) z 2 = 2 y z , 2 where � = 3 x 2 � 3 z 4 , x 2 = � 2 � 8 xy 2 , y 2 = � (4 xy 2 � x 2 ) � 8 y 4 . Easily compute with 6 squarings, x 2 , z 2 , z 4 , y 2 , y 4 , 3 more mults: y z , xy 2 , � 2 , � ( � � � ). Also some subs, doublings, etc. Use fast field arithmetic: e.g., can delay carries and y 2 . reductions in computing
Speedup: difference of squares Can compute 3 x 2 � 3 z 4 as x � z 2 )( x + z 2 ). 3( Replace 3 squarings by 1 mult, 1 squaring. Revised total: 4 squarings, 4 more mults. Note: 3 x 2 � 3 z 4 came from 3 x 2 � 3, x 3 � 3 x + a 6 . derivative of Wouldn’t have same speedup x 3 � 5 x + a 6 . for, e.g.,
f 2 ; g 2 ; 2 f g Speedup: f 2 and g 2 After computing can compute 2 f g f + g ) 2 � f 2 � g 2 . as ( In particular: y 2 and z 2 After computing can compute 2 y z y + z ) 2 � y 2 � z 2 . as ( Replace 1 mult with 1 squaring. Revised total: 5 squarings, 3 more mults.
Explicit addition formulas Similar speedups in formulas for adding distinct points. 5 squarings, 11 more mults. Again some opportunities to delay carries, etc.
Speedup: cache results x 1 =z 2 ; y 1 =z 3 In adding ( 1 ) 1 x 2 =z 2 ; y 2 =z 3 to ( 2 ), 2 compute many intermediates, z 2 ; z 3 including 1 . 1 Often add same point again to a different point; z 2 ; z 3 can reuse 1 . 1 “Chudnovsky coordinates.”
Speedup: delay fewer divisions? Faster divisions sometimes justify delaying fewer divisions. e.g. Do we really need P ; 3 P ; 5 P ; 7 P ? fractions for P ; 3 P ; 5 P ; 7 P Can convert out of Jacobian coordinates with one division, several mults. Then save mults in every P ; 3 P ; 5 P ; 7 P . addition of “Mixed coordinates.” Sometimes worthwhile, depending on division speed.
Montgomery coordinates On elliptic curves with “Montgomery form” y 2 = x 3 + a 2 x 2 + x , a 2 � 2) = 4: preferably with small ( n ( x 1 ; : : : ) = ( x =z ; : : : ) where n n z 1 = 1; x 2 m = ( x 2 � z 2 m m ) 2 ; z 2 m =4 x z x 2 m + a 2 x z m + z 2 m m ( m m ); x 2 m +1 =4( x x � z z m m +1 m m +1 ) 2 ; z 2 m +1 =4( x z � z x x 1 . m m m +1 m +1 ) 2 y , Can also figure out or use cryptographic protocols y . that ignore
x z x z m +1 m +1 m m � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ������ � � � ������ � � � � � � � � � � + + � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � ������� � ������� � � � � � � � � � � � � � + a 2 � 2 � � � 4 � x 1 + � x 2 m +1 z 2 m +1 x 2 m z 2 m
a 2 � 2) = 4 small, Assuming ( main operations are 4 squarings, 5 more mults n . for each bit of Compare to Jacobian coordinates: n has each bit of 5 squarings, 3 more mults, and on occasion 5 more squarings, 11 more mults. Montgomery form is better n is not gigantic. if
Choosing curves Traditional algorithm design: f . Have a function Want fastest algorithm f . that computes Cryptographic algorithm design: Have gigantic collection of f . apparently-safe functions Want fastest algorithm f . that computes some
Elliptic-curve Diffie-Hellman E could use any elliptic curve q . over any finite field F E ; F q Some choices of are better than others. Higher speed: easier to compute n th multiples in E ( F q ). Higher security: harder to find n given an n th multiple, i.e., to solve ECDLP. Lower bandwidth. Etc. E ; F q ? How do we choose Which curves are best?
Occasionally an application has E ; F q . different criteria for e.g. Some cryptographic protocols use “pairings” and need specific “embedding degrees.” For simplicity I’ll focus on traditional protocols: Diffie-Hellman, ECDSA, etc. Can also consider, e.g., genus-2 hyperelliptic curves. 2006.09: New speed records, faster than elliptic curves. For simplicity I’ll focus on the elliptic-curve case.
Field size? E ( F q ) has The group � q elements. “Generic” algorithms such as “Pollard’s rho method” solve ECDLP using q 1 = 2 simple operations. � Highly parallelizable. � 2 40 simple operations e.g. q � 2 80 . to solve ECDLP if q : too small. Reject
Recommend
More recommend