Efficient arithmetic in finite fields D. J. Bernstein University of - PDF document

Efficient arithmetic in finite fields D. J. Bernstein University of Illinois at Chicago

Some examples of finite fields: Z = (2 255 � 19). ( Z = (2 61 � 1))[ t ] = ( t 5 � 3). ( Z = 223))[ t ] = ( t 37 � 2). ( Z = 2)[ t ] = ( t 283 � t 12 � t 7 � t 5 � 1). Topic of this talk: How quickly can we add, subtract, multiply in these fields? Answer will depend on platform: AMD Athlon, Sun UltraSPARC IV, Intel 8051, Xilinx Spartan-3, etc. Warning: different platforms often favor different fields!

Why do we care? “Modular exponentiation”: can quickly compute n mod 2 262 � 5081 � � 4 n 2 0 ; 1 ; 2 ; : : : ; 2 256 � 1 given . Similarly, can quickly compute mn mod 2 262 � 5081 given n 4 m mod 2 262 � 5081. and 4 Time-savers: fast field mults, short “addition chains.” “Discrete-logarithm problem”: n mod 2 262 � 5081, find n . given 4 This computation seems harder.

� � � � � Diffie-Hellman secret-sharing p = 2 262 � 5081: system using Alice’s Bob’s m n secret key secret key Alice’s Bob’s m mod n mod public key public key p p 4 4 � � � �� f Alice ; Bob g ’s � f Bob ; Alice g ’s mn mod mn mod = shared secret shared secret p p 4 4 mn mod p . Alice, Bob easily find 4 Seems harder for attacker.

Bad news: “Index calculus” solves DLP at surprising speed! To protect against this attack, � 5081 with replace 2 262 a much larger prime. Much slower arithmetic. Alternative: Elliptic-curve � � cryptography. Replace 1 ; 2 ; : : : ; 2 262 � 5082 with a comparable-size “safe elliptic-curve group.” Somewhat slower arithmetic. Either way, need fast arithmetic in a finite field.

The core question How to multiply big integers? Child’s answer: Use polynomial f 0 ; 1 ; : : : ; 9 g with coefficients in to represent integer in radix 10. With this representation, multiply integers in two steps: 1. Multiply polynomials. 2. “Carry” extra digits. Polynomial multiplication involves small integers. Have split one big multiplication into many small operations.

Example of representation: � 10 2 + 3 � 10 1 + 9 � 10 0 = 839 = 8 t = 10) of polynomial value (at 8 t 2 + 3 t 1 + 9 t 0 . Squaring: (8 t 2 + 3 t 1 + 9 t 0 ) 2 = 64 t 4 + 48 t 3 + 153 t 2 + 54 t 1 + 81 t 0 . Carrying: 64 t 4 + 48 t 3 + 153 t 2 + 54 t 1 + 81 t 0 ; 64 t 4 + 48 t 3 + 153 t 2 + 62 t 1 + 1 t 0 ; 64 t 4 + 48 t 3 + 159 t 2 + 2 t 1 + 1 t 0 ; 64 t 4 + 63 t 3 + 9 t 2 + 2 t 1 + 1 t 0 ; 70 t 4 + 3 t 3 + 9 t 2 + 2 t 1 + 1 t 0 ; 7 t 5 + 0 t 4 + 3 t 3 + 9 t 2 + 2 t 1 + 1 t 0 . In other words, 839 2 = 703921.

� � � � � � � � � What operations were used here? 8 3 9 � � � �� multiply � � � � � � 72 9 72 � � � � � � � � � � add � � � ... 153 � � � � � � � 6 � � � � � add � � 159 divide by 10 � � mod 10 � � � � � 15 9

Scaled variation: 839 = 800 + 30 + 9 = t = 1) of polynomial value (at 800 t 2 + 30 t 1 + 9 t 0 . Squaring: (800 t 2 + 30 t 1 + 9 t 0 ) 2 = t 4 + 48000 t 3 + 15300 t 2 + 640000 540 t 1 + 81 t 0 . Carrying: t 4 + 48000 t 3 + 15300 t 2 + 640000 540 t 1 + 81 t 0 ; t 4 + 48000 t 3 + 15300 t 2 + 640000 620 t 1 + 1 t 0 ; : : : t 5 + 0 t 4 + 3000 t 3 + 900 t 2 + 700000 20 t 1 + 1 t 0 .

� � � � � � � � � What operations were used here? 800 30 9 � � �� multiply � � � � � � � 7200 900 7200 � � � �� add � ... 15300 � � � � � � � 600 � �� add 15900 subtract � �� mod 1000 15000 900

Speedup: double inside squaring � � � + f 2 t 2 + f 1 t 1 + f 0 t 0 Squaring produces coefficients such as f 4 f 0 + f 3 f 1 + f 2 f 2 + f 1 f 3 + f 0 f 4 . Compute more efficiently as 2 f 4 f 0 + 2 f 3 f 1 + f 2 f 2 . Or, slightly faster, f 4 f 0 + f 3 f 1 ) + f 2 f 2 . 2( Or, slightly faster, (2 f 4 ) f 0 + (2 f 3 ) f 1 + f 2 f 2 f 1 ; 2 f 2 ; : : : . after precomputing 2 � 1 = 2 of the work Have eliminated if there are many coefficients.

Speedup: allow negative coeffs 7! 15 ; 9. Recall 159 7! 15000 ; 900. Scaled: 15900 7! 16 ; � 1. Alternative: 159 7! 16000 ; � 100. Scaled: 15900 f � 5 ; � 4 ; : : : ; 4 ; 5 g Use digits f 0 ; 1 ; : : : ; 9 g . instead of Several small advantages: easily handle negative integers; easily handle subtraction; reduce products a bit.

Speedup: delay carries ab + 2 : Computing (e.g.) big a; b polynomials, carry, multiply poly, carry, add, carry. square a = 314, b = 271, = 839: e.g. (3 t 2 +1 t 1 +4 t 0 )(2 t 2 +7 t 1 +1 t 0 ) = 6 t 4 + 23 t 3 + 18 t 2 + 29 t 1 + 4 t 0 ; t 4 + 5 t 3 + 0 t 2 + 9 t 1 + 4 t 0 . carry: 8 As before (8 t 2 + 3 t 1 + 9 t 0 ) 2 = 64 t 4 + 48 t 3 + 153 t 2 + 54 t 1 + 81 t 0 ; 7 t 5 + 0 t 4 + 3 t 3 + 9 t 2 + 2 t 1 + 1 t 0 . t 5 +8 t 4 +8 t 3 +9 t 2 +11 t 1 +5 t 0 ; +: 7 7 t 5 + 8 t 4 + 9 t 3 + 0 t 2 + 1 t 1 + 5 t 0 .

a; b polynomials, Faster: multiply polynomial, add, carry. square (6 t 4 + 23 t 3 + 18 t 2 + 29 t 1 + 4 t 0 ) + (64 t 4 +48 t 3 +153 t 2 +54 t 1 +81 t 0 ) = 70 t 4 + 71 t 3 + 171 t 2 + 83 t 1 + 85 t 0 ; 7 t 5 + 8 t 4 + 9 t 3 + 0 t 2 + 1 t 1 + 5 t 0 . Eliminate intermediate carries. Outweighs cost of handling slightly larger coefficients. Important to carry between multiplications (and squarings) to reduce coefficient size; but carries are usually a bad idea for additions, subtractions, etc.

Speedup: polynomial Karatsuba f ; g Computing product of polys f < 20, deg g < 20: with (e.g.) deg 400 coefficient mults, 361 coefficient adds. f as F 0 + F 1 t 10 Faster: Write F 0 < 10, deg F 1 < 10. with deg g as G 0 + G 1 t 10 . Similarly write f g = ( F 0 + F 1 )( G 0 + G 1 ) t 10 Then F 0 G 0 � F 1 G 1 t 10 )(1 � t 10 ). + (

F 0 + F 1 , G 0 + G 1 . 20 adds for 300 mults for three products F 0 G 0 , F 1 G 1 , ( F 0 + F 1 )( G 0 + G 1 ). 243 adds for those products. F 0 G 0 � F 1 G 1 t 10 9 adds for with subs counted as adds and with delayed negations. � � � (1 � t 10 ). 19 adds for 19 adds to finish. Total 300 mults, 310 adds. Larger coefficients, slight expense; still saves time. Can apply idea recursively as poly degree grows.

Many other algebraic speedups in polynomial multiplication: Toom, FFT, etc. Increasingly important as polynomial degree grows. O ( n lg n lg lg n ) coeff operations n -coeff product. to compute n Useful for sizes of that occur in cryptography? Maybe; active research area.

Using CPU’s integer instructions Replace radix 10 with, e.g., 2 24 . Power of 2 simplifies carries. Adapt radix to platform. e.g. Every 2 cycles, Athlon 64 can compute a 128-bit product of two 64-bit integers. (5-cycle latency; parallelize!) Also low cost for 128-bit add. Reasonable to use radix 2 60 . Sum of many products of digits fits comfortably below 2 128 . Be careful: analyze largest sum.

e.g. In 4 cycles, Intel 8051 can compute a 16-bit product of two 8-bit integers. Could use radix 2 6 . Could use radix 2 8 , with 24-bit sums. e.g. Every 2 cycles, Pentium 4 F3 can compute a 64-bit product of two 32-bit integers. (11-cycle latency; yikes!) Reasonable to use radix 2 28 . Warning: Multiply instructions are very slow on some CPUs. e.g. Pentium 4 F2: 10 cycles!

Using floating-point instructions Big CPUs have separate floating-point instructions, aimed at numerical simulation but useful for cryptography. In my experience, floating-point instructions support faster multiplication (often much, much faster) than integer instructions, except on the Athlon 64. Other advantages: portability; easily scaled coefficients.

e.g. Every 2 cycles, Pentium III can compute a 64-bit product of two floating-point numbers, and an independent 64-bit sum. e.g. Every cycle, Athlon can compute a 64-bit product and an independent 64-bit sum. e.g. Every cycle, UltraSPARC III can compute a 53-bit product and an independent 53-bit sum. Reasonable to use radix 2 24 . e.g. Pentium 4 can do the same using SSE2 instructions.

How to do carries in floating-point registers? (No CPU carry instruction: not useful for simulations.) Exploit floating-point rounding: add big constant, subtract same constant. � with j � j � 2 75 : e.g. Given compute 53-bit floating-point sum � and constant 3 � 2 75 , of obtaining a multiple of 2 24 ; � 2 75 from result, subtract 3 obtaining multiple of 2 24 � ; subtract from � . nearest

Reducing modulo a prime p . Fix a prime The prime field Z =p f 0 ; 1 ; 2 ; : : : ; p � 1 g is the set � defined as � mod p , with p , + defined as + mod � defined as � mod p . p = 1000003: e.g. 1000000 + 50 = 47 in Z =p ; � 1 = 1000002 in Z =p ; � 23131 = 1 in Z =p . 117505

How to multiply in Z =p ? Can use definition: f g mod p = f g � p b f g =p . f g by a Can multiply precomputed 1 =p approximation; b f g =p . easily adjust to obtain Slight speedup: “2-adic inverse”; “Montgomery reduction.” We can do better: normally p is chosen with a special form (or dividing a special form; see “redundant representations”) f g mod p much faster. to make

Efficient arithmetic in finite fields D. J. Bernstein University of - PDF document

Efficient arithmetic in finite fields D. J. Bernstein University of Illinois at Chicago Some examples of finite fields: Z = (2 255 19). ( Z = (2 61 1))[ t ] = ( t 5 3). ( Z = 223))[ t ] = ( t 37 2). ( Z = 2)[ t ] = ( t 283 t

Efficient arithmetic in finite fields D. J. Bernstein University of Illinois at Chicago Some

Efficient Finite Field and Elliptic Curve Arithmetic Laurent Imbert CNRS, LIRMM, Universit e

Complete addition laws for all elliptic curves over finite fields D. J. Bernstein University of

Efficient arithmetic on elliptic curves in large characteristic D. J. Bernstein University of

Algorithms for finite field arithmetic ric Schost (joint with Luca De Feo & Javad

Asymptotics of arithmetic codices and towers of function fields Ignacio Cascudo CWI Amsterdam

Efficient Finite Field and Elliptic Curve Arithmetic Laurent Imbert CNRS, LIRMM, Universit e

Algorithms for multiquadratic number fields D. J. Bernstein Jens Bauch, Daniel J. Bernstein,

Counting isogenous principally-polarized abelian varieties over finite fields Everett W. Howe

MODELLING FINITE FIELDS Hendrik Lenstra Mathematisch Instituut Universiteit Leiden Finite

Part I: RELIC Diego F. Aranha Efficient Binary Field Arithmetic Numbers RELIC is an Efficient

arXiv:1202.5922v2 [math.AG] 19 May 2013 Over all non-prime finite fields, we construct some

Induced charges and fields in QGP and dence fermion media in magnetic fields at finite

On designs and Steiner systems over finite fields Alfred Wassermann Department of Mathematics,

The elliptic-curve zoo D. J. Bernstein University of Illinois at Chicago EC point counting 1983

REAL CLOSED FIELDS AND MODELS OF FRAGMENTS OF ARITHMETIC (Joint work P. DAquino and S.

Drinfeld Modules, Hasse Invariants and Factoring Polynomials over Finite Fields Anand Kumar

Context This talk is about software for finite field arithmetic ( + . . . ; most

Lecture 2 Elliptic curves over finite fields The Group structure Reminder from Monday the j

Finite fields Definition Theorem (Field) Let F be a set with two binary operations + and . Let

Information Theory Lecture 6 Block Codes and Finite Fields Codes: Roth (R) 12, 4.14

Introduction to Computer Arithmetic for Efficient Hardware Implementations Arnaud Tisserand

Towers of function fields over finite fields and their sequences of zeta functions Alexey Zaytsev

Rational points on curves over finite fields and Drinfeld modular varieties Alp Bassa Sabanc

Efficient arithmetic in finite fields D. J. Bernstein University of - PDF document

Efficient arithmetic in finite fields D. J. Bernstein University of Illinois at Chicago Some examples of finite fields: Z = (2 255 19). ( Z = (2 61 1))[ t ] = ( t 5 3). ( Z = 223))[ t ] = ( t 37 2). ( Z = 2)[ t ] = ( t 283 t

Efficient arithmetic in finite fields D. J. Bernstein University of Illinois at Chicago Some

Efficient Finite Field and Elliptic Curve Arithmetic Laurent Imbert CNRS, LIRMM, Universit e

Complete addition laws for all elliptic curves over finite fields D. J. Bernstein University of

Efficient arithmetic on elliptic curves in large characteristic D. J. Bernstein University of

Algorithms for finite field arithmetic ric Schost (joint with Luca De Feo &amp; Javad

Asymptotics of arithmetic codices and towers of function fields Ignacio Cascudo CWI Amsterdam

Efficient Finite Field and Elliptic Curve Arithmetic Laurent Imbert CNRS, LIRMM, Universit e

Algorithms for multiquadratic number fields D. J. Bernstein Jens Bauch, Daniel J. Bernstein,

Counting isogenous principally-polarized abelian varieties over finite fields Everett W. Howe

MODELLING FINITE FIELDS Hendrik Lenstra Mathematisch Instituut Universiteit Leiden Finite

Part I: RELIC Diego F. Aranha Efficient Binary Field Arithmetic Numbers RELIC is an Efficient

arXiv:1202.5922v2 [math.AG] 19 May 2013 Over all non-prime finite fields, we construct some

Induced charges and fields in QGP and dence fermion media in magnetic fields at finite

On designs and Steiner systems over finite fields Alfred Wassermann Department of Mathematics,

The elliptic-curve zoo D. J. Bernstein University of Illinois at Chicago EC point counting 1983

REAL CLOSED FIELDS AND MODELS OF FRAGMENTS OF ARITHMETIC (Joint work P. DAquino and S.

Drinfeld Modules, Hasse Invariants and Factoring Polynomials over Finite Fields Anand Kumar

Context This talk is about software for finite field arithmetic ( + . . . ; most

Lecture 2 Elliptic curves over finite fields The Group structure Reminder from Monday the j

Finite fields Definition Theorem (Field) Let F be a set with two binary operations + and . Let

Information Theory Lecture 6 Block Codes and Finite Fields Codes: Roth (R) 12, 4.14

Introduction to Computer Arithmetic for Efficient Hardware Implementations Arnaud Tisserand

Towers of function fields over finite fields and their sequences of zeta functions Alexey Zaytsev

Rational points on curves over finite fields and Drinfeld modular varieties Alp Bassa Sabanc

Algorithms for finite field arithmetic ric Schost (joint with Luca De Feo & Javad