part i relic
play

Part I: RELIC Diego F. Aranha Efficient Binary Field Arithmetic - PDF document

Efficient Binary Field Arithmetic and Applications to Curve-based Cryptography Diego F. Aranha Department of Computer Science University of Bras lia CHES 2012 Tutorial Diego F. Aranha Efficient Binary Field Arithmetic Part I: RELIC


  1. Efficient Binary Field Arithmetic and Applications to Curve-based Cryptography Diego F. Aranha Department of Computer Science University of Bras´ ılia CHES 2012 Tutorial Diego F. Aranha Efficient Binary Field Arithmetic Part I: RELIC Diego F. Aranha Efficient Binary Field Arithmetic

  2. Numbers RELIC is an Efficient LIbrary for Cryptography ( http://code.google.com/p/relic-toolkit ): Research framework R elic elic Licensed as free software (LGPL) toolkit 11 source code releases 78,000 lines of code 1300 visitors from 74 countries 1500 downloads Diego F. Aranha Efficient Binary Field Arithmetic Introduction Limitations of other libraries: Restricted portability Uninteresting licensing model Emphasis on standards and commercial algorithms Why a new criptographic library? Organization oriented for portability Complete control of licensing model Code sharing and reproducibility of results Focus on research Diego F. Aranha Efficient Binary Field Arithmetic

  3. Organization Basic organization: Meta-library Compile-time configuration Inspired on GNU Multiple Precision Arithmetic Library (GMP) Protocols Arithmetic backend Diego F. Aranha Efficient Binary Field Arithmetic Breakdown Arithmetic backend: Architecture-dependent Rigid interface with upper layers Generic modules available in C and with GMP support 21 functions for multiple precision integer arithmetic, 26 functions for binary fields, 32 functions for prime fields Why this organization? It is currently possible to obtain competitive timings with the same library in an 8-bit processor with 4KB of RAM and an 8-core Intel desktop processor. Diego F. Aranha Efficient Binary Field Arithmetic

  4. Breakdown Binary field arithmetic: Field size specified on compile time 3 different strategies for squaring, 5 for multiplication, 2 for square root extraction, 2 for half-trace and 6 for inversion Modular reduction by trinomials and pentanomials Binary curve arithmetic: Supersingular, Koblitz and ordinary (standardized or not) Affine, projective and mixed coordinate systems 4 different strategies for random point scalar multiplication, 6 for fixed point and 4 for multiple point Symmetric pairings over genus-1 or genus-2 curves Diego F. Aranha Efficient Binary Field Arithmetic Breakdown Miscellaneous: Support for words of 8, 16, 32 and 64 bits Static, stack, automatic and dynamic memory allocators Helper macros for testing and benchmarking Support for debugging, profiling, tracing and multithreading Abundant Doxygen documentation Deactivation of modules and automatic elimination of algorithms to reduce code size Standard PRNG with configurable seed source Support for FreeBSD, Linux, Mac OS X, Windows Management of configuration and build system with CMake Open collaboration with academia and industry Diego F. Aranha Efficient Binary Field Arithmetic

  5. Part II: Binary fields Diego F. Aranha Efficient Binary Field Arithmetic Introduction A finite field F p m consists of all polynomials with coefficients in Z p , prime p , modulo an irreducible degree- m polynomial f ( z ). Prime p is the characteristic of the field and m is the extension degree . A binary field F 2 m is the special case p = 2 and is formed by polynomials with binary coefficients. Diego F. Aranha Efficient Binary Field Arithmetic

  6. Introduction Example: Field F 2 8 Irreducible polynomial: f ( z ) = z 8 + z 4 + z 3 + z + 1 = 1 0001 1011 Representation: a ( z ) = z 7 + z 3 + 1 = 1000 1001 = 0x89 b ( z ) = z 6 + z 5 + z 2 = 0110 0100 = 0x64 Addition: a ( z ) + b ( z ) = z 7 + z 6 + z 5 + z 3 + z 2 + 1 = 1110 1101 = 0xED Note: a ( z ) + a ( z ) = 2 · a ( z ) = 0 , ∀ a ∈ F 2 m Diego F. Aranha Efficient Binary Field Arithmetic Introduction Example: Field F 2 8 Irreducible polynomial: f ( z ) = z 8 + z 4 + z 3 + z + 1 = 1 0001 1011 Representation: a ( z ) = z 7 + z 3 + 1 = 1000 1001 = 0x89 b ( z ) = z 6 + z 5 + z 2 = 0110 0100 = 0x64 Multiplication: a ( z ) × b ( z ) = z 13 + z 12 + z 8 + z 6 + z 2 mod f ( z ) = z 7 + z 5 + z 4 + z 3 + z 2 + 1 = 0xBD Multiplication by z : z × b ( z ) = z 7 + z 6 + z 3 = 1100 1000 = 0xC8 = b ≪ 1 Diego F. Aranha Efficient Binary Field Arithmetic

  7. Introduction Binary fields ( F 2 m ) are omnipresent in Cryptography: Efficient Curve-based Cryptography (ECC, PBC) Post-quantum Cryptography Block ciphers Many algorithms/optimizations already described in the literature: Is it possible to unify the fastest ones in a simple formulation? Can such a formulation reflect the state-of-the-art and provide new ideas? Diego F. Aranha Efficient Binary Field Arithmetic Objective Contributions Formulation of state-of-the-art binary field arithmetic using vector instructions New strategy for the implementation of multiplication Time-memory trade-offs to compensate for native multiplier Experimental results Diego F. Aranha Efficient Binary Field Arithmetic

  8. Arsenal Intel Core architecture: 128-bit Streaming SIMD Extensions instructions (65/45 nm) Super shuffle engine introduced in 45 nm series Carry-less multiplier introduced in Nehalem family 256-bit Advanced Vector Extensions instructions (32 nm) Relevant vector instructions: Instruction Description Cost Mnemonic MOVDQA Memory load/store 3/2 ← PSLLQ , PSRLQ 64-bit bitwise shifts 1 ≪ ∤ 8 , ≫ ∤ 8 PXOR,PAND,POR Bitwise XOR,AND,OR 1 ⊕ , ∧ , ∨ Byte interleaving 3 interlo/hi PUNPCKLBW/HBW PSLLDQ,PSRLDQ 128-bit bytewise shift 2 (1) ≪ 8 , ≫ 8 PSHUFB Byte shuffling 3 (1) shuffle , lookup Memory alignment 2 (1) PALIGNR ⊳ PCLMULQDQ Carry-less multiplication 10 (8) ⊗ Diego F. Aranha Efficient Binary Field Arithmetic New SSSE3 instructions PSHUFB instruction ( mm shuffle epi8 ): Real power: We can implement in parallel any function: Diego F. Aranha Efficient Binary Field Arithmetic

  9. New SSSE3 instructions Example: Bit manipulation Diego F. Aranha Efficient Binary Field Arithmetic New SSSE3 instructions PALIGNR instruction ( mm alignr epi8 ): Diego F. Aranha Efficient Binary Field Arithmetic

  10. Binary field F 2 m Irreducible polynomial: f ( z ) (trinomial or pentanomial) m − 1 � Polynomial basis: a ( z ) ∈ F 2 m = a i z i . i =0 Software representation: vector of n = ⌈ m / 64 ⌉ words (even). Graphical representation: Diego F. Aranha Efficient Binary Field Arithmetic Data types #if WORD == 8 typedef uint8_t dig_t; #elif WORD == 16 typedef uint16_t dig_t; #elif WORD == 32 typedef uint32_t dig_t; #elif WORD == 64 typedef uint64_t dig_t; #endif typedef __m128i vec_t; Diego F. Aranha Efficient Binary Field Arithmetic

  11. Useful macros #define LOAD _mm_load_si128 #define STORE _mm_store_si128 #define PSHUFB _mm_shuffle_epi8 #define XOR _mm_xor_si128 #define AND _mm_and_si128 #define SHL _mm_slli_epi64 #define SHR _mm_srli_epi64 #define SHL8 _mm_slli_si128 #define SHR8 _mm_srli_si128 #define UNPACKLO _mm_unpacklo_epi8 #define UNPACKHI _mm_unpackhi_epi8 #define CLMUL _mm_clmulepi64_si128 Diego F. Aranha Efficient Binary Field Arithmetic Proposed representation To employ 4-bit granular arithmetic, convert to split form : � � a i z i − 4 , a i z i , a L = a H = 0 ≤ i < m , 0 ≤ i < m , 0 ≤ i mod 8 ≤ 3 4 ≤ i mod 8 ≤ 7 A i A L A H Diego F. Aranha Efficient Binary Field Arithmetic

  12. Proposed representation Easy to convert to split form: A L = A i ∧ 0x0F0F0F0F0F0F0F0F0F0F0F0F0F0F0F0F A H = ( A i ∧ 0xF0F0F0F0F0F0F0F0F0F0F0F0F0F0F0F0 ) >> 4 Easy to convert back: a ( z ) = a H ( z ) z 4 + a L ( z ) . Diego F. Aranha Efficient Binary Field Arithmetic Addition/subtraction in F 2 m m − 1 � ( a i ⊕ b i ) z i c ( z ) = a ( z ) + b ( z ) = i =0 A A A A A A A A A A A 1 A ... 9 8 7 6 5 4 3 2 0 n-1 B B B B B B B B B B B 1 B ... 9 8 7 6 5 4 3 0 n-1 2 + + + + + + + + + + + + C C C C C C C C C C C 1 C ... 9 8 7 6 5 4 3 2 0 n-1 Guidelines: Use XOR instruction with largest operand size. Verify impact of higher throughput. Diego F. Aranha Efficient Binary Field Arithmetic

  13. Addition/subtraction in F 2 m void fb_addn_low(dig_t *c, dig_t *a, dig_t *b) { int i; for (i = 0; i < FB_DIGS; i += 2, c += 2, a += 2, b += 2) { vec_t t0 = LOAD (( vec_t *)a); vec_t t1 = LOAD (( vec_t *)b); t0 = XOR(t0 , t1); STORE (( vec_t *)c, t0); } } Diego F. Aranha Efficient Binary Field Arithmetic Squaring in F 2 m m � a i z i = a m − 1 + · · · + a 2 z 2 + a 1 z + a 0 a ( z ) = i =0 m − 1 � a ( z ) 2 = a i z 2 i = a m − 1 z 2 m − 2 + · · · + a 2 z 4 + a 1 z 2 + a 0 i =0 Example: a ( z ) = ( a m − 1 , a m − 2 , . . . , a 2 , a 1 , a 0 ) a ( z ) 2 = ( a m − 1 , 0 , a m − 2 , 0 , . . . , 0 , a 2 , 0 , a 1 , 0 , a 0 ) Diego F. Aranha Efficient Binary Field Arithmetic

  14. Squaring in F 2 m Since squaring is a linear operation: a ( z ) 2 = a H ( z ) 2 · z 8 + a L ( z ) 2 . We can compute a L ( z ) 2 and a H ( z ) 2 with a lookup table. For u = ( u 3 , u 2 , u 1 , u 0 ), use table ( u ) = (0 , u 3 , 0 , u 2 , 0 , u 1 , 0 , u 0 ): Diego F. Aranha Efficient Binary Field Arithmetic Proposed squaring in F 2 m A i A A L H ... table 01010101 00010001 00010000 00000101 00000100 00000001 00000000 lookup lookup A A H L interhi, interlo T T 2i+1 2i a ( z ) 2 = a L ( z ) 2 + a H ( z ) 2 · z 8 . Diego F. Aranha Efficient Binary Field Arithmetic

Recommend


More recommend