efficient software implementation of binary field
play

Efficient Software Implementation of Binary Field Arithmetic Using - PowerPoint PPT Presentation

Efficient Software Implementation of Binary Field Arithmetic Using Vector Instruction Sets Diego F. Aranha Department of Computer Science University of Bras lia Joint work with Julio L opez and Darrel Hankerson and Francisco Rodr


  1. Efficient Software Implementation of Binary Field Arithmetic Using Vector Instruction Sets Diego F. Aranha Department of Computer Science University of Bras´ ılia Joint work with Julio L´ opez and Darrel Hankerson and Francisco Rodr´ ıguez-Henr´ ıquez Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  2. Introduction Binary fields ( F 2 m ) are omnipresent in Cryptography: Efficient Curve-based Cryptography (ECC, PBC) Post-quantum Cryptography Symmetric ciphers Many algorithms/optimizations already described in the literature: Is it possible to unify the fastest ones in a simple formulation? Can such a formulation reflect the state-of-the-art and provide new ideas? Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  3. Objective Contributions Formulation of state-of-the-art binary field arithmetic using vector instructions New strategy for the implementation of multiplication Side-channel resistance Time-memory trade-offs to compensate for native multiplier Experimental results Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  4. Arsenal Intel Core architecture: 128-bit Streaming SIMD Extensions instruction set Super shuffle engine introduced in 45 nm series Relevant vector instructions: Instruction Description Cost Mnemonic MOVDQA Memory load/store 2.5 ← PSLLQ , PSRLQ 64-bit bitwise shifts 1 ≪ ∤ 8 , ≫ ∤ 8 PXOR,PAND,POR Bitwise XOR,AND,OR 1 ⊕ , ∧ , ∨ Byte interleaving 3 interlo/hi PUNPCKLBW/HBW PSLLDQ,PSRLDQ 128-bit bytewise shift 2 (1) ≪ 8 , ≫ 8 Byte shuffling 3 (1) shuffle , lookup PSHUFB Memory alignment 2 (1) PALIGNR ⊳ Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  5. New SSSE3 instructions PSHUFB instruction ( mm shuffle epi8 ): Real power: We can implement in parallel any function: Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  6. New SSSE3 instructions Example: Bit manipulation Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  7. New SSSE3 instructions Example: Bit manipulation Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  8. New SSSE3 instructions PALIGNR instruction ( mm alignr epi8 ): Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  9. Binary field F 2 m Irreducible polynomial: f ( z ) (trinomial or pentanomial) m − 1 � a i z i . Polynomial basis: a ( z ) ∈ F 2 m = i =0 Software representation: vector of n = ⌈ m / 64 ⌉ words. Graphical representation: Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  10. Proposed representation To employ 4-bit granular arithmetic, convert to split form : � � a i z i − 4 , a i z i , a L = a H = 0 ≤ i < m , 0 ≤ i < m , 0 ≤ i mod 8 ≤ 3 4 ≤ i mod 8 ≤ 7 A i A L A H Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  11. Proposed representation Easy to convert to split form: A L = A i ∧ 0x0F0F0F0F0F0F0F0F0F0F0F0F0F0F0F0F A H = ( A i ∧ 0xF0F0F0F0F0F0F0F0F0F0F0F0F0F0F0F0 ) >> 4 Easy to convert back: a ( z ) = a H ( z ) z 4 + a L ( z ) . Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  12. Squaring in F 2 m m a i z i = a m − 1 + · · · + a 2 z 2 + a 1 z + a 0 � a ( z ) = i =0 m − 1 a ( z ) 2 = a i z 2 i = a m − 1 z 2 m − 2 + · · · + a 2 z 4 + a 1 z 2 + a 0 � i =0 Example: a ( z ) = ( a m − 1 , a m − 2 , . . . , a 2 , a 1 , a 0 ) a ( z ) 2 = ( a m − 1 , 0 , a m − 2 , 0 , . . . , 0 , a 2 , 0 , a 1 , 0 , a 0 ) Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  13. Squaring in F 2 m Since squaring is a linear operation: a ( z ) 2 = a H ( z ) 2 · z 8 + a L ( z ) 2 . Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  14. Squaring in F 2 m Since squaring is a linear operation: a ( z ) 2 = a H ( z ) 2 · z 8 + a L ( z ) 2 . We can compute a L ( z ) 2 and a H ( z ) 2 with a lookup table. For u = ( u 3 , u 2 , u 1 , u 0 ), use table ( u ) = (0 , u 3 , 0 , u 2 , 0 , u 1 , 0 , u 0 ): Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  15. Proposed squaring in F 2 m A i A A L H ... table 01010101 00010001 00010000 00000101 00000100 00000001 00000000 lookup lookup A A H L interhi, interlo T T 2i+1 2i a ( z ) 2 = a L ( z ) 2 + a H ( z ) 2 · z 8 . Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  16. Square root extraction in F 2 m Algorithm by Fong et al.: a even ( z ) + √ z · a odd ( z ) � a ( z ) = Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  17. Square root extraction in F 2 m Algorithm by Fong et al.: a even ( z ) + √ z · a odd ( z ) � a ( z ) = Since square-root is also a linear operation: � a H ( z ) z 4 + a L ( z ) � a ( z ) = a H ( z ) z 2 + � � = a L ( z ) √ z · ( a L odd ( z ) + a H odd ( z ) z 2 ) + a L even ( z ) + a H even ( z ) z 2 = Note: Multiplication by √ z ideally requires shifted additions only. If not possible, precompute product by √ z . Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  18. Proposed square root in F 2 m A i shuffle A A L H 00110011 ... 11001100 ... table table · z² 00000001 00000000 00000100 00000000 lookup lookup A A H L A A L H A A even odd a ( z ) = √ z · ( a L odd ( z ) + a H odd ( z ) z 2 ) + a L even ( z ) + a H even ( z ) z 2 � Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  19. Multiplication in F 2 m 1 Three strategies: L´ opez-Dahab comb method Shuffle-based multiplication Native multiplication Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  20. L´ opez-Dahab multiplication in F 2 m We can compute u · b ( z ) using shifts and additions. If a ( z ) is divided into 4-bit polynomials, compute a ( z ) · b ( z ) by: Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  21. L´ opez-Dahab multiplication in F 2 m If the multiplier is represented in split form: b ( z ) · ( a H ( z ) z 4 + a L ( z )) a ( z ) · b ( z ) = b ( z ) z 4 a H ( z ) + b ( z ) a L ( z ) = This is a well-known technique for removing expensive 4-bit shifts! Note: The core operation is accumulating u × dense b ( z ). Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  22. L´ opez-Dahab multiplication in F 2 m Algorithm 1 LD multiplication implemented with n 128-bit registers. Input: a ( z ) = a [0 .. n − 1] , b ( z ) = b [0 .. n − 1]. Output: c ( z ) = c [0 .. n − 1]. Note: m i denotes the vector of n 2 128-bit registers ( r ( i − 1+ n / 2) , . . . , r i ). 1: Compute T 0 ( u ) = u ( z ) · b ( z ) , T 1 ( u ) = u ( z ) · ( b ( z ) z 4 ) for all u ( z ) of degree < 4. 2: ( r n − 1 . . . , r 0 ) ← 0 3: for k ← 56 downto 0 by 8 do 4: for j ← 1 to n − 1 by 2 do 5: Let u = ( u 3 , u 2 , u 1 , u 0 ), where u t is bit ( k + t ) of a [ j ]. 6: Let v = ( v 3 , v 2 , v 1 , v 0 ), where v t is bit ( k + t + 4) of a [ j ]. 7: m ( j − 1) / 2 ← m ( j − 1) / 2 ⊕ T 0 ( u ), m ( j − 1) / 2 ← m ( j − 1) / 2 ⊕ T 1 ( v ) 8: end for 9: ( r n − 1 . . . , r 0 ) ← ( r n − 1 . . . , r 0 ) ⊳ 8 10: end for 11: for k ← 56 downto 0 by 8 do 12: for j ← 0 to n − 2 by 2 do 13: Let u = ( u 3 , u 2 , u 1 , u 0 ), where u t is bit ( k + t ) of a [ j ]. 14: Let v = ( v 3 , v 2 , v 1 , v 0 ), where v t is bit ( k + t + 4) of a [ j ]. 15: m j / 2 ← m j / 2 ⊕ T 0 ( u ), m j / 2 ← m j / 2 ⊕ T 1 ( v ) 16: end for 17: if k > 0 then ( r n − 1 . . . , r 0 ) ← ( r n − 1 . . . , r 0 ) ⊳ 8 18: end for 19: return c = ( r n − 1 . . . , r 0 ) mod f ( z ) Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

  23. Shuffle-based multiplication in F 2 m If both multiplicand and multiplier are represented in split form: a ( z ) · b ( z ) = ( b H ( z ) z 4 + b L ( z )) · ( a H ( z ) z 4 + a L ( z )) Using Karatsuba formula, we can reduce it to 3 multiplications: a ( z ) · b ( z ) = a H b H z 8 +[( a H + a L )( b H + b L ) + a H b H + a L b L ] z 4 + a L b L Note: The core operation is accumulating u × sparse b L , H ( z ). x B B B B B B B B B B 1 B ... 9 8 7 6 5 4 3 2 0 n-1 Aranha, L´ opez, Hankerson, Rodr´ ıguez-Henr´ ıquez Efficient Binary Field Arithmetic Using Vector Instruction Sets

Recommend


More recommend