Software implementation of pairings Diego de Freitas Aranha September 21, 2011 Department of Computer Science University of Bras´ ılia Joint work with K. Karabina, P. Longa, C. Gebotys, J. L´ opez, D. Hankerson, A. Menezes, E. Knapp, F. Rodr´ ıguez-Henr´ ıquez, neda, J.-L. Beuchat, J. Detrey, N. Estibals . L. Fuentes-Casta˜ Diego F. Aranha Software implementation of pairings
Introduction Pairing-Based Cryptography enables many elegant solutions to cryptographic problems: Identity-based encryption Short signatures Non-interactive authenticated key agreement Pairing computation is the most expensive operation in PBC. Important: Make it faster! Diego F. Aranha Software implementation of pairings
Objective Explore new ways to accelerate serial and parallel implementations of cryptographic pairings: Maximize throughput Minimize latency Applications: servers, real-time services. Contributions Lazy reduction in extension fields Elimination of penalty for negative parameterizations Compressed cyclotomic squarings Parallelization of Miller’s Algorithm Delayed squarings and new formulations Notes on high security levels and current state-of-the-art Diego F. Aranha Software implementation of pairings
Bilinear pairings Let G 1 = � P � and G 2 = � Q � be additive groups and G T be a multiplicative group such that | G 1 | = | G 2 | = | G T | = prime n . An efficiently-computable map e : G 1 × G 2 → G T is an admissible bilinear map if the following properties are satisfied: 1 Bilinearity: given ( V , W ) ∈ G 1 × G 2 and ( a , b ) ∈ Z ∗ q : e ( aV , bW ) = e ( V , W ) ab = e ( abV , W ) = e ( V , abW ). 2 Non-degeneracy: e ( P , Q ) � = 1 G T , where 1 G T is the identity of the group G T . Diego F. Aranha Software implementation of pairings
Bilinear pairings Diego F. Aranha Software implementation of pairings
Bilinear pairings If G 1 = G 2 , the pairing is symmetric . Diego F. Aranha Software implementation of pairings
Barreto-Naehrig curves Let u be an integer such that p and n below are prime: p = 36 u 4 + 36 u 3 + 24 u 2 + 6 u + 1 n = 36 u 4 + 36 u 3 + 18 u 2 + 6 u + 1 Then E : y 2 = x 3 + b , b ∈ F p is a curve of order n and embedding degree k = 12. Example: u = − (2 62 + 2 55 + 1) , b = 2 (implementation-friendly). Diego F. Aranha Software implementation of pairings
Pairing computation The pairing e r ( P , Q ) is defined by the evaluation of f r , P at a divisor related to Q . [Miller 1986] constructed f r , P in stages combining Miller functions evaluated at divisors. Diego F. Aranha Software implementation of pairings
Pairing computation Let l U , V be the line equation through points U , V ∈ E ( F q k ) and v U the shorthand for l U , − U . For any integers a and b , we have: 1 f a + b , P ( D ) = f a , P ( D ) · f b , P ( D ) · l aP , bP ( D ) v ( a + b ) P ( D ); 2 f 2 a , P ( D ) = f a , P ( D ) 2 · l aP , aP ( D ) v 2 aP ( D ) ; l ( a ) P , P ( D ) 3 f a +1 , P ( D ) = f a , P ( D ) · v ( a +1) P ( D ) . [Barreto et al. 2002] showed how to evaluate f r , P at Q using the final exponentiation in the Tate pairing. Diego F. Aranha Software implementation of pairings
Pairing computation Algorithm 1 Miller’s Algorithm. Input: r = � log 2 r i =0 r i 2 i , P , Q . Output: e r ( P , Q ) . 1: T ← P 2: f ← 1 3: for i = ⌊ log 2 ( r ) ⌋ − 1 downto 0 do f ← f 2 · l T , T ( Q ) 4: T ← 2 T 5: if r i = 1 then 6: f ← f · l T , P ( Q ) 7: T ← T + P 8: end if 9: 10: end for 11: return f ( q k − 1) / n Diego F. Aranha Software implementation of pairings
Asymmetric pairing a opt : G 2 × G 1 → G T p 12 − 1 ( Q , P ) → ( f r , Q ( P ) · l rQ ,π p ( Q ) ( P ) · l rQ + π p ( Q ) , − π 2 p ( Q ) ( P )) n with r = 6 u + 2 , G 1 = E ( F p ) , G 2 = E ′ ( F p 2 )[ n ]. The towering is: F p 2 = F p [ i ] / ( i 2 − β ), where β = − 1. F p 4 = F p 2 [ s ] / ( s 2 − ξ ), where ξ = 1 + i . F p 6 = F p 2 [ v ] / ( v 3 − ξ ), where ξ = 1 + i . F p 12 = F p 4 [ t ] / ( t 3 − s ) or F p 6 [ w ] / ( w 2 − v ). Diego F. Aranha Software implementation of pairings
Generalized lazy reduction Intuitively, it is a trade-off between addition and modular reduction: ( a · b ) mod p + ( c · d ) mod p = ( a · b + c · d ) mod p Observation: Pairings use non-sparse primes for F p ! Diego F. Aranha Software implementation of pairings
Generalized lazy reduction Intuitively, it is a trade-off between addition and modular reduction: ( a · b ) mod p + ( c · d ) mod p = ( a · b + c · d ) mod p Observation: Pairings use non-sparse primes for F p ! Previous state-of-the-art (3 M + 2 R in F p 2 ): a · b = ( a 0 b 0 + a 1 b 1 β ) + [( a 0 + a 1 )( b 0 + b 1 ) − a 0 b 0 − a 1 b 1 ] i , For k = 2 i 3 j , total of (3 i · 6 j ) M + (2 · 3 i − 1 · 6 j ) R . Diego F. Aranha Software implementation of pairings
Generalized lazy reduction Idea: Suppose F p 2 is a higher extension and apply recursively! Any component c of an element in F p k is ultimately computed as c = � ± a i b j mod p , requiring a single reduction. New state-of-the-art: total of (3 i · 6 j ) M + kR . Diego F. Aranha Software implementation of pairings
Generalized lazy reduction Idea: Suppose F p 2 is a higher extension and apply recursively! Any component c of an element in F p k is ultimately computed as c = � ± a i b j mod p , requiring a single reduction. New state-of-the-art: total of (3 i · 6 j ) M + kR . Remark 1: Montgomery bounds should be maintained for intermediate results. Choose | p | acoordingly. Same idea applies to arithmetic in E ′ ( F p 2 ). Remark 2: Example: Multiplication in F p 12 goes from 54 M + 36 R to 54 M + 12 R . In total, 40% of reductions are saved. Diego F. Aranha Software implementation of pairings
Removing the inversion penalty Consider ( p 12 − 1) / n = ( p 6 − 1)( p 2 + 1)( p 4 − p 2 + 1) / n . The hard part is ( p 4 − p 2 + 1) / n which requires 3 | u | -th powers. If u < 0, from pairing definition: � p 12 − 1 f | r | , Q ( P ) − 1 · h � a opt ( Q , P ) = . n By distributing the power ( p 12 − 1) / n , we can compute instead: � p 12 − 1 f | r | , Q ( P ) p 6 · h � n a opt ( Q , P ) = . Diego F. Aranha Software implementation of pairings
Revised pairing computation Algorithm 2 Miller’s Algorithm for general r , even k . Input: r = � log 2 r i =0 r i 2 i , P , Q . Output: e r ( P , Q ) . 1: T ← P 2: f ← 1 3: for i = ⌊ log 2 ( r ) ⌋ − 1 downto 0 do f ← f 2 · l T , T ( Q ) 4: T ← 2 T 5: if r i = 1 then 6: f ← f · l T , P ( Q ) 7: T ← T + P 8: end if 9: 10: end for 11: if u < 0 then T ← − T , f ← f q k / 2 12: return f ( q k − 1) / n Diego F. Aranha Software implementation of pairings
Compressed cyclotomic squarings Consider F p 12 = F p 4 [ t ] / ( t 3 − s ). i =0 ( g 2 i + g 2 i +1 s ) t i ∈ G φ 6 ( F p 2 ) and Let g = � 2 g 2 = � 2 i =0 ( h 2 i + h 2 i +1 s ) t i with g i , h i ∈ F p 2 . Given C ( g ) = [ g 2 , g 3 , g 4 , g 5 ], it is efficient to compute C ( g 2 ) = [ h 2 , h 3 , h 4 , h 5 ] . Important: Decompression map D requires one inversion in F p 2 . Diego F. Aranha Software implementation of pairings
Compressed cyclotomic squarings Recall that | u | = 2 62 + 2 55 + 1. Idea: g | u | can now be computed in three steps: 1 Compute C ( g 2 i ) for 1 ≤ i ≤ 62 and store C ( g 2 55 ) and C ( g 2 62 ) 2 Compute D ( C ( g 2 55 )) = g 2 55 and D ( C ( g 2 62 )) = g 2 62 3 Compute g | u | = g 2 62 · g 2 55 · g Remark: Montgomery’s simultaneous inversion allows simultaneous decompression. Example: Computing a | u | -th power is now 30% faster. Diego F. Aranha Software implementation of pairings
Implementation results Table: Operation counts for different implementations of the Optimal Ate pairing at the 128-bit security level. Work Phase Operations in F p ML 6992 M + 5040 R Beuchat et al. 2010 FE 4647 M + 4244 R ML+FE 11639 M + 9284 R ML 6504 M + 2736 R Aranha et al. 2011 FE 3648 M + 1926 R ML+FE 10152 M + 4662 R [Pereira et al. 2011] has a slightly faster operation count, but which produces a slower implementation in the target platform. Diego F. Aranha Software implementation of pairings
Implementation results Table: Timings in cycles for the asymmetric setting on 64-bit processors. Beuchat et al. 2010 Operation Phenom II Core i7 Opteron Core 2 Duo Mult in F p 2 440 435 443 590 Squaring in F p 2 353 342 355 479 Miller Loop 1,338,000 1,330,000 1,360,000 1,781,000 Final Exp. 1,020,000 1,000,000 1,040,000 1,370,000 Pairing 2,358,000 2,330,000 2,400,000 3,151,000 Aranha et al. 2011 Operation Phenom II Core i5 Opteron Core 2 Duo Mult in F p 2 368 412 390 560 Squaring in F p 2 288 328 295 451 Miller Loop 898,000 978,000 988,000 1,275,000 Final Exp. 664,000 710,000 722,000 919,000 Pairing 1,562,000 1,688,000 1,710,000 2,194,000 Improvement 34% 28% 29% 30% Important: Latency of around 0.5 milisec in a 3GHz Phenom II X4. Diego F. Aranha Software implementation of pairings
Parallelization Property of Miller functions f a · b , P ( D ) = f b , P ( D ) a · f a , bP ( D ) Diego F. Aranha Software implementation of pairings
Recommend
More recommend