faster implementation of pairings
play

Faster Implementation of Pairings Francisco Rodr guez-Henr quez - PowerPoint PPT Presentation

ECC 2010 Redmond, USA Faster Implementation of Pairings Francisco Rodr guez-Henr quez CINVESTAV, IPN, Mexico City, Mexico Joint work with: Jean-Luc Beuchat LCIS, University of Tsukuba, Japan enaire, LIP, Nicolas Brisebarre


  1. ❋ ❋ ❋ ❋ ❋ Security considerations for Symmetric Pairings e : E ( ❋ p m )[ ℓ ] × E ( ❋ p m )[ ℓ ] → µ ℓ ⊆ ❋ × ˆ p km The discrete logarithm problem should be hard in both ● 1 and ● τ Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (6 / 49)

  2. Security considerations for Symmetric Pairings e : E ( ❋ p m )[ ℓ ] × E ( ❋ p m )[ ℓ ] → µ ℓ ⊆ ❋ × ˆ p km The discrete logarithm problem should be hard in both ● 1 and ● τ Base field ( ❋ p m ) ❋ 2 m ❋ 3 m Lower security ( ∼ 2 64 ) m = 239 m = 97 Medium security ( ∼ 2 80 ) m = 373 m = 163 Higher security ( ∼ 2 128 ) m = 1103 m = 503 ❋ 2 m : simpler finite field arithmetic ❋ 3 m : smaller field extension Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (6 / 49)

  3. ❋ ❋ ❋ ❋ ❋ ❋ ❋ Computation of the Tate pairing e : E ( ❋ p m )[ ℓ ] × E ( ❋ p m )[ ℓ ] → µ ℓ ⊆ ❋ × ˆ p km Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (7 / 49)

  4. ❋ ❋ ❋ Computation of the Tate pairing e : E ( ❋ p m )[ ℓ ] × E ( ❋ p m )[ ℓ ] → µ ℓ ⊆ ❋ × ˆ p km Arithmetic over ❋ p m : ◮ polynomial basis: ❋ p m ∼ = ❋ p [ x ] / ( f ( x )) ◮ f ( x ), degree- m polynomial irreducible over ❋ p Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (7 / 49)

  5. ❋ Computation of the Tate pairing e : E ( ❋ p m )[ ℓ ] × E ( ❋ p m )[ ℓ ] → µ ℓ ⊆ ❋ × ˆ p km Arithmetic over ❋ p m : ◮ polynomial basis: ❋ p m ∼ = ❋ p [ x ] / ( f ( x )) ◮ f ( x ), degree- m polynomial irreducible over ❋ p Arithmetic over ❋ × p km : ◮ tower-field representation ◮ only arithmetic over the underlying field ❋ p m Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (7 / 49)

  6. Computation of the Tate pairing e : E ( ❋ p m )[ ℓ ] × E ( ❋ p m )[ ℓ ] → µ ℓ ⊆ ❋ × ˆ p km Arithmetic over ❋ p m : ◮ polynomial basis: ❋ p m ∼ = ❋ p [ x ] / ( f ( x )) ◮ f ( x ), degree- m polynomial irreducible over ❋ p Arithmetic over ❋ × p km : ◮ tower-field representation ◮ only arithmetic over the underlying field ❋ p m Operations over ❋ p m : ◮ O ( m ) additions / subtractions ◮ O ( m ) multiplications ◮ O ( m ) Frobenius maps ( a �→ a p , i.e. squarings or cubings) ◮ 1 inversion Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (7 / 49)

  7. Computation of the Tate pairing e : E ( ❋ p m )[ ℓ ] × E ( ❋ p m )[ ℓ ] → µ ℓ ⊆ ❋ × ˆ p km Arithmetic over ❋ p m : ◮ polynomial basis: ❋ p m ∼ = ❋ p [ x ] / ( f ( x )) ◮ f ( x ), degree- m polynomial irreducible over ❋ p Arithmetic over ❋ × p km : ◮ tower-field representation ◮ only arithmetic over the underlying field ❋ p m Operations over ❋ p m : ◮ O ( m ) additions / subtractions ◮ O ( m ) multiplications ◮ O ( m ) Frobenius maps ( a �→ a p , i.e. squarings or cubings) ◮ 1 inversion A first idea: an all-in-one unified operator: ◮ shared resources ◮ scalable architecture Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (7 / 49)

  8. Motivations High speed is more important than low resources for some cryptographic applications Explore the other end of the area vs. time tradeoff: ◮ faster but larger than the unified operator ◮ what about the area-time product? Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (8 / 49)

  9. Motivations High speed is more important than low resources for some cryptographic applications Explore the other end of the area vs. time tradeoff: ◮ faster but larger than the unified operator ◮ what about the area-time product? Accelerate the computation by extracting as much parallelism as possible... ... Without increasing dramatically the resource requirements Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (8 / 49)

  10. ❋ ❋ Computation of the η T pairing The Tate pairing over E ( ❋ p m ) is computed in two main steps ˆ e ( P , Q ) Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (9 / 49)

  11. ❋ ❋ Computation of the η T pairing The Tate pairing over E ( ❋ p m ) is computed in two main steps ˆ e ( P , Q ) = η T ( P , Q ) Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (9 / 49)

  12. ❋ ❋ Computation of the η T pairing The Tate pairing over E ( ❋ p m ) is computed in two main steps e ( P , Q ) = η T ( P , Q ) M ˆ Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (9 / 49)

  13. Computation of the η T pairing The Tate pairing over E ( ❋ p m ) is computed in two main steps e ( P , Q ) = η T ( P , Q ) M ˆ Computation of the η T pairing ◮ via Miller’s algorithm: loop of ( m + 1) / 2 iterations ◮ result only defined modulo N -th powers in ❋ × p km , with N = # E ( ❋ p m ) Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (9 / 49)

  14. Computation of the η T pairing The Tate pairing over E ( ❋ p m ) is computed in two main steps e ( P , Q ) = η T ( P , Q ) M ˆ Computation of the η T pairing ◮ via Miller’s algorithm: loop of ( m + 1) / 2 iterations ◮ result only defined modulo N -th powers in ❋ × p km , with N = # E ( ❋ p m ) Final exponentiation by M = ( p km − 1) / N ◮ required to obtain a unique value for each congruence class ◮ example in characteristic 3 ( k = 6 and N = 3 m + 1 ± 3 ( m +1) / 2 ): 3 6 m − 1 3 3 m − 1 (3 m + 1) � 3 m + 1 ∓ 3 ( m +1) / 2 � � � M = 3 m + 1 ± 3 ( m +1) / 2 = ◮ exploit the special form of the exponent: ad-hoc algorithm Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (9 / 49)

  15. Computation of the η T pairing The Tate pairing over E ( ❋ p m ) is computed in two main steps e ( P , Q ) = η T ( P , Q ) M ˆ Computation of the η T pairing ◮ via Miller’s algorithm: loop of ( m + 1) / 2 iterations ◮ result only defined modulo N -th powers in ❋ × p km , with N = # E ( ❋ p m ) Final exponentiation by M = ( p km − 1) / N ◮ required to obtain a unique value for each congruence class ◮ example in characteristic 3 ( k = 6 and N = 3 m + 1 ± 3 ( m +1) / 2 ): 3 6 m − 1 3 3 m − 1 (3 m + 1) � 3 m + 1 ∓ 3 ( m +1) / 2 � � � M = 3 m + 1 ± 3 ( m +1) / 2 = ◮ exploit the special form of the exponent: ad-hoc algorithm Two distinct computational requirements Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (9 / 49)

  16. Computation of the η T pairing The Tate pairing over E ( ❋ p m ) is computed in two main steps e ( P , Q ) = η T ( P , Q ) M ˆ Computation of the η T pairing ◮ via Miller’s algorithm: loop of ( m + 1) / 2 iterations ◮ result only defined modulo N -th powers in ❋ × p km , with N = # E ( ❋ p m ) Final exponentiation by M = ( p km − 1) / N ◮ required to obtain a unique value for each congruence class ◮ example in characteristic 3 ( k = 6 and N = 3 m + 1 ± 3 ( m +1) / 2 ): 3 6 m − 1 3 3 m − 1 (3 m + 1) � 3 m + 1 ∓ 3 ( m +1) / 2 � � � M = 3 m + 1 ± 3 ( m +1) / 2 = ◮ exploit the special form of the exponent: ad-hoc algorithm Two distinct computational requirements ⇒ use two distinct coprocessors Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (9 / 49)

  17. ❋ ❋ ❋ ❋ ❋ ❋ ❋ ❋ ❋ Reduced Tate pairing Reduced Tate pairing Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (10 / 49)

  18. ❋ ❋ ❋ ❋ ❋ ❋ Reduced Tate pairing E ( ❋ 3 m )[ ℓ ] Reduced Tate pairing E ( ❋ 3 m )[ ℓ ] Input: two points P and Q in E ( ❋ 3 m )[ ℓ ] Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (10 / 49)

  19. ❋ ❋ ❋ ❋ Reduced Tate pairing E ( ❋ 3 m )[ ℓ ] µ ℓ ⊆ ❋ × 3 6 m Reduced Tate pairing E ( ❋ 3 m )[ ℓ ] Input: two points P and Q in E ( ❋ 3 m )[ ℓ ] Output: an ℓ -th root of unity in the extension ❋ × 3 6 m Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (10 / 49)

  20. ❋ ❋ ❋ ❋ Reduced Tate pairing E ( ❋ 3 m )[ ℓ ] µ ℓ ⊆ ❋ × 3 6 m Reduced Tate pairing E ( ❋ 3 m )[ ℓ ] Input: two points P and Q in E ( ❋ 3 m )[ ℓ ] Output: an ℓ -th root of unity in the extension ❋ × 3 6 m Two very different steps Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (10 / 49)

  21. ❋ ❋ ❋ Reduced Tate pairing E ( ❋ 3 m )[ ℓ ] µ ℓ ⊆ ❋ × 3 6 m Non-reduced ❋ × 3 6 m pairing Reduced Tate pairing (iterative algorithm) E ( ❋ 3 m )[ ℓ ] Input: two points P and Q in E ( ❋ 3 m )[ ℓ ] Output: an ℓ -th root of unity in the extension ❋ × 3 6 m Two very different steps Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (10 / 49)

  22. ❋ ❋ ❋ Reduced Tate pairing E ( ❋ 3 m )[ ℓ ] µ ℓ ⊆ ❋ × 3 6 m Non-reduced ❋ × Final 3 6 m pairing exponentiation Reduced Tate pairing (iterative (irregular algorithm) computation) E ( ❋ 3 m )[ ℓ ] Input: two points P and Q in E ( ❋ 3 m )[ ℓ ] Output: an ℓ -th root of unity in the extension ❋ × 3 6 m Two very different steps Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (10 / 49)

  23. Reduced Tate pairing E ( ❋ 3 m )[ ℓ ] E ( ❋ 3 m )[ ℓ ] µ ℓ ⊆ ❋ × µ ℓ ⊆ ❋ × 3 6 m 3 6 m Non-reduced Non-reduced ❋ × Final Final 3 6 m pairing pairing exponentiation exponentiation Reduced Tate pairing (iterative (iterative (irregular (irregular algorithm) algorithm) computation) computation) E ( ❋ 3 m )[ ℓ ] E ( ❋ 3 m )[ ℓ ] Input: two points P and Q in E ( ❋ 3 m )[ ℓ ] Output: an ℓ -th root of unity in the extension ❋ × 3 6 m Two very different steps Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (10 / 49)

  24. Two coprocessors for the η T pairing The two operations are purely sequential Only one active coprocessor at every moment Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (11 / 49)

  25. Two coprocessors for the η T pairing The two operations are purely sequential Only one active coprocessor at every moment Pipeline the data between the two coprocessors Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (11 / 49)

  26. Two coprocessors for the η T pairing The two operations are purely sequential Only one active coprocessor at every moment Pipeline the data between the two coprocessors ◮ both of them are kept busy ◮ higher throughput Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (11 / 49)

  27. Two coprocessors for the η T pairing The two operations are purely sequential Only one active coprocessor at every moment Pipeline the data between the two coprocessors ◮ both of them are kept busy ◮ higher throughput Balance the computation time between the two coprocessors Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (11 / 49)

  28. η T pairing algorithm η T : E ( ❋ 3 m )[ ℓ ] × E ( ❋ 3 m )[ ℓ ] → ❋ × 3 6 m Three tasks per iteration: ➀ update the coordinates ➁ compute the line equation ➂ accumulate the new factor Total cost: 17 × , 4 Frobenius/inverse Frobenius and 30 + over ❋ 3 m Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (12 / 49)

  29. η T pairing algorithm η T : E ( ❋ 3 m )[ ℓ ] × E ( ❋ 3 m )[ ℓ ] → ❋ × 3 6 m Three tasks per iteration: ➀ update the coordinates ➁ compute the line equation ➂ accumulate the new factor Total cost: 17 × , 4 Frobenius/inverse Frobenius and 30 + over ❋ 3 m Cost of the inverse Frobenius: Same as the Frobenius Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (12 / 49)

  30. Accelerating the η T pairing Total cost: 17 × , 2 Frobenius and inverse Frobenius and 30 + over ❋ 3 m per iteration ◮ Frobenius/inverse Frobenius and +: cheap and fast operations Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (13 / 49)

  31. Accelerating the η T pairing Total cost: 17 × , 2 Frobenius and inverse Frobenius and 30 + over ❋ 3 m per iteration ◮ Frobenius/inverse Frobenius and +: cheap and fast operations ◮ critical operation: × Need for a fast parallel multiplier: Karatsuba Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (13 / 49)

  32. Accelerating the η T pairing Total cost: 17 × , 2 Frobenius and inverse Frobenius and 30 + over ❋ 3 m per iteration ◮ Frobenius/inverse Frobenius and +: cheap and fast operations ◮ critical operation: × Need for a fast parallel multiplier: Karatsuba Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (13 / 49)

  33. A parallel Karatsuba multiplier ◮ fully parallel: all sub-products are computed in parallel ◮ pipelined architecture: higher clock frequency, one product per cycle Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (14 / 49)

  34. A parallel Karatsuba multiplier ◮ fully parallel: all sub-products are computed in parallel ◮ pipelined architecture: higher clock frequency, one product per cycle ◮ sub-products recursively implemented as Karatsuba-Ofman multipliers Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (14 / 49)

  35. A parallel Karatsuba multiplier ◮ fully parallel: all sub-products are computed in parallel ◮ pipelined architecture: higher clock frequency, one product per cycle ◮ sub-products recursively implemented as Karatsuba-Ofman multipliers ◮ support for other variants: odd-even split, 3-way split, ... Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (14 / 49)

  36. A parallel Karatsuba multiplier ◮ fully parallel: all sub-products are computed in parallel ◮ pipelined architecture: higher clock frequency, one product per cycle ◮ sub-products recursively implemented as Karatsuba-Ofman multipliers ◮ support for other variants: odd-even split, 3-way split, ... ◮ final reduction modulo the irreducible polynomial f Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (14 / 49)

  37. Accelerating the η T pairing η T coprocessor based on a single large multiplier: ◮ parallel Karatsuba architecture ◮ 7-stage pipeline ◮ one product per cycle Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (15 / 49)

  38. Accelerating the η T pairing η T coprocessor based on a single large multiplier: ◮ parallel Karatsuba architecture ◮ 7-stage pipeline ◮ one product per cycle Challenge: keep the multiplier busy at all times Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (15 / 49)

  39. Accelerating the η T pairing η T coprocessor based on a single large multiplier: ◮ parallel Karatsuba architecture ◮ 7-stage pipeline ◮ one product per cycle Challenge: keep the multiplier busy at all times Careful scheduling to avoid pipeline bubbles (idle cycles): ◮ ensure that multiplication operands are always available ◮ avoid memory congestion issues We managed to accomplish that: our processor computes Miller loop in just 17 · ( m + 3) / 2 clock cycles (considering the initialization phase) Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (15 / 49)

  40. A parallel operator for the η T pairing Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (16 / 49)

  41. ❋ ❋ The final exponentiation e ( P , Q ) as η T ( P , Q ) M with η T ( P , Q ) ∈ ❋ × Compute ˆ 3 6 m and 3 3 m − 1 (3 m + 1) � 3 m + 1 ∓ 3 ( m +1) / 2 � � � M = Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (17 / 49)

  42. ❋ The final exponentiation e ( P , Q ) as η T ( P , Q ) M with η T ( P , Q ) ∈ ❋ × Compute ˆ 3 6 m and 3 3 m − 1 (3 m + 1) � 3 m + 1 ∓ 3 ( m +1) / 2 � � � M = Operations over ❋ 3 m : 73 × , 3 m + 3 Frobenius, 3 m + 175 +, and 1 inversion Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (17 / 49)

  43. ❋ The final exponentiation e ( P , Q ) as η T ( P , Q ) M with η T ( P , Q ) ∈ ❋ × Compute ˆ 3 6 m and 3 3 m − 1 (3 m + 1) � 3 m + 1 ∓ 3 ( m +1) / 2 � � � M = Operations over ❋ 3 m : 73 × , 3 m + 3 Frobenius, 3 m + 175 +, and 1 inversion ( ∼ log m × and m − 1 Frobenius) Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (17 / 49)

  44. The final exponentiation e ( P , Q ) as η T ( P , Q ) M with η T ( P , Q ) ∈ ❋ × Compute ˆ 3 6 m and 3 3 m − 1 (3 m + 1) � 3 m + 1 ∓ 3 ( m +1) / 2 � � � M = Operations over ❋ 3 m : 73 × , 3 m + 3 Frobenius, 3 m + 175 +, and 1 inversion ( ∼ log m × and m − 1 Frobenius) Cost of the η T pairing: ◮ ( m + 1) / 2 iterations ◮ 17 × , 10 Frobenius and 30 + over ❋ 3 m per iteration Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (17 / 49)

  45. The final exponentiation e ( P , Q ) as η T ( P , Q ) M with η T ( P , Q ) ∈ ❋ × Compute ˆ 3 6 m and 3 3 m − 1 (3 m + 1) � 3 m + 1 ∓ 3 ( m +1) / 2 � � � M = Operations over ❋ 3 m : 73 × , 3 m + 3 Frobenius, 3 m + 175 +, and 1 inversion ( ∼ log m × and m − 1 Frobenius) Cost of the η T pairing: ◮ ( m + 1) / 2 iterations ◮ 17 × , 10 Frobenius and 30 + over ❋ 3 m per iteration The final exponentiation is much cheaper than the η T pairing Challenge for the final exponentiation: ◮ computation in the same time as the η T pairing ◮ ... using as few resources as possible Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (17 / 49)

  46. The final exponentiation Design the smallest architecture possible supporting all the required operations over ❋ 3 m purely sequential scheduling Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (18 / 49)

  47. The final exponentiation Design the smallest architecture possible supporting all the required operations over ❋ 3 m purely sequential scheduling Although some parallelism is required. Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (18 / 49)

  48. The final exponentiation Design the smallest architecture possible supporting all the required operations over ❋ 3 m purely sequential scheduling Although some parallelism is required. We found out that the usage of the inverse Frobenius operator is advantageous for computing the final exponentiation (as long as the irreducible polynomials are inverse-Frobenius friendly) Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (18 / 49)

  49. The final exponentiation Design the smallest architecture possible supporting all the required operations over ❋ 3 m purely sequential scheduling Although some parallelism is required. We found out that the usage of the inverse Frobenius operator is advantageous for computing the final exponentiation (as long as the irreducible polynomials are inverse-Frobenius friendly) New coprocessor with two arithmetic units: ◮ a standalone multiplier, based on a parallel-serial scheme ◮ a unified operator supporting addition/subtraction, inverse Frobenius map and inverse double Frobenius map Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (18 / 49)

  50. A coprocessor for the final exponentiation Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (19 / 49)

  51. Agenda Context 1 Hardware accelerator for the Tate pairing over supersingular curves 2 Implementation Results in Hardware Software accelerator for the Tate pairing over supersingular curves 3 Computing the non-reduced pairing Final exponentiation Implementation results Optimal Ate Pairing over Barreto-Naehrig Curves 4 Barreto–Naehrig Curves Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (20 / 49)

  52. Hardware accelerators Calculation time [ µ s] 675.5 µ s / ❋ 2 557 1000 100.8 µ s / ❋ 2 457 100 16.9 µ s / ❋ 3 313 20.9 µ s / ❋ 3 97 12.8 µ s / ❋ 3 193 6.2 µ s / ❋ 3 97 Virtex-II Pro 10 Virtex-4 LX 60 65 70 75 80 85 90 95 100 105 110 Security [bits] Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (21 / 49)

  53. Hardware implementation notes Our Xilinx FPGA implementation, significantly improved the computation time of all the hardware pairing coprocessors for supersingular curves previously published Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (22 / 49)

  54. Hardware implementation notes Our Xilinx FPGA implementation, significantly improved the computation time of all the hardware pairing coprocessors for supersingular curves previously published (a bit Surprisingly) our architecture also enjoys the best area/time trade-off performance among supersingular pairing accelerators Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (22 / 49)

  55. Hardware implementation notes Our Xilinx FPGA implementation, significantly improved the computation time of all the hardware pairing coprocessors for supersingular curves previously published (a bit Surprisingly) our architecture also enjoys the best area/time trade-off performance among supersingular pairing accelerators However, because we exceeded the FPGA’s capacity, we could only achieve up to 109 bits of security Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (22 / 49)

  56. Hardware implementation notes Our Xilinx FPGA implementation, significantly improved the computation time of all the hardware pairing coprocessors for supersingular curves previously published (a bit Surprisingly) our architecture also enjoys the best area/time trade-off performance among supersingular pairing accelerators However, because we exceeded the FPGA’s capacity, we could only achieve up to 109 bits of security Although it was not discussed here, we also implemented the Tate pairing over char 2. Experimentally, we observed that our char 2 and char 3 accelerators achieve almost the same time performance Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (22 / 49)

  57. Hardware implementation notes Our Xilinx FPGA implementation, significantly improved the computation time of all the hardware pairing coprocessors for supersingular curves previously published (a bit Surprisingly) our architecture also enjoys the best area/time trade-off performance among supersingular pairing accelerators However, because we exceeded the FPGA’s capacity, we could only achieve up to 109 bits of security Although it was not discussed here, we also implemented the Tate pairing over char 2. Experimentally, we observed that our char 2 and char 3 accelerators achieve almost the same time performance In the design process of our char 2 accelerator we found the following undocumented family of square-root friendly irreducible pentanomials: f ( x ) = x m + x m − d + x m − 2 d + x d + 1. all technical details of these designs can be found in the preprint manuscripts eprint 2009/122 and eprint 2009/398 Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (22 / 49)

  58. Agenda Context 1 Hardware accelerator for the Tate pairing over supersingular curves 2 Implementation Results in Hardware Software accelerator for the Tate pairing over supersingular curves 3 Computing the non-reduced pairing Final exponentiation Implementation results Optimal Ate Pairing over Barreto-Naehrig Curves 4 Barreto–Naehrig Curves Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (23 / 49)

  59. ❋ ❋ ❋ ❋ ❋ Computing the non-reduced pairing η T pairing: shorter loop for i ← 0 to ( m − 1) / 2 do end for Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (24 / 49)

  60. ❋ ❋ ❋ ❋ ❋ Computing the non-reduced pairing η T pairing: shorter loop for i ← 0 to ( m − 1) / 2 do Based on Miller’s algorithm: √ x P √ y P x P ← y P ← ; 3 3 x Q ← x 3 y Q ← y 3 ; Q Q t ← x P + x Q u ← y P y Q S ← − t 2 ± u σ − t ρ − ρ 2 R ← R · S end for Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (24 / 49)

  61. ❋ ❋ ❋ ❋ ❋ Computing the non-reduced pairing η T pairing: shorter loop for i ← 0 to ( m − 1) / 2 do Based on Miller’s algorithm: √ x P √ y P √ · √ x P √ y P x P ← x P ← ; y P ← y P ← 2 ; 3 3 3 3 3 ➀ update of point coordinates ➀ x Q ← x 3 x Q ← x 3 ; y Q ← y 3 y Q ← y 3 2 ( · ) 3 ; Q Q Q Q t ← x P + x Q u ← y P y Q S ← − t 2 ± u σ − t ρ − ρ 2 R ← R · S end for Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (24 / 49)

  62. ❋ ❋ ❋ ❋ ❋ Computing the non-reduced pairing η T pairing: shorter loop for i ← 0 to ( m − 1) / 2 do Based on Miller’s algorithm: √ x P √ y P √ · √ x P √ y P x P ← x P ← ; y P ← y P ← 2 ; 3 3 3 3 3 ➀ update of point coordinates ➀ x Q ← x 3 x Q ← x 3 ; y Q ← y 3 y Q ← y 3 2 ( · ) 3 ; Q Q Q Q ➁ computation of line equation t ← x P + x Q t ← x P + x Q ; u ← y P y Q u ← y P y Q ➁ 2 × , 2 + S ← − t 2 ± u σ − t ρ − ρ 2 S ← − t 2 ± u σ − t ρ − ρ 2 R ← R · S end for Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (24 / 49)

  63. ❋ ❋ ❋ ❋ Computing the non-reduced pairing η T pairing: shorter loop for i ← 0 to ( m − 1) / 2 do Based on Miller’s algorithm: √ x P √ y P √ · √ x P √ y P x P ← x P ← ; y P ← y P ← 2 ; 3 3 3 3 3 ➀ update of point coordinates ➀ x Q ← x 3 x Q ← x 3 ; y Q ← y 3 y Q ← y 3 2 ( · ) 3 ; Q Q Q Q ➁ computation of line equation ➂ accumulation of the new factor t ← x P + x Q t ← x P + x Q ; u ← y P y Q u ← y P y Q ➁ 2 × , 2 + S ← − t 2 ± u σ − t ρ − ρ 2 S ← − t 2 ± u σ − t ρ − ρ 2 ➂ 1 × ( ❋ 3 6 m ) R ← R · S R ← R · S end for Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (24 / 49)

  64. ❋ ❋ ❋ Computing the non-reduced pairing η T pairing: shorter loop for i ← 0 to ( m − 1) / 2 do Based on Miller’s algorithm: √ x P √ y P √ · √ x P √ y P x P ← x P ← ; y P ← y P ← 2 ; 3 3 3 3 3 ➀ update of point coordinates ➀ x Q ← x 3 x Q ← x 3 ; y Q ← y 3 y Q ← y 3 2 ( · ) 3 ; Q Q Q Q ➁ computation of line equation ➂ accumulation of the new factor t ← x P + x Q t ← x P + x Q ; u ← y P y Q u ← y P y Q ➁ 2 × , 2 + S ← − t 2 ± u σ − t ρ − ρ 2 S ← − t 2 ± u σ − t ρ − ρ 2 Multiplication is critical ➂ 1 × ( ❋ 3 6 m ) R ← R · S R ← R · S Comb right-to-left multiplier over end for bar ❋ 3 m Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (24 / 49)

  65. ❋ ❋ Computing the non-reduced pairing η T pairing: shorter loop for i ← 0 to ( m − 1) / 2 do Based on Miller’s algorithm: √ x P √ y P √ · √ x P √ y P x P ← x P ← ; y P ← y P ← 2 ; 3 3 3 3 3 ➀ update of point coordinates ➀ x Q ← x 3 x Q ← x 3 ; y Q ← y 3 y Q ← y 3 2 ( · ) 3 ; Q Q Q Q ➁ computation of line equation ➂ accumulation of the new factor t ← x P + x Q t ← x P + x Q ; u ← y P y Q u ← y P y Q ➁ 2 × , 2 + S ← − t 2 ± u σ − t ρ − ρ 2 S ← − t 2 ± u σ − t ρ − ρ 2 Multiplication is critical ➂ 1 × ( ❋ 3 6 m ) R ← R · S R ← R · S Comb right-to-left multiplier over end for foo bar ❋ 3 m Sparse multiplication over ❋ 3 6 m Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (24 / 49)

  66. ❋ Computing the non-reduced pairing η T pairing: shorter loop for i ← 0 to ( m − 1) / 2 do Based on Miller’s algorithm: √ x P √ y P √ · √ x P √ y P x P ← x P ← ; y P ← y P ← 2 ; 3 3 3 3 3 ➀ update of point coordinates ➀ x Q ← x 3 x Q ← x 3 ; y Q ← y 3 y Q ← y 3 2 ( · ) 3 ; Q Q Q Q ➁ computation of line equation ➂ accumulation of the new factor t ← x P + x Q t ← x P + x Q ; u ← y P y Q u ← y P y Q ➁ 2 × , 2 + S ← − t 2 ± u σ − t ρ − ρ 2 S ← − t 2 ± u σ − t ρ − ρ 2 Multiplication is critical ➂ 1 × ( ❋ 3 6 m ) R ← R · S R ← R · S 15 × , 29 + Comb right-to-left multiplier over end for foo bar ❋ 3 m Sparse multiplication over ❋ 3 6 m ◮ 15 × and 29 + over ❋ 3 m (Beuchat et al. , ARITH 18) Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (24 / 49)

  67. Computing the non-reduced pairing η T pairing: shorter loop for i ← 0 to ( m − 1) / 2 do Based on Miller’s algorithm: √ x P √ y P √ · √ x P √ y P x P ← x P ← ; y P ← y P ← 2 ; 3 3 3 3 3 ➀ update of point coordinates ➀ x Q ← x 3 x Q ← x 3 ; y Q ← y 3 y Q ← y 3 2 ( · ) 3 ; Q Q Q Q ➁ computation of line equation ➂ accumulation of the new factor t ← x P + x Q t ← x P + x Q ; u ← y P y Q u ← y P y Q ➁ 2 × , 2 + S ← − t 2 ± u σ − t ρ − ρ 2 S ← − t 2 ± u σ − t ρ − ρ 2 Multiplication is critical ➂ 1 × ( ❋ 3 6 m ) R ← R · S R ← R · S 15 × , 29 + 12 × , 59 + Comb right-to-left multiplier over end for foo bar ❋ 3 m Sparse multiplication over ❋ 3 6 m ◮ 15 × and 29 + over ❋ 3 m (Beuchat et al. , ARITH 18) ◮ 12 × and 59 + over ❋ 3 m (Gorla et al. , SAC 2007) Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (24 / 49)

  68. Computing the non-reduced pairing First core for i ← 1 to ( m − 1) / 2 do √ · � � x P [ i ] ← x P [ i − 1] ; y P [ i ] ← y P [ i − 1] 2 3 3 3 ➀ x Q [ i ] ← x Q [ i − 1] 3 ; y Q [ i ] ← y Q [ i − 1] 3 2 ( · ) 3 end for for i ← 1 to ( m − 1) / 2 do t ← x P [ i ] + x Q [ i ] 1 + ➁ u ← y P [ i ] y Q [ i ] 1 × S ← − t 2 ± u σ − t ρ − ρ 2 1 × , 1 + ➂ 12 × , 59 + R ← R · S end for Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (25 / 49)

  69. Computing the non-reduced pairing First core Second core for i ← 1 to ( m − 1) / 2 do √ · � � x P [ i ] ← x P [ i − 1] ; y P [ i ] ← y P [ i − 1] 2 3 3 3 ➀ x Q [ i ] ← x Q [ i − 1] 3 ; y Q [ i ] ← y Q [ i − 1] 3 2 ( · ) 3 end for for i ← 1 to ( m − 1) / 2 do for i ← 1 to ( m − 1) / 4 do for i ← ( m − 1) / 4 + 1 to ( m − 1) / 2 do t ← x P [ i ] + x Q [ i ] t ← x P [ i ] + x Q [ i ] 1 + 1 + t ← x P [ i ] + x Q [ i ] 1 + ➁ ➁ u ← y P [ i ] y Q [ i ] u ← y P [ i ] y Q [ i ] 1 × 1 × ➁ u ← y P [ i ] y Q [ i ] 1 × S ← − t 2 ± u σ − t ρ − ρ 2 S ← − t 2 ± u σ − t ρ − ρ 2 S ← − t 2 ± u σ − t ρ − ρ 2 1 × , 1 + 1 × , 1 + 1 × , 1 + ➂ ➂ 12 × , 59 + 12 × , 59 + ➂ R 1 ← R 1 · S 12 × , 59 + R 0 ← R 0 · S R ← R · S end for end for end for Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (25 / 49)

  70. Computing the non-reduced pairing First core Second core for i ← 1 to ( m − 1) / 2 do √ · � � x P [ i ] ← x P [ i − 1] ; y P [ i ] ← y P [ i − 1] 2 3 3 3 ➀ x Q [ i ] ← x Q [ i − 1] 3 ; y Q [ i ] ← y Q [ i − 1] 3 2 ( · ) 3 end for for i ← 1 to ( m − 1) / 4 do for i ← 1 to ( m − 1) / 2 do for i ← ( m − 1) / 4 + 1 to ( m − 1) / 2 do t ← x P [ i ] + x Q [ i ] t ← x P [ i ] + x Q [ i ] 1 + 1 + t ← x P [ i ] + x Q [ i ] 1 + ➁ ➁ u ← y P [ i ] y Q [ i ] u ← y P [ i ] y Q [ i ] 1 × 1 × ➁ u ← y P [ i ] y Q [ i ] 1 × S ← − t 2 ± u σ − t ρ − ρ 2 S ← − t 2 ± u σ − t ρ − ρ 2 S ← − t 2 ± u σ − t ρ − ρ 2 1 × , 1 + 1 × , 1 + 1 × , 1 + ➂ ➂ 12 × , 59 + 12 × , 59 + ➂ R 1 ← R 1 · S 12 × , 59 + R ← R · S R 0 ← R 0 · S end for end for end for R ← R 0 · R 1 15 × , 67 + Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (25 / 49)

  71. Computing the non-reduced pairing First core Second core for i ← 1 to ( m − 1) / 2 do √ · � � x P [ i ] ← x P [ i − 1] ; y P [ i ] ← y P [ i − 1] 2 3 3 3 ➀ x Q [ i ] ← x Q [ i − 1] 3 ; y Q [ i ] ← y Q [ i − 1] 3 2 ( · ) 3 end for for i ← 1 to ( m − 1) / 4 do for i ← 1 to ( m − 1) / 8 do for i ← 1 to ( m − 1) / 2 do for i ← ( m − 1) / 8 + 1 to ( m − 1) / 4 do for i ← ( m − 1) / 4 + 1 to ( m − 1) / 2 do t ← x P [ i ] + x Q [ i ] t ← x P [2 i − 1] + x Q [2 i − 1] t ← x P [ i ] + x Q [ i ] 1 + 1 + 1 + t ← x P [2 i − 1] + x Q [2 i − 1] t ← x P [ i ] + x Q [ i ] 1 + 1 + ➁ ➁ ➁ u ← y P [ i ] y Q [ i ] u ← y P [ i ] y Q [ i ] u ← y P [2 i − 1] y Q [2 i − 1] 1 × 1 × 1 × ➁ ➁ u ← y P [2 i − 1] y Q [2 i − 1] u ← y P [ i ] y Q [ i ] 1 × 1 × S ← − t 2 ± u σ − t ρ − ρ 2 S ← − t 2 ± u σ − t ρ − ρ 2 S ← − t 2 ± u σ − t ρ − ρ 2 S ← − t 2 ± u σ − t ρ − ρ 2 S ← − t 2 ± u σ − t ρ − ρ 2 1 × , 1 + 1 × , 1 + 1 × , 1 + 1 × , 1 + 1 × , 1 + ➂ R 0 ← R 0 · S ➂ ➂ 12 × , 59 + 12 × , 59 + 12 × , 59 + ➂ R 1 ← R 1 · S ➂ R 1 ← R 1 · S 12 × , 59 + 12 × , 59 + R ← R · S R 0 ← R 0 · S t ← x P [2 i ] + x Q [2 i ] 1 + t ← x P [2 i ] + x Q [2 i ] 1 + end for end for end for ➁ u ← y P [2 i ] y Q [2 i ] 1 × ➁ u ← y P [2 i ] y Q [2 i ] 1 × S ← − t 2 ± u σ − t ρ − ρ 2 S ← − t 2 ± u σ − t ρ − ρ 2 1 × , 1 + 1 × , 1 + ➂ R 0 ← R 0 · S ➂ R 1 ← R 1 · S 12 × , 59 + 12 × , 59 + end for end for R ← R 0 · R 1 15 × , 67 + Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (25 / 49)

  72. Computing the non-reduced pairing First core Second core for i ← 1 to ( m − 1) / 2 do √ · � � x P [ i ] ← x P [ i − 1] ; y P [ i ] ← y P [ i − 1] 2 3 3 3 ➀ x Q [ i ] ← x Q [ i − 1] 3 ; y Q [ i ] ← y Q [ i − 1] 3 2 ( · ) 3 end for for i ← 1 to ( m − 1) / 2 do for i ← 1 to ( m − 1) / 4 do for i ← 1 to ( m − 1) / 8 do for i ← 1 to ( m − 1) / 8 do for i ← ( m − 1) / 8 + 1 to ( m − 1) / 4 do for i ← ( m − 1) / 8 + 1 to ( m − 1) / 4 do for i ← ( m − 1) / 4 + 1 to ( m − 1) / 2 do t 0 ← x P [2 i − 1] + x Q [2 i − 1] t ← x P [ i ] + x Q [ i ] t ← x P [2 i − 1] + x Q [2 i − 1] t ← x P [ i ] + x Q [ i ] 1 + 1 + 1 + 1 + t 0 ← x P [2 i − 1] + x Q [2 i − 1] t ← x P [ i ] + x Q [ i ] t ← x P [2 i − 1] + x Q [2 i − 1] 1 + 1 + 1 + ➁ ➁ ➁ u ← y P [ i ] y Q [ i ] u 0 ← y P [2 i − 1] y Q [2 i − 1] u ← y P [ i ] y Q [ i ] u ← y P [2 i − 1] y Q [2 i − 1] 1 × 1 × 1 × 1 × ➁ ➁ u ← y P [ i ] y Q [ i ] u ← y P [2 i − 1] y Q [2 i − 1] u 0 ← y P [2 i − 1] y Q [2 i − 1] 1 × 1 × 1 × S ← − t 2 ± u σ − t ρ − ρ 2 S ← − t 2 ± u σ − t ρ − ρ 2 S ← − t 2 ± u σ − t ρ − ρ 2 S ← − t 2 ± u σ − t ρ − ρ 2 S ← − t 2 ± u σ − t ρ − ρ 2 t 1 ← x P [2 i ] + x Q [2 i ] 1 × , 1 + 1 × , 1 + 1 × , 1 + 1 + t 1 ← x P [2 i ] + x Q [2 i ] 1 × , 1 + 1 + 1 × , 1 + ➁ ➁ u 1 ← y P [2 i ] y Q [2 i ] u 1 ← y P [2 i ] y Q [2 i ] 1 × 1 × ➂ R 0 ← R 0 · S ➂ ➂ 12 × , 59 + 12 × , 59 + 12 × , 59 + ➂ R 1 ← R 1 · S ➂ R 1 ← R 1 · S 12 × , 59 + 12 × , 59 + R ← R · S R 0 ← R 0 · S S ← ( − t 2 0 ± u 0 σ − t 0 ρ − ρ 2 ) · S ← ( − t 2 0 ± u 0 σ − t 0 ρ − ρ 2 ) · 8 × , 13 + 8 × , 13 + t ← x P [2 i ] + x Q [2 i ] 1 + t ← x P [2 i ] + x Q [2 i ] 1 + end for end for end for ( − t 2 1 ± u 1 σ − t 1 ρ − ρ 2 ) ( − t 2 1 ± u 1 σ − t 1 ρ − ρ 2 ) ➁ u ← y P [2 i ] y Q [2 i ] 1 × ➁ u ← y P [2 i ] y Q [2 i ] 1 × ➂ ➂ R 0 ← R 0 · S S ← − t 2 ± u σ − t ρ − ρ 2 15 × , 67 + R 1 ← R 1 · S S ← − t 2 ± u σ − t ρ − ρ 2 15 × , 67 + 1 × , 1 + 1 × , 1 + end for end for ➂ R 0 ← R 0 · S ➂ R 1 ← R 1 · S 12 × , 59 + 12 × , 59 + end for end for R ← R 0 · R 1 15 × , 67 + Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (25 / 49)

  73. Agenda Context 1 Hardware accelerator for the Tate pairing over supersingular curves 2 Implementation Results in Hardware Software accelerator for the Tate pairing over supersingular curves 3 Computing the non-reduced pairing Final exponentiation Implementation results Optimal Ate Pairing over Barreto-Naehrig Curves 4 Barreto–Naehrig Curves Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (26 / 49)

  74. ❋ ❋ ❋ ❋ Final exponentiation Final exponentiation consists of raising ˆ e ( P , Q ) to the exponent, M = 2 4 m − 1 = (2 2 m − 1) · (2 m + 1 − ν 2 ( m +1) / 2 ), N where ν = ( − 1) b when m ≡ 1, 7 (mod 8) and ν = ( − 1) 1 − b in all other cases. Highly sequential computation, Very heterogeneous Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (27 / 49)

  75. ❋ ❋ Final exponentiation Final exponentiation consists of raising ˆ e ( P , Q ) to the exponent, M = 2 4 m − 1 = (2 2 m − 1) · (2 m + 1 − ν 2 ( m +1) / 2 ), N where ν = ( − 1) b when m ≡ 1, 7 (mod 8) and ν = ( − 1) 1 − b in all other cases. Highly sequential computation, Very heterogeneous We perform this operation according to a slightly optimized version: ◮ Raising to the (2 m + 1)-th power. Raising the outcome of Miller’s 2 2 m − 1 � � algorithm to the -th power produces an element U ∈ ❋ 2 4 m of order 2 2 m + 1. This property allows one to save a multiplication over ❋ 2 4 m when raising U to the (2 m + 1)-th power. Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (27 / 49)

  76. Final exponentiation Final exponentiation consists of raising ˆ e ( P , Q ) to the exponent, M = 2 4 m − 1 = (2 2 m − 1) · (2 m + 1 − ν 2 ( m +1) / 2 ), N where ν = ( − 1) b when m ≡ 1, 7 (mod 8) and ν = ( − 1) 1 − b in all other cases. Highly sequential computation, Very heterogeneous We perform this operation according to a slightly optimized version: ◮ Raising to the (2 m + 1)-th power. Raising the outcome of Miller’s 2 2 m − 1 � � algorithm to the -th power produces an element U ∈ ❋ 2 4 m of order 2 2 m + 1. This property allows one to save a multiplication over ❋ 2 4 m when raising U to the (2 m + 1)-th power. m+1 ◮ Raising to the 2 2 -th power. raising an element of ❋ 2 4 m to the 2 i -th power involves 4 i squarings and at most four additions over ❋ 2 m Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (27 / 49)

  77. Final exponentiation Final exponentiation consists of raising ˆ e ( P , Q ) to the exponent, M = 2 4 m − 1 = (2 2 m − 1) · (2 m + 1 − ν 2 ( m +1) / 2 ), N where ν = ( − 1) b when m ≡ 1, 7 (mod 8) and ν = ( − 1) 1 − b in all other cases. Highly sequential computation, Very heterogeneous We perform this operation according to a slightly optimized version: ◮ Raising to the (2 m + 1)-th power. Raising the outcome of Miller’s 2 2 m − 1 � � algorithm to the -th power produces an element U ∈ ❋ 2 4 m of order 2 2 m + 1. This property allows one to save a multiplication over ❋ 2 4 m when raising U to the (2 m + 1)-th power. m+1 ◮ Raising to the 2 2 -th power. raising an element of ❋ 2 4 m to the 2 i -th power involves 4 i squarings and at most four additions over ❋ 2 m Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (27 / 49)

  78. ❋ ❋ ❋ ❋ ❋ Finite field arithmetic Target: multi-core architectures Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (28 / 49)

  79. ❋ ❋ ❋ Finite field arithmetic Target: multi-core architectures Arithmetic over ❋ 2 m and ❋ 3 m : SSE instruction set Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (28 / 49)

  80. Finite field arithmetic Target: multi-core architectures Arithmetic over ❋ 2 m and ❋ 3 m : SSE instruction set Timings are given in clock cycles and were measured on an Intel Core 2 processor working at 2.4 GHz. √ x x p Field p Mult Aranha et al. CT-RSA’10 160 166 4030 ❋ 2 1223 480 749 5438 ❋ 2 1223 Our work CANS’10 ❋ 3 509 900 974 4128 Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (28 / 49)

  81. Agenda Context 1 Hardware accelerator for the Tate pairing over supersingular curves 2 Implementation Results in Hardware Software accelerator for the Tate pairing over supersingular curves 3 Computing the non-reduced pairing Final exponentiation Implementation results Optimal Ate Pairing over Barreto-Naehrig Curves 4 Barreto–Naehrig Curves Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (29 / 49)

  82. ❋ ❋ ❋ ❋ ❋ ❋ Implementation results Timings achieved on an Intel Core2 are given in millions of clock cycles Windows XP 64-bit SP2 environment Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (30 / 49)

  83. Implementation results Timings achieved on an Intel Core2 are given in millions of clock cycles Windows XP 64-bit SP2 environment Security # of Freq. Calc. Curve [bits] cores [GHz] time [Mcycles] E ( ❋ 2 1223 ) 128 1 2.4 18.76 Aranha et al. E ( ❋ 2 1223 ) 128 2 2.4 10.08 CT-RSA’10 E ( ❋ 2 1223 ) 128 4 2.4 5.72 E ( ❋ 3 509 ) 128 1 2.4 18.2 Our work E ( ❋ 3 509 ) 128 2 2.4 10.34 CANS’10 E ( ❋ 3 509 ) 128 4 2.4 7.06 Francisco Rodr´ ıguez-Henr´ ıquez Faster Implementation of Pairings (30 / 49)

Recommend


More recommend