pollard rho on the playstation 3
play

Pollard Rho on the PlayStation 3 Joppe W. Bos 1 Marcelo E. Kaihara 1 - PowerPoint PPT Presentation

Pollard Rho on the PlayStation 3 Joppe W. Bos 1 Marcelo E. Kaihara 1 Peter L. Montgomery 2 1 EPFL IC LACAL, CH-1015 Lausanne, Switzerland 2 Microsoft Research, One Microsoft Way, Redmond, WA 98052, USA RAIM09 October 27 th 2009 LIP ENS Lyon


  1. Pollard Rho on the PlayStation 3 Joppe W. Bos 1 Marcelo E. Kaihara 1 Peter L. Montgomery 2 1 EPFL IC LACAL, CH-1015 Lausanne, Switzerland 2 Microsoft Research, One Microsoft Way, Redmond, WA 98052, USA RAIM’09 October 27 th 2009 LIP ENS Lyon

  2. Motivation Elliptic Curve Cryptography (ECC):  Widely standardized Standard for Efficient Cryptography 2, SEC2 (112-521 bit) Wireless Transport Layer Security Specification (112-224 bit) Digital Signature Standard, FIPS 186-3, NIST (192-521 bit)  Security relies on hardness of solving Elliptic Curve Discrete Logarithm Problem (ECDLP)  Are the standardized key sizes secure?  What is the practical cost of solving the ECDLP? 2

  3. Objective Evaluate cost of solving 112-bit ECC standard ECDLP for small key sizes PlayStation 3: Use broadly available Low price platform Hybrid multi-core architecture Implement Pollard rho on the Cell architecture Design SIMD arithmetic algorithms Optimize modular arithmetic for 112-bit prime 3

  4. ECDLP Settings: p F is an elliptic curve over with odd prime Ε p P ∈ E ( F ) is a point of order n p Q = k ⋅ P ∈ 〈 P 〉 Problem: E, p, P, n Q Given and what is ? k k = log Q P Largest solved instance 109-bit prime field (2002)  It took “10 4 computers (mostly PCs) running 24 hours a day for  549 days”. 4

  5. Solving the ECDLP Pollard rho:  The most efficient algorithm in the literature (for generic curves).  The underlying idea of this method is to search for two distinct pairs c j d ( , ) ( , ) ∈ Z/nZ × Z/nZ such that c i d , j i c c ⋅ P + d ⋅ Q = ⋅ P + ⋅ Q d i i j j c d ( c − ) ⋅ P = ( − ) ⋅ Q = ( − ) k ⋅ P d d d j j i j i i -1 k ≡ ( c − c ) ⋅ ( d − d ) mod n i j j i  J.M. Pollard. Monte Carlo methods for index computation (mod p) . Mathematics of Computation, 32:918-924, 1978. 5

  6. Pollard Rho 〈 P 〉 “Walk” through the set X = c ⋅ P + d ⋅ Q i i i f : 〈 P 〉 → 〈 P 〉 Iteration function X = f ( X ) , i ≥ 0 i + 1 i This sequence eventually collides Expected number of iterations π ⋅ 〈 P 〉 2 6

  7. Optimization I X + X + i 2 Parallel version: distinguish points i 1 and send them to a central server X P.C. van Oorschot and M. J. Wiener, i [1999]. ′ X + X ′ i 2 X + i- 1 i 1 Mark points with a certain property  X ′ e.g., X i =(x i ,y i ), DPT: 2 24 | x i i Communicate them to a central DB X ′  ′ ′ X + to check collisions i- 1 ′ ′ X + i 2 i 1 Leads to a linear speed-up on the X ′ ′ i number of processors. X ′ ′ i- 1 DB 7

  8. Optimization II X + i 1 r-adding walks, E. Teske, [2001]. X + R i 0 〈 P 〉 r Divide into different partitions X + R i 1 h : 〈 P 〉 → [0, r - 1]  X + R R j = c ⋅ P + d ⋅ Q For each partition: i 15 X = ( x , y ) j j i i i X = f ( X ) = X + R X i + 1 i i h ( X ) i- 1 i Use the least significant ≈ partitions random mapping r ≥ 16 4-bit to determine the next partition 8

  9. Optimization III Simultaneous Inversion, trade X + X + inversions for multiplications i 2 i 1 P.L. Montgomery, [1987]. X i Suitable for cryptanalytic purposes ′  X + X ′ X + i 2 i- 1 i 1 Trade M modular inversions for 3(M-1)  X ′ modular multiplications and 1 modular i inversion X ′ ′ ′ X + i- 1 ′ ′ i 2 X + i 1 X ′ ′ i Affine Weierstrass representation X ′ ′ i- 1 Apply to independent walks 9

  10. Optimization IV Negation Map (not used) ( x + , y ) i 1 i + 1 M.J. Wiener and R. J. Zuccherato, [1998]. R ( x i , y ) i i ( x + − , y ) Computation of the negative is cheap ( x − , y ) R i 1 i + 1 i 1 i − 1 i − 1 - P = ( x , − y ) ( x i − , y ) i P Given an equivalence relation ~ on ( x − − , y ) i 1 i − 1 Iterate over the set of equivalence classes P / ~ Reduce search space by a factor of 2 10

  11. The PlayStation 3 The Cell contains 1 “ Power Processor Element ” (PPE) 8 “Synergistic Processing Elements” (SPEs) (6 available to the user in the PS3 under Linux) Characteristics of the SPEs: Synergistic Processing Unit (SPU) Access to 128 registers of 128-bit SIMD operations Dual pipeline (odd and even) In-order processor 256 KB of fast local memory (Local Store) 11

  12. Programming Constraints Memory  The executable and all data should fit in the LS (256KB). Branches  No “smart” dynamic branch prediction.  Instead “prepare-to-branch” instructions to redirect instruction prefetch to branch targets. Instruction set limitations  16 x 16 → 32 bit multipliers (4-SIMD) Dual pipeline  One odd and one even instruction can be dispatched per clock cycle. 12

  13. Arithmetic Using affine Weierstrass representation P , Q ∈ E ( F ) { Ο } P = (x , y ) and Q = (x , y ) p 1 1 2 2 If P ≠ Q then P + Q = (x , y ) 3 3 y − y  if P ≠ Q 2 1  2 x − x x = µ - x - x  2 1 3 1 2 μ =  2 3 x + a y = µ (x - x ) - y  if P = Q 1 3 1 3 1  2 y  1 6 modular multiplications Using Montgomery’s simultaneous inversion 6 modular subtractions and running 1 modular inversions M curves in parallel. M 13

  14. Integer Representation 2 16 Integers A, B, C, D represented in radix m - 1 m - 1 m - 1 m - 1 ∑ ∑ ∑ ∑ A = a i 2 ⋅ 16 ⋅ i B = b i 2 ⋅ 16 ⋅ i C = c i 2 ⋅ 16 ⋅ i D = d i 2 ⋅ 16 ⋅ i i = 0 i = 0 i = 0 i = 0 a b c d 0 0 0 0 16 − bit 16 − bit V[0] =  high low  a b c d i i i i V[i] =   a b c d m − i m − i m − i m − i V[m - 1] = 4 - SIMD

  15. Modular Reduction E ( F ) The prime 112-bit p in the target curve is p p = DB 7 C 2 ABF 62 E 35 E 668076 BEAD 208 B 16 15

  16. Modular Reduction E ( F ) The prime 112-bit p in the target curve is p p = DB 7 C 2 ABF 62 E 35 E 668076 BEAD 208 B 16 2 128 − 3 p = 11 ⋅ 6949 16

  17. Modular Reduction E ( F ) The prime 112-bit p in the target curve is p p = DB 7 C 2 ABF 62 E 35 E 668076 BEAD 208 B 16 2 128 − 3 p = 11 ⋅ 6949 Perform calculation using a redundant representation ~ 128 − p = 11 ⋅ 6949 ⋅ p = 2 3 17

  18. Fast reduction ~ 2 128 p = − 3 = 11 ⋅ 6949 ⋅ p Use modulus x 2 128 x h l x × 3 3 ⋅ x x ′ x ′ h l + x ′ h × 3 ′ 3 ⋅ x h x ′ ′ + v l v v h ∈ { 0 , 1 } Overwhelming prob. v h = 0 256 256 R : Z/ 2 Z → Z/ 2 Z x   128 x → ( x mod 2 ) + 3 ⋅   2 128   ~ x = x ⋅ 2 128 + x ≡ x + 3 ⋅ x = R ( x ) mod p H L L H 18

  19. Fast Modular Multiplication Proposition For independent random 128-bit non-negative integers x and y there is overwhelming probability that ~ 0 ≤ R(R(x ⋅ y)) < p Counter-examples easy to construct: 128 + 0 ≤ R(R(x)) < 2 6 During the whole run not a single faulty reduction 19

  20. Distinguish Point Property Need to uniquely determine the partition number and DTP property during the r-adding walk. P = ( x , y ) ~ x : 0 ≤ x < p Partial Montgomery Reduction in order to reduce modulo p. ′ 2 -16 x = x ⋅ mod p Check least significant 24 bits of x in partial Montgomery representation. 20

  21. Modular Inversion -1 z ≡ x mod p Based on Extended Binary GCD algorithm: p 0 32 r = 2 A B A B 1 1 2 2 A × 1 ≡ B × x mod p 1 1 A × 1 ≡ B × x mod p 2 2 1 x p gcd ( A 1 , A ) Compute x 0 1 2 1 ⋅ Obtain from almost Montgomery inverse: − k B z = x 2 mod p 2 SIMD-operations: ← [A >> t , B << t , A >> t , B << t ] [A , B , A , B ] 1 1 1 2 2 2 2 1 1 1 2 2 [A , B , A , B ] ← [A − A , B − B , A , B ] 1 2 1 2 2 2 1 1 2 2 [A , B , A , B ] ← [A , B , A − A , B − B ] 1 1 2 2 1 1 2 1 2 1 Branches significantly reduced 21

  22. Modular Inversion 22

  23. Performance Results #cycles required by #operation per #cycles per Operation each operation iteration iteration Mod Mul 53 6 318 Mod Sub 5 6 30 Partial Mon 24 1 24 Red Mod Inv 4941 1/400 12 Misc. 69 1 69 Total 453 [ 1 SPU, 4-SIMD @3.2 GHZ ] Hence, our cluster of 214 PS3s computes: 9 33 9.1 ⋅ 10 ≈ 2 iterations per sec > 0.5M It works on curves in parallel 23

Recommend


More recommend