factoring large numbers factoring large numbers with the
play

Factoring Large Numbers Factoring Large Numbers with the TWIRL - PowerPoint PPT Presentation

Factoring Large Numbers Factoring Large Numbers with the TWIRL Device with the TWIRL Device Adi Shamir, Eran Tromer Adi Shamir, Eran Tromer Bicycle chain sieve [D. H. Lehmer, 1928] Bicycle chain sieve [D. H. Lehmer, 1928]


  1. � Factoring Large Numbers Factoring Large Numbers with the TWIRL Device with the TWIRL Device Adi Shamir, Eran Tromer Adi Shamir, Eran Tromer

  2. � Bicycle chain sieve [D. H. Lehmer, 1928] Bicycle chain sieve [D. H. Lehmer, 1928]

  3. � The Number Field Sieve Integer Factorization Algorithm • Best algorithm known for factoring large integers. • Subexponential time, subexponential space. • Successfully factored a 512-bit RSA key in 1999 (hundreds of workstations running for many months). • Record: 530-bit integer factored in 2003.

  4. � NFS: Main steps Relation collection Matrix step: (sieving) step: Find many numbers Find a linear satisfying a certain (rare) dependency among the property. numbers found.

  5. � NFS: Main steps Relation collection Matrix step: (sieving) step: Find many numbers Find a linear satisfying a certain (rare) dependency among the property. numbers found. Cost dramatically reduced by [Bernstein 2001] This work followed by [LSTT 2002] and [GS 2003].

  6. � Cost of sieving for RSA-1024 in 1 year • Traditional PC-based: [Silverman 2000] 100M PCs with 170GB RAM each: $5 × 10 12 • TWINKLE: [Lenstra,Shamir 2000][Silverman 2000] * 3.5M TWINKLEs and 14M PCs: ~ $10 11 • Mesh-based sieving [Geiselmann,Steinwandt 2002] * Millions of devices, $10 11 to $10 10 (if at all?) Multi-wafer design – feasible? • Our design: $10M using standard silicon technology (0.13um, 1GHz).

  7. � The Sieving Problem Input: a set of arithmetic progressions. Each progression has a prime interval p and value log p . Output: indices where the sum of values exceeds a threshold. O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O O

  8. � 1024-bit NFS sieving parameters • Total number of indices to test: 3 × 10 23 . • Each index should be tested against all primes up to 3.5 × 10 9 .

  9. ✁ � Three ways to sieve your numbers... 41 O primes 37 O 31 29 O 23 O 19 O 17 O O 13 O O O 11 7 O O O O O O O O 5 O O O O O O O O O 3 O O O O O O O O O O O O 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 indices ( values)

  10. � PC-based sieving, à la Eratosthenes One contribution per clock cycle. 41 O 37 O 31 29 O 23 Time O 19 276–194 BC O 17 O O 13 O O O 11 7 O O O O O O O O 5 O O O O O O O O O 3 O O O O O O O O O O O O 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Memory

  11. � � TWINKLE: time-space reversal One index handled at each clock cycle. T he W eizmann In stitute K ey L ocating E ngine 41 O 37 [Shamir 99] O 31 29 O 23 O 19 Counters O 17 O O 13 O O O 11 7 O O O O O O O O 5 O O O O O O O O O 3 O O O O O O O O O O O O 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Time

  12. � � TWIRL: compressed time s =5 indices handled at each clock cycle. (real: s =32768 ) T he W eizmann I nstitute R elation L ocator 41 O 37 O 31 Various circuits 29 O 23 O 19 O 17 O O 13 O O O 11 7 O O O O O O O O 5 O O O O O O O O O 3 O O O O O O O O O O O O 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Time

  13. � ☎ ✂ � ✆ � ✂ 3 Parallelization in TWIRL 2 1 TWINKLE-like pipeline 0 … ✁✄✂ a

  14. � ✆ � ✆ ✂ ☎ � ✁ ✂ ✂ ✂ ✆ � ✂ ✂ Parallelization in TWIRL TWINKLE-like TWIRL with parallelization factor s Simple parallelization with factor s pipeline ✁✄✂ ✁✄✂ … … … a a s s s s ✁✄✂ a

  15. � � Example (simplified): handling large primes • Each prime makes a contribution once per 10,000’s of clock cycles (after time compression); inbetween, it’s merely stored compactly in DRAM. • Each memory+processor unit handles 10,000’s of progressions. It computes and sends contributions across the bus, where they are added at just the right time. Timing is critical. Processor Memory Processor Memory

  16. � � Handling large primes (cont.) Processor Memory

  17. � ✁ ✁ ✁ ✁ ✁ ✁ ✁ ✁ � ✁ ✁ ✁ ✁ ✁ ✁ ✁ Implementing a priority queue of events • The memory contains a list of events of the form ( p , a ), meaning “ a progression with interval p will make a contribution to index a ”. Goal: implement a priority queue. • The list is ordered by increasing a . • At each clock cycle: 1 . Read next event ( p , a ). 2. Send a log p contribution to line a ( mod s ) of the pipeline. 3. Update a + p ← a 4. Save the new event ( p , a ) to the memory location that will be read just before index a passes through the pipeline. • To handle collisions, slacks and logic are added.

  18. � � Handling large primes (cont.) • The memory used by past events can be reused. • Think of the processor as rotating around the cyclic memory: r o s s e c o r P

  19. � � Handling large primes (cont.) • The memory used by past events can be reused. • Think of the processor as rotating around the cyclic memory: r o s s e c o r P • By assigning similarly-sized primes to the same processor (+ appropriate choice of parameters), we guarantee that new events are always written just behind the read head. • There is a tiny (1:1000) window of activity which is “twirling” around the memory bank. It is handled by an SRAM-based cache. The bulk of storage is handled in compact DRAM.

  20. �✁ Rational vs. algebraic sieves • In fact, we need to perform two rational sieves: rational (expensive) and algebraic (even more expensive). • We are interested only in indices algebraic which pass both sieves. • We can use the results of the rational sieve to greatly reduce the cost of the algebraic sieve.

  21. � � Notes • TWIRL is a hypothetical and untested design. • It uses a highly fault-tolerant wafer-scale design. • The following analysis is based on approximations and simulations.

  22. � � TWIRL for 512-bit composites One silicon wafer full of TWIRL devices (total cost ~$15,000) can complete the sieving in under 10 minutes. This is 1,600 times faster than the best previous design.

  23. � TWIRL for 1024-bit composites R • Operates in clusters of 3 R R R R almost independent wafers. R R R • Initial investment (NRE): A ~$20M • To complete the sieving in 1 year • Use 194 clusters (~600 wafers). • Silicon cost: ~$2.9M • Total cost: ~$10M (compared to ~$1T).

  24. � ✁ .

Recommend


More recommend