Two Fast Parallel GCD Algorithms of Many Integers Sidi Mohamed - PowerPoint PPT Presentation

Two Fast Parallel GCD Algorithms of Many Integers Sidi Mohamed SEDJELMACI Laboratoire d’Informatique Paris Nord, France . ISSAC 2017, Kaiserslautern, 24-28 July 2017 . 1

Motivations • GCD of two integers : Used in CAS as a low operation, cryptography, etc. - Sequential: O ( n log 2 n log log n ), Knuth (70)-Sch¨ onhage (71). - Parallel: O ǫ ( n/ log n ) time with O ( n 1+ ǫ ) processors, Chor-Goldreich (90), Sorenson (94) and Sedjelmaci (08). This problem is still open in parallel (P-complet or NC ?) • GCD of many integers : polynomial computations, matrix computations, HNF and SNF. - Sequential: Blan(63), Brad(70), Hav(98), Cop(99), etc. - Parallel: Not addressed ? 2

Name Year Worst-case O ( n 2 ) Euclid ∼ − 300 O ( n 2 ) Lehmer 1938 O ( n 2 ) Stein 1961 O (log 4 nM ( n )) Knuth 1970 Sch¨ onhage 1971 O (log nM ( n )) O ( n 2 ) Brent-Kung 1983 O ( n 2 ) Jebelean-Weber 1993 O ( n 2 / log n ) Sorenson 1994 Stehl´ e et al. 2004 O (log nM ( n )) M¨ ohler 2008 O (log nM ( n )) Table 1: Sequential GCD Algorithms for two integers. 3

Authors Time Nb. of proc. Model Brent-Kung, 1983 O ( n ) O ( n ) Systolic Purdy, 1983 O ( n ) O ( n ) Systolic O ( n log log n O ( n 2+ ǫ ) Kannan et al., 1987 log n ) PRAM-crcw O (log 2 n ) e O ( √ n log n ) Adleman et al., rand., 1988 PRAM-crcw O ( n 1+ ǫ ) Chor-Goldreich, 1990 O ( n / log n ) PRAM-crcw O ( n 1+ ǫ ) Sorenson, 1994 O ( n / log n ) PRAM-crcw O ( n 1+ ǫ ) Sedjelmaci, 2008 O ( n / log n ) PRAM-crcw O ( n log log n O ( n 6+ ǫ ) Sorenson, rand., 2010 log n ) PRAM-erew Table 2: Parallel GCD Algorithms for two integers. 4

Our results: • The GCD of n integers of O ( n ) bits can be achieved in O ( n/ log n ) time with O ( n 2+ ǫ ) processors in CRCW PRAM model in the worst case. • The GCD of m integers of O ( n ) bits can be achieved in O ( n/ log n ) time with O ( mn 1+ ǫ ) processors in CRCW PRAM model, with 2 ≤ m ≤ n 3 / 2 / log n . • We suggest an extended GCD version for many integers and a algorithm to solve linear Diophantine equations. • To our knowledge, it is the first time that we have this parallel performance for computating the GCD of many integers. 5

Notation : A is a vector of n (or m ) integers of O ( n ) bits : A = ( a 0 , a 1 , · · · a n − 1 ), with a i ≥ 0, n ≥ 4 • An integer parameter k satisfying log k = θ (log n ). • gcd( A ) = gcd( a 0 , a 1 , · · · a n − 1 ). • gcd(0 , 0) = 0. • We use the PRAM (Parallel Random Access Machine) model of computation and CRCW PRAM (Concurrent Read Concurrent Write) sub-model. 6

Main idea for designing fast parallel GCD algorithm for many integers: Find a small integer α Repeat a I := α ; a j := a j mod α ; (in parallel, ∀ j � = I ) Until almost all the integers a i are zeros. How to find a small α ? 7

Pigeonhole like techniques: Lemma 1 : Let A = { a 1 , a 2 , · · · , a n } be a set of n distinct positive integers, such that n ≥ 2 and a n /n < a 1 < a 2 < · · · < a n . Then a i +1 − a i < a n ∃ i ∈ { 1 , 2 , · · · , n − 1 } s . t . : n . A straightforward consequence is the following: Corollary 1 : Let A = { a 1 , a 2 , · · · , a n } be a set of n distinct positive integers, with n ≥ 2, then min { a k , | a i − a j | > 0 } ≤ max { a i } , where 1 ≤ k, i, j ≤ n . n We derive the following algorithm : 8

Input : A set A = { a 0 , a 1 , · · · , a n − 1 } of n integers of O ( n ) bits, n ≥ 4. Output : gcd( a 0 , a 1 , · · · , a n − 1 ). α := a 0 ; I := 0 ; p := n ; While ( α > 1) Do For ( i = 0) to ( n − 1) ParDo If (0 < a i ≤ 2 n /p ) Then { α := a i ; I := i ; } Endfor If ( α > 2 n /p ) Then /* Compute in parallel I, J and α */ α := min { | a i − a j | > 0 } = a I − a J ; a I := α ; Endif For ( i = 0) to ( n − 1) ParDo /* Reduce all the a i ’s */ If ( i � = I ) Then a i := a i mod α ; Endfor /* ∀ i , 0 ≤ a i ≤ α */ If ( ∀ i � = I , a i = 0 ) Then Return α ; p := np ; /* p is O (log n ) bits larger */ Endwhile Return α . The ∆ -GCD Algorithm (Poster, ISSAC 2013) 9

Example (∆-GCD): Let A = (912672 , 815430 , 721161 , 565701 , 662592). After 4 iterations, we obtain GCD(A) = 3. n = 20.  912672   34137   4443   72   0  815430 54033 717 93 0                     721161 58569 810 66 0                     565701 38580 3036 60 0 → → → →                     662592 18333 561 3 3                     α = 58569 4443 93 3 3           ( I, J ) = (2 , 4) (0 , 3) (1 , 2) (4 , − ) STOP 9-1

Drawbacks of the pigeonhole technique - The number of distinct integers is important. If there are only O (log n ) distinct integers in A , then the pigeonhole technique will reduce the bit size of the integers by O (log log n ) bits and the number of iterations in the main while loop will be O ( n/ log log n ). - What happens if α = 0 ? For example, if n = 8 and A = (255 , 255 , 193 , 161 , 129 , 97 , 65 , 65). There are only two pairs of integers that match in their 3 most significant bits, namely (255 , 255) and (65 , 65). Unfortunately, in both cases α = 0. - Comparing the O ( n 2 ) pairs of integers ( a i , a j ) to find a small α = a i − a j > 0 in constant parallel time needs O ( n 3 ) processors. 10

Solution: Use other techniques - Consider O ( √ n ) integers and compute their differences a i − a j to find α > 0. There are O ( n ) comparisons done in constant time with O ( n 2+ ǫ ) processors. - In case it fails, use a Lehmer-like reduction ( R ILE , ISSAC’2001). - In case all the R ILE give zero, then reduce transformation will right-shift all the zeros of A and we continue the process with this new A . 11

The Lehmer-like reduction : R ILE and Ext- R ILE . The R ILE and Ext- R ILE algorithms are described in Sed-ISSAC’01 and Sed-JDA’08. ILE stands for Improved Lehmer Euclid : (1) R ILE is defined by Input: u ≥ v ≥ 0 , k = 2 m ; m = θ (log n ). Output: R ILE ( u, v ) = | au + bv | < 2 v/k , with 1 ≤ | a | ≤ k . - Roughly speaking, R ILE ( u, v ) computes the continued fractions. (2) : Ext- R ILE is the extended version of R ILE i.e.: we add the ezout matrix M such that : ( 0 ≤ i, j ≤ ⌊√ n ⌋ ) B´ M × ( a i , a j ) T = ( R i , R j ) ; R j = R ILE . 0 ≤ R j < R i and gcd( R i , R j ) = gcd( a i , a j ). R j < (2 /k ) max { a i , a j } . 12

Example : Let u = 1 759 291 and v = 1 349 639. Their binary representations are respectively: 11010110 1100000111011 2 = 1 759 291 10100100 1100000000111 2 = 1 349 639 We have n = p = 21. For m = 3, we obtain λ = 2 m + 2 = 8, u 1 = 214 and v 1 = 164 (the leading bits of u 1 and v 1 are in bold). Using EEA with u 1 and v 1 , we obtain in turn q , r , b and a ( r = au + bv ) : 13

q r a b 214 1 0 164 0 1 1 50 1 − 1 3 14 − 3 4 3 8 10 − 13 In our example, we obtain a = − 3, b = 4, r = 14 < v 1 /k = 164 / 8 = 20 . 50 and R ILE = | − 3 u + 4 v | = 120 683 < v/ 8 = 168 704 . 88 14

Properties of R ILE and Ext- R ILE : • Parallel complexity : O ( n/ log n ) ǫ time with O ( n 1+ ǫ ) processors on CRCW PRAM (ISSAC’01). • It computes efficiently in parallel the B´ ezout coefficients with the same parallel performance (JDA’08). 15

High level description of ∆ - 2 GCD algorithm . - Test 1: Is there a small enough a i > 0 so that we can consider it straightforwardly as an α ? - Test 2: Does the pigeonhole algorithm provide an α > 0 ? - Test 3: Use a new transformation R based on continued fractions (Sed-ISSAC’01) and test if R > 0 ? If Test 3 fails, i.e.: R j ( a i , a j ) = 0 for all ( a i , a j ), with i, j ≤ √ n , then ( R i , R j ) = ( R i , 0) and ( a i , a j ) ← − (0 , R i ). A new transformation called reduce right-shifts all the zeroes in A . We reduce by half the number of O ( √ n ) positive integers considered (the other half of integers are all zeroes). Moreover, it could be iterated at most O ( √ n ) times since, at each step, we add O ( √ n ) new zeros in the vector A . 16

∆ - 2 GCD algorithm ,: Input : A vector A = ( a 0 , a 1 , · · · , a n − 1 ), n ≥ 4 and max { a i } < 2 n . Output : gcd( a 0 , a 1 , · · · , a n − 1 ). ( α, I ) := ( a 0 , 0) ; p := n ; N := ⌊√ n ⌋ ; While ( α > 1) Do For ( i = 0) to ( n − 1) ParDo If (0 < a i ≤ 2 n /p ) then { ( α, I ) := ( a i , i ) ; S := 1 } ; else S := 0 ; /* No small a i */ Endfor If ( S = 0) then ( α, I ) := pigeonhole ( A, N ) ; If ( I = − 1) then R := 0 ; /* The pigeonhole fails */ For ( i, j = 0) to ( N − 1) ParDo x ij := R ILE ( a i , a j ) ; If ( x ij > 0) then { ( α, I ) := ( x ij , i ) ; R := 1 ; a I := x ij } /* We can divide all the a i ’s by α = x ij */ Endif Endfor 17

If ( R = 0) /* ∀ i, j , R ILE ( a i , a j ) = 0 */ then A := reduce ( A, N ) ; Endif Endif If ( I ≥ 0) then A := remainder ( A, α, I ) ; If ( ∃ a k � = 0 s.t.: ∀ i � = k ⇒ a i = 0) then Return a k ; p := np ; /* p is O (log n ) bits larger */ Endwhile Return α . 18

Two Fast Parallel GCD Algorithms of Many Integers Sidi Mohamed - PowerPoint PPT Presentation

Two Fast Parallel GCD Algorithms of Many Integers Sidi Mohamed SEDJELMACI Laboratoire dInformatique Paris Nord, France . ISSAC 2017, Kaiserslautern, 24-28 July 2017 . 1 Motivations GCD of two integers : Used in CAS as a low operation,

Prime numbers (cryptography) 2 GCD Let d | a mean Example: 5 | 10, as 10 = 2 * 5 The

Midterm 2 Review. Midterm format Modular Arithmetic Inverses and GCD Midterm Topics: Notes 6-14.

CSE 311 Foundations of Computing I Fall 2014 Useful GCD Fact If a and b are

CS 4803 If a, N are integers with N > 0 then there are unique integers r , q such that a = Nq

Order book modeling and market making under uncertainty. Sidi Mohamed ALY Lund University

Fast Constant-Time GCD Computation and Modular Inversion Daniel J. Bernstein 1,2 Bo-Yin Yang 3 1

Who is the Lone Star GCD? Who is the Lone Star GCD? Created in 2001 by the 77 th Legislature

Prime numbers (cryptography) 2 Announcements Test next Tuesday Homework due Sunday 3 GCD

GCDs & linear gcd(a,b) is an integer combinations: linear combination of a and b . The

Compositional Dataflow Circuits Stephen A. Edwards Richard Townsend Martha A. Kim Columbia

Termination Analysis of Loops Zohar Manna with Aaron R. Bradley Computer Science Department

Greatest Common Divisor The Euclidean Algorithm Let a and b be two integers such that a > 0 and

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

+ Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms

Numb3rs 11 2 10 3 Lecture 5 9 4 Modular Arithmetic 8 5 7 6 Story So Far Quotient and

CS70: Today Euclids GCD algorithm. Multiplicative Inverse. (define (euclid x y) (if (= y 0)

Basic Algorithms in Number Theory Francesco Pappalardi #1 - Algorithmic Complexity & more.

tt tr rs r

Decoding random codes: asymptotics, benchmarks, challenges, and implementations D. J.

3. Examples Show Correctness, Recursion and Recurrences [References to literatur at the examples]

Computing Nearest Gcd with Certification G. Chze, A. Galligo, B. Mourrain, J.-C. Yakoubsohn

Practical Advances in Complex Root Clustering Collaborative and ongoing works R. Imbach 1 , V. Pan

Satisfiability Modulo Linear Arithmetic Combinatorial Problem Solving (CPS) Albert Oliveras

1 Further information: IFRS 17 paragraphs 1, C1 and C34 IFRS 17 Basis for Conclusions paragraphs

Two Fast Parallel GCD Algorithms of Many Integers Sidi Mohamed - PowerPoint PPT Presentation

Two Fast Parallel GCD Algorithms of Many Integers Sidi Mohamed SEDJELMACI Laboratoire dInformatique Paris Nord, France . ISSAC 2017, Kaiserslautern, 24-28 July 2017 . 1 Motivations GCD of two integers : Used in CAS as a low operation,

Prime numbers (cryptography) 2 GCD Let d | a mean Example: 5 | 10, as 10 = 2 * 5 The

Midterm 2 Review. Midterm format Modular Arithmetic Inverses and GCD Midterm Topics: Notes 6-14.

CSE 311 Foundations of Computing I Fall 2014 Useful GCD Fact If a and b are

CS 4803 If a, N are integers with N &gt; 0 then there are unique integers r , q such that a = Nq

Order book modeling and market making under uncertainty. Sidi Mohamed ALY Lund University

Fast Constant-Time GCD Computation and Modular Inversion Daniel J. Bernstein 1,2 Bo-Yin Yang 3 1

Who is the Lone Star GCD? Who is the Lone Star GCD? Created in 2001 by the 77 th Legislature

Prime numbers (cryptography) 2 Announcements Test next Tuesday Homework due Sunday 3 GCD

GCDs &amp; linear gcd(a,b) is an integer combinations: linear combination of a and b . The

Compositional Dataflow Circuits Stephen A. Edwards Richard Townsend Martha A. Kim Columbia

Termination Analysis of Loops Zohar Manna with Aaron R. Bradley Computer Science Department

Greatest Common Divisor The Euclidean Algorithm Let a and b be two integers such that a &gt; 0 and

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

+ Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms

Numb3rs 11 2 10 3 Lecture 5 9 4 Modular Arithmetic 8 5 7 6 Story So Far Quotient and

CS70: Today Euclids GCD algorithm. Multiplicative Inverse. (define (euclid x y) (if (= y 0)

Basic Algorithms in Number Theory Francesco Pappalardi #1 - Algorithmic Complexity &amp; more.

tt tr rs r

Decoding random codes: asymptotics, benchmarks, challenges, and implementations D. J.

3. Examples Show Correctness, Recursion and Recurrences [References to literatur at the examples]

Computing Nearest Gcd with Certification G. Chze, A. Galligo, B. Mourrain, J.-C. Yakoubsohn

Practical Advances in Complex Root Clustering Collaborative and ongoing works R. Imbach 1 , V. Pan

Satisfiability Modulo Linear Arithmetic Combinatorial Problem Solving (CPS) Albert Oliveras

1 Further information: IFRS 17 paragraphs 1, C1 and C34 IFRS 17 Basis for Conclusions paragraphs

CS 4803 If a, N are integers with N > 0 then there are unique integers r , q such that a = Nq

GCDs & linear gcd(a,b) is an integer combinations: linear combination of a and b . The

Greatest Common Divisor The Euclidean Algorithm Let a and b be two integers such that a > 0 and

Basic Algorithms in Number Theory Francesco Pappalardi #1 - Algorithmic Complexity & more.