RNS Arithmetic for Linear Algebra of Discrete Logarithm Computations Using Parallel Architectures Hamza Jeljeli CARAMEL project-team, LORIA, INRIA / CNRS / Universit´ e de Lorraine, Hamza.Jeljeli@loria.fr RAIM 2015, Rennes, April 8 th , 2015 /* EPI CARAMEL */ C,A, /* Cryptologie, Arithmétique : */ R,a, /* Matériel et Logiciel */ M,E, L,i= 5,e, d[5],Q[999 ]={0};main(N ){for (;i--;e=scanf("%" "d",d+i));for(A =*d; ++i<A ;++Q[ i*i% A],R= i[Q]? R:i); for(;i --;) for(M =A;M --;N +=!M*Q [E%A ],e+= Q[(A +E*E- R*L* L%A) %A]) for( E=i,L=M,a=4;a;C= i*E+R*M*L,L=(M*E +i*L) %A,E=C%A+a --[d]);printf ("%d" "\n", (e+N* N)/2 /* cc caramel.c; echo f3 f2 f1 f0 p | ./a.out */ -A);}
Discrete Logarithm Problem (DLP) Discrete Logarithm Given a cyclic group G = � g � written multiplicatively, the discrete logarithm of h ∈ G is the unique k in [0 , # G − 1] s.t. h = g k . In some groups, DLP is computationally hard. The inverse problem (discrete exponentiation) is easy. Security of cryptographic primitives relies on difficulty of DLP: key agreement: Diffie–Hellman key exchange, encryption: ElGamal encryption, signature: DSA signature, pairing-based cryptography, . . . Evaluate security level of primitives = ⇒ DLP attacks. 1
Linear Algebra Issued from DLP Attacks Focus on DLP in multiplicative subgroups of finite fields GF( q ) . To attack DLP in finite fields, index-calculus methods: solve DLP in time sub-exponential or quasi-polynomial in the size of the finite field; require solving large sparse systems of linear equations over finite fields. Linear Algebra Problem Inputs: a prime ℓ that divides q − 1 and a matrix A . Output: a non-trivial vector w s.t. Aw mod ℓ = 0 . Linear Algebra for Factorization Linear Algebra for DLP Arithmetic over GF(2). Arithmetic over GF( ℓ ). 10% of overall time. 50% of overall time. Bottleneck for computation. 2
Characteristics of the Inputs ℓ between 100 and 1000 bits. A is an N -by- N matrix, N ranges from 10 5 to 10 8 . A is sparse, each row of A contains ∼ 100 of non-zero coefficients. The very first columns are relatively dense, then the column density decreases gradually. The row density does not change significantly. Non-zero coefficients in GF( ℓ ). Example: Resolution of DLP in GF( 2 619 ) × Size of ℓ 217 bits Size of matrix ( N ) 650k Average row weight 100 3
Linear Algebra Harder linear algebra = ⇒ heavy computations, exploit parallelism: Algorithmic level: Sparse linear algebra algorithms 1 Wiedemann : Sequence of O ( N ) iterative Sparse-Matrix–Vector products (SpMV) t xy, t xAy, t xA 2 y, . . . , t xA 2 N y Euro-Par 2014 Block Wiedemann : Distribute in many parallel sequences SpMV level: 2 parallelize SpMV over many nodes. Per-node level: 3 Hardware: GPU, multi-core CPU, many-core, . . . ? WAIFI 2014 Format for sparse matrix? How to map partial SpMV on the architecture? Arithmetic level: arithmetic over GF( ℓ ). 4 Representation: Residue Number System (RNS), Multi-precision? Accelerate arithmetic over SIMD architectures. 4
Table of Contents SpMV: v ← Au mod ℓ 1 RNS for SpMV over Parallel Architectures 2 Experimental Results 3 4
Nature of the Coefficients of A FFS-like matrices NFS-like matrices Composed of 2 parts: A is sparse. A 0 : a sparse N -by- ( N − r ) All coefficients are “small” sub-matrix containing “small” ( | . | ∈ [0 , 2 10 ] ). coefficients (majority of ± 1 ). ∼ 90% are ± 1 . A 1 : a dense N -by- r sub-matrix composed of “large” ( ∈ [0 , ℓ ] ) coefficients. r is between 0 and 10. 5
Required Operations for SpMV SpMV level: v ← Au mod ℓ Row i level: NFS-like matrices FFS-like matrices N N − r N � � � v i ← a ij u j mod ℓ v i ← a ij u j + a ij u j mod ℓ j =1 j =1 j = N − r +1 v i ← v i ± u j , ( a ij = ± 1 ) v i ← v i ± u j , ( a ij = ± 1 ) frequent frequent v i ← v i + a ij × u j , ( | a ij | < 2 10 ) v i ← v i + a ij × u j , ( | a ij | < 2 10 ) less frequent less frequent v i ← v i + a ij × u j , ( 0 ≤ a ij < ℓ ) less frequent v i ← v i mod ℓ (lazy reduction) v i ← v i mod ℓ (lazy reduction) not frequent not frequent 6
Table of Contents SpMV: v ← Au mod ℓ 1 RNS for SpMV over Parallel Architectures 2 Experimental Results 3 6
A Brief Reminder on Residue Number System (RNS) RNS basis: set of n co-prime integers ( p 1 , . . . , p n ) , P = � n i =1 p i . RNS representation of x ∈ [0 , P − 1] : � x = ( | x | p 1 , . . . , | x | p n ) . Usual operations in RNS: − − − → Addition: x + y = ( | x 1 + y 1 | p 1 , . . . , | x n + y n | p n ) Multiplication by scalar λ < p i : − − − → x × λ = ( | x 1 × λ | p 1 , . . . , | x n × λ | p n ) − − − → Multiplication: x × y = ( | x 1 × y 1 | p 1 , . . . , | x n × y n | p n ) ! ). Operations are mod P (final result should not exceed P △ ⇒ Fully independent parallel computations on the components. Comparison, Division in RNS are more tricky. p i chosen of pseudo-Mersenne form 2 k − c i to speed up | . | p i : 2 k a power of a machine word : 2 32 , 2 64 , . . . c i small compared to 2 k . 7
RNS Addition and Multiplication - Algorithms x + y needs that 2 × ( ℓ − 1) < P − 1 : Input : � x , � y : RNS representations of x , y ∈ Z /ℓ Z . Output : � z : RNS representation of z = x + y for each component i do z i ← | x i + y i | p i x + λ × y , with λ < 2 10 , needs that 2 10 × ( ℓ − 1) < P − 1 : y : RNS representations of x , y ∈ Z /ℓ Z and λ ∈ [2 , 2 10 [ . Input : � x , � Output : � z : RNS representation of z = x + λ × y for each component i do z i ← | x i + λ × y i | p i x + λ × y , with λ < ℓ , needs that ℓ × ( ℓ − 1) < P − 1 : y , � Input : � x , � λ : RNS representations of x , y , λ ∈ Z /ℓ Z Output : � z : RNS representation of z = x + λ × y for each component i do z i ← | x i + λ i × y i | p i 8
RNS Reduction Modulo ℓ [Bernstein 94] Problem: We have an x in RNS, x mod ℓ ? Chinese Remainder Theorem (CRT) reconstruction: x = � n i =1 γ i · P i mod P , where P i = P � − 1 | p i � p i , γ i � � x i · | P i � p i � n � �� n � i =1 γ i P i γ i x = � n � i =1 γ i P i − αP , where α � = P p i i =1 n � If α known ⇒ z � γ i | P i | ℓ − | αP | ℓ i =1 ≡ x (mod ℓ ) z n z satisfies � ∈ [0 , ℓ p i [ z i =1 ⇒ Full RNS computation of z . ⇒ z is not exact reduction of x . However, approximate reduction guarantees that intermediate results of SpMV computation do not exceed a bound that we impose less than P . 9
RNS Approximate Reduction Modulo ℓ - Algorithm � − 1 � Pre calculation : Vector ( � P i p i ) for i ∈ { 1 , . . . , n } � Table of RNS representations − − → | P j | ℓ for j ∈ { 1 , . . . , n } Table of RNS representations − − − → | αP | ℓ for α ∈ { 1 , . . . , n − 1 } Input : � x : RNS representation of x , with 0 ≤ x < P n � Output : � z : RNS representation of z ≡ x (mod ℓ ) , with z < ℓ p i i =1 for each component i do � � � − 1 � γ i ← � x i × � P i /* 1 RNS product */ � � � p i � p i broadcast γ i compute α /* addition of n s -bit terms */ for each component i do � � n � � � � � z i ← γ j × � | P j | ℓ /* ( n − 1) RNS additions & n RNS products */ � � � p i � � � j =1 � p i � � � � z i ← � z i − � | αP | ℓ /* 1 RNS subtraction */ � � � p i � p i 10
Required Operations for SpMV in RNS SpMV level: v ← Au mod ℓ Row i level: NFS-like matrices FFS-like matrices N N − r N � � � v i ← a ij u j mod ℓ v i ← a ij u j + a ij u j mod ℓ j =1 j =1 j = N − r +1 v i ← v i ± u j , ( a ij = ± 1 ) v i ← v i ± u j , ( a ij = ± 1 ) frequent and easy frequent and easy v i ← v i + a ij × u j , ( | a ij | < 2 10 ) v i ← v i + a ij × u j , ( | a ij | < 2 10 ) less frequent and easy less frequent and easy v i ← v i + a ij × u j , ( 0 ≤ a ij < ℓ ) less frequent and easy but binding v i ← v i mod ℓ (lazy reduction) v i ← v i mod ℓ (lazy reduction) not frequent and hard not frequent and hard 11
How long is the RNS Basis? FFS-like matrices: Take a basis B ( n, k ) that handles the product by A . 1 Let s be the maximal norm of the rows of A : sℓ � n i =1 p i < P (Recall that Wiedemann is iterative). NFS-like matrices: Take a minimal-length basis B ( n, k ) when multiplying by A 0 1 Extend to a larger basis B|| ˆ B ( n + ˆ n, k ) when multiplying by A 1 2 � sℓ ( � n i =1 p i + � ˆ n i =1 ˆ p i ) < P ( product by A 0 ) P ˆ rℓ × sℓ ( � n i =1 p i + � ˆ n i =1 ˆ p i ) < P ( product by A 1 ) . Basis extension: approach similar to reduction modulo ℓ For each modulus ˆ p j of the new basis: � n � � � � x j = | x | ˆ ˆ p j = γ i | P i | ˆ p j − | αP | ˆ . � � p j � � � � i =1 p j ˆ 12
Recommend
More recommend