Implementing RLWE-based Schemes Using an RSA Co-Processor Martin R. Albrecht 1 , Christian Hanser 2 , Andrea Hoeller 2 , oppelmann 3 , Fernando Virdia 1 , Andreas Wallner 2 Thomas P¨ 1 Information Security Group, Royal Holloway, University of London, UK 2 Infineon Technologies Austria AG 3 Infineon Technologies AG, Germany 23 January 2019 Lattice Coding & Crypto Meeting London
Prelude Deploying cryptography Rings on RSA co-processors Implementation Future directions Overview Prelude Post-quantum cryptography Deploying cryptography Deployment in general Lattice-based cryptography Ring arithmetic on RSA co-processors Kronecker substitution Splitting rings Karatsuba multiplication Implementation Future directions
Prelude Deploying cryptography Rings on RSA co-processors Implementation Future directions Prelude
Prelude Deploying cryptography Rings on RSA co-processors Implementation Future directions Post-quantum cryptography [Sho97] introduces a fast 1 order-finding quantum algorithm that allows factoring and computing discrete logs in Abelian groups. Since then, there has been a growing effort to develop new public-key encryption and signature algorithms that can resist cryptanalysis using large-scale general quantum computers. 1 Let’s not go there.
Prelude Deploying cryptography Rings on RSA co-processors Implementation Future directions Post-quantum cryptography [Sho97] introduces a fast 1 order-finding quantum algorithm that allows factoring and computing discrete logs in Abelian groups. Since then, there has been a growing effort to develop new public-key encryption and signature algorithms that can resist cryptanalysis using large-scale general quantum computers. In 2016, the US National Institute of Standards and Technology (NIST) started a several year long process to standardise post-quantum cryptographic schemes [Nat16]. Many of the proposed schemes are based on problems defined over polynomial rings, such as the RLWE problem. 1 Let’s not go there.
Prelude Deploying cryptography Rings on RSA co-processors Implementation Future directions Deploying cryptography
Prelude Deploying cryptography Rings on RSA co-processors Implementation Future directions Deployment in general In practice, cryptographic schemes have two crucial requirements 2 : high performance and ease of deployment. Optimised implementations are an active area of research. As part of the NIST process, designers often provided fast software implementations with a focus on modern CPU architectures. 2 Other than being secure in some appropriate model!
Prelude Deploying cryptography Rings on RSA co-processors Implementation Future directions Deployment in general In practice, cryptographic schemes have two crucial requirements 2 : high performance and ease of deployment. Optimised implementations are an active area of research. As part of the NIST process, designers often provided fast software implementations with a focus on modern CPU architectures. However, implementations of quantum-safe schemes are also required in constrained (often embedded) environments such as microcontrollers or smart cards. 2 Other than being secure in some appropriate model!
Prelude Deploying cryptography Rings on RSA co-processors Implementation Future directions Deployment in general For example, smart-cards provide low-power 16-bit and 32-bit CPU and small amounts of RAM. 3 And DES!
Prelude Deploying cryptography Rings on RSA co-processors Implementation Future directions Deployment in general For example, smart-cards provide low-power 16-bit and 32-bit CPU and small amounts of RAM. These are augmented with specific co-processors enabling them to run Diffie-Hellman key exchange (over finite fields and elliptic curves) and RSA encryption and signatures. For example, the SLE 78CLUFX5000 Infineon chip card provides: 16-bit CPU @ 50 MHz, 16 Kbyte RAM, 500 Kbyte NVM, AES and SHA256 co-processors 3 , Z N adder and multiplier for log 2 N = 2200 (“the RSA co-processor”). 3 And DES!
Prelude Deploying cryptography Rings on RSA co-processors Implementation Future directions Deployment in general For example, smart-cards provide low-power 16-bit and 32-bit CPU and small amounts of RAM. These are augmented with specific co-processors enabling them to run Diffie-Hellman key exchange (over finite fields and elliptic curves) and RSA encryption and signatures. For example, the SLE 78CLUFX5000 Infineon chip card provides: 16-bit CPU @ 50 MHz, 16 Kbyte RAM, 500 Kbyte NVM, AES and SHA256 co-processors 3 , Z N adder and multiplier for log 2 N = 2200 (“the RSA co-processor”). In the smart-card context, what would be required to run lattice-based cryptography? 3 And DES!
Prelude Deploying cryptography Rings on RSA co-processors Implementation Future directions Lattice-based cryptography Definition (LWE) For q , n , m ∈ Z + with m = O ( n ), χ s , χ e probability distributions over Z q , Decision-LWE : distinguish ( A , � b ) from uniform s from ( A , � Search-LWE : recover � b )
Prelude Deploying cryptography Rings on RSA co-processors Implementation Future directions Lattice-based cryptography Definition (MLWE as used in Kyber) Let R = Z [ x ] / ( x n + 1) where n is a power of 2, let R q = R / ( q ) for some q ∈ Z + . Let R k q be a ring module of dimension k over R q . Let χ be a probability distribution over Z q . Decision-MLWE : distinguish ( A , � b ) from uniform s from ( A , � Search-MLWE : recover � b ) Note : every row � b i = � j A i , j · � s j + � e i
Prelude Deploying cryptography Rings on RSA co-processors Implementation Future directions Lattice-based cryptography Definition (Kyber CPA PKE component) Simplified Kyber.CPA.Enc Simplified Kyber.CPA.Gen Input: pk CPA = ( � t , A ) $ ← R k × k 1 A q Input: m ∈ M χ − R k q × R k 2 ( � s , � e ) ← q 1 � t ← Decompress q ( � t ) 3 � t ← Compress q ( A � s + � e ) − R k χ q × R k 2 ( � e 1 , e 2 ) ← q × R q r , � 4 return u ← Compress q ( A T � 3 � r + � e 1 ) pk CPA := ( � t , A ) , sk CPA := � s + e 2 + ⌈ q � � � 4 v ← Compress q ( t ,� r 2 ⌋ · m ) 5 return c := ( � u , v ) Simplified Kyber.CPA.Dec The CCA-secure Kyber768 KEM is Input: sk CPA = � s obtained by setting n = 256, k = 3, Input: c = ( � u , v ) q = 7681 and using a FO-like transform. 1 � u ← Decompress q ( � u ) 2 v ← Decompress q ( v ) 3 return Compress q ( v − � � s , � u � )
Prelude Deploying cryptography Rings on RSA co-processors Implementation Future directions Lattice-based cryptography The most expensive operation is computing MULADD ( a , b , c ): a ( x ) · b ( x ) + c ( x ) mod ( q , f ( x )) . To reduce its cost, the · is computed using the Number Theoretic Transform (NTT).
Prelude Deploying cryptography Rings on RSA co-processors Implementation Future directions Lattice-based cryptography The most expensive operation is computing MULADD ( a , b , c ): a ( x ) · b ( x ) + c ( x ) mod ( q , f ( x )) . To reduce its cost, the · is computed using the Number Theoretic Transform (NTT). In the embedded hardware setting, multiple designs for “RLWE co-processors” have been proposed 4 . Yet, new hardware design means having to implement, test, certify, and deploy! 4 E.g. [GFS + 12] [PG12] [APS13] [PG14a] [PG14b] [PDG14] [RVM + 14] [CMV + 15] [POG15] [RRVV15] [LPO + 17]
Prelude Deploying cryptography Rings on RSA co-processors Implementation Future directions Ring arithmetic on RSA co-processors
Prelude Deploying cryptography Rings on RSA co-processors Implementation Future directions Our approach: we construct a flexible MULADD gadget by reusing the RSA co-processor on current smart-cards. We demonstrate it by implementing a variant of Kyber with competitive performance on the SLE 78 platform.
Prelude Deploying cryptography Rings on RSA co-processors Implementation Future directions Our approach: we construct a flexible MULADD gadget by reusing the RSA co-processor on current smart-cards. We demonstrate it by implementing a variant of Kyber with competitive performance on the SLE 78 platform.
Prelude Deploying cryptography Rings on RSA co-processors Implementation Future directions Kronecker substitution Kronecker substitution Kronecker substitution is a classical technique in computational algebra for reducing polynomial arithmetic to large integer arithmetic [VZGG13, p. 245][Har09].
Prelude Deploying cryptography Rings on RSA co-processors Implementation Future directions Kronecker substitution Kronecker substitution Kronecker substitution is a classical technique in computational algebra for reducing polynomial arithmetic to large integer arithmetic [VZGG13, p. 245][Har09]. The fundamental idea behind this technique is that univariate polynomial and integer arithmetic are identical except for carry propagation in the latter. a = x + 2 A = a (100) = 100 + 2 b = 3 x + 4 B = b (100) = 3 · 100 + 4 a · b = 3 x 2 + 10 x + 8 A · B = 102 · 304 = 31008 = 3 · 100 2 + 10 · 100 + 8
Recommend
More recommend