CHES 2020 Highly Efficient Architecture of NewHope-NIST on FPGA using Low-Complexity NTT/INTT Neng Zhang , Bohan Yang, Chen Chen, Shouyi Yin, Shaojun Wei and Leibo Liu Institute of Microelectronics, Tsinghua University, Beijing, China Institute of Microelectronics, Tsinghua University.
Outline 1. Introduction 2. Low-Complexity NTT/INTT 3. Hardware Architecture 4. Implementation Results Institute of Microelectronics, Tsinghua University. 2
1 Introduction NewHope: a PQC algorithm for key encapsulation mechanism (KEM) NewHope-USENIX NewHope-Simple NewHope-NIST A candidate in the 2 nd round of NIST PQC standardization process, but not in the 3 rd round Low-complexity NTT/INTT can be utilized by other algorithms. Crystals- qTesla Falcon LTV BFV PQC FHE Dilithium Institute of Microelectronics, Tsinghua University. 3
1 Introduction Main mathematical objects of NewHope polynomials over the ring ℝ 𝒓 = 𝕬 𝒓 𝒚 / 𝒚 𝑶 + 𝟐 q 12289 𝝏 𝑶 Primitive N-th root of unit over 𝑎 𝑟 𝜹 𝟑𝑶 Square root of 𝜕 𝑂 N 1024 or 512 Encryption-based KEM Key Generation 2 NTTs Encryption 2 NTTs, 1 INTT Decryption 1 INTT Institute of Microelectronics, Tsinghua University. 4
1 Introduction Multiplication over the ring Z q [x]/f(x) ➢ f(x) is arbitrary ➢ Convolution theory ➢ q≡1 ( 𝑛𝑝𝑒 𝑂 ) ➢ f(x) = x N +1 ➢ Negative Wrapped Convolution (NWC) ➢ q≡1 ( 𝑛𝑝𝑒 2 𝑂 ) Institute of Microelectronics, Tsinghua University. 5
1 Introduction Why do we need low-complexity ? area speed Low-complexity Low area High speed Institute of Microelectronics, Tsinghua University. 6
2.1 Low-Complexity NTT Number of modular multiplications of NTT Cost of the pre-processing is considerable ( N/ 2) log N + N pre-processing FFT Low-Complexity NTT ➢ A low-complexity NTT with twiddle factors computed on-the-fly [1]. ➢ Merge the pre-processing into the DIT FFT with twiddle factors pre-computed. [1] S. Roy, et al., Compact ring-lwe cryptoprocessor. CHES 2014 Institute of Microelectronics, Tsinghua University. 7
2.1 Low-Complexity NTT Derivation of the low-complexity NTT ➢ Inspired by the strategy of the Cooley-Turkey FFT ➢ Follow the divide-and-conquer method of FFT that divides in time domain (DIT) ➢ First, the pre-processing and the FFT are written together as a summation of N items ➢ Second, the summation is split into two groups according to parity of the index of a Institute of Microelectronics, Tsinghua University. 8
2.1 Low-Complexity NTT Derivation of the low-complexity NTT ➢ Third, the equation is grouped into two parts according to the size of index i. (0) and ො (1) are N/2-point NTTs 𝑏 𝑗 ො 𝑏 𝑗 of 𝑏 2𝑘 and 𝑏 2𝑘+1 ➢ In this way, N-point NTT can be resolved with two N/2-point NTTs N/4-point NTT N/2-point NTT 2-point NTT N/4-point NTT … N-point NTT … N/4-point NTT N/2-point NTT 2-point NTT N/4-point NTT Institute of Microelectronics, Tsinghua University. 9
2.1 Low-Complexity NTT Butterfly of low-complexity NTT Dataflow of a 8-point low-complexity NTT Institute of Microelectronics, Tsinghua University. 10
2.1 Low-Complexity NTT In classic FFT: Computational complexity: 𝑘𝑂/𝑛 𝜕 = 𝜕 𝑂 ( N/ 2) log N + N → ( N/ 2) log N No additional timing cost; No additional hardware resources cost Institute of Microelectronics, Tsinghua University. 11
2.2 Low-Complexity INTT Cost of the post-processing is greater than pre-processing Number of modular multiplications of NTT and INTT ( N/ 2) log N + 2 N post-processing FFT Low-Complexity INTT −𝑗 into the FFT. ➢ [1] merges the scaling of 𝜇 2𝑂 ➢ Further merge the scaling of N −1 into the FFT [1] T. Pöppelmann, et al., High-performance ideal lattice-based cryptography on 8-bit atxmega microcontrollers. LATINCRYPT 2015 Institute of Microelectronics, Tsinghua University. 12
2.2 Low-Complexity INTT Derivation of the low-complexity INTT ➢ Inspired by the strategy of the Gentleman-Sande FFT ➢ Follow the divide-and-conquer method of FFT that divides in frequency domain (DIF) ➢ First, the post-processing and the FFT are written together as a summation of N items ➢ Second, the summation is split into two groups according to the size of index of ො 𝑏 Institute of Microelectronics, Tsinghua University. 13
2.2 Low-Complexity INTT Derivation of the low-complexity INTT ➢ Third, the equation is grouped into two parts according to the parity of i. 𝑏 2𝑗 and 𝑏 2𝑗+1 correspond to N/2- (0) and (1) point INTT of 𝑐 𝑗 𝑐 𝑗 ➢ In this way, N-point INTT can be resolved with two N/2-point INTTs N/4-point INTT N/2-point INTT 2-point INTT N/4-point INTT … N-point INTT … N/4-point INTT N/2-point INTT 2-point INTT N/4-point INTT Institute of Microelectronics, Tsinghua University. 14
2.2 Low-Complexity INTT Butterfly of low-complexity INTT Dataflow of a 8-point low-complexity INTT Institute of Microelectronics, Tsinghua University. 15
2.2 Low-Complexity INTT In classic FFT: Computational complexity: −𝑘𝑂/𝑛 𝜕 = 𝜕 𝑂 ( N/ 2) log N + 2 N → ( N/ 2) log N 𝑣 + 𝑢 𝑣 − 𝑢 No additional timing cost; slightly modify the butterfly unit Institute of Microelectronics, Tsinghua University. 16
3 The Hardware Architecture The architecture of NTT/INTT Multi-bank memory ➢ Address generator [1] : ➢ Log N: Even √ Odd ╳ ➢ The execution order of the last s-loop is rearranged as : [1] W. Wang, et al., VLSI design of a large number multiplier for fully homomorphic encryption. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 22(9):1879 – 1887, Sept 2014. Institute of Microelectronics, Tsinghua University. 17
3 The Hardware Architecture Compact Butterfly Unit Institute of Microelectronics, Tsinghua University. 18
3 The Hardware Architecture Low-Complexity Modular Multiplication No additional multiplication; Time-constant Institute of Microelectronics, Tsinghua University. 19
3 The Hardware Architecture The architecture of NewHope-NIST ➢ Support: key generation, encryption and decryption ➢ Doubled bandwidth matching ➢ RAM (R0, R1): two data in an address Institute of Microelectronics, Tsinghua University. 20
3 The Hardware Architecture Timing hiding ➢ Resource conflict ➢ data dependency A RAM may be read and write by operations in the same line. Institute of Microelectronics, Tsinghua University. 21
4 Implementation Results Implementation platform ➢ Xilinx Artix-7 FPGA ➢ Vivado 2019.1.1 Implementation Results of NTT/INTT 120 250 70 3000 350 60 300 100 2500 200 50 250 80 2000 150 40 200 Ours 60 1500 [FS19] 30 150 100 [KLC+7] 40 1000 20 100 [JGCS19] 50 20 500 [FSM+19] 10 50 [BUC19b] 0 0 0 0 0 Time ATP ATP ATP ATP (us) (LUT x ms) (FF x ms) (DSP x us) (BRAM x us) Institute of Microelectronics, Tsinghua University. 22
4 Implementation Results Implementation Results of NewHope-NIST [FSM+19] 90000 Ours [JGCS19-1] [JGCS19-2] [buc19b] 3500 Time 80000 3000 (us) 2500 70000 2000 1500 60000 1000 500 50000 0 KeyGen+Decrypt Encrypt 40000 16000 30 14000 25 30000 12000 20 10000 20000 8000 15 6000 10 10000 4000 5 2000 0 0 0 ATP ATP ATP ATP (LUT x ms) (FF x ms) (DSP x us) (BRAM x us) LUTs FFs DSPs BRAMs Institute of Microelectronics, Tsinghua University. 23
Conclusion Low-complexity NTT/INTT ➢ NTT: no pre-processing ➢ INTT: no post-processing A highly efficient architecture of NewHope-NIST ➢ A clear advantage in both speed and ATP Low-complexity NTT/INTT can benefit other NTT-inside algorithms Institute of Microelectronics, Tsinghua University. 24
Thanks ! Institute of Microelectronics, Tsinghua University. 25
Recommend
More recommend