SW/HW Codesign of the Post-Quantum Cryptography Algorithm NTRUEncrypt Using HLS and RTL Design Methodologies Farnoud Farahmand, Duc Tri Nguyen, Viet B. Dang*, Ahmed Ferozpuri and Kris Gaj George Mason University
Post st-Quantum Quantum Crypt ptograph graphy y (PQC QC) Ongoi oing ng NIST PQC standa andardiz dizat ation ion proc oces ess Total 69 submis missions sions in Round 1 and 26 submis missions ions qualified to Round 2 Challen enges ges Math athema ematic tical al co comple plexity xity Large amount of ma man-power er New types of basic sic operations erations Constant stant-time time implementations Need for new w SCA (Side de-Ch Chann nnel el Attac ack) k) co countermea ermeasures ures against power and electromagnetic analysis 2
Risk sks s of Ea Early ly Hardw dware are Impl plementations ementations GMU implemen lementat tation n of DAGS S develope oped d in Fall l 2017-Spring pring 2018 18. Prelim elimin inar ary results sults present esented ed at the Code-Based Based Crypt yptograph ography y (CBC BC) works rkshop op in April il 2018. 18. Attac tack k against inst DAGS S announce unced d on May 16, 2018. 18. DAGS S not qualif ifie ied d to Round d 2 3
Softw tware/Har are/Hardw dware are Codesign esign Softw tware are RTL or HLS RTL LS-generat generated ed Hardw dware are Most t time me-crit critic ical al operat ration ion 4
SW/HW Codesign esign for PQC QC: : Advantages antages Focus us on a few (typical pically y 1-3) ) major or operati tions ns, , known wn to be e easily ly paral alle leliz izab able le muc uch h shor orter er developme elopment nt time me (at least t by a fa factor or of 10) gua uarant nteed ed sub ubsta tantial ntial speed ed-up up Insight ight regardin ding performanc rmance of future ure instru ruct ctio ion n set et extensio sions ns of modern n micropr process cessor ors Possibili bility ty of impleme lement nting ing multipl iple e candid didat ates s by the same research ch group, , eliminating inating the influence uence of different rent design gn skills ls operatio tion n subset et (e.g., ., includin uding or excluding ding key generatio ation) n) interfac ace & prot otocol col optimi mizatio ation n target et platform rm 5
Two Major jor Types pes of Platf atforms orms FPGA A Fabric ric FPGA A Fabric ric, , including uding & Hard-core ore Proces essor sors Soft-core ore Processor cessors Soft-core Processor Processor w/ Memory & I/O FPGA FPGA Fabric Fabric Examples: Examples: • Xilinx Zynq 7000 System on Chip (SoC) Xilinx Virtex UltraScale+ FPGAs • Xilinx Zynq UltraScale+ MPSoC Intel Stratix 10 FPGAs, including • Intel Arria 10 SoC FPGAs • Xilinx MicroBlaze • Intel Stratix 10 SoC FPGAs • Intel Nios II • RISC-V, originally UC Berkeley 6
Sel elect ected ed Platf atform orm FPGA A Famil ily: Xilinx inx Zynq UltraS traScale ale+ + MPSoC SoC Device: e: XCZU9E U9EG-2FF FFVB1 VB1156E 6E Prototy typing ping Board: d: ZCU102 2 Evalu luation ation Kit t from m Xilinx inx Processing cessing Syst stem em: Qua uad-cor ore ARM Cortex-A53 A53 Applic ication ation Proc ocessing essing Unit Unit, running at the frequency of 1.2 GHz (only one core used for benchmarking) Progr gramm ammable ble Logic ic: Config igura urable ble Logic ic Bloc ocks ks (CLB), LB), Block ck RAMs, , DSP P units ts 7
Ex Expe perim rimental ental Setup etup AXI Lite Main Clock Zynq Processing System AXI Timer Interface Interface AXI Lite Interface AXI Full IRQ AXI Stream AXI Stream AXI DMA Interface Interface AXI Lite Interface Interface AXI Lite Hardware FIFO FIFO Input FIFO Output FIFO Interface Accelerator Interface wr_clk rd_clk wr_clk rd_clk clk UUT_clk Clocking wizard 8
Sel elect ected ed Algorit orithm hm NTRUEncrypt ypt is one of the most well-known PQC algorithms that has withstood cryptanalysis. The speed of NTRUEncrypt in software, especially on embedded software platforms, is limited by the long execution time of polyno nomia mial l multip iplicatio lication. We implement two variants of the NIST Round 1 PQC candidate NTRUEncrypt ypt: ntru-pke-443 and ntru-pke-743 in bare-met metal al mode. Polynomial multiplication is implemented in the Programmable Logic (PL) of Zynq using two approaches RT RTL and HLS HLS
Accelerat celerator or De Desi sign gn Target: t: Minimum mum Ex Execut cution ion Time me Register-Transfer Level methodology with VHDL Block diagram of the Datapath and Algorithmic State Machine (ASM) chart of the Controller High Level Synthesis methodology with C Goal: The same or comparable number of clock cycles as in the Register-Transfer Level (manual) implementation in VHDL Attem empt pt 1: Reference implementation based on the grade school algorithm for multiplication (a.k.a. schoolbook, paper-and-pencil, etc.) Attem empt pt 2: Optimized implementation based on rotation Multiple attempts at optimization using Vivado HLS directives (pragmas) and minor code changes Outcome come 1: Tens s of thousa usands nds of clock ck cycles es, compared to the expected n=743 clock cycles Soluti tion: on: Rewriting the code in C in such a way to match the block diagram used to generate VHDL code Outcome come 2: Expected functionality Around d n clock ck cycles es of the execution time 10
Speed-up achieved for Polynomial Multiplication 140 128.5 119.8 120 106.8 99.6 100 89.1 82.8 81.9 76.1 80 60 40 20 0 ntru-pke-443 ntru-pke-443 ntru-pke-743 ntru-pke-743 ENC Speed up DEC Speed up ENC Speed up DEC Speed up RTL HLS 11
Total Speed-up achieved for entire ENC/DEC 8 6.8 6.8 7 6 5 4 4 3.9 3.9 4 3 2.4 2.3 2 1 0 ntru-pke-443 ntru-pke-443 ntru-pke-743 ntru-pke-743 ENC Total Speed-up DEC Total Speed up ENC Total Speed-up DEC Total Speed up RTL HLS 12
Resource Utilization 95,329 100,000 82,221 76,972 80,000 60,000 51,953 49,674 49,293 44,257 40,000 29,655 16,686 20,000 11,425 9,413 7,802 1 1 1 1 0 RTL ntru-pke-443 HLS ntru-pke-443 RTL ntru-pke-743 HLS ntru-pke-743 LUTs FFs Slices BRAMs 13
Q&A Th Thank ank Yo You! u! Questions uestions? Comments? mments? Sug uggestions gestions? CERG: G: http:// tp://cr cryp yptograph graphy.g .gmu.ed mu.edu ATHE HENa Na: : http:/ ttp://cr crypt yptogr graph phy.g .gmu.edu mu.edu/ath athen ena 14
Recommend
More recommend