part i introduction to post quantum cryptography
play

Part I: Introduction to Post Quantum Cryptography Tutorial@CHES - PowerPoint PPT Presentation

Part I: Introduction to Post Quantum Cryptography Tutorial@CHES 2017 - Taipei Tim Gneysu Ruhr-Universitt Bochum & DFKI 04.10.2017 Overview Goals Provide a high-level introduction to Post-Quantum Cryptography (PQC)


  1. Learning with Errors Solving of a system of linear equations secret 7×4 4×1 7×1 ℤ 13 ℤ 13 ℤ 13 4 1 11 10 6 4 × 9 = 8 5 5 9 53 11 1 11 10 3 9 0 10 4 12 1 3 3 2 9 12 7 3 4 6 5 11 4 3 3 5 0 Blue is given; Find (learn) red  Solve linear system

  2. Learning with Errors Solving of a system of linear equations looks random random secret small noise 7×4 4×1 7×1 7×1 ℤ 13 ℤ 13 ℤ 13 ℤ 13 4 1 11 10 0 6 4 -1 × 9 + = 8 5 5 9 53 1 11 1 1 11 10 3 9 0 10 1 4 0 12 1 3 3 2 -1 9 12 7 3 4 6 5 11 4 3 3 5 0 Blue is given; Find red  Learning with Errors (LWE) Problem

  3. Key Aspects of Lattice-based Systems • Encryption and signature systems are both feasible (and secure) – Significant ciphertext expansion for (R-)LWE encryption – Decryption error probability with (R-)LWE encryption • Random Sampling not only from uniform but also from Discrete Gaussian distributions (not a trivial task!) • Most operations are efficient and parallizable – (Ideal lattices) Make use of FFT for polynomial multiplication – (Standard lattices) Matrix-vector arithmetic • Reasonably large public and private keys – Given for encryption/signatures constructions – Unclear for advanced services such as functional encryption (e.g., FHE)

  4. Outline • Introduction • Classes of Post-Quantum Cryptography (PQC) – Code-Based Cryptography – Lattice-Based Cryptography – Hash-Based Cryptography • Lessons Learned

  5. Hash-based Cryptography: Lamport-Diffie One-Time Signatures (LD-OTS, 1979)  Definition : Given a security parameter 𝑜 , the set of 𝑜 -bit vectors 𝑉 𝑜 = {0,1} 𝑜 and a one-way function ℎ: 𝑉 𝑜 → 𝑉 𝑜  Secret key : Generate 2𝑜 × 𝑜 -bit vector 𝑌 = (𝑦 0,0 , 𝑦 0,1 , 𝑦 1,0 , 𝑦 1,1 , . . , 𝑦 𝑜−1,1 )  Public Key : Compute 𝑍 = 𝑧 0,0 , . . , 𝑧 𝑜−1,1 ∀𝑧 𝑗,𝑘 = 𝑔(𝑦 𝑗,𝑘 ) x 0 x 1 x 0 x 1 x 0 x 1 … = X x 0 x 1 x 0 x 1 h h h h h h h h h h y 0 y 1 y 0 y 1 y 0 y 1 … = Y y 0 y 1 y 0 y 1  Publish public key Y

  6. Hash-based Cryptography: Lamport-Diffie One-Time Signatures (LD-OTS, 1979)  Definition : Given a published public key 𝑍 and an 𝑜 -bit message 𝑁 = (𝑛 0 , … , 𝑛 𝑜−1 ) to sign  Sign : Generate signature 𝜏 = (𝑦 0,𝑛 0 , . . , 𝑦 𝑜−1,𝑛 𝑜−1 ) by revealing corresponding 𝑦 𝑗,𝑛 𝑗 secret bits.  Verify : Check that for f( 𝜏 𝑗 ) = 𝑧 (𝑗,𝑛 𝑗 ) ∀ 𝑗 = [0, 𝑜 − 1] m 0 m 1 m 2 m n-2 m n-1 r r r r r … = 𝜏 x 0 x 1 x 0 x 1 x 0 x 1 x 0 x 1 x 0 x 1 ! h h h h h = y 0 y 1 y 0 y 1 y 0 y 1 … = Y y 0 y 1 y 0 y 1

  7. Extension for Multiple Use: Merkle‘s Signature Scheme Public MSS key • Idea by R. Merkle [1979] : reduces P K = V 3 [ 0 the validity of many OTS verification ] V [ 2 0 V ] 2 [ 1 ] keys to a single verification key V V V 1 V 1 1 [ 1 [ [ 3 [ 1 2 0 ] ] ] ] using a binary tree V V V 0 0 V 0 V 0 V 0 V [ [ 0 V 0 [ 0 [ [ [ 4 5 [ [ 0 1 2 3 6 7 ] ] ] ] ] ] ] = = = ] = = = = 𝑕(𝑍 4 ) 𝑕(𝑍 5 ) = 𝑕(𝑍 0 ) 𝑕(𝑍 1 ) 𝑕(𝑍 2 ) 𝑕(𝑍 0 ) 𝑕(𝑍 6 ) 𝑕(𝑍 7 ) Public OTS keys • Properties and Requirements – Max. signature count determined by height H of tree (fixed at setup) – Needs to keep track of already used signatures in the tree  stateful signature scheme – Can be used with any one-time signature scheme and (collision- resistant) cryptographic hash function

  8. Merkle Signature Scheme Principle Let 𝑕: {0,1} ∗ → {0,1} 𝑜 be a hash function with security parameter 𝑜 • Fix height 𝐼 and generate 2 𝐼 LD-OTS key pairs (𝑌 𝑗 , 𝑍 𝑗 ) with 0 ≤ 𝑗 < 2 𝐼 • • 𝑗 𝑘 with 0 ≤ 𝑗 ≤ 𝐼 and 0 ≤ 𝑘 < 2 𝐼−𝑗 Notation : 𝑊 Example : 𝐼 = 3 PK = V 3 [0] V 2 [0] V 2 [1] V 1 [3] V 1 [2] V 1 [1] V 1 [0] V 0 [5] V 0 [4] V 0 [0] V 0 [6] V 0 [1] V 0 [2] V 0 [3] V 0 [7] = = = = = = = = 𝑕(𝑍 4 ) 𝑕(𝑍 5 ) 𝑕(𝑍 6 ) 𝑕(𝑍 0 ) 𝑕(𝑍 1 ) 𝑕(𝑍 2 ) 𝑕(𝑍 0 ) 𝑕(𝑍 7 ) (𝑌 0 , 𝑍 0 ) (𝑌 1 , 𝑍 1 ) (𝑌 2 , 𝑍 2 ) (𝑌 3 , 𝑍 3 ) (𝑌 4 , 𝑍 4 ) (𝑌 5 , 𝑍 5 ) (𝑌 6 , 𝑍 6 ) (𝑌 7 , 𝑍 7 ) • Computation rule for inner nodes: 𝑊 𝑗 𝑘 = g(𝑊 𝑗−1 [2j] || 𝑊 𝑗−1 [2j+1]) with 0 < 𝑗 ≤ H and 0 ≤ 𝑘 < 2 𝑗

  9. Key Aspects of Hash-based Cryptographic Systems • Only signature schemes available , no encryption • Moderate requirements for implementations – Second preimage (older schemes: collision) resistant hash function – Pseudorandom functions for OTS (XMSS) • Hard limitation on the number of signatures per tree – Height of the tree determines max. # of signatures (issue with DoS attacks for real-world systems) – Requires track record of signatures already used (critical in untrusted environments!) – Increasing tree height increases memory requirements and computational complexity

  10. Outline • Introduction • Classes of Post-Quantum Cryptography (PQC) – Code-Based Cryptography – Lattice-Based Cryptography – Hash-Based Cryptography • Lessons Learned

  11. Lessons Learned • Post-Quantum Cryptography essential for long-term security – Code-based encryption schemes are the most mature candidates – Digital signatures from hash-based cryptography with high confidence respect to security and under standardization – Lattice-based cryptography has high potential and extremely high versatility • Next topics in this tutorial (selection due to time constraints) – Efficient implementation strategies for Code-Based Cryptosystems – Efficient implementation of Lattice-Based Cryptosystems ICT-644729

  12. Part I: Introduction to Post Quantum Cryptography Tutorial@CHES 2017 - Taipei Tim Güneysu Ruhr-Universität Bochum & DFKI 04.10.2017 Thank you! Questions?

  13. Part II: Hardware Architectures for Post Quantum Cryptography Tutorial@CHES 2017 - Taipei Tim Güneysu Ruhr-Universität Bochum & DFKI 04.10.2017 including slides by Ingo von Maurich and Thomas Pöppelmann Tutorial@CHES 2017 - Tim Güneysu

  14. Tutorial Outline – Part II Code-based Cryptography Efficient Code-based Implementations Lattice-based Cryptography Efficient Lattice-based Implementations Lessons Learned

  15. Recall: McEliece Encryption Scheme [1978] Key Generation Given a [𝑜, 𝑙] -code 𝐷 with generator matrix 𝐻 and error correcting capability 𝑢 Private Key: (𝑇, 𝐻, 𝑄) , where 𝑇 is a scrambling and 𝑄 is a permutation matrix Public Key: 𝐻′ = 𝑇 · 𝐻 · 𝑄 Encryption 𝑙 , error vector e ∈ 𝑆 𝔾 2 𝑜 , wt e ≤ 𝑢 Message 𝑛 ∈ 𝔾 2 x ← 𝑛𝐻′ + e Decryption Let Ψ 𝐼 be a 𝑢 -error-correcting decoding algorithm. 𝑛 · 𝑇 ← Ψ 𝐼 𝑦 · 𝑄 −1 , removes the error e · 𝑄 −1 Extract 𝑛 by computing 𝑛 · 𝑇 · 𝑇 −1

  16. Security Parameters (Binary Goppa Codes) • Original proposal : McEliece with binary Goppa codes  Code properties determine key size, matrices are often large • Code parameters revisited by Bernstein, Lange and Peters • Public key is a 𝑙 ∗ (𝑜 − 𝑙) bit matrix (redundant part only)

  17. Code-based Cryptography for Embedded Devices K pub =M y= Ψ (y, K priv ) y=Mx+e K priv (Matrix) Decrypt x Encrypt x y y • Selection of the employed code is a highly critical issue – Properties of code determine key size, short keys essential – Structures in codes reduce key size, but can enable attacks – Encoding is a fast operation on all platforms (matrix multiplication) – Decoding requires efficient techniques in terms of time and memory • Basic McEliece is only CPA-secure; conversion required • Protection against side-channel and fault-injection attacks

  18. Quasi-Cyclic Moderate Density Check Codes (QC-MDPC) • 𝑢 -error correcting (𝑜, 𝑠, 𝑥) -QC-MDPC code of length 𝑜 = 𝑜 0 𝑠 • Parity-check matrix 𝐼 consists of 𝑜 0 blocks with fixed row weight 𝑥 Code/Key Generation Generate 𝑜 0 first rows of parity-check matrix blocks 𝐼 𝑗 1. 𝑠 of weight 𝑥 𝑗 , w = 𝑗=0 𝑜 0 −1 𝑥 𝑗 ℎ 𝑗 ∈ 𝑆 𝐺 2 2. Obtain remaining rows by 𝑠 − 1 quasi-cyclic shifts of ℎ 𝑗 𝐼 = [𝐼 0 |𝐼 1 |… |𝐼 𝑜 0 −1 ] 3. Generator matrix of systematic form 𝐻 = 𝐽 𝑙 𝑅 4. −1 ∗ 𝐼 0 ) 𝑈 (𝐼 𝑜 0 −1 −1 ∗ 𝐼 1 ) 𝑈 (𝐼 𝑜 0 −1 Q = … −1 ∗ 𝐼 𝑜 0 −2 ) 𝑈 (𝐼 𝑜 0 −1

  19. Background on QC-MDPC Codes Parity check matrix 𝐼 𝑜 0 = 2 𝐼 1 𝐼 0 I Generator matrix 𝐻

  20. (QC-)MDPC McEliece Encryption 𝑙 , error vector 𝑓 ∈ 𝑆 𝐺 2 𝑜 , 𝑥𝑢(𝑓) ≤ 𝑢 Message 𝑛 ∈ 𝐺 2 x ← 𝑛𝐻 + 𝑓 Decryption Let Ψ 𝐼 be a 𝑢 -error-correcting (QC-)MDPC decoding algorithm. 𝑛𝐻 ← Ψ 𝐼 𝑛𝐻 + 𝑓 Extract 𝑛 from the first k positions. Parameters for 80-bit equivalent symmetric security [MTSB13] 𝑜 0 = 2, 𝑜 = 9602, 𝑠 = 4801, 𝑥 = 90, 𝑢 = 84

  21. Tutorial Outline – Part II Code-based Cryptography Efficient Code-based Implementations Lattice-based Cryptography Efficient Lattice-based Implementations Lessons Learned

  22. Hardware Implementation of Building Blocks for McEliece/Niederreiter • Two Operations – Encryption/Encoding: G • Matrix-vector multiplication (with large matricies, either to be stored or to be generated on-the-fly); codeword • TRNG for error generation – Decryption/Decoding: ciphertext • Code- specific syndrome decoding; hard-decision decoding with simple (bitwise) operations preferred • Inverse-matrix-vector multiplication message

  23. Efficient Decoding of MDPC Codes Decoders for LDPC/MDPC codes: bit flipping and belief propagation “Bit - Flipping” Decoder 1. Compute syndrome 𝑡 of the ciphertext 2. Count unsatisfied parity-check-equations # 𝑣𝑞𝑑 for each ciphertext bit Flip ciphertext bits that violate ≥ 𝑐 equations 3. 4. Recompute syndrome Repeat until 𝑡 = 0 or reaching max. iterations (decoding failure) 5.  How to determine threshold 𝑐 ? • Precompute 𝑐 𝑗 for each iteration [Gal62] • 𝑐 = 𝑛𝑏𝑦 𝑣𝑞𝑑 [HP03] • 𝑐 = 𝑛𝑏𝑦 𝑣𝑞𝑑 − δ [MTSB13]

  24. FPGA Low-Resource Encryption Target: Xilinx Spartan-6 FPGA 32 flip flops Scheme: QC-MDPC Encryption m  Given first 4801-bit row 𝑕 of 𝐻 and message 𝑛 , Control + XOR compute 𝑦 = 𝑛𝐻 + 𝑓 BRAM  Storage requirements • One 18 kBit BRAM is sufficient to store message m , m row 𝑕 and the redundant part (3x4801-bit vectors) G • But only two data ports are available redundan • t part Read out 32-bit of the message and store them in a separate register  Error addition • Instead of starting with an all-zero redundant part we preload it with the second half of the error vector

  25. FPGA Low-Resource Decryption QC-MDPC Decryption  Secret key and ciphertext consist of two blocks  Iterative vs. parallel design  Decoding is complex task → parallel processing  BRAM-based implementation: storage requirements  Secret key (2x4801 bit)  Ciphertext (2x4801 bit)  Syndrome (4801 bit)  In total 3 BRAMs due to memory and port access requirements

  26. FPGA Low-Resource Decryption QC-MDPC Decryption Syndrome computation 𝑡 = 𝐼𝑦 𝑈  • Similar technique as for encoding  Compare 𝑡 = 𝟏? • Compute binary OR of all 32-bit blocks of the syndrome  Count # 𝑣𝑞𝑑 • Hamming weight of syndrome AND ℎ 0 /ℎ 1 (32-bit at a time) • Accumulate Hamming weight  Bit-flipping • If # 𝑣𝑞𝑑 ≥ 𝑐 𝑗 invert ciphertext bit(s) and XOR ℎ 0 /ℎ 1 to the syndrome while rotating both

  27. Lightweight FPGA Results  Post-PAR for Xilinx Spartan-6 XC6SLX4 & Virtex-6 XC6VLX240T  Encryption takes 735,000 cycles  Decryption takes 4,274,000 cycles on average

  28. Lightweight FPGA Comparison  Realistic public key size (0.6 kByte vs. 50-100 kByte)  Smallest McEliece FPGA implementation  Sufficient performance for many applications

  29. Tutorial Outline – Part II Code-based Cryptography Efficient Code-based Implementations Lattice-based Cryptography Efficient Lattice-based Implementations Lessons Learned

  30. Lattice-Based Cryptography • Recall: Benefits of Lattice-Based Cryptography – We can get signatures and public key encryption from lattices and also more advanced services (IBE, FHE) – A lot of development on theory side; schemes are improving – Implementation of lattice-based cryptography is a young field; only done for a few years (except maybe for NTRU)

  31. To be Ideal or not Ideal? Two important lines of research: random lattices and ideal lattices • Major impact on implementation (theory not that much) • Security for random lattices is better understood (ideal lattices are more structured)  Ideal Lattices  Random Lattices • • Operations on large matrices Operations on polynomials with 256 or (e.g., 532x840) 512 coefficients • Mostly matrix-vector multiplication modulo 𝑟 < 2 32 • Mostly polynomial multiplication modulo 𝑟 < 2 32 • Large public keys (e.g., 532x840 matrix) • Public keys are one (or two) polynomials with 256 or 512 coefficients

  32. Learning with Errors Solving of a system of linear equations secret 7×4 4×1 7×1 ℤ 13 ℤ 13 ℤ 13 4 1 11 10 6 4 × 9 = 8 5 5 9 53 11 1 11 10 3 9 0 10 4 12 1 3 3 2 9 12 7 3 4 6 5 11 4 3 3 5 0 Blue is given; Find (learn) red  Solve linear system

  33. Learning with Errors Solving of a system of linear equations looks random random secret small noise 7×4 4×1 7×1 7×1 ℤ 13 ℤ 13 ℤ 13 ℤ 13 4 1 11 10 0 6 4 -1 × 9 + = 8 5 5 9 53 1 11 1 1 11 10 3 9 0 10 1 4 0 12 1 3 3 2 -1 9 12 7 3 4 6 5 11 4 3 3 5 0 Blue is given; Find red  Learning with errors

  34. (Ring) Learning with Errors From learning with errors to ring-learning with errors 7×4 ℤ 13 Only one line 4 4 1 1 11 11 10 10 • Shift first line on every line has to be stored • Use rule that we negate x in case of wrap around (e.g., 3 4 1 11 10 ⇒ −10 ≡ 3 mod 13) 2 3 4 1 12 2 3 4 9 12 2 3 10 9 12 2 11 10 9 12

  35. Ring Learning with Errors: Principle 𝒃 34 23 … 23 • Ideal lattices correspond to ideals in random 𝑎 𝑟 𝑦 the ring R = 𝑦 𝑜 +1 × small secret 𝒕 1 -2 … 0 • Ring Learning With Errors (RLWE) (Gaussian) sample is: 𝐮 = 𝒃𝒕 + 𝒇 ∈ 𝑆 for + uniform 𝒃 ∈ R and small discrete small error 𝒇 0 1 … 0 Gaussian distributed 𝒕, 𝒇 ← 𝐸 𝜏 (Gaussian) – Search-RLWE: Find s when given 𝐮 = and 𝐛 random – Decision-RLWE: Distinguish 𝐮 from 32 43 … 12 uniform when given 𝐮 and 𝐛

  36. Example: 𝑎 𝒓 𝑦 Polynomial Addition in R = 𝑦 𝒐 +1 𝑎 𝒓 𝑦 • Assume ring R = 𝑦 𝒐 +1 • Assume parameters 𝑟 = 5 and 𝑜 = 4 𝒘 = 4𝑦 3 + 2𝑦 2 + 0𝑦 1 + 1 • = (4,2,0,1) 𝐥 = 2𝑦 3 + 1𝑦 2 + 4𝑦 1 + 0 • = 2,1,4,0 • 𝒕 = 𝒘 + 𝒍 = 4 + 2 mod 5,2 + 1,4,1 = (1,3,4,1) 𝒘 𝒍 𝒕

  37. Example: 𝑎 𝒓 𝑦 Polynomial Multiplication in R = 𝑦 𝒐 +1 • 𝒍 = 2, 1, 4, 0 • Task: 𝒜 = 𝒕 ∗ 𝒍 = (3, 0, 2, 0) • 𝒕 = 1, 3, 4, 1

  38. Discrete Gaussian Distribution • 𝐸 𝜏 is defined by assigning weight proportional to −𝑦 2 𝜍 𝜏 𝑦 = exp( 2𝜏 2 ) R = 𝑎 𝟓𝟏𝟘𝟒 𝑦 Uniform 𝒃 -1501 1020 502 … -1900 572 𝑦 𝟑𝟔𝟕 + 1 Gaussian e -1 4 -8 … 0 1 Remark on Arithmetic of x-distributed values: Uniform * Gaussian = Uniform Gaussian * Gaussian = larger Gaussian

  39. Gaussian Sampling: Options Cumulative Distribution Table (CDT) Rejection Sampling Sampling Bernoulli Sampling [DG14] Efficient sampling from discrete Gaussians for lattice-based cryptography on a constrained device , Dwarakanath and Galbraith, Applicable Algebra in Engineering, Communication and Computing, 2014 Knuth-Yao Sampling [DDLL14] Lattice Signatures and Bimodal Gaussian s, Léo Ducas and Alain Durmus and Tancrède Lepoint and Vadim Lyubashevsky, CRYPTO '13

  40. Ring-LWE Encryption Scheme [LP11/LPR10] Gen : Choose 𝒃 ← 𝑆 and 𝒔 1 , 𝒔 2 ← 𝐸 𝜏 ; pk : 𝒒 = 𝒔 1 − 𝒃 ⋅ 𝒔 2 ∈ R ; sk : 𝒔 2 𝑏 𝑑 1 x + Enc ( 𝒃, 𝒒, 𝑛 ∈ 0,1 𝑜 ): 𝒇 1 , 𝒇 2 , 𝒇 3 ← 𝐸 𝜏 . 𝐸 𝜏 𝐸 𝜏 𝐸 𝜏 𝒏 = 𝑓𝑜𝑑𝑝𝑒𝑓 𝑛 . Ciphertext: 𝑞 x + + 𝑑 2 [𝒅 1 = 𝒃 ⋅ 𝒇 1 +𝒇 2 , 𝒅 2 = 𝒒 ⋅ 𝒇 1 +𝒇 3 + 𝒏] 𝑛 𝑓𝑜𝑑𝑝𝑒𝑓 Dec ( 𝑑 = [𝒅 1 , 𝒅 2 ], 𝒔 𝟑 ): Output 𝑑 1 𝑒𝑓𝑑𝑝𝑒𝑓 𝑛 x + 𝑒𝑓𝑑𝑝𝑒𝑓(𝒅 1 ⋅ 𝒔 2 +𝒅 2 ) 𝑠 𝑑 2 1 Correctness: 𝒅 1 𝒔 2 + 𝒅 2 = (𝒃𝒇 1 + 𝒇 2 ) 𝒔 2 + 𝒒𝒇 1 + 𝒇 3 + 𝒏 = 𝒔 2 𝒃𝒇 1 + 𝒔 2 𝒇 2 + 𝒔 1 𝒇 1 − 𝒔 2 𝒃𝒇 1 + 𝒇 3 + 𝒏 = 𝒏 + 𝒔 2 𝒇 2 + 𝒔 1 𝒇 1 + 𝐟 3 small large

  41. Ring-LWE Encryption: Parameters R = 𝑎 𝟓𝟏𝟘𝟒 𝑦 𝑦 𝟑𝟔𝟕 + 1 𝑜 − bit message/coefficients Error correction m 0 1 … 1 0 • Encode(m ) 𝑓𝑜𝑑𝑝𝑒𝑓 𝑛 – Return 𝑛 ⋅ 𝑟/2 𝒏 0 2046 … 2046 0 • Decode (x) 𝒏 + 𝒔 2 𝒇 2 + 402 1907 … 2631 4024 𝒔 1 𝒇 1 + 𝐟 3 – If ( 1/4𝑟 < 𝑦 < 3/4𝑟 ) de 𝑑𝑝𝑒𝑓 𝑛 Return 1 – Else return 0 𝒏 0 1 … 1 0

  42. Ring-LWE Encryption: Parameters | 𝒅 1 , 𝒅 2 | Parameter sets 𝑜 𝑞 𝜏 |sk| |pk| security (256, 4093, 8.35 [LP11] 256 4093 ~4.5 6,144 1,792 6,144 ~106 bits (256, 7681,11.32) [GFSBH12] 256 7681 ~4.8 6,656 1,792 6,656 ~106 bits (512, 12289, 12.18) [GFSBH12] 512 12289 ~4.9 14,336 3,584 14,336 ~256 bits • Message and ciphertext: – Message space: 𝑜 bits – Expansion 2 ⋅ log 2 𝑟 – Two large polynomials ( 𝒅 1 , 𝒅 2 ) • Public key: one or two large polynomials ( 𝒃 , 𝒒) • Secret key: small polynomial ( 𝒔 𝟑 )

  43. Tutorial Outline – Part II Code-based Cryptography Efficient Code-based Implementations Lattice-based Cryptography Efficient Lattice-based Implementations Lessons Learned

  44. Hardware Implementation Building Blocks for R-LWE • Two main components – Polynomial multiplier for 𝑜 = {256,512,1024} over specific rings with coefficients with less than log2(𝑟) < 24 bits – Discrete Gaussian sampler with precisely defined precision 𝜏

  45. Hardware Implementation: Low-Cost Design for Xilinx Spartan-6 • Row-wise polynomial multiplication ( 𝒃𝒇 1 / 𝒒𝒇 1 ) – Simple address generation – Sample coefficient of 𝒇 1 , add row of 𝒅 1 then add row of 𝒅 2 , add coefficient of 𝒇 2 and 𝒇 3 • Key and ciphertext are stored in block memory Modular • DSP block for arithmetic reduction (power ( 𝑟 × 𝑟 -bit multipler) ot two possible) Multiplication (DSP)

  46. Hardware Implementation: Low Area Post-place-and-route performance on a Spartan-6 LX9 FPGA. Area savings by power of two modulus • Usage of 𝑟 = 4096 leads to area improvement and higher clock frequency • Performance is still very good • Area consumption is low, especially for decryption

  47. Ring-LWE: Can we do better? • Schoolbook polynomial multiplication is simple and independent of parameters • Performance is reasonable but can still be improved Remember: according to schoolbook multiplication, we need 𝑜 2 • multiplications modulo q for one polynomial multiplication – 128 2 = 16384 – 256 2 = 65536 – 512 2 = 262144 – 1024 2 = 1048576 Can we do better?

  48. Optimization: Polynomial Multiplication based on NTT • Include algorithmic tweaks for fast polynomial multiplication • The Number Theoretic Transform (NTT) is a discrete Fourier transform (DFT) defined over a finite field or ring. For a given primitive 𝑜 -th root of unity 𝜕 the NTT is defined as: – Forward transformation: NTT 𝑜−1 𝒃 𝑘 𝜕 𝑗𝑘 , 𝑗 = 0,1,… , 𝑜 • 𝑩[𝑗] = 𝑘=0 – Inverse transformation: INTT 𝑜−1 𝑩 𝑘 𝜕 −𝑗𝑘 , 𝑗 = 0,1,… , 𝑜 • 𝒃[𝑗] = 𝑜 −1 𝑘=0 • NTT exists if 𝑟 is a prime, 𝑜 a power of two and if q ≡ 1 mod 2𝑜 • Example : Ring-LWE encryption: 7681 mod 2 ∙ 256 = 1

  49. NTT for Lattice Cryptography: Convolution Theorem • With the convolution theorem we can basically multiply two vectors/polynomials with the help of the NTT – 𝐝 = INTT NTT 𝒃 ∘ NTT 𝒄 – Efficient algorithms are known for bi-direction conversion NTT 𝒃 ∘ INTT 𝒅 NTT 𝒄 • Negative Wrapped Convolution: – Polynomial multiplication in 𝑎 𝑟 𝑦 / 𝑦 𝑜 + 1 – Runtime 𝑃(𝑜 log𝑜) – No appending of zeros required (as for regular convolution) – Implicit polynomial reduction by 𝑦 𝑜 + 1

  50. Efficient Computation of the NTT (Cooley-Tukey) twiddle factors Multiplication by 𝜕 0 = 1 • Bitreversal required ( NTT 𝑜𝑝→𝑐𝑝 ) • Precomputationof powers of 𝜕 possible • Arithmetic is basically multiplication and reduction 𝑜 modulo 𝑟 ( 2 log 2 (𝑜) times) • Further optimizations still possible

  51. Ring-LWE Encryption on FPGA NTT is very fast but still quite small Lots of improvement since [GFS+12]

  52. Tutorial Outline – Part II Code-based Cryptography Efficient Code-based Implementations Lattice-based Cryptography Efficient Lattice-based Implementations Lessons Learned

  53. Lessons Learned  Efficient McEliece implementations with practical key sizes • QC-MDPC codes are an efficient alternative to binary Goppa codes • Note: consider attacks on decryption failure rate (ASIACRYPT 2016) • Low-cost FPGA implementation practical for key agreement scheme (in prep)  Efficient R-LWE encryption are extremely efficient • R-LWE (and variants) also allow signature + advanced schemes • FPGA implementations more efficient than RSA, en par with ECC  Papers and source code available at http://www.seceng.rub.de/research/projects/pqc/  For more papers and codes, see project websites of ICT-644729

  54. Part II: Hardware Architectures for Post Quantum Cryptography Tutorial@CHES 2017 - Taipei Tim Güneysu Ruhr-Universität Bochum & DFKI 04.10.2017 Thank you! Questions? Tutorial@CHES 2017 - Tim Güneysu

  55. Part III: Post Quantum Cryptography in Embedded Software Tutorial@CHES 2017 - Taipei Tim Güneysu Ruhr-Universität Bochum & DFKI 04.10.2017 including slides by Ingo von Maurich and Thomas Pöppelmann

  56. Tutorial Outline – Part III Code-based Cryptography Efficient Code-based Implementations Lattice-based Cryptography Efficient Lattice-based Implementations Lessons Learned

  57. Recall: McEliece Encryption Scheme [1978] Key Generation Given a [𝑜, 𝑙] -code 𝐷 with generator matrix 𝐻 and error correcting capability 𝑢 Private Key: (𝑇, 𝐻, 𝑄) , where 𝑇 is a scrambling and 𝑄 is a permutation matrix Public Key: 𝐻′ = 𝑇 · 𝐻 · 𝑄 Encryption 𝑙 , error vector e ∈ 𝑆 𝔾 2 𝑜 , wt e ≤ 𝑢 Message 𝑛 ∈ 𝔾 2 x ← 𝑛𝐻′ + e Decryption Let Ψ 𝐼 be a 𝑢 -error-correcting decoding algorithm. 𝑛 · 𝑇 ← Ψ 𝐼 𝑦 · 𝑄 −1 , removes the error e · 𝑄 −1 Extract 𝑛 by computing 𝑛 · 𝑇 · 𝑇 −1

  58. (QC-)MDPC McEliece Encryption 𝑙 , error vector 𝑓 ∈ 𝑆 𝐺 2 𝑜 , 𝑥𝑢(𝑓) ≤ 𝑢 Message 𝑛 ∈ 𝐺 2 x ← 𝑛𝐻 + 𝑓 Decryption Let Ψ 𝐼 be a 𝑢 -error-correcting (QC-)MDPC decoding algorithm. 𝑛𝐻 ← Ψ 𝐼 𝑛𝐻 + 𝑓 Extract 𝑛 from the first k positions. Parameters for 80-bit equivalent symmetric security [MTSB13] 𝑜 0 = 2, 𝑜 = 9602, 𝑠 = 4801, 𝑥 = 90, 𝑢 = 84

  59. Tutorial Outline – Part III Code-based Cryptography Efficient Code-based Implementations Lattice-based Cryptography Efficient Lattice-based Implementations Lessons Learned

  60. 32-bit ARM Microcontroller ARM-based 32-bit Microcontroller  STM32F407@168MHz  32-bit ARM Cortex-M4  1 Mbyte flash, 192 kbyte SRAM  Crypto functions: TRNG, 3DES, AES, SHA-1/-256, HMAC co-processor  Costs: roughly US$ 10 AVR-based 8-bit Microcontroller  ATXMega128A1@32MHz  8-bit AVR Xmega Family  256 Kbyte flash, 8 Kbyte SRAM  Crypto functions: DES, AES  Costs: roughly US$ 10

  61. Implementing Key Generation  Memory is a scarce resource on microcontrollers  Generate and store random sparse vectors of length 4801 with 45 bits set  store set bit locations only Generating secret key 𝑰 = [𝑰 𝟏 |𝑰 𝟐 ]  Generate first row of 𝐼 1 , repeat if not invertible  Generate first row of 𝐼 0  Convert to sparse representation → 90 counters Computing public key 𝑯 = [𝑱|𝑹] −1 and 𝐼 0  Compute 𝑅 from first row of 𝐼 1

  62. Implementing (Plain) Encryption  Recall operation principle as for low-cost hardware • All processes are based on 32-bit based operations • Set bits in message 𝑛 select rows of the public key 𝐻 • Parse 𝑛 bit-by-bit, XOR current row of 𝐻 if bit is set  Error addition for encryption • Use TRNG to provide random bits to add 𝑢 errors • Obtain individual error indices by rejection sampling from log 2 𝑜 = 14 bit

  63. Implementing (Plain) Decryption Recall syndrome computation; parity check matrix in sparse  Parse ciphertext bit-by-bit  XOR row of the secret key if corresponding ciphertext bit is set Decoding iteration  Count #bits that are set in the syndrome and current row of the parity-check matrix blocks  use 90 counters  Compare #bits to decoding threshold  Invert current ciphertext bit if #bits above threshold  Add current row to syndrome  Generate next row → increment counters (check overflows)

  64. Implementation Results Scheme Platform Cycles/Op Time McE MDPC (keygen) STM32F407 148,576,008 884 ms McE MDPC (enc) STM32F407 16,771,239 100 ms McE MDPC (dec) STM32F407 37,171,833 221 ms McE MDPC (enc) ATxmega256 26,767,463 836 ms McE MDPC (dec) ATxmega256 86,874,388 2,71 s • 8-Bit AVR platform too slow for real-world deployment • Key generation excessive, decryption roughly 3 seconds • 32-bit ARM is a suitable platform and provides built-in TRNG • Improved QcBits software for Cortex-M4 by Chou (CHES 2016)

  65. Further Implementation Remarks and Requirements • CCA2-Security for McEliece Encryption: – Additional conversion (e.g., via Fujisaki-Okamoto, includes the necessity for hash-function and re-encryption) • Side-Channel Attacks: – Masking schemes (SCA) for McEliece by Eisenbarth et al. [SAC15], does not include CCA2 security • Decryption Failure Rate Attacks: – Guo et al [ASIACRYPT16] identifies correlation between decoding failures in iterative decoders (bit flipping decoding)

  66. Tutorial Outline – Part III Code-based Cryptography Efficient Code-based Implementations Lattice-based Cryptography Efficient Lattice-based Implementations Lessons Learned

  67. Ring-LWE Encryption Scheme [LP11/LPR10] Gen : Choose 𝒃 ← 𝑆 and 𝒔 1 , 𝒔 2 ← 𝐸 𝜏 ; pk : 𝒒 = 𝒔 1 − 𝒃 ⋅ 𝒔 2 ∈ R ; sk : 𝒔 2 𝑏 𝑑 1 x + Enc ( 𝒃, 𝒒, 𝑛 ∈ 0,1 𝑜 ): 𝒇 1 , 𝒇 2 , 𝒇 3 ← 𝐸 𝜏 . 𝐸 𝜏 𝐸 𝜏 𝐸 𝜏 𝒏 = 𝑓𝑜𝑑𝑝𝑒𝑓 𝑛 . Ciphertext: 𝑞 x + + 𝑑 2 [𝒅 1 = 𝒃 ⋅ 𝒇 1 +𝒇 2 , 𝒅 2 = 𝒒 ⋅ 𝒇 1 +𝒇 3 + 𝒏] 𝑛 𝑓𝑜𝑑𝑝𝑒𝑓 Dec ( 𝑑 = [𝒅 1 , 𝒅 2 ], 𝒔 𝟑 ): Output 𝑑 1 𝑒𝑓𝑑𝑝𝑒𝑓 𝑛 x + 𝑒𝑓𝑑𝑝𝑒𝑓(𝒅 1 ⋅ 𝒔 2 +𝒅 2 ) 𝑠 𝑑 2 1 Correctness: 𝒅 1 𝒔 2 + 𝒅 2 = (𝒃𝒇 1 + 𝒇 2 ) 𝒔 2 + 𝒒𝒇 1 + 𝒇 3 + 𝒏 = 𝒔 2 𝒃𝒇 1 + 𝒔 2 𝒇 2 + 𝒔 1 𝒇 1 − 𝒔 2 𝒃𝒇 1 + 𝒇 3 + 𝒏 = 𝒏 + 𝒔 2 𝒇 2 + 𝒔 1 𝒇 1 + 𝐟 3 small large

  68. Ring-LWE Encryption: Parameters | 𝒅 1 , 𝒅 2 | Parameter sets 𝑜 𝑞 𝜏 |sk| |pk| security (256, 4093, 8.35 [LP11] 256 4093 ~4.5 6,144 1,792 6,144 ~106 bits (256, 7681,11.32) [GFSBH12] 256 7681 ~4.8 6,656 1,792 6,656 ~106 bits (512, 12289, 12.18) [GFSBH12] 512 12289 ~4.9 14,336 3,584 14,336 ~256 bits • Message and ciphertext: – Message space: 𝑜 bits – Expansion 2 ⋅ log 2 𝑟 – Two large polynomials ( 𝒅 1 , 𝒅 2 ) • Public key: one or two large polynomials ( 𝒃 , 𝒒) • Secret key: small polynomial ( 𝒔 𝟑 )

  69. Tutorial Outline – Part III Code-based Cryptography Efficient Code-based Implementations Lattice-based Cryptography Efficient Lattice-based Implementations Lessons Learned

  70. Simple Implementation of RLWE-Encryption void encrypt (poly a, poly p, unsigned char * plaintext, poly c1, poly c2) { int i,j; poly e1,e2,e3; gauss_poly (e1); gauss_poly (e2); gauss_poly (e3); poly_init(c1, 0, n); // init with 0 This has to be fast poly_init(c2, 0, n); // init with 0 for(i = 0;i < n ; i++){ // multiplication loops for(j = 0; j< n ; j++){ c1[(i + j) % n] = modq(c1[(i + j) % n] + ( a[i] * e1[j] * (i+j>=n ? -1 : 1))); c2[(i + j) % n] = modq(c2[(i + j) % n] + ( p[i] * e1[j] * (i+j>=n ? -1 : 1))); } c1[i] = modq(c1[i] + e2[i]); c2[i] = (plaintext[i>>3] & (1<<(i%8))) ? modq(c2[i] + e3[i] + q/2) : modq(c2[i] + e3[i]); } }

  71. Software Implementation Main Functions for R-LWE • Two main components – Polynomial multiplier for 𝑜 = {256,512,1024} over specific rings with coefficients with less than log2(𝑟) < 24 bits – Discrete Gaussian sampler with precisely defined precision 𝜏 and tail cut 𝜐

  72. Intermediate Results • Implementation of RLWE-Encryption on the AVR 8-bit ATxmega processor running at 32 MHz • Schoolbook multiplication (SchoolMul) • Encryption is two multiplications and decryption one

  73. Recall Improvement: Polynomial Multiplication with NTT • Number Theoretic Transform (NTT) is a discrete Fourier transform (DFT) defined over a finite field or ring. For a given primitive 𝑜 -th root of unity 𝜕 the NTT is defined as: – Forward transformation: NTT 𝑜−1 𝒃 𝑘 𝜕 𝑗𝑘 , 𝑗 = 0,1, … , 𝑜 • 𝑩[𝑗] = 𝑘=0 – Inverse transformation: INTT • 𝒃[𝑗] = 𝑜 −1 𝑘=0 𝑜−1 𝑩 𝑘 𝜕 −𝑗𝑘 ,𝑗 = 0,1, … , 𝑜 • NTT exists if 𝑟 is a prime, 𝑜 a power of two and if q ≡ 1 mod 2𝑜

  74. Efficient Computation of the NTT (Textbook) twiddle factors Multiplication by 𝜕 0 = 1 • Bitreversal required ( NTT 𝑜𝑝→𝑐𝑝 ) • Precomputation of powers of 𝜕 possible • Arithmetic is basically multiplication and 𝑜 reduction modulo 𝑟 ( 2 log 2 (𝑜) times) 09.10.2012

  75. Optimization of NTT Computation Removal of expensive “helper” functions • Problem: Permutation (Bitrev) of polynomial is expensive – “Standard” NTT 𝑐𝑝→𝑜𝑝 requires bitreversed input and produces naturally ordered output – Bitreversal before each forward or inverse NTT • Solution: NTT algorithm can be written as – Natural to bitreversed for forward: NTT 𝑜𝑝→𝑐𝑝 – Bitreversed to natural for inverse: INTT 𝑐𝑝→𝑜𝑝 – No bitreversal necessary anymore: • INTT 𝑐𝑝→𝑜𝑝 (NTT 𝑜𝑝→𝑐𝑝 𝒃 ∘ NTT 𝑜𝑝→𝑐𝑝 (𝒄))

Recommend


More recommend