Joël Cathébras Alexandre Carbon Peter Milder Renaud Sirdey Nicolas Ventroux DATA FLOW ORIENTED HARDWARE DESIGN OF RNS-BASED POLYNOMIAL MULTIPLICATION FOR SHE ACCELERATION Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18
IMPLEMENTATION PROBLEMATIC FOR RLWE-BASED LEVELED-FHE SCHEMES • Handling polynomial of 𝑺 = ℤ 𝑌 /(𝐺(𝑌)) and 𝑺 𝑟 = 𝑺/𝑟𝑺 : • Modulus 𝑟 ~ several hundred of bits Security • deg( 𝐺 ) ~ several thousand Impact Multiplicative depth Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18 | 2
IMPLEMENTATION PROBLEMATIC FOR RLWE-BASED LEVELED-FHE SCHEMES • Handling polynomial of 𝑺 = ℤ 𝑌 /(𝐺(𝑌)) and 𝑺 𝑟 = 𝑺/𝑟𝑺 : • Modulus 𝑟 ~ several hundred of bits Security • deg( 𝐺 ) ~ several thousand Impact Multiplicative depth • Residue Number System: 𝑏 𝑟 1 𝑐 𝑟 1 𝑏 𝑟 𝑗 𝑐 𝑟 𝑗 𝑏 𝑟 𝑙 𝑐 𝑟 𝑙 𝑏 𝑐 𝑙 … … × ⇔ × × × 𝑟 = 𝑟 𝑗 𝑟 𝑟 1 𝑟 𝑗 𝑟 𝑙 𝑗=1 𝑠 𝑠 𝑠 𝑠 𝑟 1 𝑟 𝑗 𝑟 𝑙 Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18 | 3
IMPLEMENTATION PROBLEMATIC FOR RLWE-BASED LEVELED-FHE SCHEMES • Handling polynomial of 𝑺 = ℤ 𝑌 /(𝐺(𝑌)) and 𝑺 𝑟 = 𝑺/𝑟𝑺 : • Modulus 𝑟 ~ several hundred of bits Security • deg( 𝐺 ) ~ several thousand Impact Multiplicative depth • Residue Number System: 𝑏 𝑟 1 𝑐 𝑟 1 𝑏 𝑟 𝑗 𝑐 𝑟 𝑗 𝑏 𝑟 𝑙 𝑐 𝑟 𝑙 𝑏 𝑐 𝑙 … … × ⇔ × × × 𝑟 = 𝑟 𝑗 𝑟 𝑟 1 𝑟 𝑗 𝑟 𝑙 𝑗=1 𝑠 𝑠 𝑠 𝑠 𝑟 1 𝑟 𝑗 𝑟 𝑙 • Bajard et al. in 2016, further simplified by Halevi et al. in 2018 : • RNS compatible FV. Dec 𝑆𝑂𝑇 and FV. Mult&Relin 𝑆𝑂𝑇 . • New 𝒔𝒎𝒍 𝑆𝑂𝑇 : pair of 𝑙 × 𝑙 – matrices with elements in 𝑆 𝑟 𝑗 for 𝑗 in 1, … , 𝑙 . • Performance bottleneck: Residue Polynomial Multiplication ( 𝑆 𝑟 𝑗 ’s products) Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18 | 4
IMPLEMENTATION PROBLEMATIC FOR RLWE-BASED LEVELED-FHE SCHEMES • Handling polynomial of 𝑺 = ℤ 𝑌 /(𝐺(𝑌)) and 𝑺 𝑟 = 𝑺/𝑟𝑺 : • Modulus 𝑟 ~ several hundred of bits Security • deg( 𝐺 ) ~ several thousand Impact Multiplicative depth • Residue Number System: 𝑏 𝑟 1 𝑐 𝑟 1 𝑏 𝑟 𝑗 𝑐 𝑟 𝑗 𝑏 𝑟 𝑙 𝑐 𝑟 𝑙 𝑏 𝑐 𝑙 … … × ⇔ × × × 𝑟 = 𝑟 𝑗 𝑟 𝑟 1 𝑟 𝑗 𝑟 𝑙 𝑗=1 𝑠 𝑠 𝑠 𝑠 𝑟 1 𝑟 𝑗 𝑟 𝑙 • Bajard et al. in 2016, further simplified by Halevi et al. in 2018 : • RNS compatible FV. Dec 𝑆𝑂𝑇 and FV. Mult&Relin 𝑆𝑂𝑇 . • New 𝒔𝒎𝒍 𝑆𝑂𝑇 : pair of 𝑙 × 𝑙 – matrices with elements in 𝑆 𝑟 𝑗 for 𝑗 in 1, … , 𝑙 . • Performance bottleneck: Residue Polynomial Multiplication ( 𝑆 𝑟 𝑗 ’s products) • Negative Wrapped Convolution over 𝑆 𝑟 𝑗 = ℤ 𝑟 𝑗 𝑌 /(𝐺(𝑌)) : • No polynomial modular reduction. Restrict the choice of 𝐺 𝑌 = 𝑌 𝑜 + 1 with 𝑜 a power of 2. • • Restrict the choice of 𝑟 𝑗 : 𝑟 𝑗 ≡ 1 mod 2𝑜 . 𝑘 ) 0≤𝑘<2𝑜 , where 𝜔 𝑗 a 𝑜 -th primitive root of - 1 in ℤ 𝑟 𝑗 • ∗ . 2𝑜𝑙 precomputed values: (𝜔 𝑗 Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18 | 5
RELATED WORKS (HARDWARE ACCELERATION) • Migliore et al. 2018: Karatsuba rather than NWC (no RNS) • Finer choice of 𝐺(𝑌) allowing batching of binary messages. • Asymptotic complexity in 𝑃(𝑜 1,585 ) Vs 𝑃(𝑜 log 𝑜) : turning point ( 𝑜 = 6144 , log 2 𝑟 = 512 ). Not sufficient to target large multiplicative depth. • Öztürk et al. 2015: RNS and NTT approach for LTV scheme (no NWC) • Memory-access iterative NTT. • External pre-computation of NTT twiddle factors. Use communication bandwidth for non-payload data. • Cousins et al. 2017: RNS and NTT approach for LTV scheme • Dataflow oriented pipelined NTT. • Local storage of all twiddle factors at compile time. Storage cost in O( 𝑙𝑜 ), dependent of RNS basis size. • Sinha Roy et al. 2015: RNS and NTT (no NWC) approach for RLWE-based scheme • Memory-access iterative NTT. • Local storage of a subset of the twiddle factors, and computation on-the-fly of the others. Better storage in O( 𝑙 log 𝑜 ), but still dependent of RNS basis size. Dataflow oriented NWC with on-the-fly computation of twiddle factors Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18 | 6
NWC ARCHITECTURE PRINCIPLE • Architecture principle: One NWC over 𝑺 ⟺ O(𝑙) smaller NWC over the 𝑺 𝑟 𝑗 ’s : 𝐷 𝑗 = NWC 𝑗 𝐵 𝑗 , 𝐶 𝑗 • Required values for NWC 𝑗 : ∗ ⇒ 𝜕 𝑗 = 𝜔 𝑗 • 2 mod 𝑟 𝑗 is a 𝑜 -th primitive root of 1 over ℤ 𝑟 𝑗 ∗ 𝜔 𝑗 : a 𝑜 -th primitive root of - 1 over ℤ 𝑟 𝑗 −1 𝑜 𝑗 −1 𝑜 𝑗 𝑟 𝑗 𝑤 𝑗 GEN GEN GEN 1 𝜔 𝑗 ITW PCTW TW … 𝑥 𝜔 𝑗 −1 ⊂ Ψ 𝑗 −1 ( 𝜕 𝑗 = 𝜔 𝑗 2 mod 𝑟 𝑗 ) Ω 𝑗 ⊂ Ψ 𝑗 and Ω 𝑗 twiddle flow −1 ⋅ Ψ 𝑗 data flow −1 ) −1 ) (𝑟 𝑗 , 𝑤 𝑗 , Ψ 𝑗 ) (𝑟 𝑗 , 𝑤 𝑗 , Ω 𝑗 ) (𝑟 𝑗 , 𝑤 𝑗 , Ω 𝑗 (𝑟 𝑗 , 𝑤 𝑗 , 𝑜 𝑗 (𝑟 𝑗 , 𝑤 𝑗 ) VEC 𝐵 𝑗 PW PW VEC NTT PW 𝐷 𝑗 MM MM NTT 𝐶 𝑗 MM Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18 | 7
NWC ARCHITECTURE PRINCIPLE • Architecture principle: One NWC over 𝑺 ⟺ O(𝑙) smaller NWC over the 𝑺 𝑟 𝑗 ’s : 𝐷 𝑗 = NWC 𝑗 𝐵 𝑗 , 𝐶 𝑗 • Required values for NWC 𝑗 : ∗ ⇒ 𝜕 𝑗 = 𝜔 𝑗 • 2 mod 𝑟 𝑗 is a 𝑜 -th primitive root of 1 over ℤ 𝑟 𝑗 ∗ 𝜔 𝑗 : a 𝑜 -th primitive root of - 1 over ℤ 𝑟 𝑗 𝑃 𝑥 seeds ≪ 𝑃 𝑜 twiddles 𝑜−1 𝑘 Generation of Ψ 𝑗 = 𝜔 𝑗 . 𝑘=0 𝑜 One set every 𝑈 = 𝑥 cycles. −1 𝑜 𝑗 −1 𝑜 𝑗 𝑟 𝑗 𝑤 𝑗 GEN GEN GEN 1 𝜔 𝑗 ITW PCTW TW … 𝑥 𝜔 𝑗 −1 ⊂ Ψ 𝑗 −1 ( 𝜕 𝑗 = 𝜔 𝑗 2 mod 𝑟 𝑗 ) Ω 𝑗 ⊂ Ψ 𝑗 and Ω 𝑗 twiddle flow −1 ⋅ Ψ 𝑗 data flow −1 ) −1 ) (𝑟 𝑗 , 𝑤 𝑗 , Ψ 𝑗 ) (𝑟 𝑗 , 𝑤 𝑗 , Ω 𝑗 ) (𝑟 𝑗 , 𝑤 𝑗 , Ω 𝑗 (𝑟 𝑗 , 𝑤 𝑗 , 𝑜 𝑗 (𝑟 𝑗 , 𝑤 𝑗 ) VEC 𝐵 𝑗 PW PW VEC NTT PW 𝐷 𝑗 MM MM NTT 𝐶 𝑗 MM Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18 | 8
NWC ARCHITECTURE PRINCIPLE • Architecture principle: One NWC over 𝑺 ⟺ O(𝑙) smaller NWC over the 𝑺 𝑟 𝑗 ’s : 𝐷 𝑗 = NWC 𝑗 𝐵 𝑗 , 𝐶 𝑗 • Required values for NWC 𝑗 : ∗ ⇒ 𝜕 𝑗 = 𝜔 𝑗 • 2 mod 𝑟 𝑗 is a 𝑜 -th primitive root of 1 over ℤ 𝑟 𝑗 ∗ 𝜔 𝑗 : a 𝑜 -th primitive root of - 1 over ℤ 𝑟 𝑗 𝑜−1 𝑃 𝑥 seeds ≪ 𝑃 𝑜 twiddles −1 = 𝜔 𝑗 −𝑘 Computation of Ψ 𝑗 𝑜−1 𝑘=0 𝑘 Generation of Ψ 𝑗 = 𝜔 𝑗 −1 = Reorder(𝑟 𝑗 − Ψ 𝑗 ) . Ψ 𝑗 𝑘=0 𝑘 = 𝜔 𝑗 − 𝑜−𝑘 mod 𝑟 𝑗 ) 𝑜 One set every 𝑈 = 𝑥 cycles. ( 𝑟 𝑗 − 𝜔 𝑗 −1 𝑜 𝑗 −1 𝑜 𝑗 𝑟 𝑗 𝑤 𝑗 GEN GEN GEN 1 𝜔 𝑗 ITW PCTW TW … 𝑥 𝜔 𝑗 −1 ⊂ Ψ 𝑗 −1 ( 𝜕 𝑗 = 𝜔 𝑗 2 mod 𝑟 𝑗 ) Ω 𝑗 ⊂ Ψ 𝑗 and Ω 𝑗 twiddle flow −1 ⋅ Ψ 𝑗 data flow −1 ) −1 ) (𝑟 𝑗 , 𝑤 𝑗 , Ψ 𝑗 ) (𝑟 𝑗 , 𝑤 𝑗 , Ω 𝑗 ) (𝑟 𝑗 , 𝑤 𝑗 , Ω 𝑗 (𝑟 𝑗 , 𝑤 𝑗 , 𝑜 𝑗 (𝑟 𝑗 , 𝑤 𝑗 ) VEC 𝐵 𝑗 PW PW VEC NTT PW 𝐷 𝑗 MM MM NTT 𝐶 𝑗 MM Conference on Cryptographic Hardware and Embedded Systems 2018 | Amsterdam, The Netherlands | 09-10-18 | 9
Recommend
More recommend