Public key cryptography on IoT devices Sujoy Sinha Roy COSIC, KU Leuven 1
• Small area for HW implementations • Small code size for SW implementation • Low power or energy or both • Reasonably fast computation time 2
This talk Lightweight hardware implementation of Elliptic Curve Cryptography (ECC) ➢ Over binary field ➢ Over prime field 3
m Elliptic curves over binary field F 2 Generic elliptic curves y 2 + xy = x 3 + ax 2 + b where a and b are from Point addition: P 3 (x 3 , y 3 ) = Finite field operations P 1 (x 1 ,y 1 )+P 2 (x 2 ,y 2 ) x 3 = λ 2 + λ + x 1 + x 2 + a y 3 = λ (x 1 + x 3 ) + x 3 + y 1 Scalar multiplication: λ = (y 1 + y 2 )/(x 1 +x 2 ) Base point P(x,y) on curve and scalar n Point doubling: P 3 (x 3 , y 3 ) = 2P 1 (x 1 ,y 1 ) x 3 = λ 2 + λ + a nP = P + P + P + … + P 2 + λ x 3 + x 3 y 3 = x 1 λ = x 1 + y 1 /x 1 Scalar multiplication using double and add algorithm … 0 1 0 1 1 … PD PD PD PD PD … … PA PA PA 4
Lightweight ECC: common tricks • Choice of elliptic curve, finite field etc. ➢ special arithmetic such as endomorphism ➢ sparse irreducible polynomial • Efficient point multiplication algorithm ➢ Reduces number of field operations ➢ Also number of registers e.g. Montgomery ladder, special encoding of scalar etc. • Projective coordinate system ➢ Inversion free • Affordable light-weight countermeasures ➢ Constant time arithmetic. E.g., Montgomery ladder ➢ Random projective coordinate ➢ Scalar randomization (may be?) 5
163 Uses NIST 163-bit ECC over F 2 ~80 bit security “A 5.1 μ J per point- multiplication elliptic curve cryptographic processor” by V. Rozic, O. 6 Reparaz, and I. Verbauwhede, published in IJCTA 2016.
• Co-processor architecture • Components within ‘ - - ’ rectangle are implemented on the chip “A 5.1 μ J per point- multiplication elliptic curve cryptographic processor” by V. Rozic, O. 7 Reparaz, and I. Verbauwhede, published in IJCTA 2016.
• Algorithm: Montgomery ladder with projective coordinate • Circular register file • Digit serial arithmetic unit (MALU) • Full custom balanced layout for Register File and MALU “A 5.1 μ J per point- multiplication elliptic curve cryptographic processor” by V. Rozic, O. 8 Reparaz, and I. Verbauwhede, published in IJCTA 2016.
Measurement results • UMC 130 nm • Core area 0.54 mm 2 • Scalar multiplication 86K cycles (102 ms at 847.5 KHz) • Power 50.4µW at 847.5KHz • Energy per scalar multiplication 5.1µJ “A 5.1 μ J per point- multiplication elliptic curve cryptographic processor” by V. Rozic, O. 9 Reparaz, and I. Verbauwhede, published in IJCTA 2016.
283 Uses NIST 283-bit Koblitz curve over F 2 ~140 bit security 10
m Elliptic curves over F 2 Generic elliptic curves Koblitz curves y 2 + xy = x 3 + a x 2 + b y 2 + xy = x 3 + ax 2 + 1, a=0 or 1 Point addition: P 3 (x 3 , y 3 ) = P 1 (x 1 ,y 1 )+P 2 (x 2 ,y 2 ) Point addition: P 3 (x 3 , y 3 ) = P 1 (x 1 ,y 1 )+P 2 (x 2 ,y 2 ) x 3 = λ 2 + λ + x 1 + x 2 + a x 3 = λ 2 + λ + x 1 + x 2 + a y 3 = λ (x 1 + x 3 ) + x 3 + y 1 y 3 = λ (x 1 + x 3 ) + x 3 + y 1 Point doubling: P 3 (x 3 , y 3 ) = 2P 1 (x 1 ,y 1 ) Point doubling: P 3 (x 3 , y 3 ) = 2P 1 (x 1 ,y 1 ) x 3 = λ 2 + λ + a Frobenius endomorphism x 3 = x 1 2 2 + λ x 3 + x 3 y 3 = x 1 2 y 3 = y 1 Cheap! Scalar multiplication Scalar multiplication … 0 1 0 1 1 … … 0 1 0 1 1 … PD PD PD PD PD FE FE FE FE FE … … … … PA PA PA PA PA PA 11
But there is a catch … Generic elliptic curve Scalar 1 1 … … 0 1 0 PD PD PD PD PD … … PA PA PA Koblitz curve Scalar 1 … 0 1 0 1 … Scalar conversion FE FE FE FE FE … … PA PA PA 12
But there is a catch … Generic elliptic curve Scalar 1 1 … … 0 1 0 PD PD PD PD PD … … PA PA PA Koblitz curve Scalar … 0 1 0 1 … 1 Scalar conversion FE FE FE FE FE … … PA PA PA Several implementations of 13 m lightweight ECC over 𝔾 2
Scalar conversion • Step 1: Scalar is reduced using the lazy reduction by Brumley and Järvinen • Step 2: zero-free expansion by Okeya, Takagi, and Vuillaume ⇒ For Koblitz curve K283, integer add/sub of size 283-bit 14
Optimization • We avoid negations ➢ We compute ( d 0 , d 1 ) ( d 0 /2 – d 1, d 0 /2) ➢ We compute ( a 0 , a 1 ) (2 a 1 , a 1 - a 0 ) Saves 1/3 of ➢ We compute ( b 0 , b 1 ) ( b 0 /2 – b 1 , b 0 /2) cycles! ➢ Sign is corrected in the end of loop 15
SPA resistance O or 1 Conditional multi-precision addition reveals info of the secret scalar 16
SPA resistance O or 1 Conditional multi-precision addition reveals info of the secret scalar We generate u ∈ {-1,1} using zero-free function Ψ ( ) ➢ u = -1 then b 0 - a 0 Similar operations ⇒ Increased SPA resistance! ➢ u = +1 then b 0 + a 0 17
Scalar multiplication Scalar conversion produces zero-free representation • Zero-free representation is generated in (almost) constant time • Conversion is one time for a scalar ⇒ attacker has one trace • The accumulator point is randomized as shown by Coron: (X; Y;Z) = (xr; yr 2 ; r), where r is random 18
Lightweight 283-bit Koblitz curve processor Area 4.3 KGE (without RAM) ~10 KGE (with RAM) RAM size 4032 bits Time 1,566,000 cycles 98 ms (16MHz) Energy 9.6 µJ Power 98 µW (1MHz) “Lightweight coprocessor for Koblitz curves: 283-bit ECC including scalar conversion 19 with only 4300 gates” by SS Roy, K Järvinen, I Verbauwhede in CHES2015
An implementation over prime field 20
Curve25519 E : y 2 = x 3 + 486662x 2 + x 128-bit security • Montgomery curve Efficient prime p = 2 255 − 19 • • Known for fast arithmetic 21
Curve25519 E : y 2 = x 3 + 486662x 2 + x 128-bit security • Montgomery curve Efficient prime p = 2 255 − 19 • • Known for fast arithmetic Montgomery ladder Combined PA-PD No need to store y-coordinate! 4S + 5M +M A + 8A 22
Curve25519 Efficient prime p = 2 255 − 19 • Modular reduction is easier C = AB = C 1 ∙2 255 + C 0 C mod p = (C 1 ∙19 + C 0 ) mod p • 15 × 17 = 255 • Special acceleration on HW by processing words of 17-bit • E.g. Xilinx FPGAs have 25×18 DSP multipliers 23
Throughput: 25,000 point multiplications per sec Area of point multiplier: 2,783 LUTs 3,592 FF 20 DSP MULTs Parallel processing for high throughput Modular multiplier for Curve25519 “ Efficient Elliptic-Curve Cryptography using Curve25519 on Reconfigurable 24 Devices” by Sasdrich and Güneysu in ARC 2014
lightweight architecture for Curve25519 • 32 bit word-serial architecture • Single port memory • 32-bit multiplier parameterized for digit width w = 2,4,8,12 and 16 ➢ Speed vs area • ASIP ⇒ programmable “NaCl’s crypto_box in hardware” by M. H utter, J. Schilling, P. Schwabe, and W. 25 Wieser in CHES 2015. Architecture diagram taken from CHES2015 presentation.
lightweight architecture for Curve25519 Results Note: Unified implementation of Curve25519, Salsa20 and Poly 1305 Smallest configuration: Area 14,648 GE, power 40µW (including optimized RAM) Key exchange takes 3,455,394 cycles Fastest configuration: Area 17,966 GE, power 70µW (including optimized RAM) Key exchange takes 811,170 cycles “NaCl’s crypto_box in hardware” by M. Hutter, J. Schilling, P. Schwabe, and W. 26 Wieser in CHES 2015. Architecture diagram taken from CHES2015 presentation.
Of general interest … [product scanning] • Classical product scanning example a 3 a 2 a 1 a 0 × b 3 b 2 b 1 b 0 a 0 b 0 … c 0 27
Of general interest … [product scanning] • Classical product scanning example a 3 a 2 a 1 a 0 × b 3 b 2 b 1 b 0 a 1 b 0 a 0 b 0 a 0 b 1 c 1 c 0 28
Of general interest … [product scanning] • Classical product scanning example a 3 a 2 a 1 a 0 × b 3 b 2 b 1 b 0 a 2 b 0 a 1 b 0 a 0 b 0 a 1 b 1 a 0 b 1 a 0 b 2 c 2 c 1 c 0 29
Of general interest … [product scanning] • Classical product scanning example a 3 a 2 a 1 a 0 × b 3 b 2 b 1 b 0 a 3 b 0 a 2 b 0 a 1 b 0 a 0 b 0 a 2 b 1 a 1 b 1 a 0 b 1 a 1 b 2 a 0 b 2 a 0 b 3 c 3 c 2 c 1 c 0 30
Of general interest … [product scanning] • Classical product scanning example a 3 a 2 a 1 a 0 × b 3 b 2 b 1 b 0 a 3 b 0 a 2 b 0 a 1 b 0 a 0 b 0 a 3 b 1 a 2 b 1 a 1 b 1 a 0 b 1 a 2 b 2 a 1 b 2 a 0 b 2 … a 1 b 1 a 0 b 3 … c 4 c 3 c 2 c 1 c 0 31
Of general interest … [product scanning] • Classical product scanning example a 3 a 2 a 1 a 0 a 0 b 0 × b 3 b 2 b 1 b 0 a 3 b 0 a 2 b 0 a 1 b 0 a 0 b 0 × a 3 b 1 a 2 b 1 a 1 b 1 a 0 b 1 a 2 b 2 a 1 b 2 a 0 b 2 + … a 1 b 1 a 0 b 3 … c 4 c 3 c 2 c 1 c 0 c 0 32
Of general interest … [product scanning] • Classical product scanning example a 3 a 2 a 1 a 0 a 1 b 0 × b 3 b 2 b 1 b 0 a 3 b 0 a 2 b 0 a 1 b 0 a 0 b 0 × a 3 b 1 a 2 b 1 a 1 b 1 a 0 b 1 a 2 b 2 a 1 b 2 a 0 b 2 + … a 1 b 1 a 0 b 3 … c 4 c 3 c 2 c 1 c 0 a 1 b 0 33
Of general interest … [product scanning] • Classical product scanning example a 3 a 2 a 1 a 0 a 0 b 1 × b 3 b 2 b 1 b 0 a 3 b 0 a 2 b 0 a 1 b 0 a 0 b 0 × a 3 b 1 a 2 b 1 a 1 b 1 a 0 b 1 a 2 b 2 a 1 b 2 a 0 b 2 + … a 1 b 1 a 0 b 3 … c 4 c 3 c 2 c 1 c 0 c 0 34
Recommend
More recommend