Hardware Security Group at Lab-STICC 8 faculties and ≈ 12 PhD students / postdocs / ATER / engineers • Hardware security for embedded systems: ◮ memory and communication protection Arithmetic Tradeoffs on Performance/Cost/Security ◮ secure OS with HW blocks, DIFT for Hardware Asymmetric Cryptography ◮ multicore / manycore security • Crypto implementations in hardware & embedded software: ◮ asymmetric (RSA, (H)ECC, PQC) Arnaud Tisserand ◮ arithmetic aspects (operators, libraries) ◮ homomorphic encryption CNRS, Lab-STICC laboratory • Secure hardware implementation: CEA Seminar, July 2017 ◮ side channel and fault injection attacks and protections ◮ targets: FPGA and ASIC (reconfigurable, CGRA, ASIP) ◮ high-level synthesis (HLS) for security Lab-STICC: Laboratoire des Sciences et Techniques de l’Information, de la Communication et de la Connaissance MOCS: M´ ethodes, Outils, Circuits et Syst` emes Arnaud Tisserand. CNRS – Lab-STICC. Arithmetic Tradeoffs for Hardware Asymmetric Cryptography 2/53 Skills (1/2) Skills (2/2) • Hardware accelerators for crypto. applications: • Optimized hardware arithmetic operators: ◮ low-power operators ( x ± y , x · y , 1 / x , √ x , 1 / � ◮ public-key crypto.: RSA, (H)ECC x 2 + y 2 , � n i =0 x i y i , . . . ) ◮ private-key: AES, (3)DES ◮ multiplication by constants (scalar, vector, matrix) ◮ hash functions: SHAx (multi-mode) ◮ advanced computation algorithms ◮ advanced representations of numbers • Crypto-processor for (Hyper)-Elliptic Curve Cryptography: ◮ function approximation (sin( x ) , cos( x ) , exp( x ) , log( x ) , tan( x ) , . . . ) ◮ arithmetic operators over F p and F 2 m , typically 100–600 bits ◮ modular and finite fields arithmetic F p and F 2 m ◮ optimized architectures, algorithms and number representations ◮ fault tolerance (or detecting) operators • Software libraries for arithmetic and cryptography: ◮ FPGAs and ASICs targets ◮ ECC library for GPUs and embedded processors ◮ RNS library for homomorphic encryption in multicores • Tools for hardware arithmetic circuits: • Study and implementation of protections against physical attacks: ◮ operators generators ◮ Passive: power consumption, electromagnetic radiations, timings ◮ arithmetic circuits with bounded errors ◮ Active: fault injection ( in progress ) • Software arithmetic/computation libraries: • Levels: arithmetic algorithms, numbers/objects representations, ◮ (public-key) cryptography operators, architectures, circuit optimizations ◮ floating-point emulation on integer processor • Trade-offs between: performance, cost (area/energy), security ◮ multiprecision computations (up to millions of bits) ◮ embedded processors, multi-cores and GPUs targets • True random number generators (TRNGs) Arnaud Tisserand. CNRS – Lab-STICC. Arithmetic Tradeoffs for Hardware Asymmetric Cryptography 3/53 Arnaud Tisserand. CNRS – Lab-STICC. Arithmetic Tradeoffs for Hardware Asymmetric Cryptography 4/53
(Hyper-)Elliptic Curve Cryptography (H)ECC Curve Level and Field Level Operations point • Finite field F p : F p elements are very large: addition doubling tripling quintupling septupling . . . integer arithmetic modulo large prime p 100–600 bits! . . . ADD DBL TPL QPL SPL • Elliptic curve over F p : P + Q [2] P [3] P [5] P [7] P . . . y 2 = x 3 + ax + b E : if = = P � = ± Q P + · · · + P P + · · · + P • Points on the curve E : P + P P + P + P . . . P = ( x 1 , y 1 ) , Q = ( x 2 , y 2 ) , R = ( x 3 , y 3 ) sequence of ≈ 10–20 F q operations Operation at curve level • Set of points on E : F q operations: add/sub, multiplication M , square S , inversion I ◮ finite (large # about p ) ◮ “forms” an abelian group ◮ group law addition on points y 2 = x 3 + 4 x + 20 over F 1009 one scalar multiplication [ k ] P hundreds of curve op. • Two operations: ADD , DBL , . . . DBL ◮ Point addition: P + Q → R denoted ADD M , S , I in F p M . . . M S . . . thousands of field op. ◮ Point doubling: P + P = [2] P → R denoted DBL clock cycles Arnaud Tisserand. CNRS – Lab-STICC. Arithmetic Tradeoffs for Hardware Asymmetric Cryptography 5/53 Arnaud Tisserand. CNRS – Lab-STICC. Arithmetic Tradeoffs for Hardware Asymmetric Cryptography 6/53 Costs of Curve Level Operations ECC Scalar Multiplication Best computation costs from literature and curves over F p • main operation in ECC protocols curve-level operations a • P ∈ E Q = [ k ] P = P + P + · · · + P − 3 refs. ADD mADD DBL TPL QPL SPL � �� � • k = ( k n − 1 k n − 2 . . . k 1 k 0 ) 2 k times EFD 11 M + 5 S 7 M + 4 S 1 M + 8 S 5 M + 10 S n. a. n. a. • n = 160–600 bits � = [18] n. a. n. a. 1 M + 8 S 5 M + 10 S 7 M + 16 S 15 M + 24 S [22] 11 M + 5 S 7 M + 4 S 2 M + 8 S 6 M + 11 S 9 M + 15 S 13 M + 18 S Double-and-add scalar multiplication algorithm: EFD 11 M + 5 S 7 M + 4 S 3 M + 5 S 7 M + 7 S n. a. n. a. 1: Q ← O = [24] 11 M + 5 S 7 M + 4 S 3 M + 5 S 7 M + 7 S 11 M + 11 S 18 M + 11 S 2: for i from n − 1 to 0 do [23][22] 11 M + 5 S 7 M + 4 S 3 M + 5 S 7 M + 8 S 10 M + 12 S 14 M + 15 S 3: Q ← [2] Q ( DBL ) refs. λ DBL λ TPL if k i = 1 then Q ← Q + P 4: ( ADD ) � = 4 λ M + (4 λ + 2) S (11 λ − 1) M + (4 λ + 2) S [14][15][20] 5: return Q refs. λ TPL / λ ′ DBL (11 λ + 4 λ ′ − 1) M + (4 λ + 4 λ ′ + 3) S • Scans each bit of k and performs corresponding curve-level operation � = [14][15] • Average cost: 0 . 5 n ADD + n DBL (security ≈ 0 . 5 n ones in k ) EFD: Explicit-Formulas Database http://hyperelliptic.org/EFD • Security : Elliptic curve discrete logarithm problem (ECDLP) given P and Q = [ k ] P , it is computationally unfeasible to obtain k mADD : A + J − → J Arnaud Tisserand. CNRS – Lab-STICC. Arithmetic Tradeoffs for Hardware Asymmetric Cryptography 7/53 Arnaud Tisserand. CNRS – Lab-STICC. Arithmetic Tradeoffs for Hardware Asymmetric Cryptography 8/53
Basic Power Analysis Attack on ECC Accelerator Specifications V DD • Performances = ⇒ hardware ( HW ) protocol level protocol level encryption encryption ◮ dedicated functional units I circuit signature signature ◮ internal parallelism traces etc etc GND • Limited cost (embedded systems) DBL DBL DBL ADD DBL ADD DBL DBL ◮ reduced silicon area ◮ low energy (& power consumption) [ k ] P [ k ] P HW ◮ large area used at each clock cycle curve level curve level • Flexibility = ⇒ software ( SW ) 0 0 0 1 1 0 P + P ◮ curves, algorithms, representations ADD ( P , Q ) DBL ( P ) ADD ( P , Q ) DBL ( P ) (points/elements), k recoding, . . . ◮ at design time / at run time SW • Security against SCAs = ⇒ HW Scalar multiplication operation HW field level field level for i from 0 to t − 1 do ◮ secure units ( F 2 m , F p ) . . . x ± y x × y . . . x ± y x × y if k i = 1 then Q = ADD ( P , Q ) ◮ secure key storage/management P = DBL ( P ) ◮ secure control Arnaud Tisserand. CNRS – Lab-STICC. Arithmetic Tradeoffs for Hardware Asymmetric Cryptography 9/53 Arnaud Tisserand. CNRS – Lab-STICC. Arithmetic Tradeoffs for Hardware Asymmetric Cryptography 10/53 Protected F 2 m Multipliers Accelerator Architecture Mastrovito 233 250 accelerator #transitions 200 150 Unprotected 100 50 0 0 100 200 300 400 500 200 225 250 key mng. cycles cycles code register external interface CTRL mem. file Protected Overhead: interconnect Area/time < 10 % References: PhD D. Pamula [29] Articles: [32], [31], FU 1 FU 2 FU 3 [30] Data : w -bit (32 , . . . , 128) except for k digits, control : a few bits per unit Arnaud Tisserand. CNRS – Lab-STICC. Arithmetic Tradeoffs for Hardware Asymmetric Cryptography 11/53 Arnaud Tisserand. CNRS – Lab-STICC. Arithmetic Tradeoffs for Hardware Asymmetric Cryptography 12/53
Protected (Old) Accelerator for F 2 m Circuit-Level Protections for Arithmetic Operators Activity trace #transit. 300 Protected 200 Mastrovito 100 ADD operation 0 current [mA] 0.16 Current measures 0.12 Protected 0.08 Mastrovito 0.04 DBL operation 0.00 Activity trace #transit. 300 Protected 200 Mastrovito 100 DBL operation 0 current [mA] Current measures 0.08 0.06 Unprotected 0.04 Mastrovito 0.02 DBL operation 0.00 Activity trace #transit. 300 Unprotected 200 Mastrovito 100 DBL operation 0 0 50 100 150 200 250 300 350 cycles References: [12] and [13] Warning: old dedicated accelerator (similar behavior is expected for our new one) Arnaud Tisserand. CNRS – Lab-STICC. Arithmetic Tradeoffs for Hardware Asymmetric Cryptography 13/53 Arnaud Tisserand. CNRS – Lab-STICC. Arithmetic Tradeoffs for Hardware Asymmetric Cryptography 14/53 Comparison Architecture ECC 256 vs HECC 128 (1/2) Comparison Architecture ECC 256 vs HECC 128 (2/2) Implementations on Spartan 6 FPGAs without DSP slices ECC HECC 5 time [ms] 4 speedup 3 30 1,1 2 ECC 1 25 0 HECC 3 20 × area 1,2 1,1 2,1 2 15 5,1 3,1 4,1 1,2 1,4 2,2 area [slices] 5,2 1 10 3,2 2,1 4,2 2,4 100 2,2 3,4 5,4 4,4 % usage 4,1 80 5 5,1 6,1 8,1 11,1 12,1 3,1 9,1 7,1 12,2 10,2 11,2 3,2 60 4,2 10,1 5,2 8,2 6,2 7,2 9,2 40 20 600 800 1000 1200 1400 1600 1800 2000 2200 0 1,1 1,2 1,4 2,4 3,4 4,4 1,1 1,2 2,1 3,1 3,2 5,2 8,2 On average HECC is 40 % faster than ECC for a similar silicon cost Arnaud Tisserand. CNRS – Lab-STICC. Arithmetic Tradeoffs for Hardware Asymmetric Cryptography 15/53 Arnaud Tisserand. CNRS – Lab-STICC. Arithmetic Tradeoffs for Hardware Asymmetric Cryptography 16/53
Recommend
More recommend