hardware software co design for security ecc processor
play

Hardware-Software Co-Design for Security: ECC Processor Example - PowerPoint PPT Presentation

Hardware-Software Co-Design for Security: ECC Processor Example Arnaud Tisserand CNRS, Lab-STICC SILM Workshop, Nov. 2019 -- Introduction Public-key (or asymmetric) cryptography (PKC): RSA (hyper-)elliptic curve cryptography ((H)ECC)


  1. Hardware-Software Co-Design for Security: ECC Processor Example Arnaud Tisserand CNRS, Lab-STICC SILM Workshop, Nov. 2019 --

  2. Introduction Public-key (or asymmetric) cryptography (PKC): • RSA • (hyper-)elliptic curve cryptography ((H)ECC) • post-quantum crypto (PQC) Design, prototype and evaluate hardware/software (HW/SW) for PKC: • HW: computation units, accelerators, crypto-processors • SW: libraries, generators for HW, dedicated compiler for our processors Objectives: • high speed, reduced silicon area and energy consumption • protections against side-channel and fault-injection attacks (SCA/FIA) • HW: FPGA and ASIC implementations • SW: embedded processors implementations Arnaud Tisserand. CNRS – Lab-STICC 2/20

  3. Elliptic Curve Cryptography (ECC) Elliptic curve over GF( p ): y 2 = x 3 + ax + b E : Curve points representation: • P = ( x , y ) affine coordinates many field inversions • P = ( x , y , z , . . . ) redundant coordinates significantly faster (e.g., Jacobian) y 2 = x 3 + 4 x + 20 over GF (1009) Scalar multiplication: Q = [ k ] P = P + P + · · · + P The most time consuming � �� � operation in protocols k times where P ∈ E and k = ( k n − 1 k n − 2 . . . k 1 k 0 ) 2 k has 200–600 bits Good and complete presentation in [14] and [10] Arnaud Tisserand. CNRS – Lab-STICC 3/20

  4. Scalar Multiplication • P ∈ E Q = [ k ] P = P + P + · · · + P � �� � • k = ( k n − 1 k n − 2 . . . k 1 k 0 ) 2 k times Double-and-add scalar multiplication algorithm: 1: Q ← O 2: for i from n − 1 to 0 do 3: Q ← [2] Q ( DBL ) 4: if k i = 1 then Q ← Q + P ( ADD ) 5: return Q • scans each bit of k and performs corresponding curve-level operation • average cost: 0 . 5 n ADD + n DBL (security ≈ 0 . 5 n ones in k ) v4 PY PY v18 mul mul v25 v4 sub RY OUT v6 v17 v24 add QZ mul v23 v5 v22 add v24 v25 RZ v3 sub mul sqr v23 add OUT QY QY mul v16 v6 v11 v5 PY mul v6 sub PY PZ RX QZ PZ v6 mul OUT v14 v13 v7 PY PZ v9 v10 mul sqr v6 PZ mul v7 v8 v10 add v11 add v10 v8 add v9 add v12 QZ v7 mul mul PZ v8 v7 v12 sub sub PX QZ v13 PZ v10 v11 add v12 add PX v17 v5 v12 v11 mul RZ OUT sqr v17 add v18 v19 v20 PX v1 mul add PX mul v18 sub RY OUT PZ v10 v16 v1 v9 PX mul sqr v0 v15 sqr a v4 sub mul v2 v9 a add v19 add QX v0 mul v3 QX mul sub v10 v1 add v2 sqr v14 v2 v4 v1 v1 add mul RX OUT v2 Arnaud Tisserand. CNRS – Lab-STICC 4/20

  5. Side Channel Attacks protocol level key exchange signature etc [ k ] P curve level Scalar multiplication operation ADD ( P , Q ) DBL ( P ) for i from 0 to t − 1 do if k i = 1 then Q = ADD ( P , Q ) P = DBL ( P ) field level . . . x ± y x × y Arnaud Tisserand. CNRS – Lab-STICC 5/20

  6. Side Channel Attacks protocol level key exchange signature etc [ k ] P curve level Scalar multiplication operation ADD ( P , Q ) DBL ( P ) for i from 0 to t − 1 do if k i = 1 then Q = ADD ( P , Q ) P = DBL ( P ) field level . . . x ± y x × y Arnaud Tisserand. CNRS – Lab-STICC 5/20

  7. Side Channel Attacks protocol level DBL DBL DBL DBL DBL DBL key exchange signature etc [ k ] P curve level Scalar multiplication operation ADD ( P , Q ) DBL ( P ) for i from 0 to t − 1 do if k i = 1 then Q = ADD ( P , Q ) P = DBL ( P ) field level . . . x ± y x × y Arnaud Tisserand. CNRS – Lab-STICC 5/20

  8. Side Channel Attacks protocol level DBL DBL DBL ADD DBL ADD DBL DBL key exchange signature etc [ k ] P curve level Scalar multiplication operation ADD ( P , Q ) DBL ( P ) for i from 0 to t − 1 do if k i = 1 then Q = ADD ( P , Q ) P = DBL ( P ) field level . . . x ± y x × y Arnaud Tisserand. CNRS – Lab-STICC 5/20

  9. Side Channel Attacks protocol level DBL DBL DBL ADD DBL ADD DBL DBL key exchange signature etc 0 0 0 1 1 0 [ k ] P curve level Scalar multiplication operation ADD ( P , Q ) DBL ( P ) for i from 0 to t − 1 do if k i = 1 then Q = ADD ( P , Q ) P = DBL ( P ) • simple power analysis (& variants) field level . . . x ± y x × y Arnaud Tisserand. CNRS – Lab-STICC 5/20

  10. Side Channel Attacks protocol level DBL DBL DBL ADD DBL ADD DBL DBL key exchange signature etc 0 0 0 1 1 0 [ k ] P curve level Scalar multiplication operation ADD ( P , Q ) DBL ( P ) for i from 0 to t − 1 do if k i = 1 then Q = ADD ( P , Q ) P = DBL ( P ) • simple power analysis (& variants) field level . . . x ± y x × y • differential power analysis (& variants) • horizontal/vertical/templates/. . . attacks Arnaud Tisserand. CNRS – Lab-STICC 5/20

  11. Software vs Hardware Support I SW instructions management + control @ hierarchy memory @ D reg. FU 1 FU 2 FU 3 LSU file large large EXCELLENT slow moderate FLEXIBILITY SPEED AREA ENERGY DEVEL. COST limited fast small small HUGE CTRL reg. reg. reg. reg. op. op. op. op. HW memory

  12. Software vs Hardware Support I SW instructions management + control @ hierarchy memory @ SECURITY? D reg. FU 1 FU 2 FU 3 LSU file large large EXCELLENT slow moderate FLEXIBILITY SPEED AREA ENERGY DEVEL. COST limited fast small small HUGE CTRL reg. reg. reg. reg. op. op. op. op. HW memory Arnaud Tisserand. CNRS – Lab-STICC 6/20

  13. Activity in a Processor Operation to be executed: r ← x + a[i] data/op. x a[i] time + r • AS: ALU status • PIS: fetch, decode, pipeline management, bypasses, memory hierarchy, branch predictor, monitoring, etc. Arnaud Tisserand. CNRS – Lab-STICC 7/20

  14. Activity in a Processor Operation to be executed: r ← x + a[i] signals data/op. x a[i] time + r • AS: ALU status • PIS: fetch, decode, pipeline management, bypasses, memory hierarchy, branch predictor, monitoring, etc. Arnaud Tisserand. CNRS – Lab-STICC 7/20

  15. Activity in a Processor Operation to be executed: r ← x + a[i] signals data/op. instructions x LD R1,R2 a[i] time + ADD R3,R1,R4 r • AS: ALU status • PIS: fetch, decode, pipeline management, bypasses, memory hierarchy, branch predictor, monitoring, etc. Arnaud Tisserand. CNRS – Lab-STICC 7/20

  16. Activity in a Processor Operation to be executed: r ← x + a[i] signals data/op. instructions x LD R1,R2 a[i] time + ADD R3,R1,R4 AS r • AS: ALU status • PIS: fetch, decode, pipeline management, bypasses, memory hierarchy, branch predictor, monitoring, etc. Arnaud Tisserand. CNRS – Lab-STICC 7/20

  17. Activity in a Processor Operation to be executed: r ← x + a[i] signals data/op. state instructions x LD R1,R2 processor internal state (PIS) a[i] time processor internal state (PIS) + ADD R3,R1,R4 AS r processor internal state (PIS) • AS: ALU status • PIS: fetch, decode, pipeline management, bypasses, memory hierarchy, branch predictor, monitoring, etc. Arnaud Tisserand. CNRS – Lab-STICC 7/20

  18. Our Processor Specifications protocol level key exchange signature etc [ k ] P curve level P + P ADD ( P , Q ) DBL ( P ) field level . . . x ± y x × y Arnaud Tisserand. CNRS – Lab-STICC 8/20

  19. Our Processor Specifications • Performances = ⇒ hardware ( HW ) protocol level key exchange ◮ dedicated functional units signature ◮ internal parallelism etc • Limited cost (embedded systems) ◮ reduced silicon area ◮ low energy (& power consumption) [ k ] P ◮ large area used at each clock cycle curve level P + P ADD ( P , Q ) DBL ( P ) HW field level . . . x ± y x × y Arnaud Tisserand. CNRS – Lab-STICC 8/20

  20. Our Processor Specifications • Performances = ⇒ hardware ( HW ) protocol level key exchange ◮ dedicated functional units signature ◮ internal parallelism etc • Limited cost (embedded systems) ◮ reduced silicon area ◮ low energy (& power consumption) [ k ] P ◮ large area used at each clock cycle curve level • Flexibility = ⇒ software ( SW ) ◮ curves, algorithms, representations P + P (points/elements), k recoding, . . . ADD ( P , Q ) DBL ( P ) ◮ at design time / at run time SW HW field level . . . x ± y x × y Arnaud Tisserand. CNRS – Lab-STICC 8/20

  21. Our Processor Specifications • Performances = ⇒ hardware ( HW ) protocol level key exchange ◮ dedicated functional units signature ◮ internal parallelism etc • Limited cost (embedded systems) ◮ reduced silicon area ◮ low energy (& power consumption) [ k ] P HW ◮ large area used at each clock cycle curve level • Flexibility = ⇒ software ( SW ) ◮ curves, algorithms, representations P + P (points/elements), k recoding, . . . ADD ( P , Q ) DBL ( P ) ◮ at design time / at run time SW • Security against SCAs = ⇒ HW HW field level ◮ secure units ( F 2 m , F p ) . . . x ± y x × y ◮ secure key storage/management ◮ secure control Arnaud Tisserand. CNRS – Lab-STICC 8/20

  22. Processor Architecture processor Arnaud Tisserand. CNRS – Lab-STICC 9/20

  23. Processor Architecture processor FU 1 FU 2 FU 3 Arnaud Tisserand. CNRS – Lab-STICC 9/20

  24. Processor Architecture processor register file FU 1 FU 2 FU 3 Arnaud Tisserand. CNRS – Lab-STICC 9/20

  25. Processor Architecture processor key mng. register file FU 1 FU 2 FU 3 Arnaud Tisserand. CNRS – Lab-STICC 9/20

  26. Processor Architecture processor key mng. register CTRL file FU 1 FU 2 FU 3 Arnaud Tisserand. CNRS – Lab-STICC 9/20

Recommend


More recommend