FAST ENDOMORPHISMS IN HARDWARE Kimmo Jrvinen 1 , 2 1 University of - PowerPoint PPT Presentation

FAST ENDOMORPHISMS IN HARDWARE Kimmo Järvinen 1 , 2 1 University of Helsinki, Computer Science, Helsinki, Finland kimmo.u.jarvinen@helsinki.fi 2 Xiphera Ltd., Espoo, Finland kimmo.jarvinen@xiphera.com The 21st Workshop on Elliptic Curve Cryptography Nijmegen, the Netherlands, Nov. 13–15, 2017 ECC’17 November 15, 2017 1/36

INTRODUCTION ◮ This talk surveys my work on hardware implementations of ECC with fast endomorphisms ◮ Particularly: Koblitz curves, Four Q , and GLV/GLS curves ◮ In software, fast endomorphisms reduce the number of operations and lead to significant speedups ◮ In hardware, simplicity is often the key to efficiency and the feasibility of fast endomorphisms is less clear ECC’17 November 15, 2017 2/36

PRELIMINARIES ECC’17 November 15, 2017 3/36

SCALAR MULTIPLICATION ◮ Let E be an elliptic curve defined over a finite field F q ◮ Points on E (together with O ) form an additive Abelian group ◮ Let k be an integer and P be a point on E ; then, scalar multiplication is the following operation: [ k ] P = P + P + . . . + P � �� k times ◮ Scalar multiplication is the central operation of ECC mostly determining the efficiency of the cryptosystem ECC’17 November 15, 2017 4/36

ECC HIERARCHY SCALAR MULTIPLICATION POINT POINT ADDITION DOUBLING FIELD FIELD FIELD ADD/SUB MULT INV ECC’17 November 15, 2017 5/36

ANATOMY OF ECC HW Mult logic Add ALU logic Other logic ECC’17 November 15, 2017 6/36

ANATOMY OF ECC HW Mult FAU logic ctrl Add ALU FAU logic Local regs Other logic ECC’17 November 15, 2017 6/36

ANATOMY OF ECC HW Key storage Mult FAU ECC ctrl logic ECC Co-Processor ctrl Host Processor Add ALU FAU logic Main Local memory regs Other logic ECC’17 November 15, 2017 6/36

FAST ENDOMORPHISMS ◮ GLV/GLS curves have an efficiently computable endomorphism φ ( P ) such that φ ( P ) = [ λ ] P Then, scalar multiplication can be computed as: [ k ] P = [ k 0 ] P + [ k 1 ] φ ( P ) where k 0 + k 1 λ = k If k 0 , k 1 are of the same size, Shamir’s trick for double scalar multplication saves about half of the point doublings ◮ Koblitz curves are curves over F 2 m for which φ ( x , y ) = ( x 2 , y 2 ) is an endomorphism ECC’17 November 15, 2017 7/36

OVERVIEW OF CHALLENGES ◮ Fast endomorphisms require recoding of the scalars (e.g., find k 0 , k 1 ) ⇒ Logic must be added (either a separate converter or FAU instruction set extension) ◮ The size of the overhead depends on the curve and implementation architecture ◮ For binary curves, FAU supports arithmetic over F 2 m but conversions require operations over Z ◮ For prime curves, FAU supports arithmetic over Z but FAU is typically highly optimized for mod p arithmetic ECC’17 November 15, 2017 8/36

SOFTWARE VS. HARDWARE Software +++ Faster scalar multiplications - Slightly larger program memory and data memory requirements ⇒ Advantages bigger than disadvantages (almost always) ECC’17 November 15, 2017 9/36

SOFTWARE VS. HARDWARE Software +++ Faster scalar multiplications - Slightly larger program memory and data memory requirements ⇒ Advantages bigger than disadvantages (almost always) Hardware ++(+) Faster scalar multiplications (almost surely) - - More complex control logic - ( - ) New instructions needed in FAU - ( - - ) More memory/registers needed ⇒ ??? ECC’17 November 15, 2017 9/36

PIPELINING time t 1 Scalar recoding Precomputation Main for-loop Main for-loop Inversion · · · ECC’17 November 15, 2017 10/36

PIPELINING time t 1 Scalar recoding Precomputation Main for-loop Main for-loop Inversion · · · ≥ t 1 Scalar recoding Precomputation Main for-loop Main for-loop · · · Inversion ECC’17 November 15, 2017 10/36

PIPELINING time t 1 Scalar recoding Precomputation Main for-loop Main for-loop Inversion · · · ≥ t 1 Scalar recoding Precomputation Main for-loop Main for-loop · · · Inversion Precomputation ≥ t 2 s.t. t 2 < t 1 Main for-loop Main for-loop Inversion · · · Scalar recoding ECC’17 November 15, 2017 10/36

PARALLELISM ◮ Stages should be balanced because throughput is determined by the slowest stage ◮ For-loop is by far the slowest stage ◮ Solutions: (a) Make for-loop faster by using more area (or make other parts slower and save area) (b) Use parallel for-loop units ECC’17 November 15, 2017 11/36

KOBLITZ CURVES (Joint work with J. Adikari, B.B. Brumley, V. Dimitrov, S. Sinha Roy, J. Skyttä, and I. Verbauwhede) ECC’17 November 15, 2017 12/36

KOBLITZ CURVES ◮ Binary curves introduced by N. Koblitz already in 1991 and included in many standards (e.g., NIST) ECC’17 November 15, 2017 13/36

KOBLITZ CURVES ◮ Binary curves introduced by N. Koblitz already in 1991 and included in many standards (e.g., NIST) ◮ Cheap Frobenius maps φ : ( x , y ) �→ ( x 2 , y 2 ) can be used instead of point doublings ECC’17 November 15, 2017 13/36

KOBLITZ CURVES ◮ Binary curves introduced by N. Koblitz already in 1991 and included in many standards (e.g., NIST) ◮ Cheap Frobenius maps φ : ( x , y ) �→ ( x 2 , y 2 ) can be used instead of point doublings ◮ . . . but first the integer k needs to be given as a τ -adic √ i = 0 k i τ i where τ = ( µ + expansion k = � ℓ − 1 − 7 ) / 2 ∈ C · · · add dbl dbl add dbl add dbl dbl add dbl add · · · conversion add add add add F 2 m Z ECC’17 November 15, 2017 13/36

SCALAR CONVERSIONS ◮ Many cryptosystems (e.g., signature schemes) require k also as an integer (a) Select a random integer and find its τ -adic expansion (b) Select a random τ -adic expansion and find its integer equivalent ECC’17 November 15, 2017 14/36

SCALAR CONVERSIONS ◮ Many cryptosystems (e.g., signature schemes) require k also as an integer (a) Select a random integer and find its τ -adic expansion (b) Select a random τ -adic expansion and find its integer equivalent ◮ Option (a) ◮ Base- τ expansions can be found analogously to finding binary expansions except with divisions by τ instead of 2 ◮ Straightforward τ -adic expansion of k is twice as long as k ◮ Meier and Staffelbach: Because P = φ m ( P ) , then α P = β P if α ≡ β ( mod τ m − 1 ) ◮ Solinas: Reduction modulo ( τ m − 1 ) / ( τ − 1 ) gives an expansion of length m + a where a ∈ { 0 , 1 } ECC’17 November 15, 2017 14/36

SCALAR CONVERSIONS ◮ Both require complex operations (e.g., divisions, large multiplications) ◮ High-speed implementations: Avoid conversions from becoming the bottleneck ⇒ HW acceleration ◮ Lightweight implementations: Conversions done over Z ⇒ How to combine efficiently with F 2 m ? ◮ Lazy reduction (repeated divisions by τ ) and its many variations (pipelined, word-wise, . . . ) are commonly used and lead to fast conversions but with an expense in area ECC’17 November 15, 2017 15/36

HIGH-SPEED IMPLEMENTATION ◮ The key to high speed is to accelerate the main for-loop; other parts can be separated to different pipeline stages ◮ For-loop consists of point additions and Frobenius maps ◮ Point additions are dominated by field multiplications (in F 2 m ) ◮ Point addition with Lopez-Dahab formulas (SAC’98) ◮ Frobenius maps φ ( Q ) = ( X 2 , Y 2 , Z 2 ) are cheap and can be computed independently for all coordinates ECC’17 November 15, 2017 16/36

HIGH-SPEED IMPLEMENTATION X 1 X 2 Z 1 Z 2 Point addition: Y 1 Q ← Q + P = ( X , Y , Z ) + ( x , y ) Frobenius: Y 2 Y 4 Q ← φ ( Q ) = ( X 2 , Y 2 , Z 2 ) Y 3 ECC’17 November 15, 2017 17/36

HIGH-SPEED IMPLEMENTATION X 1 X 2 Z 1 Z 2 Point addition: Y 1 Y 3 Q ← Q + P = ( X , Y , Z ) + ( x , y ) Frobenius: Y 2 Y 4 Q ← φ ( Q ) = ( X 2 , Y 2 , Z 2 ) ECC’17 November 15, 2017 17/36

HIGH-SPEED IMPLEMENTATION X 1 X 2 X 1 X 2 Z 1 Z 2 Z 1 Z 2 Point addition: Y 1 Y 3 Y 1 Y 3 Q ← Q + P = ( X , Y , Z ) + ( x , y ) Frobenius: Y 2 Y 4 Y 2 Y 4 Q ← φ ( Q ) = ( X 2 , Y 2 , Z 2 ) ECC’17 November 15, 2017 17/36

HIGH-SPEED IMPLEMENTATION X 1 X 2 X 1 X 2 X 1 X 2 Z 1 Z 2 Z 1 Z 2 Z 1 Z 2 Point addition: Y 1 Y 3 Y 1 Y 3 Y 1 Y 3 Q ← Q + P = ( X , Y , Z ) + ( x , y ) Frobenius: Y 2 Y 4 Y 2 Y 4 Y 2 Y 4 Q ← φ ( Q ) = ( X 2 , Y 2 , Z 2 ) ECC’17 November 15, 2017 17/36

HIGH-SPEED RESULTS ◮ The above technique computes the for-loop in less than 5 µ s on K-163 or 12 µ s on K-283 in a Stratix II FPGA (old) ◮ One core performs over 200,000 op/s with delay of 11.7 µ s ◮ Multiple cores fit in an FPGA and one device can reach throughputs of several millions ◮ Delay is not spectacular compared to modern SW but throughput is ECC’17 November 15, 2017 18/36

COMPACT IMPLEMENTATION ◮ Koblitz curve K-283 ◮ 16-bit ALU for binary polynomial arithmetic extended with a 16-bit integer adder/subtractor ECC’17 November 15, 2017 19/36

FAST ENDOMORPHISMS IN HARDWARE Kimmo Jrvinen 1 , 2 1 University of - PowerPoint PPT Presentation

FAST ENDOMORPHISMS IN HARDWARE Kimmo Jrvinen 1 , 2 1 University of Helsinki, Computer Science, Helsinki, Finland kimmo.u.jarvinen@helsinki.fi 2 Xiphera Ltd., Espoo, Finland kimmo.jarvinen@xiphera.com The 21st Workshop on Elliptic Curve

Growth and entropy for group endomorphisms Anna Giordano Bruno (joint work with Pablo Spiga) GTG

Hardware Observability Framework Hardware Observability Framework Hardware Observability

Endomorphisms - old and not so old Joachim Cuntz Copenhagen 2019 A unique, even bizarre,

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

VC. VC. Hardware Startup The Hardware Revolu/on The Hardware Revolution Removing Barriers to

Sec Secure ure Hardware Hardware and Hardware and Hardware- En Enabled abled Security

Introduction to the dynamics of holomorphic endomorphisms of P k Dimitra Tsigkari Postgraduate

Statistical properties for holomorphic endomorphisms of morphisms F. Bianchi, projective spaces

The algebra of the parallel endomorphisms of a germ of pseudo-Riemannian metric Charles Boubel

Non-density of stability for holomorphic endomorphisms of CP k Romain Dujardin Universit e

ARITHMETIC, SET THEORY, AND THEIR MODELS PART TWO: ENDOMORPHISMS Ali Enayat YOUNG SET THEORY

Faster Compact DiffieHellman: Endomorphisms on the x -line Craig Costello H useyin H sl

Computing central endomorphisms of an abelian variety via reductions modulo p Edgar Costa (MIT)

Families of curves with nontrivial endomorphisms in their Jacobians Jerome William Hoffman

Fixed points of post-critically algebraic endomorphisms Van Tu LE Institute de Math ematiques

software and hardware for the Internet of Things. Choose hardware Design hardware Design

Implementing GLS Recall the assumptions of Approach 9: E( Y | x ) = f ( x , ) , var( Y | x ) =

Management of the Unknowable Dr. Alva L. Couch Tufts University Medford, Massachusetts, USA

Global Constraints Combinatorial Problem Solving (CPS) Enric Rodr guez-Carbonell (based on

Structured Regression for Efficient Object Detection Christoph Lampert www.christoph-lampert.org

Action of finite groups on (generalized) cluster categories Laurent Demonet Max Planck Institut

Speckle Tracking Imagerie de Dformation Erwan DONAL Cardiologie CHU Rennes

The Elasticity of Formal Work in African Countries Andy McKay (University of Sussex) Jukka

SNS Core Vessel Water Leak Saga Presented at the 7 th High Power Targetry Workshop June 4-8,