THE STATE-OF-THE-ART OF HARDWARE IMPLEMENTATIONS OF ELLIPTIC CURVE - PowerPoint PPT Presentation

THE STATE-OF-THE-ART OF HARDWARE IMPLEMENTATIONS OF ELLIPTIC CURVE CRYPTOGRAPHY Kimmo Järvinen Department of Computer Science University of Helsinki kimmo.u.jarvinen@helsinki.fi ECRYPT-CSA Workshop on Hardware Benchmarking Bochum, Germany, June 7, 2017 K. Järvinen: The State-of-the-Art of ECC HW June 7, 2017 1/43

INTRODUCTION ◮ ECC has become very popular because of high performance and short key sizes ◮ Huge numbers of HW implementations of ECC are available in the literature (We focus mainly on FPGAs) ◮ We discuss (the difficulties of) benchmarking ECC HW implementations and survey their state-of-the-art K. Järvinen: The State-of-the-Art of ECC HW June 7, 2017 2/43

OUTLINE ◮ Background on ECC We present preliminaries of ECC ◮ ECC Implementations for Different Use Cases We discuss what kind of challenges different use cases bring for designing ECC implementations ◮ General Discussion on Benchmarking ECC HW We discuss benchmarking of ECC HW and the related difficulties ◮ Benchmarking ECC Implementations We survey specific state-of-the-art ECC implementations and benchmark them against each others K. Järvinen: The State-of-the-Art of ECC HW June 7, 2017 3/43

BACKGROUND ON ECC K. Järvinen: The State-of-the-Art of ECC HW June 7, 2017 4/43

ELLIPTIC CURVE CRYPTOGRAPHY ◮ Elliptic Curve Discrete Logarithm Problem Security is based on the difficulty of solving the ECDLP: Given two points P and Q = kP , find the integer k ◮ Elliptic Curve Diffie-Hellman Q A Q A = k A P Q B = k B P Q AB = k A Q B Q AB = k B Q A Q B K. Järvinen: The State-of-the-Art of ECC HW June 7, 2017 5/43

SCALAR MULTIPLICATION ◮ Efficient and secure computation of scalar multiplication essential for all elliptic curve cryptosystems ◮ Points on the curve form an additive Abelian group ◮ Scalar multiplication carried out with a series of (a) Point additions P 3 = P 1 + P 2 and (b) Point doublings P 3 = P 1 + P 1 = 2 P 1 ◮ Point operations computed with operations in F q . E.g., for y 2 = x 3 + ax + b , ( x 3 , y 3 ) = ( x 1 , y 1 ) + ( x 2 , y 2 ) with x 1 � = x 2 : where λ = y 2 − y 1 x 3 = λ 2 − x 1 − x 2 , y 3 = λ ( x 1 − x 3 ) − y 1 x 2 − x 1 ◮ Projective coordinates ( X , Y , Z ) to avoid inversions K. Järvinen: The State-of-the-Art of ECC HW June 7, 2017 6/43

ECC HIERARCHY SCALAR MULTIPLICATION POINT POINT ADDITION DOUBLE FIELD FIELD FIELD ADD/SUB MULT INV K. Järvinen: The State-of-the-Art of ECC HW June 7, 2017 7/43

FIELD ARITHMETIC Multiplication ◮ Field Multiplication Critical operation that typically requires the most attention. One computes c = a × b in F p by computing (1) c ′ = a × b over Z and (2) c = c ′ mod p ◮ Prime vs. Binary Fields (a) Binary fields do not have carry propagation and lead to very efficient multipliers in HW (b) Prime fields typically benefit less from HW; however, hardwired multipliers in modern FPGAs can be used K. Järvinen: The State-of-the-Art of ECC HW June 7, 2017 8/43

FIELD ARITHMETIC Multiplication ◮ Integer Multiplication Large multiplications (e.g., 256 × 256-bit) typically require multiprecision algorithms even in HW (a) Operand-scanning vs. product-scanning vs. hybrid-scanning (b) Karatsuba algorithms (c) Squaring saves some partial multiplications because a i b j = a j b i if a = b K. Järvinen: The State-of-the-Art of ECC HW June 7, 2017 9/43

FIELD ARITHMETIC Multiplication ◮ Modular Reduction The type of prime greatly affects the implementation strategy and efficiency (a) Mersenne primes 2 k − 1 would be the best because reduction H but they are rare: 2 127 − 1, 2 521 − 1 is an addition c ′ L + c ′ (b) Generalized Mersenne primes used for the NIST curves; e.g., 2 256 − 2 224 + 2 192 + 2 96 − 1 that leads to additions/subtractions with full words (c) Pseudo Mersenne primes 2 k − γ compute the reduction via H ; e.g., Curve25519 uses 2 255 − 19 c ′ L + γ c ′ (d) Barrett reduction, Montgomery domain, etc. K. Järvinen: The State-of-the-Art of ECC HW June 7, 2017 10/43

FIELD ARITHMETIC Inversion ◮ Inversion : Extended Euclidean Algorithm (EEA) vs. Fermat’s Little Theorem (FLT) ◮ FLT computes a − 1 = a q − 2 in F q via a series of squarings and multiplications ◮ FLT reuses the multiplier and requires only control logic ◮ FLT is inherently constant time ◮ EEA can be faster if implemented with a dedicated unit K. Järvinen: The State-of-the-Art of ECC HW June 7, 2017 11/43

POINT OPERATIONS ◮ Algorithms for point addition and doubling ◮ Series of field operations ◮ Explicit-Formulas Database ◮ Relevant things: ◮ Number of operations (multiplications and squarings) ◮ Parallelism ◮ Number of registers ◮ Atomicity or completeness ◮ etc. K. Järvinen: The State-of-the-Art of ECC HW June 7, 2017 12/43

SCALAR MULTIPLICATION Input : Integer k = � ℓ − 1 i = 0 k i 2 i , point P Output : Point Q = kP Q ← O for i = ℓ − 1 to 0 do Q ← 2 Q if k i = 1 then Q ← Q + P Structure of Scalar Multiplication: ◮ Preprocessing: precomputations with P , preprocessing of k ◮ Main for-loop: A series of point operations ◮ Coordinate conversion (inversion) K. Järvinen: The State-of-the-Art of ECC HW June 7, 2017 13/43

ECC IMPLEMENTATIONS FOR DIFFERENT USE CASES K. Järvinen: The State-of-the-Art of ECC HW June 7, 2017 14/43

WHY DO WE NEED HARDWARE? ◮ Fast Processing Speeds HW provides very high throughput and/or low latency and can free resources from the main processor ◮ Minimal Resource Usage HW is required if resources (e.g., chip area, power, energy, etc.) are extremely scarce ◮ Implementation Security HW maximizes implementation security K. Järvinen: The State-of-the-Art of ECC HW June 7, 2017 15/43

LOW LATENCY ◮ Optimization Goal : Compute a scalar multiplication as fast as possible (time from input to output) ◮ The traditional optimization goal; vast majority of published ECC implementations fall into this category ◮ Use fast multipliers, utilize parallelism in point operations, use precomputations, etc. K. Järvinen: The State-of-the-Art of ECC HW June 7, 2017 16/43

LOW LATENCY Field Operations ◮ The latency of field multiplication dominates ⇒ Use a faster multiplier ◮ Designing a fast, e.g., 256-bit multiplier is difficult TIME ◮ In theory, using more area gives a faster multiplier THEORY ◮ Small subproducts over several clock cycles and deep pipelines are often better in practice AREA K. Järvinen: The State-of-the-Art of ECC HW June 7, 2017 17/43

LOW LATENCY Field Operations ◮ The latency of field multiplication dominates ⇒ Use a faster multiplier ◮ Designing a fast, e.g., 256-bit multiplier is difficult TIME PRACTICE ◮ In theory, using more area gives a faster multiplier THEORY ◮ Small subproducts over several clock cycles and deep pipelines are often better in practice AREA K. Järvinen: The State-of-the-Art of ECC HW June 7, 2017 17/43

LOW LATENCY Field Operations ◮ The latency of field multiplication dominates ⇒ Use a faster multiplier ◮ Designing a fast, e.g., 256-bit multiplier is difficult TIME ◮ In theory, using more area PRACTICE gives a faster multiplier THEORY ◮ Small subproducts over several clock cycles and deep pipelines are often better in practice AREA K. Järvinen: The State-of-the-Art of ECC HW June 7, 2017 17/43

LOW LATENCY Point Operations ◮ Independent field operations in point operations can be computed in parallel (or in a pipeline) ◮ Identify the number of parallel arithmetic blocks from the point operation formulas (e.g., Explicit Formula Database) ◮ Memory access may become a problem K. Järvinen: The State-of-the-Art of ECC HW June 7, 2017 18/43

LOW LATENCY Point Operations a 24 X 2 X 4 + × × Z 2 Z 4 + − × − × × X 3 X 5 + × + × × Z 3 Z 5 − × − × × Z 1 X 1 Montgomery (1987): Differential addition and doubling https://hyperelliptic.org/EFD/g1p/auto-montgom-xz.html#ladder-ladd-1987-m-3 K. Järvinen: The State-of-the-Art of ECC HW June 7, 2017 19/43

LOW LATENCY Scalar Multiplication ◮ Minimize the critical path ◮ Precomputations (window) ◮ Precompute multiples of P ; e.g., − ( 2 w − 1 ) P , . . . , − 3 P , − P , P , 3 P , . . . , ( 2 w − 1 ) P ◮ Convert the integer k appropriately ◮ Reduces the number of point additions; fixed P allows reducing the number of point doublings also ◮ Also constant-time alternatives exist ◮ Fast endomorphisms ◮ Koblitz curves: Frobenius map ( x 2 , y 2 ) replaces doublings ◮ GLV/GLS curves: Ψ( P ) = λ P kP = k 1 P + k 2 Ψ( P ) ⇒ when k = k 1 + k 2 λ K. Järvinen: The State-of-the-Art of ECC HW June 7, 2017 20/43

HIGH THROUGHPUT ◮ Optimization Goal : Compute as many scalar multiplications as possible in certain time (operations per second) ◮ Simply making t , latency of one scalar multiplication, smaller is not feasible (or even possible) ◮ Typically more efficient to increase N , the number of concurrent scalar multiplications, with parallelism and pipelining T = N t K. Järvinen: The State-of-the-Art of ECC HW June 7, 2017 21/43

THE STATE-OF-THE-ART OF HARDWARE IMPLEMENTATIONS OF ELLIPTIC CURVE - PowerPoint PPT Presentation

THE STATE-OF-THE-ART OF HARDWARE IMPLEMENTATIONS OF ELLIPTIC CURVE CRYPTOGRAPHY Kimmo Jrvinen Department of Computer Science University of Helsinki kimmo.u.jarvinen@helsinki.fi ECRYPT-CSA Workshop on Hardware Benchmarking Bochum, Germany,

HISTORY ART Pre- Historic Art Egyptian Art Greek Art Roman Art Byzantine Art Medieval Art

HISTORY ART Pre- Historic Art Egyptian Art Greek Art Roman Art Byzantine Art Medieval Art

Hardware Observability Framework Hardware Observability Framework Hardware Observability

Hardware- -Based Implementations Based Implementations Hardware of Factoring Algorithms of

Hardware Implementations of Fixed-Point Atan2 Florent de Dinechin Matei I stoan Universit

Contracts vs. Implementations: Where? Common Eiffel Errors: Instructions for Implementations :

Threshold Implementations Svetla Nikova Threshold Implementations A provably secure

Overview of Presentation Public Art Definitions Why is Public Art Important ? Percent for Art

ART OF CHANGE 21 PRSENTATION 2 ART OF CHANGE 21 ABOUT US Art of Change 21 works in the field

VC. VC. Hardware Startup The Hardware Revolu/on The Hardware Revolution Removing Barriers to

Sec Secure ure Hardware Hardware and Hardware and Hardware- En Enabled abled Security

Pixel Art What is pixel art? Pixel art is a digital art form that is created in raster in its

Art and Design Art and Design Insects Year One Art and Design Art and Design | LKS2 | Insects |

Greek Art from E Early Classical to l Cl l Hellenistic Period Hellenistic Period AP Art

CHART | ART FAIR 29. 31. AUGUST 2014 CHART | ART FAIR IS AN INNOVATIVE ART FAIR WITH A HIGH

Tartu Art School Tartu Art School Estonia Tartu Tartu Art School Tartu Art School Graphic

Open tools and methods for large scale segmentation of Very High Resolution satellite images

RES - causa di maggior flessibilit o possibile soluzione? Simone Biondi 18 Ottobre 2018 Un

NASA Solar System Exploration Research Virtual Institute Presentation to NASA/South Africa

Performance study of a compact LumiCal prototype in an electron beam. . Gostkin Joint

PANACHE: A PARALLEL FILE SYSTEM CACHE FOR GLOBAL FILE ACCESS

Private Quantum Decoupling Francesco Buscemi 1 3rd Intl. Conference on Quantum Foundations

Community Research Paul Charlton NIHR Patient Research Ambassador Cancer and Nutrition NIHR

Welcome to CS 106 Winter 2019 Course website We will walk through the course website i