how cryptographic benchmarking about preserve the goes
play

How cryptographic benchmarking About PRESERVE: The goes wrong - PowerPoint PPT Presentation

1 2 How cryptographic benchmarking About PRESERVE: The goes wrong mission of PRESERVE is, to design, implement, and Daniel J. Bernstein test a secure and scalable Thanks to NIST 60NANB12D261 V2X Security Subsystem for for funding


  1. 3 4 include many CPUs. PRESERVE deliverable 5.4, Compare to, e.g., ASIC? “Deployment Issues Report IAIK NIST P-256 ECC V4”, 2016: “the number of 858 scalarmult/second deliverable 1.1, ECC signature verifications per in 111620 GE at 192 Requirements of Vehicle second is the key performance at 180nm (“UMC Architecture”, 2011: factor for ASICs in a C2C technology using F 1,000 packets per environment : : : [On a standard cell library cessing each in 1 9.3744 — m 2 /GE; w 4mm × 4mm chip] the 180nm e met by current technology may only yield enough conditions (temperature discussed in [32], space for one ECC core, whereas core voltage 1.62V)”). GHz processor 90nm will allow for up to ten ECC times as long for Signature verification cores and 55nm will allow for even a dedicated somewhat slower than more.” For 180nm core says co-processor is Still close to 100 × max 100MHz, 100 verif/second. necessary.” than the PRESERVE

  2. 3 4 CPUs. PRESERVE deliverable 5.4, Compare to, e.g., “Deployment Issues Report IAIK NIST P-256 ECC Module V4”, 2016: “the number of 858 scalarmult/second 1.1, ECC signature verifications per in 111620 GE at 192 MHz Vehicle second is the key performance at 180nm (“UMC L180GII 2011: factor for ASICs in a C2C technology using Faraday f180 per environment : : : [On a standard cell library (FSA0A each in 1 9.3744 — m 2 /GE; worst case 4mm × 4mm chip] the 180nm current technology may only yield enough conditions (temperature 125 [32], space for one ECC core, whereas core voltage 1.62V)”). cessor 90nm will allow for up to ten ECC long for Signature verification will be cores and 55nm will allow for even dedicated somewhat slower than scalarmult. more.” For 180nm core says r is Still close to 100 × more efficient max 100MHz, 100 verif/second. than the PRESERVE estimates.

  3. 4 5 PRESERVE deliverable 5.4, Compare to, e.g., “Deployment Issues Report IAIK NIST P-256 ECC Module: V4”, 2016: “the number of 858 scalarmult/second ECC signature verifications per in 111620 GE at 192 MHz second is the key performance at 180nm (“UMC L180GII factor for ASICs in a C2C technology using Faraday f180 environment : : : [On a standard cell library (FSA0A C), 9.3744 — m 2 /GE; worst case 4mm × 4mm chip] the 180nm conditions (temperature 125 ◦ C, technology may only yield enough space for one ECC core, whereas core voltage 1.62V)”). 90nm will allow for up to ten ECC Signature verification will be cores and 55nm will allow for even somewhat slower than scalarmult. more.” For 180nm core says Still close to 100 × more efficient max 100MHz, 100 verif/second. than the PRESERVE estimates.

  4. 4 5 PRESERVE deliverable 5.4, Compare to, e.g., Let’s go yment Issues Report IAIK NIST P-256 ECC Module: core argument 2016: “the number of 858 scalarmult/second Central claim: signature verifications per in 111620 GE at 192 MHz in [32], a is the key performance at 180nm (“UMC L180GII processor for ASICs in a C2C technology using Faraday f180 (i.e., 17 environment : : : [On a standard cell library (FSA0A C), for signature 9.3744 — m 2 /GE; worst case 4mm chip] the 180nm [32] is “P conditions (temperature 125 ◦ C, technology may only yield enough Z., ‘Analysis for one ECC core, whereas core voltage 1.62V)”). overhead will allow for up to ten ECC Signature verification will be Third Joint and 55nm will allow for even somewhat slower than scalarmult. Mobile Net For 180nm core says Still close to 100 × more efficient (WMNC), 100MHz, 100 verif/second. than the PRESERVE estimates.

  5. 4 5 deliverable 5.4, Compare to, e.g., Let’s go back to PRESERVE’s Issues Report IAIK NIST P-256 ECC Module: core argument for number of 858 scalarmult/second Central claim: “As verifications per in 111620 GE at 192 MHz in [32], a Pentium ey performance at 180nm (“UMC L180GII processor needs ab in a C2C technology using Faraday f180 (i.e., 17 million CPU [On a standard cell library (FSA0A C), for signature verification. 9.3744 — m 2 /GE; worst case chip] the 180nm [32] is “Petit, J., Mamm conditions (temperature 125 ◦ C, only yield enough Z., ‘Analysis of authentication ECC core, whereas core voltage 1.62V)”). overhead in vehicula for up to ten ECC Signature verification will be Third Joint IFIP Wireless will allow for even somewhat slower than scalarmult. Mobile Networking 180nm core says Still close to 100 × more efficient (WMNC), 2010.” 100 verif/second. than the PRESERVE estimates.

  6. 4 5 5.4, Compare to, e.g., Let’s go back to PRESERVE’s rt IAIK NIST P-256 ECC Module: core argument for an ASIC. of 858 scalarmult/second Central claim: “As discussed verifications per in 111620 GE at 192 MHz in [32], a Pentium D 3.4 GHz rmance at 180nm (“UMC L180GII processor needs about” 5ms technology using Faraday f180 (i.e., 17 million CPU cycles) standard cell library (FSA0A C), for signature verification. 9.3744 — m 2 /GE; worst case 180nm [32] is “Petit, J., Mammeri, conditions (temperature 125 ◦ C, enough Z., ‘Analysis of authentication whereas core voltage 1.62V)”). overhead in vehicular networks’, ten ECC Signature verification will be Third Joint IFIP Wireless and for even somewhat slower than scalarmult. Mobile Networking Conference ys Still close to 100 × more efficient (WMNC), 2010.” verif/second. than the PRESERVE estimates.

  7. 5 6 Compare to, e.g., Let’s go back to PRESERVE’s IAIK NIST P-256 ECC Module: core argument for an ASIC. 858 scalarmult/second Central claim: “As discussed in 111620 GE at 192 MHz in [32], a Pentium D 3.4 GHz at 180nm (“UMC L180GII processor needs about” 5ms technology using Faraday f180 (i.e., 17 million CPU cycles) standard cell library (FSA0A C), for signature verification. 9.3744 — m 2 /GE; worst case [32] is “Petit, J., Mammeri, conditions (temperature 125 ◦ C, Z., ‘Analysis of authentication core voltage 1.62V)”). overhead in vehicular networks’, Signature verification will be Third Joint IFIP Wireless and somewhat slower than scalarmult. Mobile Networking Conference Still close to 100 × more efficient (WMNC), 2010.” than the PRESERVE estimates.

  8. 5 6 Compare to, e.g., Let’s go back to PRESERVE’s [32] says NIST P-256 ECC Module: core argument for an ASIC. to the huge scalarmult/second economic Central claim: “As discussed 111620 GE at 192 MHz from vehicula in [32], a Pentium D 3.4 GHz 180nm (“UMC L180GII governments, processor needs about” 5ms technology using Faraday f180 companies, (i.e., 17 million CPU cycles) rd cell library (FSA0A C), have made for signature verification. — m 2 /GE; worst case vehicular [32] is “Petit, J., Mammeri, conditions (temperature 125 ◦ C, [1]. On average, Z., ‘Analysis of authentication voltage 1.62V)”). collisions overhead in vehicular networks’, and 7900 Signature verification will be Third Joint IFIP Wireless and United States, somewhat slower than scalarmult. Mobile Networking Conference economic close to 100 × more efficient (WMNC), 2010.” [2]. : : : [Simila the PRESERVE estimates. costing e

  9. 5 6 e.g., Let’s go back to PRESERVE’s [32] says “1. Intro P-256 ECC Module: core argument for an ASIC. to the huge life losses rmult/second economic impacts Central claim: “As discussed 192 MHz from vehicular collisions, in [32], a Pentium D 3.4 GHz (“UMC L180GII governments, automotive processor needs about” 5ms Faraday f180 companies, and indu (i.e., 17 million CPU cycles) rary (FSA0A C), have made the reduction for signature verification. /GE; worst case vehicular fatalities [32] is “Petit, J., Mammeri, erature 125 ◦ C, [1]. On average, vehicula Z., ‘Analysis of authentication 1.62V)”). collisions cause 102 overhead in vehicular networks’, and 7900 injuries daily verification will be Third Joint IFIP Wireless and United States, leaving er than scalarmult. Mobile Networking Conference economic impact of × more efficient (WMNC), 2010.” [2]. : : : [Similar sto PRESERVE estimates. costing e 160 billion

  10. 5 6 Let’s go back to PRESERVE’s [32] says “1. Introduction. Due dule: core argument for an ASIC. to the huge life losses and the economic impacts resulting Central claim: “As discussed from vehicular collisions, many in [32], a Pentium D 3.4 GHz I governments, automotive processor needs about” 5ms f180 companies, and industry conso (i.e., 17 million CPU cycles) (FSA0A C), have made the reduction of for signature verification. case vehicular fatalities a top prio [32] is “Petit, J., Mammeri, 125 ◦ C, [1]. On average, vehicular Z., ‘Analysis of authentication collisions cause 102 deaths overhead in vehicular networks’, and 7900 injuries daily in the be Third Joint IFIP Wireless and United States, leaving an calarmult. Mobile Networking Conference economic impact of $230 billion efficient (WMNC), 2010.” [2]. : : : [Similar story for EU:] estimates. costing e 160 billion annually

  11. 6 7 Let’s go back to PRESERVE’s [32] says “1. Introduction. Due core argument for an ASIC. to the huge life losses and the economic impacts resulting Central claim: “As discussed from vehicular collisions, many in [32], a Pentium D 3.4 GHz governments, automotive processor needs about” 5ms companies, and industry consortia (i.e., 17 million CPU cycles) have made the reduction of for signature verification. vehicular fatalities a top priority [32] is “Petit, J., Mammeri, [1]. On average, vehicular Z., ‘Analysis of authentication collisions cause 102 deaths overhead in vehicular networks’, and 7900 injuries daily in the Third Joint IFIP Wireless and United States, leaving an Mobile Networking Conference economic impact of $230 billion (WMNC), 2010.” [2]. : : : [Similar story for EU:] costing e 160 billion annually [3].”

  12. 6 7 go back to PRESERVE’s [32] says “1. Introduction. Due Vehicles rgument for an ASIC. to the huge life losses and the information. economic impacts resulting of IEEE1609.2 Central claim: “As discussed from vehicular collisions, many support the [32], a Pentium D 3.4 GHz governments, automotive Signature cessor needs about” 5ms companies, and industry consortia [8] over 17 million CPU cycles) have made the reduction of P-224 and ignature verification. vehicular fatalities a top priority paper, w “Petit, J., Mammeri, [1]. On average, vehicular and communication ‘Analysis of authentication collisions cause 102 deaths the authentication overhead in vehicular networks’, and 7900 injuries daily in the provided Joint IFIP Wireless and United States, leaving an II. Signature Networking Conference economic impact of $230 billion verification (WMNC), 2010.” [2]. : : : [Similar story for EU:] D 3.4Ghz costing e 160 billion annually [3].”

  13. 6 7 PRESERVE’s [32] says “1. Introduction. Due Vehicles will communicate for an ASIC. to the huge life losses and the information. “All implementations economic impacts resulting of IEEE1609.2 standa “As discussed from vehicular collisions, many support the Elliptic entium D 3.4 GHz governments, automotive Signature Algorithm about” 5ms companies, and industry consortia [8] over the two NIST CPU cycles) have made the reduction of P-224 and P-256. verification. vehicular fatalities a top priority paper, we assess the J., Mammeri, [1]. On average, vehicular and communication authentication collisions cause 102 deaths the authentication vehicular networks’, and 7900 injuries daily in the provided by ECDSA. Wireless and United States, leaving an II. Signature generation rking Conference economic impact of $230 billion verification times on 2010.” [2]. : : : [Similar story for EU:] D 3.4Ghz workstation costing e 160 billion annually [3].”

  14. 6 7 PRESERVE’s [32] says “1. Introduction. Due Vehicles will communicate safet ASIC. to the huge life losses and the information. “All implementations economic impacts resulting of IEEE1609.2 standard [7] shall discussed from vehicular collisions, many support the Elliptic Curve Digital GHz governments, automotive Signature Algorithm (ECDSA) 5ms companies, and industry consortia [8] over the two NIST curves cycles) have made the reduction of P-224 and P-256. : : : In this vehicular fatalities a top priority paper, we assess the processing eri, [1]. On average, vehicular and communication overhead authentication collisions cause 102 deaths the authentication mechanism orks’, and 7900 injuries daily in the provided by ECDSA. : : : Table and United States, leaving an II. Signature generation and Conference economic impact of $230 billion verification times on a Pentiu [2]. : : : [Similar story for EU:] D 3.4Ghz workstation [10]” costing e 160 billion annually [3].”

  15. 7 8 [32] says “1. Introduction. Due Vehicles will communicate safety to the huge life losses and the information. “All implementations economic impacts resulting of IEEE1609.2 standard [7] shall from vehicular collisions, many support the Elliptic Curve Digital governments, automotive Signature Algorithm (ECDSA) companies, and industry consortia [8] over the two NIST curves have made the reduction of P-224 and P-256. : : : In this vehicular fatalities a top priority paper, we assess the processing [1]. On average, vehicular and communication overhead of collisions cause 102 deaths the authentication mechanism and 7900 injuries daily in the provided by ECDSA. : : : Table United States, leaving an II. Signature generation and economic impact of $230 billion verification times on a Pentium [2]. : : : [Similar story for EU:] D 3.4Ghz workstation [10]” costing e 160 billion annually [3].”

  16. 7 8 ys “1. Introduction. Due Vehicles will communicate safety [10] (in [32]) huge life losses and the information. “All implementations J., ‘Analysis economic impacts resulting of IEEE1609.2 standard [7] shall Authentication vehicular collisions, many support the Elliptic Curve Digital VANETs’, governments, automotive Signature Algorithm (ECDSA) Conference companies, and industry consortia [8] over the two NIST curves Mobility made the reduction of P-224 and P-256. : : : In this Cairo, Decemb vehicular fatalities a top priority paper, we assess the processing [10] says On average, vehicular and communication overhead of implemented collisions cause 102 deaths the authentication mechanism and follo 7900 injuries daily in the provided by ECDSA. : : : Table For NIST States, leaving an II. Signature generation and “Pentium economic impact of $230 billion verification times on a Pentium 2.50ms/3.33ms : [Similar story for EU:] D 3.4Ghz workstation [10]” 4.97ms/6.63ms costing e 160 billion annually [3].”

  17. 7 8 Introduction. Due Vehicles will communicate safety [10] (in [32]) is “P losses and the information. “All implementations J., ‘Analysis of ECDSA acts resulting of IEEE1609.2 standard [7] shall Authentication Pro collisions, many support the Elliptic Curve Digital VANETs’, 3rd IFIP automotive Signature Algorithm (ECDSA) Conference on New industry consortia [8] over the two NIST curves Mobility and Securit reduction of P-224 and P-256. : : : In this Cairo, December 2009. atalities a top priority paper, we assess the processing [10] says “ECDSA average, vehicular and communication overhead of implemented using 102 deaths the authentication mechanism and following the Fig.1.” uries daily in the provided by ECDSA. : : : Table For NIST P-224/P-256 leaving an II. Signature generation and “Pentium D 3.4GHz act of $230 billion verification times on a Pentium 2.50ms/3.33ms to story for EU:] D 3.4Ghz workstation [10]” 4.97ms/6.63ms to billion annually [3].”

  18. 7 8 duction. Due Vehicles will communicate safety [10] (in [32]) is “Petit the information. “All implementations J., ‘Analysis of ECDSA resulting of IEEE1609.2 standard [7] shall Authentication Processing in many support the Elliptic Curve Digital VANETs’, 3rd IFIP International Signature Algorithm (ECDSA) Conference on New Technologies, consortia [8] over the two NIST curves Mobility and Security (NTMS), of P-224 and P-256. : : : In this Cairo, December 2009.” riority paper, we assess the processing [10] says “ECDSA was and communication overhead of implemented using MIRACL deaths the authentication mechanism and following the Fig.1.” the provided by ECDSA. : : : Table For NIST P-224/P-256 on II. Signature generation and “Pentium D 3.4GHz workstation”: billion verification times on a Pentium 2.50ms/3.33ms to sign, EU:] D 3.4Ghz workstation [10]” 4.97ms/6.63ms to verify. nnually [3].”

  19. 8 9 Vehicles will communicate safety [10] (in [32]) is “Petit information. “All implementations J., ‘Analysis of ECDSA of IEEE1609.2 standard [7] shall Authentication Processing in support the Elliptic Curve Digital VANETs’, 3rd IFIP International Signature Algorithm (ECDSA) Conference on New Technologies, [8] over the two NIST curves Mobility and Security (NTMS), P-224 and P-256. : : : In this Cairo, December 2009.” paper, we assess the processing [10] says “ECDSA was and communication overhead of implemented using MIRACL the authentication mechanism and following the Fig.1.” provided by ECDSA. : : : Table For NIST P-224/P-256 on II. Signature generation and “Pentium D 3.4GHz workstation”: verification times on a Pentium 2.50ms/3.33ms to sign, D 3.4Ghz workstation [10]” 4.97ms/6.63ms to verify.

  20. 8 9 ehicles will communicate safety [10] (in [32]) is “Petit Compare rmation. “All implementations J., ‘Analysis of ECDSA speeds rep IEEE1609.2 standard [7] shall Authentication Processing in of 14nm rt the Elliptic Curve Digital VANETs’, 3rd IFIP International (“2015 Intel Signature Algorithm (ECDSA) Conference on New Technologies, https://bench.cr.yp.to over the two NIST curves Mobility and Security (NTMS), 0.015ms and P-256. : : : In this Cairo, December 2009.” 0.049ms we assess the processing [10] says “ECDSA was communication overhead of implemented using MIRACL authentication mechanism and following the Fig.1.” rovided by ECDSA. : : : Table For NIST P-224/P-256 on Signature generation and “Pentium D 3.4GHz workstation”: verification times on a Pentium 2.50ms/3.33ms to sign, 3.4Ghz workstation [10]” 4.97ms/6.63ms to verify.

  21. 8 9 communicate safety [10] (in [32]) is “Petit Compare to, e.g., Ed25519 “All implementations J., ‘Analysis of ECDSA speeds reported for standard [7] shall Authentication Processing in of 14nm 3.31GHz Elliptic Curve Digital VANETs’, 3rd IFIP International (“2015 Intel Core i5-6600”) rithm (ECDSA) Conference on New Technologies, https://bench.cr.yp.to NIST curves Mobility and Security (NTMS), 0.015ms to sign (49840 P-256. : : : In this Cairo, December 2009.” 0.049ms to verify (163206 the processing [10] says “ECDSA was communication overhead of implemented using MIRACL authentication mechanism and following the Fig.1.” ECDSA. : : : Table For NIST P-224/P-256 on generation and “Pentium D 3.4GHz workstation”: times on a Pentium 2.50ms/3.33ms to sign, rkstation [10]” 4.97ms/6.63ms to verify.

  22. 8 9 safety [10] (in [32]) is “Petit Compare to, e.g., Ed25519 implementations J., ‘Analysis of ECDSA speeds reported for single co [7] shall Authentication Processing in of 14nm 3.31GHz Skylake Digital VANETs’, 3rd IFIP International (“2015 Intel Core i5-6600”) (ECDSA) Conference on New Technologies, https://bench.cr.yp.to : curves Mobility and Security (NTMS), 0.015ms to sign (49840 cycles), this Cairo, December 2009.” 0.049ms to verify (163206 cycles). cessing [10] says “ECDSA was overhead of implemented using MIRACL mechanism and following the Fig.1.” able For NIST P-224/P-256 on nd “Pentium D 3.4GHz workstation”: entium 2.50ms/3.33ms to sign, [10]” 4.97ms/6.63ms to verify.

  23. 9 10 [10] (in [32]) is “Petit Compare to, e.g., Ed25519 J., ‘Analysis of ECDSA speeds reported for single core Authentication Processing in of 14nm 3.31GHz Skylake VANETs’, 3rd IFIP International (“2015 Intel Core i5-6600”) on Conference on New Technologies, https://bench.cr.yp.to : Mobility and Security (NTMS), 0.015ms to sign (49840 cycles), Cairo, December 2009.” 0.049ms to verify (163206 cycles). [10] says “ECDSA was implemented using MIRACL and following the Fig.1.” For NIST P-224/P-256 on “Pentium D 3.4GHz workstation”: 2.50ms/3.33ms to sign, 4.97ms/6.63ms to verify.

  24. 9 10 [10] (in [32]) is “Petit Compare to, e.g., Ed25519 J., ‘Analysis of ECDSA speeds reported for single core Authentication Processing in of 14nm 3.31GHz Skylake VANETs’, 3rd IFIP International (“2015 Intel Core i5-6600”) on Conference on New Technologies, https://bench.cr.yp.to : Mobility and Security (NTMS), 0.015ms to sign (49840 cycles), Cairo, December 2009.” 0.049ms to verify (163206 cycles). [10] says “ECDSA was This chip didn’t exist in 2009. implemented using MIRACL Compare instead to single core and following the Fig.1.” of 65nm 2.4GHz Core 2 (“2007 For NIST P-224/P-256 on Intel Core 2 Quad Q6600”). “Pentium D 3.4GHz workstation”: 0.065ms to sign (156843 cycles), 2.50ms/3.33ms to sign, 0.232ms to verify (557082 cycles). 4.97ms/6.63ms to verify.

  25. 9 10 (in [32]) is “Petit Compare to, e.g., Ed25519 2012 Bernstein–Schw ‘Analysis of ECDSA speeds reported for single core on 720MHz Authentication Processing in of 14nm 3.31GHz Skylake 0.9ms to ANETs’, 3rd IFIP International (“2015 Intel Core i5-6600”) on ARM Co Conference on New Technologies, https://bench.cr.yp.to : 1000MHz Mobility and Security (NTMS), 0.015ms to sign (49840 cycles), in iPad 1, December 2009.” 0.049ms to verify (163206 cycles). 1000MHz ys “ECDSA was in Samsung This chip didn’t exist in 2009. implemented using MIRACL 1000MHz Compare instead to single core following the Fig.1.” Motorola of 65nm 2.4GHz Core 2 (“2007 NIST P-224/P-256 on 800MHz Intel Core 2 Quad Q6600”). entium D 3.4GHz workstation”: Amazon 0.065ms to sign (156843 cycles), 2.50ms/3.33ms to sign, Today: in 0.232ms to verify (557082 cycles). 4.97ms/6.63ms to verify. Cortex-A7

  26. 9 10 “Petit Compare to, e.g., Ed25519 2012 Bernstein–Schw ECDSA speeds reported for single core on 720MHz ARM Processing in of 14nm 3.31GHz Skylake 0.9ms to verify (650102 IFIP International (“2015 Intel Core i5-6600”) on ARM Cortex-A8 co New Technologies, https://bench.cr.yp.to : 1000MHz Apple A4 Security (NTMS), 0.015ms to sign (49840 cycles), in iPad 1, iPhone 4 er 2009.” 0.049ms to verify (163206 cycles). 1000MHz Samsung “ECDSA was in Samsung Galaxy This chip didn’t exist in 2009. using MIRACL 1000MHz TI OMAP3630 Compare instead to single core the Fig.1.” Motorola Droid X of 65nm 2.4GHz Core 2 (“2007 P-224/P-256 on 800MHz Freescale Intel Core 2 Quad Q6600”). 3.4GHz workstation”: Amazon Kindle 4 (2011); 0.065ms to sign (156843 cycles), to sign, Today: in CPUs costing 0.232ms to verify (557082 cycles). to verify. Cortex-A7 is even

  27. 9 10 Compare to, e.g., Ed25519 2012 Bernstein–Schwabe speeds reported for single core on 720MHz ARM Cortex-A8: in of 14nm 3.31GHz Skylake 0.9ms to verify (650102 cycles). national (“2015 Intel Core i5-6600”) on ARM Cortex-A8 cores were in echnologies, https://bench.cr.yp.to : 1000MHz Apple A4 (NTMS), 0.015ms to sign (49840 cycles), in iPad 1, iPhone 4 (2010); 0.049ms to verify (163206 cycles). 1000MHz Samsung Exynos 3110 in Samsung Galaxy S (2010); This chip didn’t exist in 2009. CL 1000MHz TI OMAP3630 in Compare instead to single core Motorola Droid X (2010); of 65nm 2.4GHz Core 2 (“2007 800MHz Freescale i.MX50 in Intel Core 2 Quad Q6600”). rkstation”: Amazon Kindle 4 (2011); : : : 0.065ms to sign (156843 cycles), Today: in CPUs costing ≈ 2 0.232ms to verify (557082 cycles). Cortex-A7 is even more popula

  28. 10 11 Compare to, e.g., Ed25519 2012 Bernstein–Schwabe speeds reported for single core on 720MHz ARM Cortex-A8: of 14nm 3.31GHz Skylake 0.9ms to verify (650102 cycles). (“2015 Intel Core i5-6600”) on ARM Cortex-A8 cores were in https://bench.cr.yp.to : 1000MHz Apple A4 0.015ms to sign (49840 cycles), in iPad 1, iPhone 4 (2010); 0.049ms to verify (163206 cycles). 1000MHz Samsung Exynos 3110 in Samsung Galaxy S (2010); This chip didn’t exist in 2009. 1000MHz TI OMAP3630 in Compare instead to single core Motorola Droid X (2010); of 65nm 2.4GHz Core 2 (“2007 800MHz Freescale i.MX50 in Intel Core 2 Quad Q6600”). Amazon Kindle 4 (2011); : : : 0.065ms to sign (156843 cycles), Today: in CPUs costing ≈ 2 EUR. 0.232ms to verify (557082 cycles). Cortex-A7 is even more popular.

  29. 10 11 Compare to, e.g., Ed25519 2012 Bernstein–Schwabe 180nm 32-bit reported for single core on 720MHz ARM Cortex-A8: (“2001 Intel 14nm 3.31GHz Skylake 0.9ms to verify (650102 cycles). 0.46ms (0.9 Intel Core i5-6600”) on ARM Cortex-A8 cores were in for Curve25519 https://bench.cr.yp.to : 1000MHz Apple A4 using floating-p 0.015ms to sign (49840 cycles), in iPad 1, iPhone 4 (2010); Integer multiplier 0.049ms to verify (163206 cycles). 1000MHz Samsung Exynos 3110 Nobody in Samsung Galaxy S (2010); chip didn’t exist in 2009. adapting 1000MHz TI OMAP3630 in Compare instead to single core Would b Motorola Droid X (2010); 65nm 2.4GHz Core 2 (“2007 3.4GHz P 800MHz Freescale i.MX50 in Core 2 Quad Q6600”). same basic Amazon Kindle 4 (2011); : : : 0.065ms to sign (156843 cycles), more instructions, Today: in CPUs costing ≈ 2 EUR. 0.232ms to verify (557082 cycles). Ed25519 Cortex-A7 is even more popular. on one co

  30. 10 11 e.g., Ed25519 2012 Bernstein–Schwabe 180nm 32-bit 2GHz for single core on 720MHz ARM Cortex-A8: (“2001 Intel Pentium 3.31GHz Skylake 0.9ms to verify (650102 cycles). 0.46ms (0.9 million re i5-6600”) on ARM Cortex-A8 cores were in for Curve25519 scala https://bench.cr.yp.to : 1000MHz Apple A4 using floating-point (49840 cycles), in iPad 1, iPhone 4 (2010); Integer multiplier is verify (163206 cycles). 1000MHz Samsung Exynos 3110 Nobody has ever b in Samsung Galaxy S (2010); exist in 2009. adapting this to signatures. 1000MHz TI OMAP3630 in to single core Would be ≈ 0 : 6ms Motorola Droid X (2010); Core 2 (“2007 3.4GHz Pentium D 800MHz Freescale i.MX50 in Quad Q6600”). same basic microarchitecture, Amazon Kindle 4 (2011); : : : (156843 cycles), more instructions, Today: in CPUs costing ≈ 2 EUR. verify (557082 cycles). Ed25519 would be Cortex-A7 is even more popular. on one core than P

  31. 10 11 2012 Bernstein–Schwabe 180nm 32-bit 2GHz Willamette core on 720MHz ARM Cortex-A8: (“2001 Intel Pentium 4”): 0.9ms to verify (650102 cycles). 0.46ms (0.9 million cycles) i5-6600”) on ARM Cortex-A8 cores were in for Curve25519 scalarmult : 1000MHz Apple A4 using floating-point multiplier. cycles), in iPad 1, iPhone 4 (2010); Integer multiplier is much slo cycles). 1000MHz Samsung Exynos 3110 Nobody has ever bothered in Samsung Galaxy S (2010); 2009. adapting this to signatures. 1000MHz TI OMAP3630 in core Would be ≈ 0 : 6ms for verify. Motorola Droid X (2010); (“2007 3.4GHz Pentium D (dual core): 800MHz Freescale i.MX50 in Q6600”). same basic microarchitecture, Amazon Kindle 4 (2011); : : : cycles), more instructions, faster clock. Today: in CPUs costing ≈ 2 EUR. cycles). Ed25519 would be > 10 × faster Cortex-A7 is even more popular. on one core than Petit’s soft

  32. 11 12 2012 Bernstein–Schwabe 180nm 32-bit 2GHz Willamette on 720MHz ARM Cortex-A8: (“2001 Intel Pentium 4”): 0.9ms to verify (650102 cycles). 0.46ms (0.9 million cycles) ARM Cortex-A8 cores were in for Curve25519 scalarmult 1000MHz Apple A4 using floating-point multiplier. in iPad 1, iPhone 4 (2010); Integer multiplier is much slower! 1000MHz Samsung Exynos 3110 Nobody has ever bothered in Samsung Galaxy S (2010); adapting this to signatures. 1000MHz TI OMAP3630 in Would be ≈ 0 : 6ms for verify. Motorola Droid X (2010); 3.4GHz Pentium D (dual core): 800MHz Freescale i.MX50 in same basic microarchitecture, Amazon Kindle 4 (2011); : : : more instructions, faster clock. Today: in CPUs costing ≈ 2 EUR. Ed25519 would be > 10 × faster Cortex-A7 is even more popular. on one core than Petit’s software.

  33. 11 12 Bernstein–Schwabe 180nm 32-bit 2GHz Willamette Bad ECDSA-NIST-P-256 720MHz ARM Cortex-A8: (“2001 Intel Pentium 4”): certainly to verify (650102 cycles). • can’t use 0.46ms (0.9 million cycles) • can’t use Cortex-A8 cores were in for Curve25519 scalarmult • need an 1000MHz Apple A4 using floating-point multiplier. etc. Typical 1, iPhone 4 (2010); Integer multiplier is much slower! 1000MHz Samsung Exynos 3110 2000 Bro Nobody has ever bothered Samsung Galaxy S (2010); Menezes adapting this to signatures. 1000MHz TI OMAP3630 in 4.0ms/6.4ms Would be ≈ 0 : 6ms for verify. rola Droid X (2010); cycles) fo 3.4GHz Pentium D (dual core): 800MHz Freescale i.MX50 in inside NIST same basic microarchitecture, Amazon Kindle 4 (2011); : : : 2001 Bernstein, more instructions, faster clock. y: in CPUs costing ≈ 2 EUR. 0.7 million Ed25519 would be > 10 × faster rtex-A7 is even more popular. for NIST on one core than Petit’s software.

  34. 11 12 Bernstein–Schwabe 180nm 32-bit 2GHz Willamette Bad ECDSA-NIST-P-256 Cortex-A8: (“2001 Intel Pentium 4”): certainly has some (650102 cycles). • can’t use fastest 0.46ms (0.9 million cycles) • can’t use fastest cores were in for Curve25519 scalarmult • need an annoying A4 using floating-point multiplier. etc. Typical estimate: iPhone 4 (2010); Integer multiplier is much slower! Samsung Exynos 3110 2000 Brown–Hank Nobody has ever bothered laxy S (2010); Menezes on 400MHz adapting this to signatures. OMAP3630 in 4.0ms/6.4ms (1.6/2.6 Would be ≈ 0 : 6ms for verify. X (2010); cycles) for double 3.4GHz Pentium D (dual core): reescale i.MX50 in inside NIST P-224/P-256 same basic microarchitecture, 4 (2011); : : : 2001 Bernstein, ≈ 1 more instructions, faster clock. costing ≈ 2 EUR. 0.7 million cycles on Ed25519 would be > 10 × faster even more popular. for NIST P-224 scala on one core than Petit’s software.

  35. 11 12 180nm 32-bit 2GHz Willamette Bad ECDSA-NIST-P-256 design rtex-A8: (“2001 Intel Pentium 4”): certainly has some impact: cycles). • can’t use fastest mulmods; 0.46ms (0.9 million cycles) • can’t use fastest curve form ere in for Curve25519 scalarmult • need an annoying inversion; using floating-point multiplier. etc. Typical estimate: 2 × slo (2010); Integer multiplier is much slower! Exynos 3110 2000 Brown–Hankerson–L´ op Nobody has ever bothered (2010); Menezes on 400MHz Pentium adapting this to signatures. in 4.0ms/6.4ms (1.6/2.6 million Would be ≈ 0 : 6ms for verify. cycles) for double scalarmult 3.4GHz Pentium D (dual core): in inside NIST P-224/P-256 verif. same basic microarchitecture, : : : 2001 Bernstein, ≈ 1 : 6 × faster: more instructions, faster clock. 2 EUR. 0.7 million cycles on Pentium Ed25519 would be > 10 × faster opular. for NIST P-224 scalarmult. on one core than Petit’s software.

  36. 12 13 180nm 32-bit 2GHz Willamette Bad ECDSA-NIST-P-256 design (“2001 Intel Pentium 4”): certainly has some impact: • can’t use fastest mulmods; 0.46ms (0.9 million cycles) • can’t use fastest curve formulas; for Curve25519 scalarmult • need an annoying inversion; using floating-point multiplier. etc. Typical estimate: 2 × slower. Integer multiplier is much slower! 2000 Brown–Hankerson–L´ opez– Nobody has ever bothered Menezes on 400MHz Pentium II: adapting this to signatures. 4.0ms/6.4ms (1.6/2.6 million Would be ≈ 0 : 6ms for verify. cycles) for double scalarmult 3.4GHz Pentium D (dual core): inside NIST P-224/P-256 verif. same basic microarchitecture, 2001 Bernstein, ≈ 1 : 6 × faster: more instructions, faster clock. 0.7 million cycles on Pentium II Ed25519 would be > 10 × faster for NIST P-224 scalarmult. on one core than Petit’s software.

  37. 12 13 32-bit 2GHz Willamette Bad ECDSA-NIST-P-256 design 2000 Bro Intel Pentium 4”): certainly has some impact: Menezes • can’t use fastest mulmods; cycles on 0.46ms (0.9 million cycles) • can’t use fastest curve formulas; Curve25519 scalarmult e.g., P-224 • need an annoying inversion; floating-point multiplier. 1.2 million etc. Typical estimate: 2 × slower. Integer multiplier is much slower! 2.7 million 2000 Brown–Hankerson–L´ opez– dy has ever bothered 2001 Bernstein Menezes on 400MHz Pentium II: adapting this to signatures. 0.7 million 4.0ms/6.4ms (1.6/2.6 million be ≈ 0 : 6ms for verify. 0.8 million cycles) for double scalarmult 0.9 million 3.4GHz Pentium D (dual core): inside NIST P-224/P-256 verif. using comp basic microarchitecture, 2001 Bernstein, ≈ 1 : 6 × faster: instructions, faster clock. OpenSSL 0.7 million cycles on Pentium II Ed25519 would be > 10 × faster 2.0 million for NIST P-224 scalarmult. core than Petit’s software.

  38. 12 13 2GHz Willamette Bad ECDSA-NIST-P-256 design 2000 Brown–Hank entium 4”): certainly has some impact: Menezes software use • can’t use fastest mulmods; cycles on P4 than million cycles) • can’t use fastest curve formulas; scalarmult e.g., P-224 scalarmult: • need an annoying inversion; oint multiplier. 1.2 million cycles on etc. Typical estimate: 2 × slower. multiplier is much slower! 2.7 million cycles on 2000 Brown–Hankerson–L´ opez– ever bothered 2001 Bernstein P-224 Menezes on 400MHz Pentium II: signatures. 0.7 million cycles on 4.0ms/6.4ms (1.6/2.6 million 6ms for verify. 0.8 million cycles on cycles) for double scalarmult 0.9 million cycles on D (dual core): inside NIST P-224/P-256 verif. using compressed k microarchitecture, 2001 Bernstein, ≈ 1 : 6 × faster: instructions, faster clock. OpenSSL 1.0.1, P-224 0.7 million cycles on Pentium II be > 10 × faster 2.0 million cycles on for NIST P-224 scalarmult. Petit’s software.

  39. 12 13 Willamette Bad ECDSA-NIST-P-256 design 2000 Brown–Hankerson–L´ op certainly has some impact: Menezes software uses many • can’t use fastest mulmods; cycles on P4 than on PII. cycles) • can’t use fastest curve formulas; e.g., P-224 scalarmult: • need an annoying inversion; multiplier. 1.2 million cycles on Pentium etc. Typical estimate: 2 × slower. slower! 2.7 million cycles on Pentium 2000 Brown–Hankerson–L´ opez– 2001 Bernstein P-224 scalarmult: Menezes on 400MHz Pentium II: signatures. 0.7 million cycles on Pentium 4.0ms/6.4ms (1.6/2.6 million verify. 0.8 million cycles on Pentium cycles) for double scalarmult 0.9 million cycles on Pentium core): inside NIST P-224/P-256 verif. using compressed keys. rchitecture, 2001 Bernstein, ≈ 1 : 6 × faster: clock. OpenSSL 1.0.1, P-224 verif: 0.7 million cycles on Pentium II faster 2.0 million cycles on Pentium for NIST P-224 scalarmult. software.

  40. 13 14 Bad ECDSA-NIST-P-256 design 2000 Brown–Hankerson–L´ opez– certainly has some impact: Menezes software uses many more • can’t use fastest mulmods; cycles on P4 than on PII. • can’t use fastest curve formulas; e.g., P-224 scalarmult: • need an annoying inversion; 1.2 million cycles on Pentium II. etc. Typical estimate: 2 × slower. 2.7 million cycles on Pentium 4. 2000 Brown–Hankerson–L´ opez– 2001 Bernstein P-224 scalarmult: Menezes on 400MHz Pentium II: 0.7 million cycles on Pentium II. 4.0ms/6.4ms (1.6/2.6 million 0.8 million cycles on Pentium 4. cycles) for double scalarmult 0.9 million cycles on Pentium 4 inside NIST P-224/P-256 verif. using compressed keys. 2001 Bernstein, ≈ 1 : 6 × faster: OpenSSL 1.0.1, P-224 verif: 0.7 million cycles on Pentium II 2.0 million cycles on Pentium D. for NIST P-224 scalarmult.

  41. 13 14 ECDSA-NIST-P-256 design 2000 Brown–Hankerson–L´ opez– How did certainly has some impact: Menezes software uses many more 17 million can’t use fastest mulmods; cycles on P4 than on PII. 22 million can’t use fastest curve formulas; e.g., P-224 scalarmult: Presumably an annoying inversion; 1.2 million cycles on Pentium II. bad mulmo ypical estimate: 2 × slower. 2.7 million cycles on Pentium 4. Why did Brown–Hankerson–L´ opez– 2001 Bernstein P-224 scalarmult: ECDSA, Menezes on 400MHz Pentium II: 0.7 million cycles on Pentium II. underlying 4.0ms/6.4ms (1.6/2.6 million 0.8 million cycles on Pentium 4. Why did for double scalarmult 0.9 million cycles on Pentium 4 previous NIST P-224/P-256 verif. using compressed keys. Why did Bernstein, ≈ 1 : 6 × faster: OpenSSL 1.0.1, P-224 verif: million cycles on Pentium II Why did 2.0 million cycles on Pentium D. IST P-224 scalarmult.

  42. 13 14 ECDSA-NIST-P-256 design 2000 Brown–Hankerson–L´ opez– How did Petit manage some impact: Menezes software uses many more 17 million cycles fo fastest mulmods; cycles on P4 than on PII. 22 million cycles fo fastest curve formulas; e.g., P-224 scalarmult: Presumably some combination ying inversion; 1.2 million cycles on Pentium II. bad mulmod and ba estimate: 2 × slower. 2.7 million cycles on Pentium 4. Why did Petit reimplement wn–Hankerson–L´ opez– 2001 Bernstein P-224 scalarmult: ECDSA, using MIRA 400MHz Pentium II: 0.7 million cycles on Pentium II. underlying arithmetic? (1.6/2.6 million 0.8 million cycles on Pentium 4. Why did Petit not double scalarmult 0.9 million cycles on Pentium 4 previous speed literature? P-224/P-256 verif. using compressed keys. Why did Petit cho ≈ 1 : 6 × faster: OpenSSL 1.0.1, P-224 verif: cycles on Pentium II Why did BHLM cho 2.0 million cycles on Pentium D. scalarmult.

  43. 13 14 design 2000 Brown–Hankerson–L´ opez– How did Petit manage to use impact: Menezes software uses many more 17 million cycles for P-224 verif, ds; cycles on P4 than on PII. 22 million cycles for P-256 verif formulas; e.g., P-224 scalarmult: Presumably some combination inversion; 1.2 million cycles on Pentium II. bad mulmod and bad curve slower. 2.7 million cycles on Pentium 4. Why did Petit reimplement erson–L´ opez– 2001 Bernstein P-224 scalarmult: ECDSA, using MIRACL for t entium II: 0.7 million cycles on Pentium II. underlying arithmetic? million 0.8 million cycles on Pentium 4. Why did Petit not simply cite rmult 0.9 million cycles on Pentium 4 previous speed literature? verif. using compressed keys. Why did Petit choose Pentium faster: OpenSSL 1.0.1, P-224 verif: entium II Why did BHLM choose PII? 2.0 million cycles on Pentium D. rmult.

  44. 14 15 2000 Brown–Hankerson–L´ opez– How did Petit manage to use Menezes software uses many more 17 million cycles for P-224 verif, cycles on P4 than on PII. 22 million cycles for P-256 verif? e.g., P-224 scalarmult: Presumably some combination of 1.2 million cycles on Pentium II. bad mulmod and bad curve ops. 2.7 million cycles on Pentium 4. Why did Petit reimplement 2001 Bernstein P-224 scalarmult: ECDSA, using MIRACL for the 0.7 million cycles on Pentium II. underlying arithmetic? 0.8 million cycles on Pentium 4. Why did Petit not simply cite 0.9 million cycles on Pentium 4 previous speed literature? using compressed keys. Why did Petit choose Pentium D? OpenSSL 1.0.1, P-224 verif: Why did BHLM choose PII? 2.0 million cycles on Pentium D.

  45. 14 15 Brown–Hankerson–L´ opez– How did Petit manage to use Petit: “There Menezes software uses many more 17 million cycles for P-224 verif, cryptographic on P4 than on PII. 22 million cycles for P-256 verif? OpenSSL Authors P-224 scalarmult: Presumably some combination of comparison million cycles on Pentium II. bad mulmod and bad curve ops. that MIRA million cycles on Pentium 4. Why did Petit reimplement performance Bernstein P-224 scalarmult: ECDSA, using MIRACL for the elliptic curves million cycles on Pentium II. underlying arithmetic? million cycles on Pentium 4. Why did Petit not simply cite million cycles on Pentium 4 previous speed literature? compressed keys. Why did Petit choose Pentium D? enSSL 1.0.1, P-224 verif: Why did BHLM choose PII? million cycles on Pentium D.

  46. 14 15 wn–Hankerson–L´ opez– How did Petit manage to use Petit: “There are three re uses many more 17 million cycles for P-224 verif, cryptographic libra than on PII. 22 million cycles for P-256 verif? OpenSSL and Crypto++. Authors in [21] prop rmult: Presumably some combination of comparison and concluded cycles on Pentium II. bad mulmod and bad curve ops. that MIRACL has cycles on Pentium 4. Why did Petit reimplement performance for op P-224 scalarmult: ECDSA, using MIRACL for the elliptic curves over cycles on Pentium II. underlying arithmetic? cycles on Pentium 4. Why did Petit not simply cite cycles on Pentium 4 previous speed literature? ressed keys. Why did Petit choose Pentium D? P-224 verif: Why did BHLM choose PII? cycles on Pentium D.

  47. 14 15 erson–L´ opez– How did Petit manage to use Petit: “There are three main many more 17 million cycles for P-224 verif, cryptographic libraries: MIRA 22 million cycles for P-256 verif? OpenSSL and Crypto++. Authors in [21] proposed a Presumably some combination of comparison and concluded entium II. bad mulmod and bad curve ops. that MIRACL has the best entium 4. Why did Petit reimplement performance for operations on scalarmult: ECDSA, using MIRACL for the elliptic curves over binary fie entium II. underlying arithmetic? entium 4. Why did Petit not simply cite entium 4 previous speed literature? Why did Petit choose Pentium D? verif: Why did BHLM choose PII? entium D.

  48. 15 16 How did Petit manage to use Petit: “There are three main 17 million cycles for P-224 verif, cryptographic libraries: MIRACL, 22 million cycles for P-256 verif? OpenSSL and Crypto++. Authors in [21] proposed a Presumably some combination of comparison and concluded bad mulmod and bad curve ops. that MIRACL has the best Why did Petit reimplement performance for operations on ECDSA, using MIRACL for the elliptic curves over binary fields.” underlying arithmetic? Why did Petit not simply cite previous speed literature? Why did Petit choose Pentium D? Why did BHLM choose PII?

  49. 15 16 How did Petit manage to use Petit: “There are three main 17 million cycles for P-224 verif, cryptographic libraries: MIRACL, 22 million cycles for P-256 verif? OpenSSL and Crypto++. Authors in [21] proposed a Presumably some combination of comparison and concluded bad mulmod and bad curve ops. that MIRACL has the best Why did Petit reimplement performance for operations on ECDSA, using MIRACL for the elliptic curves over binary fields.” underlying arithmetic? But NIST P-224 and NIST P-256 Why did Petit not simply cite are defined over prime fields! previous speed literature? [21] says “For elliptic curves Why did Petit choose Pentium D? over prime fields, OpenSSL has the best performance under all Why did BHLM choose PII? platforms.”

  50. 15 16 did Petit manage to use Petit: “There are three main More general million cycles for P-224 verif, cryptographic libraries: MIRACL, Paper analyzes million cycles for P-256 verif? OpenSSL and Crypto++. crypto up Authors in [21] proposed a Presumably some combination of If the crypto comparison and concluded mulmod and bad curve ops. Why is the that MIRACL has the best Why should did Petit reimplement performance for operations on ECDSA, using MIRACL for the If the crypto elliptic curves over binary fields.” underlying arithmetic? Paper is But NIST P-224 and NIST P-256 Look, here’s did Petit not simply cite are defined over prime fields! More likely revious speed literature? [21] says “For elliptic curves More likely did Petit choose Pentium D? over prime fields, OpenSSL has funding to the best performance under all did BHLM choose PII? platforms.”

  51. 15 16 manage to use Petit: “There are three main More general situation: for P-224 verif, cryptographic libraries: MIRACL, Paper analyzes impact for P-256 verif? OpenSSL and Crypto++. crypto upon an ap Authors in [21] proposed a some combination of If the crypto sounds comparison and concluded bad curve ops. Why is the paper interesting? that MIRACL has the best Why should it be published? reimplement performance for operations on MIRACL for the If the crypto sounds elliptic curves over binary fields.” rithmetic? Paper is more interesting. But NIST P-224 and NIST P-256 Look, here’s a spee not simply cite are defined over prime fields! More likely to be pu literature? [21] says “For elliptic curves More likely to mot choose Pentium D? over prime fields, OpenSSL has funding to fix the p the best performance under all choose PII? platforms.”

  52. 15 16 use Petit: “There are three main More general situation: verif, cryptographic libraries: MIRACL, Paper analyzes impact of verif? OpenSSL and Crypto++. crypto upon an application. Authors in [21] proposed a combination of If the crypto sounds fast: comparison and concluded curve ops. Why is the paper interesting? that MIRACL has the best Why should it be published? reimplement performance for operations on r the If the crypto sounds slower: elliptic curves over binary fields.” Paper is more interesting. But NIST P-224 and NIST P-256 Look, here’s a speed problem! cite are defined over prime fields! More likely to be published. [21] says “For elliptic curves More likely to motivate tium D? over prime fields, OpenSSL has funding to fix the problem. the best performance under all PII? platforms.”

  53. 16 17 Petit: “There are three main More general situation: cryptographic libraries: MIRACL, Paper analyzes impact of OpenSSL and Crypto++. crypto upon an application. Authors in [21] proposed a If the crypto sounds fast: comparison and concluded Why is the paper interesting? that MIRACL has the best Why should it be published? performance for operations on If the crypto sounds slower: elliptic curves over binary fields.” Paper is more interesting. But NIST P-224 and NIST P-256 Look, here’s a speed problem! are defined over prime fields! More likely to be published. [21] says “For elliptic curves More likely to motivate over prime fields, OpenSSL has funding to fix the problem. the best performance under all platforms.”

  54. 16 17 “There are three main More general situation: Obvious cryptographic libraries: MIRACL, Paper analyzes impact of application enSSL and Crypto++. crypto upon an application. deployment: rs in [21] proposed a If the crypto sounds fast: Many random rison and concluded Why is the paper interesting? answering MIRACL has the best Why should it be published? CPU to rmance for operations on literature If the crypto sounds slower: curves over binary fields.” mulmod, Paper is more interesting. NIST P-224 and NIST P-256 Look, here’s a speed problem! Slowest, defined over prime fields! More likely to be published. are most ys “For elliptic curves More likely to motivate Situation rime fields, OpenSSL has funding to fix the problem. randomness est performance under all There’s no rms.” deliberately

  55. 16 17 re three main More general situation: Obvious question whenever raries: MIRACL, Paper analyzes impact of application considers Crypto++. crypto upon an application. deployment: “Is it proposed a If the crypto sounds fast: Many random metho concluded Why is the paper interesting? answering this question. has the best Why should it be published? CPU to test? What operations on literature and libra If the crypto sounds slower: over binary fields.” mulmod, or curve Paper is more interesting. and NIST P-256 Look, here’s a speed problem! Slowest, least comp prime fields! More likely to be published. are most likely to b elliptic curves More likely to motivate Situation is fully explainable fields, OpenSSL has funding to fix the problem. randomness + natura rmance under all There’s no evidence deliberately slowed

  56. 16 17 main More general situation: Obvious question whenever an MIRACL, Paper analyzes impact of application considers crypto crypto upon an application. deployment: “Is it fast enough?” If the crypto sounds fast: Many random methodologies Why is the paper interesting? answering this question. Which est Why should it be published? CPU to test? What to take s on literature and libraries? Reuse If the crypto sounds slower: fields.” mulmod, or curve ops, or mo Paper is more interesting. P-256 Look, here’s a speed problem! Slowest, least competent answ fields! More likely to be published. are most likely to be published. curves More likely to motivate Situation is fully explainable enSSL has funding to fix the problem. randomness + natural selection. under all There’s no evidence that Petit deliberately slowed down crypto.

  57. 17 18 More general situation: Obvious question whenever an Paper analyzes impact of application considers crypto crypto upon an application. deployment: “Is it fast enough?” If the crypto sounds fast: Many random methodologies for Why is the paper interesting? answering this question. Which Why should it be published? CPU to test? What to take from literature and libraries? Reuse If the crypto sounds slower: mulmod, or curve ops, or more? Paper is more interesting. Look, here’s a speed problem! Slowest, least competent answers More likely to be published. are most likely to be published. More likely to motivate Situation is fully explainable by funding to fix the problem. randomness + natural selection. There’s no evidence that Petit deliberately slowed down crypto.

  58. 17 18 general situation: Obvious question whenever an Paper intro analyzes impact of application considers crypto software upon an application. deployment: “Is it fast enough?” incentive slow, and crypto sounds fast: Many random methodologies for report its is the paper interesting? answering this question. Which should it be published? CPU to test? What to take from Paper will literature and libraries? Reuse functions, crypto sounds slower: mulmod, or curve ops, or more? lengths, is more interesting. timing mechanism, here’s a speed problem! Slowest, least competent answers maximize likely to be published. are most likely to be published. from old likely to motivate Situation is fully explainable by funding to fix the problem. This is not randomness + natural selection. what matters There’s no evidence that Petit deliberately slowed down crypto.

  59. 17 18 situation: Obvious question whenever an Paper introducing impact of application considers crypto software or hardwa application. deployment: “Is it fast enough?” incentive to report slow, and analogous sounds fast: Many random methodologies for report its own crypto er interesting? answering this question. Which e published? CPU to test? What to take from Paper will naturally literature and libraries? Reuse functions, parameters sounds slower: mulmod, or curve ops, or more? lengths, platforms, interesting. timing mechanism, speed problem! Slowest, least competent answers maximize reported e published. are most likely to be published. from old to new. motivate Situation is fully explainable by the problem. This is not the same randomness + natural selection. what matters most There’s no evidence that Petit deliberately slowed down crypto.

  60. 17 18 Obvious question whenever an Paper introducing new crypto application considers crypto software or hardware has same plication. deployment: “Is it fast enough?” incentive to report older crypto slow, and analogous incentive Many random methodologies for report its own crypto as fast. interesting? answering this question. Which published? CPU to test? What to take from Paper will naturally select literature and libraries? Reuse functions, parameters, input er: mulmod, or curve ops, or more? lengths, platforms, I/O format, timing mechanism, etc. that roblem! Slowest, least competent answers maximize reported improvement blished. are most likely to be published. from old to new. Situation is fully explainable by roblem. This is not the same as selecting randomness + natural selection. what matters most for the users. There’s no evidence that Petit deliberately slowed down crypto.

  61. 18 19 Obvious question whenever an Paper introducing new crypto application considers crypto software or hardware has same deployment: “Is it fast enough?” incentive to report older crypto as slow, and analogous incentive to Many random methodologies for report its own crypto as fast. answering this question. Which CPU to test? What to take from Paper will naturally select literature and libraries? Reuse functions, parameters, input mulmod, or curve ops, or more? lengths, platforms, I/O format, timing mechanism, etc. that Slowest, least competent answers maximize reported improvement are most likely to be published. from old to new. Situation is fully explainable by This is not the same as selecting randomness + natural selection. what matters most for the users. There’s no evidence that Petit deliberately slowed down crypto.

  62. 18 19 Obvious question whenever an Paper introducing new crypto Bit operations application considers crypto software or hardware has same (assuming yment: “Is it fast enough?” incentive to report older crypto as as listed slow, and analogous incentive to random methodologies for key ops/bit report its own crypto as fast. ering this question. Which to test? What to take from Paper will naturally select 128 88 literature and libraries? Reuse functions, parameters, input 128 100 d, or curve ops, or more? lengths, platforms, I/O format, 128 117 timing mechanism, etc. that st, least competent answers maximize reported improvement 256 144 most likely to be published. from old to new. 128 147.2 Situation is fully explainable by 256 156 This is not the same as selecting 128 162.75 randomness + natural selection. what matters most for the users. 128 202.5 There’s no evidence that Petit 256 283.5 erately slowed down crypto.

  63. 18 19 question whenever an Paper introducing new crypto Bit operations per considers crypto software or hardware has same (assuming precomputed it fast enough?” incentive to report older crypto as as listed in recent slow, and analogous incentive to methodologies for key ops/bit cipher report its own crypto as fast. question. Which What to take from Paper will naturally select 128 88 Simon: raries? Reuse functions, parameters, input 128 100 NOEKEON curve ops, or more? lengths, platforms, I/O format, 128 117 Skinny timing mechanism, etc. that competent answers maximize reported improvement 256 144 Simon: to be published. from old to new. 128 147.2 PRESENT explainable by 256 156 Skinny This is not the same as selecting 128 162.75 Piccolo natural selection. what matters most for the users. 128 202.5 AES evidence that Petit 256 283.5 AES ed down crypto.

  64. 18 19 whenever an Paper introducing new crypto Bit operations per bit of plaintext crypto software or hardware has same (assuming precomputed subk enough?” incentive to report older crypto as as listed in recent Skinny pap slow, and analogous incentive to dologies for key ops/bit cipher report its own crypto as fast. Which e from Paper will naturally select 128 88 Simon: 60 ops Reuse functions, parameters, input 128 100 NOEKEON more? lengths, platforms, I/O format, 128 117 Skinny timing mechanism, etc. that answers maximize reported improvement 256 144 Simon: 106 op published. from old to new. 128 147.2 PRESENT explainable by 256 156 Skinny This is not the same as selecting 128 162.75 Piccolo selection. what matters most for the users. 128 202.5 AES etit 256 283.5 AES crypto.

  65. 19 20 Paper introducing new crypto Bit operations per bit of plaintext software or hardware has same (assuming precomputed subkeys), incentive to report older crypto as as listed in recent Skinny paper: slow, and analogous incentive to key ops/bit cipher report its own crypto as fast. Paper will naturally select 128 88 Simon: 60 ops broken functions, parameters, input 128 100 NOEKEON lengths, platforms, I/O format, 128 117 Skinny timing mechanism, etc. that maximize reported improvement 256 144 Simon: 106 ops broken from old to new. 128 147.2 PRESENT 256 156 Skinny This is not the same as selecting 128 162.75 Piccolo what matters most for the users. 128 202.5 AES 256 283.5 AES

  66. 19 20 Paper introducing new crypto Bit operations per bit of plaintext software or hardware has same (assuming precomputed subkeys), incentive to report older crypto as not entirely listed in Skinny paper: slow, and analogous incentive to key ops/bit cipher report its own crypto as fast. 256 54 Salsa20/8 256 78 Salsa20/12 Paper will naturally select 128 88 Simon: 60 ops broken functions, parameters, input 128 100 NOEKEON lengths, platforms, I/O format, 128 117 Skinny timing mechanism, etc. that 256 126 Salsa20 maximize reported improvement 256 144 Simon: 106 ops broken from old to new. 128 147.2 PRESENT 256 156 Skinny This is not the same as selecting 128 162.75 Piccolo what matters most for the users. 128 202.5 AES 256 283.5 AES

  67. 19 20 introducing new crypto Bit operations per bit of plaintext Many bad re or hardware has same (assuming precomputed subkeys), backed b incentive to report older crypto as not entirely listed in Skinny paper: e.g. Do w and analogous incentive to optimized key ops/bit cipher its own crypto as fast. the older 256 54 Salsa20/8 256 78 Salsa20/12 will naturally select Rely on “optimizing” 128 88 Simon: 60 ops broken functions, parameters, input “We come 128 100 NOEKEON lengths, platforms, I/O format, most architectures 128 117 Skinny mechanism, etc. that 256 126 Salsa20 do much maximize reported improvement 256 144 Simon: 106 ops broken complete old to new. 128 147.2 PRESENT heuristics. 256 156 Skinny not the same as selecting get little 128 162.75 Piccolo matters most for the users. where the 128 202.5 AES slightly wrong 256 283.5 AES

  68. 19 20 ducing new crypto Bit operations per bit of plaintext Many bad examples rdware has same (assuming precomputed subkeys), backed by tons of rt older crypto as not entirely listed in Skinny paper: e.g. Do we bother analogous incentive to optimized impleme key ops/bit cipher crypto as fast. the older crypto? T 256 54 Salsa20/8 256 78 Salsa20/12 rally select Rely on “optimizing” 128 88 Simon: 60 ops broken rameters, input “We come so close 128 100 NOEKEON rms, I/O format, most architectures 128 117 Skinny mechanism, etc. that 256 126 Salsa20 do much more without rted improvement 256 144 Simon: 106 ops broken complete algorithms new. 128 147.2 PRESENT heuristics. We can 256 156 Skinny same as selecting get little niggles here 128 162.75 Piccolo most for the users. where the heuristics 128 202.5 AES slightly wrong answ 256 283.5 AES

  69. 19 20 crypto Bit operations per bit of plaintext Many bad examples to imitate, same (assuming precomputed subkeys), backed by tons of misinformation. crypto as not entirely listed in Skinny paper: e.g. Do we bother searching incentive to optimized implementations of key ops/bit cipher fast. the older crypto? Take any co 256 54 Salsa20/8 256 78 Salsa20/12 Rely on “optimizing” compiler! 128 88 Simon: 60 ops broken input “We come so close to optimal 128 100 NOEKEON rmat, most architectures that we can’t 128 117 Skinny that 256 126 Salsa20 do much more without using rovement 256 144 Simon: 106 ops broken complete algorithms instead 128 147.2 PRESENT heuristics. We can only try to 256 156 Skinny selecting get little niggles here and there 128 162.75 Piccolo users. where the heuristics get 128 202.5 AES slightly wrong answers.” 256 283.5 AES

  70. 20 21 Bit operations per bit of plaintext Many bad examples to imitate, (assuming precomputed subkeys), backed by tons of misinformation. not entirely listed in Skinny paper: e.g. Do we bother searching for optimized implementations of key ops/bit cipher the older crypto? Take any code! 256 54 Salsa20/8 256 78 Salsa20/12 Rely on “optimizing” compiler! 128 88 Simon: 60 ops broken “We come so close to optimal on 128 100 NOEKEON most architectures that we can’t 128 117 Skinny 256 126 Salsa20 do much more without using NP 256 144 Simon: 106 ops broken complete algorithms instead of 128 147.2 PRESENT heuristics. We can only try to 256 156 Skinny get little niggles here and there 128 162.75 Piccolo where the heuristics get 128 202.5 AES slightly wrong answers.” 256 283.5 AES

  71. 20 21 erations per bit of plaintext Many bad examples to imitate, Reality is (assuming precomputed subkeys), backed by tons of misinformation. entirely listed in Skinny paper: e.g. Do we bother searching for optimized implementations of ops/bit cipher the older crypto? Take any code! 54 Salsa20/8 78 Salsa20/12 Rely on “optimizing” compiler! 88 Simon: 60 ops broken “We come so close to optimal on 100 NOEKEON most architectures that we can’t 117 Skinny 126 Salsa20 do much more without using NP 144 Simon: 106 ops broken complete algorithms instead of 147.2 PRESENT heuristics. We can only try to 156 Skinny get little niggles here and there 162.75 Piccolo where the heuristics get 202.5 AES slightly wrong answers.” 283.5 AES

  72. 20 21 er bit of plaintext Many bad examples to imitate, Reality is more complicated: computed subkeys), backed by tons of misinformation. listed in Skinny paper: e.g. Do we bother searching for optimized implementations of cipher the older crypto? Take any code! Salsa20/8 Salsa20/12 Rely on “optimizing” compiler! Simon: 60 ops broken “We come so close to optimal on NOEKEON most architectures that we can’t Skinny Salsa20 do much more without using NP Simon: 106 ops broken complete algorithms instead of PRESENT heuristics. We can only try to Skinny get little niggles here and there Piccolo where the heuristics get AES slightly wrong answers.” AES

  73. 20 21 plaintext Many bad examples to imitate, Reality is more complicated: subkeys), backed by tons of misinformation. Skinny paper: e.g. Do we bother searching for optimized implementations of the older crypto? Take any code! Rely on “optimizing” compiler! ops broken “We come so close to optimal on most architectures that we can’t do much more without using NP ops broken complete algorithms instead of heuristics. We can only try to get little niggles here and there where the heuristics get slightly wrong answers.”

  74. 21 22 Many bad examples to imitate, Reality is more complicated: backed by tons of misinformation. e.g. Do we bother searching for optimized implementations of the older crypto? Take any code! Rely on “optimizing” compiler! “We come so close to optimal on most architectures that we can’t do much more without using NP complete algorithms instead of heuristics. We can only try to get little niggles here and there where the heuristics get slightly wrong answers.”

  75. 21 22 bad examples to imitate, Reality is more complicated: SUPERCOP by tons of misinformation. includes of 595 cryptograph Do we bother searching for > 20 implementations optimized implementations of older crypto? Take any code! Haswell: on “optimizing” compiler! implementation gcc -O3 ome so close to optimal on is 6 : 15 × rchitectures that we can’t Salsa20 implementation. much more without using NP complete algorithms instead of merged implementation heuristics. We can only try to with “machine-indep little niggles here and there optimizations the heuristics get compiler slightly wrong answers.”

  76. 21 22 xamples to imitate, Reality is more complicated: SUPERCOP benchma of misinformation. includes 2155 implementations of 595 cryptograph other searching for > 20 implementations implementations of crypto? Take any code! Haswell: Reasonably “optimizing” compiler! implementation compiled gcc -O3 -fomit-frame-pointer close to optimal on is 6 : 15 × slower than rchitectures that we can’t Salsa20 implementation. without using NP rithms instead of merged implementation can only try to with “machine-indep here and there optimizations and heuristics get compiler options: answers.”

  77. 21 22 imitate, Reality is more complicated: SUPERCOP benchmarking to rmation. includes 2155 implementations of 595 cryptographic primitives. rching for > 20 implementations of Salsa20. tions of any code! Haswell: Reasonably simple compiler! implementation compiled with gcc -O3 -fomit-frame-pointer optimal on is 6 : 15 × slower than fastest can’t Salsa20 implementation. ing NP instead of merged implementation try to with “machine-independent” there optimizations and best of 121 compiler options: 4 : 52 × slow

  78. 22 23 Reality is more complicated: SUPERCOP benchmarking toolkit includes 2155 implementations of 595 cryptographic primitives. > 20 implementations of Salsa20. Haswell: Reasonably simple ref implementation compiled with gcc -O3 -fomit-frame-pointer is 6 : 15 × slower than fastest Salsa20 implementation. merged implementation with “machine-independent” optimizations and best of 121 compiler options: 4 : 52 × slower.

  79. 22 23 is more complicated: SUPERCOP benchmarking toolkit Another includes 2155 implementations lattice-based of 595 cryptographic primitives. means generating > 20 implementations of Salsa20. of random Haswell: Reasonably simple ref 2017.03 implementation compiled with Valencia–O’Sullivan–G Regazzoni gcc -O3 -fomit-frame-pointer is 6 : 15 × slower than fastest sources of Salsa20 implementation. discrete benchma merged implementation with “machine-independent” Qualitatively optimizations and best of 121 choice of compiler options: 4 : 52 × slower. sampling

  80. 22 23 complicated: SUPERCOP benchmarking toolkit Another interesting includes 2155 implementations lattice-based signing of 595 cryptographic primitives. means generating a > 20 implementations of Salsa20. of random Gaussian Haswell: Reasonably simple ref 2017.03 Brannigan–Smyth–Oder– implementation compiled with Valencia–O’Sullivan–G Regazzoni “An investigation gcc -O3 -fomit-frame-pointer is 6 : 15 × slower than fastest sources of randomness Salsa20 implementation. discrete Gaussian sampling”: benchmarks for RNGs, merged implementation with “machine-independent” Qualitatively large optimizations and best of 121 choice of RNG ⇒ compiler options: 4 : 52 × slower. sampling ⇒ cost of

  81. 22 23 complicated: SUPERCOP benchmarking toolkit Another interesting example: includes 2155 implementations lattice-based signing typically of 595 cryptographic primitives. means generating a huge numb > 20 implementations of Salsa20. of random Gaussian samples. Haswell: Reasonably simple ref 2017.03 Brannigan–Smyth–Oder– implementation compiled with Valencia–O’Sullivan–G¨ uneysu– Regazzoni “An investigation gcc -O3 -fomit-frame-pointer is 6 : 15 × slower than fastest sources of randomness within Salsa20 implementation. discrete Gaussian sampling”: benchmarks for RNGs, samplers. merged implementation with “machine-independent” Qualitatively large impacts: optimizations and best of 121 choice of RNG ⇒ cost of compiler options: 4 : 52 × slower. sampling ⇒ cost of signing.

  82. 23 24 SUPERCOP benchmarking toolkit Another interesting example: includes 2155 implementations lattice-based signing typically of 595 cryptographic primitives. means generating a huge number > 20 implementations of Salsa20. of random Gaussian samples. Haswell: Reasonably simple ref 2017.03 Brannigan–Smyth–Oder– implementation compiled with Valencia–O’Sullivan–G¨ uneysu– Regazzoni “An investigation of gcc -O3 -fomit-frame-pointer is 6 : 15 × slower than fastest sources of randomness within Salsa20 implementation. discrete Gaussian sampling”: benchmarks for RNGs, samplers. merged implementation with “machine-independent” Qualitatively large impacts: optimizations and best of 121 choice of RNG ⇒ cost of compiler options: 4 : 52 × slower. sampling ⇒ cost of signing.

  83. 23 24 SUPERCOP benchmarking toolkit Another interesting example: Two examples includes 2155 implementations lattice-based signing typically in this 2017 cryptographic primitives. means generating a huge number Skylake (I implementations of Salsa20. of random Gaussian samples. 383.69 MByte/sec ell: Reasonably simple ref 2017.03 Brannigan–Smyth–Oder– cycles/byte) implementation compiled with Valencia–O’Sullivan–G¨ uneysu– using AES-NI; Regazzoni “An investigation of (32 cycles -O3 -fomit-frame-pointer × slower than fastest sources of randomness within Salsa20 implementation. discrete Gaussian sampling”: benchmarks for RNGs, samplers. implementation “machine-independent” Qualitatively large impacts: optimizations and best of 121 choice of RNG ⇒ cost of compiler options: 4 : 52 × slower. sampling ⇒ cost of signing.

  84. 23 24 enchmarking toolkit Another interesting example: Two examples of sp implementations lattice-based signing typically in this 2017 paper cryptographic primitives. means generating a huge number Skylake (Intel Core implementations of Salsa20. of random Gaussian samples. 383.69 MByte/sec Reasonably simple ref 2017.03 Brannigan–Smyth–Oder– cycles/byte) for AES compiled with Valencia–O’Sullivan–G¨ uneysu– using AES-NI; 106.07 Regazzoni “An investigation of (32 cycles/byte) fo -fomit-frame-pointer than fastest sources of randomness within implementation. discrete Gaussian sampling”: benchmarks for RNGs, samplers. implementation “machine-independent” Qualitatively large impacts: and best of 121 choice of RNG ⇒ cost of options: 4 : 52 × slower. sampling ⇒ cost of signing.

  85. 23 24 rking toolkit Another interesting example: Two examples of speed repo implementations lattice-based signing typically in this 2017 paper for a 3.4GHz rimitives. means generating a huge number Skylake (Intel Core i7-6700): Salsa20. of random Gaussian samples. 383.69 MByte/sec (8.86 simple ref 2017.03 Brannigan–Smyth–Oder– cycles/byte) for AES CTR-D with Valencia–O’Sullivan–G¨ uneysu– using AES-NI; 106.07 MByte/sec Regazzoni “An investigation of (32 cycles/byte) for ChaCha20. -fomit-frame-pointer fastest sources of randomness within discrete Gaussian sampling”: benchmarks for RNGs, samplers. endent” Qualitatively large impacts: 121 choice of RNG ⇒ cost of slower. sampling ⇒ cost of signing.

  86. 24 25 Another interesting example: Two examples of speed reported lattice-based signing typically in this 2017 paper for a 3.4GHz means generating a huge number Skylake (Intel Core i7-6700): of random Gaussian samples. 383.69 MByte/sec (8.86 2017.03 Brannigan–Smyth–Oder– cycles/byte) for AES CTR-DRBG Valencia–O’Sullivan–G¨ uneysu– using AES-NI; 106.07 MByte/sec Regazzoni “An investigation of (32 cycles/byte) for ChaCha20. sources of randomness within discrete Gaussian sampling”: benchmarks for RNGs, samplers. Qualitatively large impacts: choice of RNG ⇒ cost of sampling ⇒ cost of signing.

  87. 24 25 Another interesting example: Two examples of speed reported lattice-based signing typically in this 2017 paper for a 3.4GHz means generating a huge number Skylake (Intel Core i7-6700): of random Gaussian samples. 383.69 MByte/sec (8.86 2017.03 Brannigan–Smyth–Oder– cycles/byte) for AES CTR-DRBG Valencia–O’Sullivan–G¨ uneysu– using AES-NI; 106.07 MByte/sec Regazzoni “An investigation of (32 cycles/byte) for ChaCha20. sources of randomness within But wait. eBACS reports discrete Gaussian sampling”: 0.92 cycles/byte for AES-256-CTR, benchmarks for RNGs, samplers. 1.18 cycles/byte for ChaCha20. Qualitatively large impacts: Author non-response: “essential choice of RNG ⇒ cost of for us to examine standard open sampling ⇒ cost of signing. implementations”. Slow ones?

  88. 24 25 Another interesting example: Two examples of speed reported lattice-based signing typically in this 2017 paper for a 3.4GHz generating a huge number Skylake (Intel Core i7-6700): random Gaussian samples. 383.69 MByte/sec (8.86 2017.03 Brannigan–Smyth–Oder– cycles/byte) for AES CTR-DRBG alencia–O’Sullivan–G¨ uneysu– using AES-NI; 106.07 MByte/sec Regazzoni “An investigation of (32 cycles/byte) for ChaCha20. sources of randomness within But wait. eBACS reports discrete Gaussian sampling”: 0.92 cycles/byte for AES-256-CTR, enchmarks for RNGs, samplers. 1.18 cycles/byte for ChaCha20. Qualitatively large impacts: Author non-response: “essential of RNG ⇒ cost of for us to examine standard open sampling ⇒ cost of signing. implementations”. Slow ones?

  89. 24 25 interesting example: Two examples of speed reported signing typically in this 2017 paper for a 3.4GHz generating a huge number Skylake (Intel Core i7-6700): aussian samples. 383.69 MByte/sec (8.86 Brannigan–Smyth–Oder– cycles/byte) for AES CTR-DRBG alencia–O’Sullivan–G¨ uneysu– using AES-NI; 106.07 MByte/sec investigation of (32 cycles/byte) for ChaCha20. randomness within But wait. eBACS reports Gaussian sampling”: 0.92 cycles/byte for AES-256-CTR, RNGs, samplers. 1.18 cycles/byte for ChaCha20. rge impacts: Author non-response: “essential cost of for us to examine standard open cost of signing. implementations”. Slow ones?

  90. 24 25 example: Two examples of speed reported ypically in this 2017 paper for a 3.4GHz number Skylake (Intel Core i7-6700): samples. 383.69 MByte/sec (8.86 Brannigan–Smyth–Oder– cycles/byte) for AES CTR-DRBG uneysu– using AES-NI; 106.07 MByte/sec investigation of (32 cycles/byte) for ChaCha20. within But wait. eBACS reports sampling”: 0.92 cycles/byte for AES-256-CTR, samplers. 1.18 cycles/byte for ChaCha20. impacts: Author non-response: “essential for us to examine standard open signing. implementations”. Slow ones?

Recommend


More recommend