efficient algorithms in software
play

Efficient Algorithms in Software Julio Lpez jlopez@ic.unicamp.br - PowerPoint PPT Presentation

Efficient Algorithms in Software Julio Lpez jlopez@ic.unicamp.br Institute of Computing, University of Campinas September 2017, Habana, Cuba. ASCrypto 2017 Agenda 1 Efficient Software Implementations Software Efficiency Parallel


  1. Symmetric-Key Cryptography Data Encryption Secure Communication • Alice and Bob would like to communicate through an insecure channel. • Charles is a malicious third party that has also access to the channel. • It is desired that Charles does not be able to read messages interchanged by Alice and Bob. 0111100001100010101011111010 Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 11 / 83

  2. Symmetric-Key Cryptography Data Encryption Symmetric Data Encryption Using a secret key k , Alice and Bob can interchange encrypted messages. Charles can not read the messages without the knowledge of the key k . k k Key Generation ( M , k ) M encryption C C decryption 0111100001100010101011111010 C = E k ( M ) M = D k ( C ) Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 12 / 83

  3. Symmetric-Key Cryptography Data Encryption Advanced Encryption Standard (AES) • AES, 1998 (Daemen and Rijmen) • AES (2000) is the current NIST standard for encrypting data using a symmetric key. • AES is a cipher that encrypts a 128-bit plaintext ( M ) producing a 128-bit ciphertext ( C ) using a key k . k M AES C • AES supports three key sizes, | k | = { 128 , 192 , 256 } , leading to three algorithms: • AES-128. • AES-192. • AES-256. Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 13 / 83

  4. Symmetric-Key Cryptography Data Encryption AES State Representation AES keeps track of a 128-bit state, which can be seen as a 4 × 4 matrix of bytes. . . . M C k 0 k N r In each round, AES applies a series of transformations over the matrix.  10 if | k | = 128    N r = 12 if | k | = 192    14 if | k | = 256 After N r rounds, the last state is returned as the ciphertext. Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 14 / 83

  5. Symmetric-Key Cryptography Data Encryption AES State Transformations • SubBytes • ShiftRows • MixColumns • AddRoundKey For decryption, transformations are inverted and applied in reverse order. Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 15 / 83

  6. Symmetric-Key Cryptography Data Encryption AES Mix Column-Encryption pe = { 03 } x 3 + { 01 } x 2 + { 01 } x + { 02 } c = pe ⊗ c = M e ⊗ c       c 0 02 03 01 01 c 0  c 1   01 02 03 01   c 1         =       c 2 01 01 02 03 c 2      c 3 03 01 01 02 c 3 Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 16 / 83

  7. Symmetric-Key Cryptography Data Encryption AES Mix Column-Decryption pd = { 0 b } x 3 + { 0 d } x 2 + { 09 } x + { 0 e } c = pd ⊗ c = M d ⊗ c       c 0 0 e 0 b 0 d 09 c 0  c 1   09 0 e 0 b 0 d   c 1         =       c 2 0 d 09 02 0 b c 2      c 3 0 b 0 d 09 0 e c 3 Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 17 / 83

  8. Symmetric-Key Cryptography Data Encryption The AES-NI Instruction Set In 2010, Intel released a set of instructions to perform the AES algorithm. Plaintext Plaintext AddRoundKey AddRoundKey AESDECLAST InvSubBytes SubBytes InvShiftRows N r − 1 ShiftRows AESENC MixColumns AddRoundKey AddRoundKey N r − 1 InvMixColumns AESDEC InvSubBytes SubBytes InvShiftRows AESENCLAST ShiftRows AddRoundKey AddRoundKey Ciphertext Ciphertext Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 18 / 83

  9. Symmetric-Key Cryptography Data Encryption AES-128 Encryption Encrypting a 128-bit block (stored in xmm15 ) using the key schedule (stored in xmm0-xmm10 ). N r = 10 . MOVQDA xmm15 , (% rsi) ; Load message block 1 PXOR xmm15 , xmm0 ; AddRoundKey 2 AESENC xmm15 , xmm1 ; Round 1 3 AESENC xmm15 , xmm2 ; Round 2 4 AESENC xmm15 , xmm3 ; Round 3 5 AESENC xmm15 , xmm4 ; Round 4 6 AESENC xmm15 , xmm5 ; Round 5 7 AESENC xmm15 , xmm6 ; Round 6 8 AESENC xmm15 , xmm7 ; Round 7 9 AESENC xmm15 , xmm8 ; Round 8 10 AESENC xmm15 , xmm9 ; Round 9 11 AESENCLAST xmm15 , xmm10 ; Round 10 12 MOVQDA (% rdi), xmm15 ; Store cipher block 13 Analogously, for decryption use AESDEC , AESDECLAST and invert the key schedule using AESIMC . Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 19 / 83

  10. Symmetric-Key Cryptography Data Encryption Modes of Operation Splitting a long message into 128-bit blocks and encrypting each one is not secure! (ECB Mode) Modes of operation are used for encrypting arbitrary-length messages using a block cipher as a building block. • CBC. Cipher block chaining. • CTR. Counter mode. • GCM. Galois-counter mode. (Authenticated encryption) Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 20 / 83

  11. Symmetric-Key Cryptography Data Encryption Cipher Block Chaining (CBC) P 1 P 2 P 3 P 4 C 1 C 2 C 3 C 4 IV D k D k D k D k E k E k E k E k IV C 1 C 2 C 3 C 4 P 1 P 2 P 3 P 4 Encryption Decryption (sequential execution) (parallel execution) Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 21 / 83

  12. Symmetric-Key Cryptography Data Encryption Counter mode (CTR) IV+1 IV+2 IV+3 IV+4 IV+1 IV+2 IV+3 IV+4 E k E k E k E k E k E k E k E k P 1 P 2 P 3 P 4 C 1 C 2 C 3 C 4 C 1 C 2 C 3 C 4 P 1 P 2 P 3 P 4 Encryption Decryption Either encryption and decryption can be executed in parallel. The block cipher encryption is used only. Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 22 / 83

  13. Symmetric-Key Cryptography Data Encryption Performance of AES-128-CBC Encryption The performance is determined by the latency of the AESENC instruction. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Clock Latency · · · · · · · · · AESENC AESENC AESENC µ -arch Latency CBC-ENC Intel Haswell 7 4.49 Intel Skylake 4 2.71 AMD Zen 4 2.44 Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 23 / 83

  14. Symmetric-Key Cryptography Data Encryption Pipelined AES Implementation The execution of AESENC instruction can be overlapped with other instructions of the same type. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Clock Latency · · · · · · · · · AESENC AESENC AESENC · · · · · · · · · AESENC AESENC AESENC · · · · · · · · · AESENC AESENC AESENC w = 4 · · · · · · · · · AESENC AESENC AESENC Throughput Processor’s pipeline improves performance of CBC-DEC and CTR modes. Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 24 / 83

  15. Symmetric-Key Cryptography Data Encryption Performance of AES-128-CBC Decryption w = 1 w = 2 w = 4 1.4 1.2 Running Time (cycles-per-byte) 1.0 0.8 0.6 0.4 0.2 0.0 Haswell Skylake Zen Scheduling w = 4 AES-NI instructions, the performance of decryption is improved. Can we do better? Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 25 / 83

  16. Symmetric-Key Cryptography Data Encryption Performance of AES-128-CBC Decryption w = 1 w = 2 w = 4 w = 8 1.4 1.2 Running Time (cycles-per-byte) 1.0 0.8 0.6 0.4 0.2 0.0 Haswell Skylake Zen Yes! Zen has two execution units for AES-NI instructions. µ -arch Latency CBC-ENC CBC-DEC Intel Haswell 7 4.49 0.63 Intel Skylake 4 2.71 0.62 AMD Zen 4 2.44 0.37 Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 25 / 83

  17. Symmetric-Key Cryptography Data Encryption Performance of AES-128-CTR Mode w = 2 w = 4 w = 8 Sequential 1.4 1.2 Running Time (cycles-per-byte) 1.0 0.8 0.6 0.4 0.2 0.0 Haswell Skylake Zen µ -arch Latency CBC-ENC CBC-DEC CTR Intel Haswell 7 4.49 0.63 0.74 Intel Skylake 4 2.71 0.62 0.62 AMD Zen 4 2.44 0.37 0.39 Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 26 / 83

  18. 2.2 Hash Functions

  19. Symmetric-Key Cryptography Hash Functions Hash Function A hash function maps an arbitrary-length bit-string into a n -bit string. h : { 0 , 1 } ∗ → { 0 , 1 } n The output of a hash function is called as digest or hash value. Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 27 / 83

  20. Symmetric-Key Cryptography Hash Functions Cryptographic Properties 1st pre-image. Given a hash value r it should be difficult to find any message M such that r = h ( M ) . 2nd pre-image. Given an input M 1 it should be difficult to find a different input M 2 such that h ( M 1 ) = h ( M 2 ) . Collision resistant. It should be difficult to find two different messages M 1 and M 2 such that h ( M 1 ) = h ( M 2 ) . Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 28 / 83

  21. Symmetric-Key Cryptography Hash Functions Applications of Hash Functions There is a large number of applications of cryptographic hash functions: • Verifying the integrity of files or messages. • Password verification. • Pseudo-random number generation. • Key derivation functions. • Digital signatures. Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 29 / 83

  22. Symmetric-Key Cryptography Hash Functions NIST Hash Functions 1993 · · ·• SHA-0: Secure Hash Algorithm (160 bits). 1995 · · ·• SHA-1: output 160 bits. 2001 · · ·• SHA-2: output: 224, 256, 384, 512. 2015 · · ·• SHA-3 Keccak, output: 224, 256, 384, 512. 2015 · · ·• SHA-3 (SHAKE128, SHAKE256), output: m (arbitrary) (FIPS) 180-4. Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 30 / 83

  23. 2.3 SHA2 Implementation

  24. Symmetric-Key Cryptography SHA2 Implementation SHA2 Algorithm SHA2-256 operates as follows. • Initialize state S 0 with constant values. • After padding, the message is split into n 512-bit blocks: M 1 , . . . , M n . • For each block M j : S j = Update ( S j − 1 , M j ) for 1 ≤ j ≤ n • The digest of M is H ( M ) = S n . Update consists of two phases: 1 Message Schedule. 2 State Update. Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 31 / 83

  25. Symmetric-Key Cryptography SHA2 Implementation Update Phase 1: Message Schedule Let w 0 , . . . , w 15 be the message block M i split into 16 words of 32 bits, then, the message schedule calculates 48 new words: w i ← σ 0 ( w i − 15 ) + σ 1 ( w i − 2 ) + w i − 7 + w i − 16 , for 16 ≤ i < 64 . where σ 0 ( x ) = Rot ( x, 7) ⊕ Rot ( x, 18) ⊕ Shr ( x, 3) σ 1 ( x ) = Rot ( x, 17) ⊕ Rot ( x, 19) ⊕ Shr ( x, 10) Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 32 / 83

  26. Symmetric-Key Cryptography SHA2 Implementation Update Phase 2: State Update ( a 0 , b 0 , c 0 , d 0 , e 0 , f 0 , g 0 , h 0 ) ← S T 2 i for i ← 0 to 63 do T 1 ← h i ⊞ Σ 1 ( e i ) ⊞ Ch ( e i , f i , g i ) ⊞ a i a i +1 k i ⊞ w i b i +1 b i T 2 ← Σ 0 ( a i ) ⊞ Maj ( a i , b i , c i ) c i c i +1 h i +1 ← g i , g i +1 ← f i d i +1 d i f i +1 ← e i , e i +1 ← d i ⊞ T 1 e i e i +1 d i +1 ← c i , c i +1 ← b i f i +1 f i b i +1 ← a i , a i +1 ← T 1 ⊞ T 2 g i g i +1 end for S ′ ← ( a 0 ⊞ a 63 , . . . , h 0 ⊞ h 63 ) h i +1 h i T 1 i w i k i ⊞ is addition modulo 2 32 . Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 33 / 83

  27. Symmetric-Key Cryptography SHA2 Implementation SHA New Instructions (SHA-NI) In 2013, Intel released the specification of the SHA New Instructions (SHA-NI). • Since 2016 it was supported by Goldmont Intel micro-architecture. • Zen AMD’s micro-architecture also added support in 2017. SHA1: SHA2-256 (and SHA2-224): • SHA1MSG1 • SHA256MSG1 • SHA1MSG2 • SHA256MSG2 • SHA1NEXTE • SHA256RNDS2 • SHA1RNDS4 Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 34 / 83

  28. Symmetric-Key Cryptography SHA2 Implementation Implementation of Phase 1a: Message Schedule The SHA256MSG1 instruction performs the following operation: x i = σ 0 ( w i +1 ) + w i , for 0 ≤ i < 4 . xmm0 xmm1 w 7 w 6 w 5 w 4 w 3 w 2 w 1 w 0 σ 0 σ 0 σ 0 σ 0 + + + + x 3 x 2 x 1 x 0 xmm2 Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 35 / 83

  29. Symmetric-Key Cryptography SHA2 Implementation Implementation of Phase 1b: Message Schedule The SHA256MSG2 instruction performs the following operation: w i +16 = σ 1 ( w i +14 ) + y i , for 0 ≤ i < 4 . xmm0 xmm1 y 3 y 2 y 1 y 0 w 15 w 14 w 13 w 12 σ 1 σ 1 + + + + w 19 w 18 w 17 w 16 xmm2 σ 1 σ 1 Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 36 / 83

  30. Symmetric-Key Cryptography SHA2 Implementation Implementation of Phase 2: Two Iterations Let A i = [ a i , b i , e i , f i ] and C = [ c i , d i , g i , h i ] be the state at the i -th iteration. Then, it holds that: C i +2 = A i The remaining values A i +2 = [ a i +2 , b i +2 , e i +2 , f i +2 ] are calculated by the SHA256RNDS2 instruction: A i +2 = SHA256RNDS2 ( A i , C i , X ) where X = [ w i + k i , w i +1 + k i +1 ] . Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 37 / 83

  31. Symmetric-Key Cryptography SHA2 Implementation Implementation of Phase 2: Two Iterations T 2 i T 2 i +1 a i a i +1 a i +2 b i +1 b i +2 b i = c i c i +1 c i +2 a i d i +1 d i +2 d i e i e i +1 e i +2 f i f i +1 f i +2 g i g i +1 g i +2 T 1 i h i +1 T 1 i +1 h i +2 h i w i w i +1 k i k i +1 Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 38 / 83

  32. Symmetric-Key Cryptography SHA2 Implementation Implementation of Phase 2: Two Iterations T 2 i T 2 i +1 a i a i +1 a i +2 b i +1 b i +2 b i = = c i c i +1 c i +2 a i a i = d i +1 d i +2 d i b i e i e i +1 e i +2 f i f i +1 f i +2 = g i g i +1 g i +2 e i = T 1 i h i +1 T 1 i +1 h i +2 f i h i w i w i +1 k i k i +1 Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 38 / 83

  33. Symmetric-Key Cryptography SHA2 Implementation Implementation of Phase 2: Four Iterations Using two SHA256RNDS2 instructions, one can compute four iterations of the Update function: C i +2 = A i A i +2 = SHA256RNDS2 ( C i , A i , X ) C i +4 = A i +2 A i +4 = SHA256RNDS2 ( C i +2 , A i +2 , Y ) where X = [ w i + k i , w i +1 + k i +1 ] and Y = [ w i +2 + k i +2 , w i +3 + k i +3 ] . This is equivalent to: C i +4 = SHA256RNDS2 ( C i , A i , X ) A i +4 = SHA256RNDS2 ( A i , C i +4 , Y ) Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 39 / 83

  34. Symmetric-Key Cryptography SHA2 Implementation Performance of SHA2-256 using SHA-NI SHA-NI is 4-5 × faster than 64-bit implementations of SHA2-256. 2 10 2 9 5 × Running Time 2 8 (cycles-per-byte) 4 × 2 7 Speedup 2 6 3 × 2 5 2 4 2 × 2 3 2 2 1 × 2 1 1 16 256 1 16 256 4K 64K 1M 4K 64K 1M Message size (bytes) Message size (bytes) sphlib (supercop) OpenSSL SHA-NI Can we do better? Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 40 / 83

  35. Symmetric-Key Cryptography SHA2 Implementation Pipelined Implementation of SHA-NI Like AES-NI, SHA-NI instructions can be executed in pipeline. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Clock Latency · · · · · · · · · SHA256RNDS2 SHA256RNDS2 SHA256RNDS2 · · · · · · · · · SHA256RNDS2 SHA256RNDS2 SHA256RNDS2 · · · · · · · · · w = 4 SHA256RNDS2 SHA256RNDS2 SHA256RNDS2 · · · · · · · · · SHA256RNDS2 SHA256RNDS2 SHA256RNDS2 Throughput Target scenario: multiple hashing ⇒ hash-based signatures (PQ-Crypto). Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 41 / 83

  36. Symmetric-Key Cryptography SHA2 Implementation Performance of Pipelined Implementation of SHA-NI Example: Calculating four hashes (pipelined) is 20% faster than a sequential implementation. Zen (Ryzen 7 1800X processor) 2.5 1 message 2 messages 4 messages 8 messages Running Time (cycles-per-byte) 2.0 1.5 1.0 256 4K 64K 1M Message size (bytes) Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 42 / 83

  37. 2.4 SHA3 Implementation

  38. Symmetric-Key Cryptography SHA3 Implementation The SHA-3 Family of Functions SHA-3 is composed of four hash functions and two XOF called as SHAKE. Security Level 1 Function Output size ( n ) Bit-rate ( r ) SHA-3 224 224 1,152 112 SHA-3 256 256 1,088 128 SHA-3 384 384 832 192 SHA-3 512 512 576 256 SHAKE 128 n 1,344 min( n/ 2 , 128 ) SHAKE 256 n 1,088 min( n/ 2 , 256 ) The input of a SHA-3 is split into blocks of r bits. The larger bit-rate the faster execution. Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 43 / 83

  39. Symmetric-Key Cryptography SHA3 Implementation Extendable-Output Function An extendable-output function ( XOF ) maps an arbitrary length bit string producing a variable-length digest value. XOF : { 0 , 1 } ∗ × N { 0 , 1 } ∗ �→ { 0 , 1 } n ( a, n ) → Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 44 / 83

  40. Symmetric-Key Cryptography SHA3 Implementation The SHA-3 Design The SHA-3 was designed using a sponge construction proposed in 2009 by Bertoni et al. Initializing Absorbing Squeezing Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 45 / 83

  41. Symmetric-Key Cryptography SHA3 Implementation Sponge Construction Initializing: The state has 1,600 bits that are initialized to 0; then, the input is split into blocks of r bits. Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 46 / 83

  42. Symmetric-Key Cryptography SHA3 Implementation Sponge Construction Absorbing: Each block is added to the first r bits of the state; then, the state is processed by a permutation function P . Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 46 / 83

  43. Symmetric-Key Cryptography SHA3 Implementation Sponge Construction Squeezing : After the input was consumed, the function P is used to produce ⌊ n/r ⌋ output blocks of r bits concatenated with n (mod r ) bits taken from the last state. Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 46 / 83

  44. Symmetric-Key Cryptography SHA3 Implementation Permutation Function P The state has 1 , 600 bits and is represented by 5 × 5 matrix S , each entry of the matrix is 64-bit word.   s 0 s 1 s 2 s 3 s 4 s 5 s 6 s 7 s 8 s 9   S = s 10 s 11 s 12 s 13 s 14  ; S [ x, y ] = s 5 x + y for 0 ≤ x, y < 5 .    s 15 s 16 s 17 s 18 s 19 s 20 s 21 s 22 s 23 s 24 The permutation P consists of 24 rounds applying the transformations: θ ρ ι 24 χ π Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 47 / 83

  45. Symmetric-Key Cryptography SHA3 Implementation Using 256-bit instructions The SHA-3 state is stored in seven 256-bit registers. s 0 s 1 s 2 s 3 Y 0 s 5 s 6 s 7 s 8 Y 1 Pros: s 10 s 11 s 12 s 13 Y 2 • It uses just few 256-bit s 15 s 16 s 17 s 18 Y 3 vector registers. Cons: s 20 s 21 s 22 s 23 Y 4 • The permutation s 24 s 24 s 24 s 24 Y 5 instructions of AVX-2 are s 4 s 9 s 14 s 19 Y 6 expensive. • Y i : 256-bit vector registers. • s i : 64-bit words. Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 48 / 83

  46. Symmetric-Key Cryptography SHA3 Implementation Using 128-bit instructions • State representation. s 0 s 1 s 15 s 16 X 0 X 7 • The state uses 12 s 2 s 3 s 17 s 18 X 1 X 8 variables of 256 bits. • Pros: s 5 s 6 s 14 s 19 X 2 X 9 • The permutation X 3 s 7 s 8 X 10 s 20 s 21 instructions of SSE4 are cheaper than s 4 s 9 s 22 s 23 X 4 X 11 AVX-2. s 10 s 11 s 24 s 24 X 5 X 12 • Cons: • It uses more variables. s 12 s 13 X 6 • X i : 128-bit vector registers. Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 49 / 83

  47. Symmetric-Key Cryptography SHA3 Implementation 4-way implementation • State representation. s 1 s 2 s 3 s 4 Y 0 0 0 0 0 • The state uses 25 s 1 s 2 s 3 s 4 variables of 256 bits. Y 1 1 1 1 1 • Pros: s 1 s 2 s 3 s 4 Y 2 2 2 2 2 • There is no 64-bit . . . . permutations. . . • Cons: s 1 s 2 s 3 s 4 Y 22 22 22 22 22 • It uses many variables s 1 s 2 s 3 s 4 Y 23 and the processor has 23 23 23 23 only 16 registers. s 1 s 2 s 3 s 4 Y 24 24 24 24 24 • Y i : 256-bit vector registers. Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 50 / 83

  48. Symmetric-Key Cryptography SHA3 Implementation Performance of SHA3-128 Function Cycles-per-bytes taken for hashing a message of 4096 bytes. 18 Running Time (cycles-per-byte) 15 12 9 6 3 0 Haswell Skylake Zen x64 x64shld AVX2 generic64 2M-SSE 4M-AVX2 Measurements were taken using the official Keccak Code Package. Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 51 / 83

  49. Symmetric-Key Cryptography SHA3 Implementation SHA3 Parallel Hashing: Two and Four Messages 4 (1M) 64-bit native instructions. Haswell Skylake (2M) 128-bit vector instructions 3 Zen Speedup [SSE2/AVX]. 2 (4M) 256-bit vector instructions [AVX2]. 1 1 2 3 4 Number of messages Performance of Zen does not scale well for hashing 4 messages. Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 52 / 83

  50. Section 3 Elliptic Curve Cryptography

  51. 3.1 Elliptic Curves

  52. Elliptic Curve Cryptography Elliptic Curves ECC: Software Implementation • Introduction • Point Multiplication kP • Elliptic Curve Diffie-Hellman (X25519, X448) • Digital Signature (EdDSA) • Performance (vector instructions on Intel Haswell/Skylake) Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 53 / 83

  53. Elliptic Curve Cryptography Elliptic Curves Elliptic Curve Cryptography (ECC) • In 1985, Koblitz [8] and Miller [9] independently suggested the use of elliptic curves for cryptographic purposes. • ECC achieves the same security as RSA-based protocols using shorter keys sizes. For example: at the 128-bit security level: • RSA uses keys of 3,072 bits • ECC uses keys of 256 bits. • Applications of ECC: • Key-agreement protocols. • Digital signatures. • Bitcoin. • End-to-end encryption. • Smart cards security. Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 54 / 83

  54. Elliptic Curve Cryptography Elliptic Curves Mathematical Aspects of Elliptic Curves • An elliptic curve is defined by the following equation: E/ F p : y 2 + a 1 xy + a 3 y = x 3 + a 2 x 2 + a 4 x + a 6 where a 1 , a 2 , a 3 , a 4 , a 6 ∈ F p and p is a prime number. • The points of an elliptic curve form a commutative group, with O as identity. ( E, +) = { ( x, y ) ∈ E } ∪ {O} • The addition of two different points ( x 3 , y 3 ) = ( x 1 , y 1 ) + ( x 2 , y 2 ) is calculated as: � y 2 − y 1 � 2 x 3 = − x 1 − x 2 x 2 − x 1 � y 2 − y 1 � y 3 = ( x 1 − x 3 ) − y 1 x 2 − x 1 Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 55 / 83

  55. Elliptic Curve Cryptography Elliptic Curves Point Addition Let P and Q two points in the curve, then we can compute P + Q using a geometric construction: Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 56 / 83

  56. Elliptic Curve Cryptography Elliptic Curves Point Addition Let P and Q two points in the curve, then we can compute P + Q using a geometric construction: • Trace a line passing through P and Q . Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 56 / 83

  57. Elliptic Curve Cryptography Elliptic Curves Point Addition Let P and Q two points in the curve, then we can compute P + Q using a geometric construction: • Trace a line passing through P and Q . • This line will intersect the curve in a point R . Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 56 / 83

  58. Elliptic Curve Cryptography Elliptic Curves Point Addition Let P and Q two points in the curve, then we can compute P + Q using a geometric construction: • Trace a line passing through P and Q . • This line will intersect the curve in a point R . • Trace a vertical line passing through R . Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 56 / 83

  59. Elliptic Curve Cryptography Elliptic Curves Point Addition Let P and Q two points in the curve, then we can compute P + Q using a geometric construction: • Trace a line passing through P and Q . • This line will intersect the curve in a point R . • Trace a vertical line passing through R . • The point where this line intersects the curve will be defined as the addition P + Q . Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 56 / 83

  60. Elliptic Curve Cryptography Elliptic Curves Point Doubling The addition of a point P with itself can be computed as follows: Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 57 / 83

  61. Elliptic Curve Cryptography Elliptic Curves Point Doubling The addition of a point P with itself can be computed as follows: • Trace a line tangent to the curve at point P . Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 57 / 83

  62. Elliptic Curve Cryptography Elliptic Curves Point Doubling The addition of a point P with itself can be computed as follows: • Trace a line tangent to the curve at point P . • The line will intersect to the curve in a point R . Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 57 / 83

  63. Elliptic Curve Cryptography Elliptic Curves Point Doubling The addition of a point P with itself can be computed as follows: • Trace a line tangent to the curve at point P . • The line will intersect to the curve in a point R . • Trace a vertical line passing through R . Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 57 / 83

  64. Elliptic Curve Cryptography Elliptic Curves Point Doubling The addition of a point P with itself can be computed as follows: • Trace a line tangent to the curve at point P . • The line will intersect to the curve in a point R . • Trace a vertical line passing through R . • The point were this line intersects to the curve is defined as 2 P . Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 57 / 83

  65. Elliptic Curve Cryptography Elliptic Curves Point Multiplication kP Given an integer number k and a point P ∈ E , point multiplication is defined as: kP = P + P + · · · + P � �� � k times Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 58 / 83

  66. Elliptic Curve Cryptography Elliptic Curves Point Multiplication kP Given an integer number k and a point P ∈ E , point multiplication is defined as: kP = P + P + · · · + P � �� � k times 15 P = (1111) 2 P = (2 3 + 2 2 + 2 1 + 1) P = 2 3 P + 2 2 P + 2 1 P + P Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 58 / 83

  67. Elliptic Curve Cryptography Elliptic Curves Point Multiplication kP Given an integer number k and a point P ∈ E , point multiplication is defined as: kP = P + P + · · · + P � �� � k times 15 P = (1111) 2 P = (2 3 + 2 2 + 2 1 + 1) P = 2 3 P + 2 2 P + 2 1 P + P kP = k n − 1 2 n − 1 + · · · + k 1 2 P + k 0 P Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 58 / 83

  68. Elliptic Curve Cryptography Elliptic Curves Point Multiplication: Double-and-Add algorithm Input: P ∈ E and k ∈ Z + . Output: kP ( k n − 1 , . . . , k 1 , k 0 ) 2 ← k Q ← O for i ← n − 1 to 0 do Q ← 2 Q Q ← Q + k i P end for return Q Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 59 / 83

  69. Elliptic Curve Cryptography Elliptic Curves Techniques for kP The operation kP can be performed using different techniques: • Double-and-Add Algorithm (right-to-left) • Montgomery Algorithm. • w -NAF representations. • Fixed recoding representations. • Elliptic curves with endomorphism, GLV/GLS curves. Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 60 / 83

  70. Elliptic Curve Cryptography Elliptic Curves Elliptic Curve Discrete Logarithm Problem (ECDLP) Given two points, P and Q , the problem of finding an integer k such that Q = kP is known as the elliptic curve discrete logarithm problem. • The Pollard’s algorithm is the best known algorithm that solves ECDLP. The complexity of this algorithm is: �� � O # E ( F p ) , where # E ( F p ) ≈ p is the number of points in the curve. • For example: an elliptic curve defined over a prime field such that p ≈ 2 256 then 2 128 operations are required to solve ECDLP. Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 61 / 83

  71. Elliptic Curve Cryptography Elliptic Curves The Standardized Elliptic Curves by NIST • In 1999, NIST standardized a set of elliptic curves to compute digital signatures (ECDSA) and the key-agreement protocol (ECDH) [10]. • NIST’s curves have the following equation: y 2 = x 3 − 3 x + b E/ F p : • Prime curves: P-256 and P-384 P-256 P-384 Security 128-bit 192-bit 2 256 − 2 224 + 2 192 + 2 96 − 1 2 384 − 2 128 − 2 96 + 2 32 − 1 p b 0x5ac635d...27d2604b 0xb3312fa...d3ec2aef 2 256 − 2 224 + 2 192 − 2 128 + t 2 384 − t # E t 0xbce6faa...fc632551 0x389cb27...333ad68d Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 62 / 83

  72. Elliptic Curve Cryptography Elliptic Curves RFC7748: Edwards/Montgomery Elliptic Curves On January 2016, the RFC7748 recommends the use of Curve25519 and Curve448 in two elliptic curve models: • Edwards curves: E : ax 2 + y 2 = 1 + dx 2 y 2 . • Montgomery curves: E : v 2 = u 3 + Au 2 + u . Curve25519 Bernstein [1, 2] Curve448 Hamburg [5] Security 128-bit 224-bit 2 255 − 19 2 448 − 2 224 − 1 p ( − 1 , − 121665 ( a, d, A ) 121666 , 486662) (1 , − 39081 , 156326) # E 8 ℓ 4 ℓ 2 252 − 0x14def9dea2f79cd65812631a 2 446 − 0x8335dc163bb124b65129c96fd ℓ 5cf5d3ed e933d8d723a70aadc873d6d54a7bb0d Julio López (IC-UNICAMP) Efficient Algorithms in Software ASCrypto 2017 63 / 83

Recommend


More recommend