based on the reuse of repetitive data
play

based on the Reuse of Repetitive Data Jin Hyung Park and Dong Hoon - PowerPoint PPT Presentation

FACE : Fast AES CTR mode Encryption Techniques based on the Reuse of Repetitive Data Jin Hyung Park and Dong Hoon Lee Center for Information Security Technologies, Korea University FACE 1 Introduction The counter is


  1. FACE : Fast AES CTR mode Encryption Techniques based on the Reuse of Repetitive Data Jin Hyung Park and Dong Hoon Lee Center for Information Security Technologies, Korea University

  2. FACE 1 Introduction     The counter is incremented for each block IV(Counter) + (i – 1) Block Cipher Block Cipher Block Cipher K K K Encryption Encryption Encryption Plaintext 0 Ciphertext 0 Plaintext 1 Ciphertext 1 Plaintext i-1 Ciphertext i-1 1 st block i th block 2 nd block

  3. FACE 2 Introduction   The counter is incremented for each block IV(Counter) + (i – 1) Block Cipher Block Cipher Block Cipher K K K Encryption Encryption Encryption Plaintext 0 Ciphertext 0 Plaintext 1 Ciphertext 1 Plaintext i-1 Ciphertext i-1 1 st block 2 nd block i th block

  4. FACE 2 Introduction   1 st block 2 nd block CTR 0 : 0x00 00 00 00 0x00 00 00 00 0x00 00 00 00 0x00 00 00 01 CTR 1 : 0x00 00 00 00 0x00 00 00 00 0x00 00 00 00 0x00 00 00 02 State State 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Round Key Round Key 00 00 00 01 00 00 00 02 8A 48 ED AC 8A 48 ED AC 4F 7B 83 56 4F 7B 83 56 BA 5E 6A 50 5E BA 6A 50 State State 59 B3 C4 38 59 C4 38 B3 8A 48 ED AC 8A 48 ED AC 4F 7B 83 56 4F 7B 83 56 BA BA 5E 6A 50 5E 6A 50 59 B3 C4 39 59 B3 C4 3A < Initial Whitening phase of AES >

  5. FACE 2 Introduction   CTR 0 : 0x00 00 00 00 0x00 00 00 00 0x00 00 00 00 0x00 00 00 01 CTR 1 : 0x00 00 00 00 0x00 00 00 00 0x00 00 00 00 0x00 00 00 02 State State 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Round Key Round Key 00 00 00 01 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 State State 00 00 00 01 00 00 01 00 00 00 00 00 00 00 00 00 Counter-mode Caching** 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 *

  6. FACE 3 Round Function - 4 Transformations      [4] [8] [12] [0] [4] [8] [12] [0] [4] [8] [12] [0] [5] [9] [13] [1] [1] [5] [9] [13] [1] [5] [9] [13] [10] [14] [2] [6] [6] [10] [14] [2] [6] [10] [14] [2] Shift S-Box [7] [11] [15] [3] [7] [11] [15] [15] [3] [7] [11] [3]    Round Key State 2 3 1 1 [0] [4] [8] [12] [4] [8] [12] [0] [4] [8] [12] [0] [0] [4] [8] [12] 1 2 3 1 [5] [9] [13] [1] [9] [13] [1] • [5] [9] [13] [1] [1] [5] [9] [13] [5] 1 1 2 3 [10] [14] [2] [6] [10] [14] [2] [6] [2] [6] [10] [14] [10] [14] [2] [6] 3 1 1 2 [15] [3] [7] [11] [3] [7] [11] [3] [7] [11] [15] [15] [3] [7] [11] [15]

  7. FACE 4 AES Implementation Methods  static const u32 Te0[256] = { static const u32 Te3[256] = { 0xc66363a5U, 0xf87c7c84U, 0xee777799U, 0xf67b7b8dU, 0x6363a5c6U, 0x7c7c84f8U, 0x777799eeU, 0x7b7b8df6U, … 0xfff2f20dU, 0xd66b6bbdU, 0xde6f6fb1U, 0x91c5c554U, 0xf2f20dffU, 0x6b6bbdd6U, 0x6f6fb1deU, 0xc5c55491U, 0x60303050U, 0x02010103U, 0xce6767a9U, 0x562b2b7dU, 0x30305060U, 0x01010302U, 0x6767a9ceU, 0x2b2b7d56U, 0xe7fefe19U, 0xb5d7d762U, 0x4dababe6U, 0xec76769aU, 0xfefe19e7U, 0xd7d762b5U, 0xababe64dU, 0x76769aecU, … … 0x824141c3U, 0x299999b0U, 0x5a2d2d77U, 0x1e0f0f11U, 0x4141c382U, 0x9999b029U, 0x2d2d775aU, 0x0f0f111eU, 0x7bb0b0cbU, 0xa85454fcU, 0x6dbbbbd6U, 0x2c16163aU, 0xb0b0cb7bU, 0x5454fca8U, 0xbbbbd66dU, 0x16163a2cU, }; }; < OpenSSL > s0 = GETU32(in ) ^ rk[0]; s1 = GETU32(in + 4) ^ rk[1]; s2 = GETU32(in + 8) ^ rk[2]; s3 = GETU32(in + 12) ^ rk[3]; /* round 1: */ t0 = Te0[s0 >> 24] ^ Te1[(s1 >> 16) & 0xff] ^ Te2[(s2 >> 8) & 0xff] ^ Te3[s3 & 0xff] ^ rk[ 4]; t1 = Te0[s1 >> 24] ^ Te1[(s2 >> 16) & 0xff] ^ Te2[(s3 >> 8) & 0xff] ^ Te3[s0 & 0xff] ^ rk[ 5]; t2 = Te0[s2 >> 24] ^ Te1[(s3 >> 16) & 0xff] ^ Te2[(s0 >> 8) & 0xff] ^ Te3[s1 & 0xff] ^ rk[ 6]; t3 = Te0[s3 >> 24] ^ Te1[(s0 >> 16) & 0xff] ^ Te2[(s1 >> 8) & 0xff] ^ Te3[s2 & 0xff] ^ rk[ 7];

  8. FACE 4 AES Implementation Methods  static const u32 Te0[256] = { static const u32 Te3[256] = { 0xc66363a5U, 0xf87c7c84U, 0xee777799U, 0xf67b7b8dU, 0x6363a5c6U, 0x7c7c84f8U, 0x777799eeU, 0x7b7b8df6U, … 0xfff2f20dU, 0xd66b6bbdU, 0xde6f6fb1U, 0x91c5c554U, 0xf2f20dffU, 0x6b6bbdd6U, 0x6f6fb1deU, 0xc5c55491U, 0x60303050U, 0x02010103U, 0xce6767a9U, 0x562b2b7dU, 0x30305060U, 0x01010302U, 0x6767a9ceU, 0x2b2b7d56U, 0xe7fefe19U, 0xb5d7d762U, 0x4dababe6U, 0xec76769aU, 0xfefe19e7U, 0xd7d762b5U, 0xababe64dU, 0x76769aecU, … … 0x824141c3U, 0x299999b0U, 0x5a2d2d77U, 0x1e0f0f11U, 0x4141c382U, 0x9999b029U, 0x2d2d775aU, 0x0f0f111eU, 0x7bb0b0cbU, 0xa85454fcU, 0x6dbbbbd6U, 0x2c16163aU, 0xb0b0cb7bU, 0x5454fca8U, 0xbbbbd66dU, 0x16163a2cU, }; }; < OpenSSL > s0 = GETU32(in ) ^ rk[0]; s1 = GETU32(in + 4) ^ rk[1]; s2 = GETU32(in + 8) ^ rk[2]; Vulnerable to Cache timing attack s3 = GETU32(in + 12) ^ rk[3]; /* round 1: */ t0 = Te0[s0 >> 24] ^ Te1[(s1 >> 16) & 0xff] ^ Te2[(s2 >> 8) & 0xff] ^ Te3[s3 & 0xff] ^ rk[ 4]; t1 = Te0[s1 >> 24] ^ Te1[(s2 >> 16) & 0xff] ^ Te2[(s3 >> 8) & 0xff] ^ Te3[s0 & 0xff] ^ rk[ 5]; t2 = Te0[s2 >> 24] ^ Te1[(s3 >> 16) & 0xff] ^ Te2[(s0 >> 8) & 0xff] ^ Te3[s1 & 0xff] ^ rk[ 6]; t3 = Te0[s3 >> 24] ^ Te1[(s0 >> 16) & 0xff] ^ Te2[(s1 >> 8) & 0xff] ^ Te3[s2 & 0xff] ^ rk[ 7];

  9. FACE 5 AES Implementation Methods   < 8 plaintext blocks > < 8 [128-bits] registers > LSB MSB MSB LSB … … b 0 b 1 b 2 b 3 b 12 b 13 b 14 00000000 1010 1 0 1 0 Block 0 : Register 0 : … … b 0 b 1 b 2 b 3 b 12 b 13 b 14 00000001 1100 1 1 0 0 Register 1 : Block 1 : … … b 0 b 2 b 12 b 14 1111 0 0 0 0 b 1 b 3 b 13 00000010 Block 2 : Register 2 : … … b 0 b 2 b 12 b 14 0000 0 0 0 0 b 1 b 3 b 13 00000011 Register 3 : Block 3 : … … b 0 b 1 b 2 b 3 b 12 b 13 b 14 00000100 0000 0 0 0 0 Register 4 : Block 4 : … … b 0 b 1 b 2 b 3 b 12 b 13 b 14 00000101 0000 0 0 0 0 Register 5 : Block 5 : … … b 0 b 1 b 2 b 3 b 12 b 13 b 14 00000110 0000 0 0 0 0 Register 6 : Block 6 : … … b 0 b 1 b 2 b 3 b 12 b 13 b 14 00000111 0000 0 0 0 0 Register 7 : Block 7 : < bitsliced form transformation (OpenSSL implementation based on [1]) >

  10. FACE 6 AES Implementation Methods   Instruction Description AESENC Perform one round of an AES encryption flow AESENCLAST Perform the last round of an AES encryption flow AESDEC Perform one round of an AES decryption flow AESDECLAST Perform the last round of an AES decryption flow AESKEYGENASSIST Assist in AES round key generation AESIMC Assist in AES Inverse Mix Columns PCLMULQDQ Carryless multiply < Crypto++ > *block = _mm_xor_si128( *block , skeys[0] ) ; /* round 1: */ *block = _mm_aesenc_si128 ( *block , skeys[1] ) ;

  11. FACE 7 AES Implementation Methods Performance Method Test Environment Reference (Cycles per Byte) 10.57 + α Table-based Core 2 Quad Q6600 INDOCRYPT 2008 [1] (not for CTR) 9.32 Core 2 Quad Q6600 Bitslicing CHES 2009 [2] 7.59 Core 2 Quad Q9550 1.4 - 2.0 Westmere Processor INTEL whitepaper [3] AES-NI 0.57 Skylake Core i5 Crypto++ Benchmark [4] [1] : Daniel J. Bernstein and Peter Schwabe , “New AES software speed records”, INDOCRYPT 2008 [2] : Emilia K äsper and Peter Schwabe , “Faster and Timing -Attack Resistant AES- GCM”, CHES 2009 [3] : Shay Gueron , “Intel Advanced Encryption Standard (AES) New Instructions Set”, May, 2010 ( The first Westmere-based processors (that supports AES-NI) were launched on Jan, 2010. ) [4] : Crypto++ 6.0.0 Benchmarks, https://www.cryptopp.com/benchmarks.html, 2017. 12

  12. FACE 8 Problem  Bits itsli lice ce AES ES-NI NI aesenc xmm15, xmm1  only 1 instruction performs round operation During a format conversion, each byte of input is sliced bitwise. And the sliced bits are spread Adding some operations to calculate the rest in the corresponding positions of each register becomes a considerable burden even if instruction latency and Necessary input bytes to calculate the rest throughput differ from each instruction are spread to whole register Such operations (for the rest) Almost the whole instructions of should be composed of previous implementation should be performed several instructions with additional operations (save, load, merge)

  13. FACE 9 Our Work (FACE) FACE  Extends the counter-mode caching  FACE   The first to combine counter-mode caching with bitsliced implementation  The first to apply counter-mode caching up to the round transformations of AES-NI FACE the highest throughput  

  14. FACE 10 Fast AES Counter mode Encryption FACE (Fast AES Counter Mode Encryption)    12 bytes 16 bytes - - 12 bytes - 255 255 - - 2 32 - 1 -   4K 1K - - 2 40 2 40 - -

Recommend


More recommend