Hardware Implementation of Block Cipher: Case Study Using AES - PowerPoint PPT Presentation

Hardware Implementation of Block Cipher: Case Study Using AES Tohoku University Rei Ueno

Acknowledgments Naofumi Homma, Tohoku Univ . Takafumi Aoki, Tohoku Univ . Sumio Morioka, Interstellar technologies, Inc . Noriyuki Miura, Kobe Univ . Kohei Matsuda, Kobe Univ . Makoto Nagata, Kobe Univ . Shivam Bhasin, NTU Yves Mathieu, Telecom ParisTech Tarik Graba, Telecom ParisTech Jean-Luc Danger, Telecom ParisTech 2

This talk n Given a symmetric key cipher, how hardware designer implement and optimize it p For practical application: • With higher efficiency, encryption/decryption unified, on-the-fly key scheduling, without block-wise pipelining p Case study using AES! n Disclaimer p Some modern lightweight ciphers are already optimized and they avoid some concerns in implementing AES p But I still believe that optimization of AES implementation can be feedbacked to cipher designs 3

Hardware architectures of block cipher Un- rolled Datapath replication Area Round- Resource based sharing Serialized Time for one block encryption 4

Hardware architectures of block cipher Un- rolled Datapath replication Pipelining Area Round- Resource based sharing Byte- Datapath Efficient optimization serial hardware Time for one block encryption 5

For practical hardware implementation n Block-chaining modes have been widely deployed p CBC, CMAC, and CCM… n (Un)Parallelizability: Issue on block-wise pipelining p AES hardware achieves 53Gbps, but works only for parallelizable modes [Mathew+ JSSC2011] p Higher throughput ≠ Lower latency n Both encryption and decryption operations n Importance of on-the-fly key scheduling p Off-the-fly key scheduling requires additional memories to store expanded keys p Latency for calculating round keys is nonnegligible if we use AES with key-tweakable modes 6

Outline n Introduction n Related works n Optimized architecture n Optimization of linear functions over tower-field n Performance evaluation n Concluding remarks 7

Conventional architecture 1/2 [Lutz+, CHES 2002] n Enc and Dec datapaths with additional selectors p Overhead of selectors for unification is nontrivial p False paths appear www.chesworkshop.org/ches2002/presentations/Lutz.pdf 8

Conventional architecture 2/2 [Satoh+, AC 2001] n Unify each pair of operation and its inverse p RoundKey requires InvMixColumns p Some MUXs in unified operations p Long critical path 9

Tower-field implementation n Inversion should be performed over tower-field p Tower-field inversion is more efficient than direct mapping (e.g., table-lookup) n Two types of tower-field implementation p Type-I: only inversion is performed over tower-field p Type-II: all operations are performed over tower-field Inversion MixColumns (S-box) InvMixColumns Type-I Good Good Type-II Better Bad 10

Overall architecture Plaintext/Ciphertext Initial key Round function part Ciphertext/Plaintext n Round-based architecture Key scheduling part n On-the-fly key scheduler 12

Round function part n Compress encryption and decryption datapaths by register-retiming and operation-reordering p Unify inversion circuits in encryption and decryption • Without any additional selectors (i.e., overheads) p Merge linear operations to reduce gates and critical delay • Affine/InvAffine and MixColumns/InvMixColumns • At most one linear operation for a round n Type-II tower-field implementation p Isomorphic mappings are performed at data I/O p Lower-area tower-field (Inv)Affine and (Inv)MixColumns 13

Resister-retiming and operation-reordering Proposed Original Proposed Original Decryption Encryption 14

Key tricks (of decryption) Ciphertext Data register Data register Final op. Pre-round op. Round op. InvSubBytes AddRoundKey InvSubBytes InvShiftRows InvShiftRows AddRoundKey AddRoundKey InvMixColumns Data register Plaintext Data register 15

Key tricks (of decryption) Ciphertext Data register Data register Final op. Pre-round op. Round op. Inversion AddRoundKey Inversion InvShiftRows InvShiftRows InvAffine AddRoundKey AddRoundKey InvMixColumns InvAffine Data register Plaintext Data register n Decompose InvSubByte to InvAffine and Inversion n Register-retiming to initially perform inversion in round operations 16

Key tricks (of decryption) Ciphertext Data register Data register Final op. Pre-round op. Round op. Inversion AddRoundKey Inversion InvShiftRows InvShiftRows InvAffine AddRoundKey AddRoundKey Unified affine -1 Data register Plaintext Data register n Merge linear operations as Unified affine -1 p InvAffine and InvMixColumns n Distinct AddRoundKey to avoid additional selectors or InvMixColumns for RoundKey 17

Resulting datapath Unified inversion without selector Disable inactive path At most one linear operation for round Only one 4:1 selector 18

Overall architecture Plaintext/Ciphertext Initial key Round function part Ciphertext/Plaintext n Round-based architecture Key scheduling part n On-the-fly key scheduler 19

Key scheduling part n Round key generator is dominant p Unify encryption and decryption datapaths p Shorten critical delay than round function part by NOT unifying some XOR gates Unified components Not unified XOR gates 20

Coming back to round function part n Major components p Inversion p Linear operations p Bit-parallel XOR p Selectors p (Inv)ShiftRows n Performance depends on constructions of inversion and linear operations p Inversion: Use state-of-the-art adoptable one p Linear operations: Depends on XOR matrices 22

Multiplicative-offset n Increase variation of construction of XOR matrices p To find optimal XOR matrices with lower HWs n Multiply offset value c to intermediate value d i,j ( r ) and store cd i,j ( r ) into register p Multiplication with fixed value is XOR matrix operation p c is taken from GF (2 8 ) excluding 0 Pre-round Round Post-round d i,j ( r ) d i,j (11) Plaintext Inversion Iso. Mapping -1 Iso. mapping Unified Affine d i,j (1) Ciphertext d i,j ( r +1) Original encryption flow (simplified) 23

Multiplicative-offset n Increase variation of construction of XOR matrices p To find optimal XOR matrices with lower HWs n Multiply offset value c to intermediate value d i,j ( r ) and store cd i,j ( r ) into register p Multiplication with fixed value is XOR matrix operation p c is taken from GF (2 8 ) excluding 0 Pre-round Round Post-round cd i,j ( r ) cd i,j (11) Plaintext Inversion Multiply c Iso. Mapping -1 Multiply c 2 Iso. mapping Multiply c -1 Unified Affine cd i,j (1) Ciphertext cd i,j ( r +1) Proposed encryption flow (simplified) 24

Multiplicative-offset n Increase variation of construction of XOR matrices p To find optimal XOR matrices with lower HWs n Multiply offset value c to intermediate value d i,j ( r ) and store cd i,j ( r ) into register p Multiplication with fixed value is XOR matrix operation p c is taken from GF (2 8 ) excluding 0 Pre-round Round Post-round cd i,j ( r ) cd i,j (11) Plaintext Inversion Merged mapping -1 Merged Merged mapping Reduce HW of XOR matrices Unified Affine cd i,j (1) Ciphertext for linear operations by 10% cd i,j ( r +1) Original encryption flow (simplified) 25

Performance comparison n Synthesized proposed and conventional archs. p Logic synthesis: Design Compiler p Technology: Nangate 45-nm Open Cell Library Area (GE) Latency Throughput Efficiency (ns) (Gbps) (Kbps/GE) Satoh et al. 16,628.67 24.97 5.64 339.10 Lutz et al. 28,301.33 16.20 7.90 279.18 Liu et al. 15,335.67 29.70 4.74 309.13 Mathew et al. 21,429.33 30.80 4.57 213.33 This work w/o MO 18,013.00 16.28 8.65 480.49 This work w/ MO 17,368,67 15.84 8.89 511.78 n 51—57% higher efficient than conventional ones p Multiplicative-offset (MO) improves efficiency by 7—9% 26

Evaluation of power/energy consumption n Gate-level timing simulation with back-annotation for estimating power consumption p With regarding glitch-effects Power consumption and power-latency product at encryption Power [uW] @ 100 MHz PL product Satoh et al. 902 22,523 Lutz et al. 735 11,907 Liu et al. 1,010 29,997 Mathew et al. 1,390 42,812 This work w/o MO 569 9,263 This work w/ MO 465 7,366 n Our architecture achieved lowest power/energy p MO achieves further reduction by 7—24% 27

Encryption only architecture n Designed encryption-only hardware based on our philosophy p Compared with representative open-source IP (SASEBO IP) and state-of-the-art one [ARITH 2016] Area Latency Thru Thru/GE Power PL (GE) (ns) (Gbps) (uW) product SASEBO Table 23,085.00 11.64 12.00 519.66 352 4,097 IP Comp 11,431.67 23.04 6.06 530.16 513 11,820 ARITH Type-I 12,108.33 23.87 5.90 487.16 655 14,266 2016 Type-II 13,249.33 21.78 6.46 487.92 755 18,022 This work 12,127,00 13.97 10.08 831.10 279 3,898 n Our architecture is 58—64% higher efficient p Also advantageous in power/energy consumption 28

Hardware Implementation of Block Cipher: Case Study Using AES - PowerPoint PPT Presentation

Hardware Implementation of Block Cipher: Case Study Using AES Tohoku University Rei Ueno Acknowledgments Naofumi Homma, Tohoku Univ . Takafumi Aoki, Tohoku Univ . Sumio Morioka, Interstellar technologies, Inc . Noriyuki Miura, Kobe Univ . Kohei

Problem 1 k zero bits n bits IV Block Block Block Block Cipher Cipher Cipher Cipher

Block Cipher Modes of Operation Electronic Code Book Cipher Block Chaining Mode Cryptography

Tweakable Block Cipher Secure Beyond the Birthday Bound in the Ideal Cipher Model Jooyoung Lee ,

Advanced Block Cipher Design My crazy boss asked me to design a new block cipher. Whats next?

Vigenre Cipher Like Csar cipher, but use a phrase Example Message THE BOY HAS THE

Block Cipher Cryptanalysis: An Overview Subhabrata Samajder Indian Statistical Institute, Kolkata

Block Cipher Cryptanalysis II: Block Cipher Cryptanalysis II: Linear Cryptanalysis yp y Andrey

Extending the Hill Cipher Two secure modifications of the classic cipher John Chase Matt Davis

Classical Ciphers Playfair Cipher Polyalphabetic Ciphers Cryptography Vigen` ere Cipher

Classical Ciphers Playfair Cipher Polyalphabetic Ciphers Cryptography Vigen` ere Cipher

Hardware Observability Framework Hardware Observability Framework Hardware Observability

Clarification about attacker power Block ciphers used to encode messages longer than block size

CS 241 Data Organization Ciphers March 22, 2018 Cipher In cryptography, a cipher (or

COBRA: A Parallelizable Authenticated Online Cipher without Block Cipher Inverse 1 Atul Luykx

Grbner Bases. Applications in Cryptology Description of the Cipher Families Feistel cipher:

ChaCha20 and Poly1305 Cipher Suites for TLS Adam Langley Wan-Teh Chang Outline ChaCha20

Round Compression for Parallel Matching Algorithms Krzysztof Onak IBM T.J. Watson Research

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

High Level Overview: Round 2 of the State Innovation Model (SIM) Initiative Dr. Karen Murphy

Information Visualization & Visual Analytics Jack van Wijk Dept. Math. & Computer

Unit Tes)ng Tool Compe))on Round Four Urko Rueda, Ren Just, Juan P. Galeo5, Tanja E. J. Vos

Public Involvement Meetings 2 nd Round July 23 Kauai July 28-31 Hawaii August 4-7

Minimum Disclosure Counting for the Alternative Vote Roland Wen and Richard Buckland School of

disease study updates Professor Michael J Seckl GCIG Chicago meeting Jun 17 GOG-0275: A

Hardware Implementation of Block Cipher: Case Study Using AES - PowerPoint PPT Presentation

Hardware Implementation of Block Cipher: Case Study Using AES Tohoku University Rei Ueno Acknowledgments Naofumi Homma, Tohoku Univ . Takafumi Aoki, Tohoku Univ . Sumio Morioka, Interstellar technologies, Inc . Noriyuki Miura, Kobe Univ . Kohei

Problem 1 k zero bits n bits IV Block Block Block Block Cipher Cipher Cipher Cipher

Block Cipher Modes of Operation Electronic Code Book Cipher Block Chaining Mode Cryptography

Tweakable Block Cipher Secure Beyond the Birthday Bound in the Ideal Cipher Model Jooyoung Lee ,

Advanced Block Cipher Design My crazy boss asked me to design a new block cipher. Whats next?

Vigenre Cipher Like Csar cipher, but use a phrase Example Message THE BOY HAS THE

Block Cipher Cryptanalysis: An Overview Subhabrata Samajder Indian Statistical Institute, Kolkata

Block Cipher Cryptanalysis II: Block Cipher Cryptanalysis II: Linear Cryptanalysis yp y Andrey

Extending the Hill Cipher Two secure modifications of the classic cipher John Chase Matt Davis

Classical Ciphers Playfair Cipher Polyalphabetic Ciphers Cryptography Vigen` ere Cipher

Classical Ciphers Playfair Cipher Polyalphabetic Ciphers Cryptography Vigen` ere Cipher

Hardware Observability Framework Hardware Observability Framework Hardware Observability

Clarification about attacker power Block ciphers used to encode messages longer than block size

CS 241 Data Organization Ciphers March 22, 2018 Cipher In cryptography, a cipher (or

COBRA: A Parallelizable Authenticated Online Cipher without Block Cipher Inverse 1 Atul Luykx

Grbner Bases. Applications in Cryptology Description of the Cipher Families Feistel cipher:

ChaCha20 and Poly1305 Cipher Suites for TLS Adam Langley Wan-Teh Chang Outline ChaCha20

Round Compression for Parallel Matching Algorithms Krzysztof Onak IBM T.J. Watson Research

Boosting (ensemble) Module 4 - Ensemble classifiers - Objectives module 4: boosting (ensemble

High Level Overview: Round 2 of the State Innovation Model (SIM) Initiative Dr. Karen Murphy

Information Visualization &amp; Visual Analytics Jack van Wijk Dept. Math. &amp; Computer

Unit Tes)ng Tool Compe))on Round Four Urko Rueda, Ren Just, Juan P. Galeo5, Tanja E. J. Vos

Public Involvement Meetings 2 nd Round July 23 Kauai July 28-31 Hawaii August 4-7

Minimum Disclosure Counting for the Alternative Vote Roland Wen and Richard Buckland School of

disease study updates Professor Michael J Seckl GCIG Chicago meeting Jun 17 GOG-0275: A

Information Visualization & Visual Analytics Jack van Wijk Dept. Math. & Computer