Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa Ashmawy Western University London, Ontario, Canada CHES-2018 1
Outline • Introduction. • Proposed AES S-box Architecture. • New Logic-Minimization Algorithms. • New GF((2 4 ) 2 ) Inversion. • New Exponentiation Stage. • New Representation of Subfield Inversion. • New Output Multipliers. • Comparisons and Concluding Remarks. 2
Introduction First Introduction Standardizing of Rijndael Rijndael as the AES Rijmen & Daemen 1998 2001 2005 2010 2015 2016 2018 First Imp. using Tower Fields Satoh et al. 3
Introduction Reduce the number of Then First Introduction Standardizing Most compact gates in Canright to 115 to 113 Target small area of Rijndael Rijndael as the S-box AES Rijmen & Daemen Boyar and Peralta CMT Canright 1998 2001 2005 2010 2015 2016 2018 First Imp. using Tower Fields Satoh et al. 3
Introduction Reduce the number of Then First Introduction Standardizing Most compact gates in Canright to 115 to 113 Target small area of Rijndael Rijndael as the S-box AES Rijmen & Daemen Boyar and Peralta CMT Canright 1998 2001 2005 2010 2015 2016 2018 First Imp. using Reduce the depth Most efficient Target small delay Tower Fields of S-box to 16 gates S-box / high efficiency Satoh et al. Ueno et al. Boyar, Find and Peralta 3
Introduction Reduce the number of Then First Introduction Standardizing Most compact gates in Canright to 115 to 113 Target small area of Rijndael Rijndael as the S-box AES Rijmen & Daemen Boyar and Peralta CMT Canright 1998 2001 2005 2010 2015 2016 2018 First Imp. using Reduce the depth Most efficient Target small delay Tower Fields of S-box to 16 gates S-box / high efficiency Satoh et al. Ueno et al. Boyar, Find and Peralta In this paper, we propose: 1. The most compact S-box to date. 2. The most efficient S-box to date. 3
Implementation Pitfalls 1. Use AND gates, when NAND gates have smaller area and delay in all technology libraries. 4
Implementation Pitfalls 1. Use AND gates, when NAND gates have smaller area and delay in all technology libraries. 2. Use only simple gates, when compound gates (AND-OR-Invert, OR-AND-Invert) may be more efficient. 4
Implementation Pitfalls 1. Use AND gates, when NAND gates have smaller area and delay in all technology libraries. 2. Use only simple gates, when compound gates (AND-OR-Invert, OR-AND-Invert) may be more efficient. • We improved previous designs using AND gates to the ones using NAND/NOR gates: Delay (ns) Area (GEs) S-box Original Improved Original Improved Canright [Can05b] 200 1.253 113-gates [Boy16] 202 1.523 1.346 194 Depth-16 (2012) [BP12] 230.5 222 0.960 0.906 Depth-16 (2017) [BFP17] 224.5 0.957 0.912 216 Ueno et al. [UHS+15] 256.5 0.831 0.772 238 Targeting STM 65-nm CMOS standard library 4
Implementation Pitfalls 1. Use AND gates, when NAND gates have smaller area and delay in all technology libraries. 2. Use only simple gates, when compound gates (AND-OR-Invert, OR-AND-Invert) may be more efficient. • We improved previous designs using AND gates to the ones using NAND/NOR gates: The smallest The fastest original original Delay (ns) Area (GEs) S-box Original Improved Original Improved Canright [Can05b] 200 1.253 113-gates [Boy16] 202 1.523 1.346 194 Depth-16 (2012) [BP12] 230.5 222 0.960 0.906 Depth-16 (2017) [BFP17] 224.5 0.957 0.912 216 Ueno et al. [UHS+15] 256.5 0.831 0.772 238 Targeting STM 65-nm CMOS standard library 4
Implementation Pitfalls 1. Use AND gates, when NAND gates have smaller area and delay in all technology libraries. 2. Use only simple gates, when compound gates (AND-OR-Invert, OR-AND-Invert) may be more efficient. • We improved previous designs using AND gates to the ones using NAND/NOR gates: The smallest The smallest The fastest The fastest original improved original improved Delay (ns) Area (GEs) S-box Original Improved Original Improved Canright [Can05b] 200 1.253 113-gates [Boy16] 202 1.523 1.346 194 Depth-16 (2012) [BP12] 230.5 222 0.960 0.906 Depth-16 (2017) [BFP17] 224.5 0.957 0.912 216 Ueno et al. [UHS+15] 256.5 0.831 0.772 238 Targeting STM 65-nm CMOS standard library 4
Implementation Pitfalls 1. Use AND gates, when NAND gates have smaller area and delay in all technology libraries. 2. Use only simple gates, when compound gates (AND-OR-Invert, OR-AND-Invert) may be more efficient. • We improved previous designs using AND gates to the ones using NAND/NOR gates: The smallest The smallest The fastest The fastest original improved original improved Delay (ns) Area (GEs) S-box Original Improved Original Improved Canright [Can05b] 200 1.253 113-gates [Boy16] 202 1.523 1.346 194 Depth-16 (2012) [BP12] 230.5 222 0.960 0.906 Depth-16 (2017) [BFP17] 224.5 0.957 0.912 216 Ueno et al. [UHS+15] 256.5 0.831 0.772 238 Targeting STM 65-nm CMOS standard library At the end, we compare only against the Improved Versions. Formulations of the improved designs are included in the paper. 4
AES S-box • Original S-box Inversion g x M + h s GF(2 8 ) 5
AES S-box • Original S-box Inversion g x M + h s GF(2 8 ) • Typical implementation using Composite Fields in Normal Basis Composite field Inversion () 2 X - 1 g X x M + h s 5
Proposed AES S-box Architecture • 12 terms are shared between the Exponentiation and Multipliers Composite field Inversion 6 5 10 g T in 12 s T out 5 6 6
Proposed AES S-box Architecture • 12 terms are shared between the Exponentiation and Multipliers Composite field Inversion 6 5 10 g T in 12 s T out 5 6 New Logic- New Logic- New, Improved New New New Minimization Minimization Representations Formulations Formulations Multipliers Algorithms Algorithms 6
Proposed AES S-box Architecture • 12 terms are shared between the Exponentiation and Multipliers Composite field Inversion 6 5 10 g T in 12 s T out 5 6 New Logic- New Logic- New, Improved New New New Minimization Minimization Representations Formulations Formulations Multipliers Algorithms Algorithms Everything optimized by-hand and by CAD tools at various abstraction levels (promote using NAND/NOR and compound gates ) 6
Outline • Introduction, Motivation and Previous Work. • Proposed AES S-box Architecture. • New Logic-Minimization Algorithms. • New GF((2 4 ) 2 ) Inversion. • New Exponentiation Stage. • New Representation of Subfield Inversion. • New Output Multipliers. • Comparisons and Concluding Remarks. 7
Logic-Minimization Algorithms T in • Implement isomorphic transformation Input Rep. in GF((2 4 ) 2 ) matrices using smallest number of gates. • NP-hard problem [BMP08]. g T in 12 12 shared terms 8
Logic-Minimization Algorithms (cont.) First 8 rows of T in • Implement isomorphic transformation matrices using smallest number of gates. • NP-hard problem [BMP08]. • Previous work • Cancellation-free search: Gates are never used to cancel-out common terms, Canright [Can05b] and Paar [Paa94]. 9
Logic-Minimization Algorithms (cont.) First 8 rows of T in • Implement isomorphic transformation matrices using smallest number of gates. • NP-hard problem [BMP08]. • Previous work • Cancellation-free search: Gates are never used to cancel-out common terms, Canright [Can05b] and Paar [Paa94]. • Heuristics (with cancellation): Normal-BP (Boyar and Peralta [BP10]) 9
Logic-Minimization Algorithms (cont.) First 8 rows of T in • Implement isomorphic transformation matrices using smallest number of gates. • NP-hard problem [BMP08]. • Previous work • Cancellation-free search: Gates are never used to cancel-out common terms, Canright [Can05b] and Paar [Paa94]. • Heuristics (with cancellation): 1 Normal-BP (Boyar and Peralta [BP10]) 1. Test adding one gate 2. Compute Distance to each target (assuming no sharing) 3. Select a gate leading to the (min average Dist ) Resolve ties using different methods. 9
Logic-Minimization Algorithms (cont.) First 8 rows of T in • Implement isomorphic transformation matrices using smallest number of gates. • NP-hard problem [BMP08]. • Previous work • Cancellation-free search: Gates are never used to cancel-out common terms, Canright [Can05b] and Paar [Paa94]. Compute • Heuristics (with cancellation): 2 Dist 1 Normal-BP (Boyar and Peralta [BP10]) 1. Test adding one gate 2. Compute Distance to each target (assuming no sharing) 3. Select a gate leading to the (min average Dist ) Resolve ties using different methods. 9
Logic-Minimization Algorithms (cont.) First 8 rows of T in • Implement isomorphic transformation matrices using smallest number of gates. • NP-hard problem [BMP08]. • Previous work • Cancellation-free search: Gates are never used to cancel-out common terms, Canright [Can05b] and Paar [Paa94]. Compute • Heuristics (with cancellation): 2 Dist 1 Normal-BP (Boyar and Peralta [BP10]) 1. Test adding one gate 3 2. Compute Distance to each target (assuming no sharing) 3. Select a gate leading to the (min average Dist ) Resolve ties using different methods. 9
Recommend
More recommend