Better than Brute-Force Optimized Hardware Architecture for Effcient Biclique Attacks on AES-128 Andrey Bogdanov*, Elif Bilge Kavun**, Christof Paar**, Christian Rechberger***, Tolga Yalcin** * KU Leuven, Belgium, ** HGI-RUB, Germany, *** DTU, Denmark
Overview • Meet-in-the-Middle with Bicliques • Low Data Complexity Biclique Cryptanalysis of AES-128 • Optimized Brute Force Attack on AES-128 – on FPGA – on ASIC • Biclique Attack on AES-128 – on FPGA – on ASIC • Conclusion
MITM with Bicliques plaintexts {P} ciphertexts {C} K 2 K K 1 K 2 m all key bits encryption oracle • Allow all key bits affect a part of the cipher • Stick to a structure to enable efficient enumeration of keys and states in this part • Structure = biclique!
MITM with Bicliques
Low Data Complexity Biclique Cryptanalysis of AES-128 • Start modifications in the first round of AES-128 • Divide entire space of 2 128 keys into of 2 124 non-overlapping groups of 2 4 keys • Fix a base key and enumerate all other keys in the key group x = (x 0 x 1 x 2 x 3 x 4 x 5 00) 2 a = (000000a 1 a 0 ) 2 y = (y 0 y 1 y 2 y 3 y 4 y 5 00) 2 b = (000000b 1 b 0 ) 2 • Modify base key at two byte positions independently (in 2 2 ways each) • Follow propagation of modifications forwards and backwards
Low Data Complexity Biclique Cryptanalysis of AES-128
Low Data Complexity Biclique Cryptanalysis of AES-128 Recomputation at matching
Low Data Complexity Biclique Cryptanalysis of AES-128 Complexities: • Computational complexity to precompute all states S a and S b in each key group: 0.3 AES-128 runs (first step). • About 7.12 AES-128 runs to test all 16 keys in the key group (second step). • Negligible computation complexity (2 -32 ) for false positives • Overall computation complexity: 2 124 (0.3 + 7.12) = 2 126.89 AES executions. • Data complexity: Only 16 chosen plaintexts!!
Implementation • FPGA target platform: RIVYERA Computing Cluster � 128 Xilinx Spartan3 XC3S500 high performance FPGAs � Equivalent computing power of 640 million system gates • ASIC target technology: NANGATE � 45 nm Generic Library
Optimized Brute-Force Attack on AES-128 Key Gen Fixed Output Plaintext • Highly pipelined architecture for Round-1 highest possible speed (11-stage K 1 S 1 pipeline within each AES round) Round-2 K 2 S 2 • Composite field inverters over ORACLE GF((2 2 ) 2 ) 2 for s-boxes K 8 S 8 Round-9 • Register based (RAMless) design K 9 S 9 Round-10 – suitable for both FPGA and S 10 ASIC implementation Byte Match = FF ?
Optimized Brute-Force Attack on AES-128 Key Gen Fixed Output Plaintext • Design implemented in two favors: Round-1 K 1 S 1 � All identical rounds (for a fair Round-2 comparison with respect to the K 2 S 2 ORACLE original biclique advantage figures) K 8 S 8 � Partial matching in the last three Round-9 rounds (for better area utilization – K 9 S 9 Round-10 makes no difference for FPGA) S 10 • Smaller and faster than the Byte Match reported fastest design (362 KGE vs 660 KGE and 2.5 GHz vs 2 GHz ) = FF ?
Optimized Brute-Force Attack on AES-128 * Pipeline register cost negligible for FPGA implementation – already part of the slice!
Optimized Brute-Force Attack on AES-128 GF(2 2 ) 2 [7:4] [7:4] Multiplier Input GF(2 2 ) 2 GF(2 2 ) 2 [3:0] Multiplier Inverter Output GF(2 2 ) 2 GF(2 2 ) 2 [3:0] Multiplier Multiplier P
Optimized Brute-Force Attack on AES-128 FPGA Performance Slice % FPGA Maximum Freq Keys tested/sec/FPGA Utilization Utilization (MHz) 526 x 10 6 26949 / 33278 80.98 263.16 ASIC Performance Maximum Freq Average Power Core Area (GE) Keys tested/mW (MHz) (mW) 3.98 x 10 6 362181 2480 622.937
Biclique Attack on AES-128 Plaintext + Key MixColumns Starting Point: Conceptual design S • One-to-one maps theory to implementation Key Key State State RAM(s) RAM(s) RAM(s) RAM(s) 0 1 0 1 • Based on precomputation of all K 3 S 3 Round-4 base and biclique states K 4 S 4 Round-5 Plaintext Regular • Not feasible for hardware (Full) K 5 S 5 Memory Rounds Round-6 K 6 S 6 implementation ORACLE Round-7 K 7 S 7 � Requires too many RAMs Round-8 K 8 S 8 � Interconnection and control logic Partial Round-9 Rounds K 9 S 9 too complex to allow an area and Round-10 speed efficient design S 10 Match
Biclique Attack on AES-128 Key Ptxt Gen New Approach: Recomputation K P Round-1 • On the fly calculation of base K 1 S 1 Biclique Round-2 Rounds and biclique states K 2 S 2 Round-3 • Pipeline registers act as state K 3 S 3 Round-4 K 4 S 4 storage media Round-5 Regular (Full) K 5 S 5 ORACLE � No additional RAMs/registers Rounds Round-6 required – virtual storage K 6 S 6 Round-7 • Similar to optimized brute force K 7 S 7 Round-8 K 8 S 8 attack in structure Partial Round-9 Rounds K 9 S 9 � simpler control logic and Round-10 interconnections S 10 Match
Biclique Attack on AES-128 First “Biclique” Round: • Serial AES implementation • 8-bit (!) datapath • Single S-Box
Biclique Attack on AES-128 Second “Biclique” Round: • Slightly modified serial AES implementation • Still 8-bit (!) datapath • Two S-Boxes • Limited additional storage (shift registers) for biclique states
Biclique Attack on AES-128 Third “Biclique” Round:
Biclique Attack on AES-128 Third “Biclique” Round: • Serial AES implementation on 4 separate paths • Still 8-bit (!) datapath (on each path) • Four S-Boxes • Slightly more complex control logic • More registers for double-buffering of biclique states (still shift registers with minimal cost • Only covers the “SubBytes” stage of a full AES round – the rest implemented as in a regular round
Optimized brute-force attack on AES-128 FPGA Performance Slice % FPGA Maximum Freq* Keys tested/sec/FPGA Utilization Utilization (MHz) 945 x 10 6 30720 / 33278 92.31 236.22 ASIC Performance Maximum Freq Average Power Core Area (GE) Keys tested/mW (MHz) (mW) 7.32 x 10 6 163912 1548 211.545 * Slower than the brute-force attack due to reduced number of pipeline stages
Conclusion • The fastest brute-force attack implementation on AES-128 • The first biclique attack implementation on AES-128 � Almost a factor of 2 speed and cost gain � Only 16 chosen plaintexts (w.r.t. 288 in the original biclique attack paper) • Suitable for both FPGA and ASIC implementation • Applicable to AES-192 and AES-256 as well
Recommend
More recommend