Introduction Methodology Implementations Results Lightweight Implementations of SHA-3 Candidates on FPGAs Jens-Peter Kaps Panasayya Yalla Kishore Kumar Surapathi Bilal Habib Susheel Vadlamudi Smriti Gurung John Pham Cryptographic Engineering Research Group (CERG) http://cryptography.gmu.edu Department of ECE, Volgenau School of Engineering, George Mason University, Fairfax, VA, USA 12th International Conference on Cryptology in India Indocrypt 2011 Indocrypt 2011 J.-P. Kaps, Smriti Gurung, et al. Lightweight Implementations of SHA-3 on FPGAs 1 / 27
Introduction Methodology Implementations Results Outline 1 Introduction 2 Methodology 3 Implementations 4 Results Indocrypt 2011 J.-P. Kaps, Smriti Gurung, et al. Lightweight Implementations of SHA-3 on FPGAs 2 / 27
Introduction Hash Function Competition Methodology Previous Work Implementations Goal Results Hash Function Competition A hash algorithm reads an arbitrary length message and produces a fixed bit string called hash value/message digest. Main applications: Digital signatures, Message Authentication Codes (MAC), Universal Unique IDentifier(UUID/GUID), password tables and many more. NIST competition for new secure hash algorithm SHA-3 Announced in Nov 2007, 64 entries submitted. 14 selected for Round 2. Currently in Round 3 → 5 finalists. NIST’s selection criteria: Security, HW/SW speed, scalability. Indocrypt 2011 J.-P. Kaps, Smriti Gurung, et al. Lightweight Implementations of SHA-3 on FPGAs 3 / 27
Introduction Hash Function Competition Methodology Previous Work Implementations Goal Results Hash Function Competition A hash algorithm reads an arbitrary length message and produces a fixed bit string called hash value/message digest. Main applications: Digital signatures, Message Authentication Codes (MAC), Universal Unique IDentifier(UUID/GUID), password tables and many more. NIST competition for new secure hash algorithm SHA-3 Announced in Nov 2007, 64 entries submitted. 14 selected for Round 2. Currently in Round 3 → 5 finalists. NIST’s selection criteria: Security, HW/SW speed, scalability. Motivation Analyze performance of candidates in a constrained FPGA environment ⇒ determine scalability on FPGAs. Indocrypt 2011 J.-P. Kaps, Smriti Gurung, et al. Lightweight Implementations of SHA-3 on FPGAs 3 / 27
Introduction Hash Function Competition Methodology Previous Work Implementations Goal Results Previous Work on SHA-3 Candidates Several Throughput/Area optimized implementations on FPGAs were published: Gaj et al.[CHES 2010], Matsuo et al.[SHA-3 conference 2010], Baldwin et al.[SHA-3 conference 2010]. Only two specific for low-area implementations of SHA-3 finalists: Kerckhof et al.[HASH 2011], Jungk et al.[Reconfig 2011]. Indocrypt 2011 J.-P. Kaps, Smriti Gurung, et al. Lightweight Implementations of SHA-3 on FPGAs 4 / 27
Introduction Hash Function Competition Methodology Previous Work Implementations Goal Results Previous Work on SHA-3 Candidates Several Throughput/Area optimized implementations on FPGAs were published: Gaj et al.[CHES 2010], Matsuo et al.[SHA-3 conference 2010], Baldwin et al.[SHA-3 conference 2010]. Only two specific for low-area implementations of SHA-3 finalists: Kerckhof et al.[HASH 2011], Jungk et al.[Reconfig 2011]. Problem: Rating algorithm performance when Implementations are on different devices, made with different implementation goals and features, vary in both: area and throughput, and support different I/O interface widths. Indocrypt 2011 J.-P. Kaps, Smriti Gurung, et al. Lightweight Implementations of SHA-3 on FPGAs 4 / 27
Introduction Hash Function Competition Methodology Previous Work Implementations Goal Results Our Goal: Comprehensive set of lightweight implementations of all Round 2 SHA-3 Candidates (except SIMD) and all SHA-3 Finalists. All optimized for the same target → maximum Throughput to Area ratio for given area budget. All use the same standardized interface. Implemented on different families for fair comparison with other reported results. Target Details: Xilinx Spartan 3, low cost FPGA family Budget: 400-600 slices, 1 Block RAM (BRAM) Implemented 256 bit digest versions only Indocrypt 2011 J.-P. Kaps, Smriti Gurung, et al. Lightweight Implementations of SHA-3 on FPGAs 5 / 27
Introduction Methodology Assumptions Implementations Interface and Protocol Results Assumptions Implementing for minimum area alone can lead to unrealistic run-times. ⇒ Target: Achieve the maximum Throughput/Area ratio for a given area budget. Realistic scenario: System on Chip: Certain area only available. Standalone: Smaller Chip, lower cost, but limit to smallest chip available, e.g. 768 slices on smallest Spartan 3 FPGA. Makes fair comparison of lightweight implementations possible. Indocrypt 2011 J.-P. Kaps, Smriti Gurung, et al. Lightweight Implementations of SHA-3 on FPGAs 6 / 27
Introduction Methodology Assumptions Implementations Interface and Protocol Results Interface and Protocol Based on Interface and I/O Protocol from Gaj et al.[CHES 2010]. msg len ap, seq len ap (after padding ) in 32-bit words. msg len bp, seq len bp (before padding) in bits. n − 2 � msg len bp = seq len ap i · 32 + seq len bp n − 1 i =0 n − 1 bits � w msg len ap = seq len ap i · 32 seq_len_ap 0 0 i =0 seq w = 16 bits. 0 clk rst seq_len_ap 0 bits 1 w seq msg_len_ap 1 clk rst 1 msg_len_bp SHA Core w w seq_len_ap 1 n−1 din dout seq_len_bp message n−1 src_ready dst_ready src_read dst_write seq n−1 a)SHA Interface b)SHA Protocol Indocrypt 2011 J.-P. Kaps, Smriti Gurung, et al. Lightweight Implementations of SHA-3 on FPGAs 7 / 27
BLAKE-256 Introduction Grøstl Methodology JH Implementations Keccak Results Skein BLAKE-256 Algorithm Key Features M C 255 Init. Ti 0 Salt value: A user Dependant 512 512 512 256 constant 128 bits set all to 0 P P P1 14x 512 512 8 G functions : XOR, addition, G G G G IV CM shifting. P2 H 511 255 G G G G P1,P2 : Permutation 256 0 Blake scales very well. CM CM G 32 A A’ Folded up to 4 times vertically 12 32 7 <<< <<< B B’ and 4 times horizontally. 32 C C’ 16 32 8 <<< <<< D D’ Indocrypt 2011 J.-P. Kaps, Smriti Gurung, et al. Lightweight Implementations of SHA-3 on FPGAs 8 / 27
BLAKE-256 Introduction Grøstl Methodology JH Implementations Keccak Results Skein BLAKE-256 Implementation Implementation dout 32 CM REG_A DRAM 1 A A’ 0 1 Salt : BRAM 31 15 32 0 din 31 16 0 R4 16 Reg REG_B1 REG_B2 <<< State: DRAM Port−A 15 32 DRAM 1 B 1 B’ 0 BRAM 0 32 0 R2 Quasi pipelined Half G Port−B <<< 32 32 REG_C DRAM 1 C C’ function 32 0 R3 <<< Registers: Reduce REG_1 REG_2 DRAM 1 D 1 D’ R1 0 0 32 1 0 1 0 critical path <<< D 32 32 C B 32 32 A Permutation causes a large controller with 210 addresses. BRAM contains constants, message, IV, intermediate hash. Scalability: Unfolding leads to worse TP/A. Improvement: Rescheduling of G results in 290 clock per block versus 350 . Indocrypt 2011 J.-P. Kaps, Smriti Gurung, et al. Lightweight Implementations of SHA-3 on FPGAs 9 / 27
BLAKE-256 Introduction Grøstl Methodology JH Implementations Keccak Results Skein Grøstl Algorithm M Key Features 512 512 IV Hi−1 Based on AES like architecture 512 S-BOX, shift rows, Mixed P Q Addp Addq 512 512 columns S−Box S−Box Grøstl scales well, like AES. 10x 10x Sft Row Sft Row Folded up to 8 times vertically. Mix Mix Small storage requirements. 255 Uses many narrow memory H 0 Hi accesses in parallel (8 per column). Indocrypt 2011 J.-P. Kaps, Smriti Gurung, et al. Lightweight Implementations of SHA-3 on FPGAs 10 / 27
BLAKE-256 Introduction Grøstl Methodology JH Implementations Keccak Results Skein Grøstl Implementation 31 Implementation din 0 31 dout 15 B 1 16 Reg 0 32 15 A A State p,q : DRAM 0 B Port−A A Shift Rows : how data 1 0 1 0 2 1 0 2 1 0 0 1 BRAM 4xDRAM 4xDRAM 4xDRAM 4xDRAM accessed from DRAM Port−B 32 8 8 8 8 GFMul Add Constant Mix Column : Reg GF-multiplier(half Reg Reg 0 1 0 1 SBox SBox SBox SBox multiplier) Finalization takes as many clock cycles as 1 block. BRAM stores only intermediate hash and IV. One new column every 3 clock cycles, P & Q interleaved. Scalability: Reducing number of clock cycles per column by adding S-Boxes and/or GF-Multiplier. Indocrypt 2011 J.-P. Kaps, Smriti Gurung, et al. Lightweight Implementations of SHA-3 on FPGAs 11 / 27
BLAKE-256 Introduction Grøstl Methodology JH Implementations Keccak Results Skein JH Algorithm 511 1024 Key Features 0 1023 512 M 512 Grouping: reordering of 1024 E 8 S 0 Group bits state SBOX : Permutation R 8 256 S−box C 0 Linear transformation : rotation L 42x and XOR R 6 S−box P 1024 De-grouping: inverse of L grouping De−group Permutation , grouping, and P 512 M de-grouping makes scaling 1023 511 1023 H 512 0 1024 768 difficult Folding increases size Indocrypt 2011 J.-P. Kaps, Smriti Gurung, et al. Lightweight Implementations of SHA-3 on FPGAs 12 / 27
Recommend
More recommend