CryptoManiac: A Fast Flexible Architecture for Secure Communication Lisa Wu, Chris Weaver, and Todd Austin Presented by Dan Amelang
Background • Rise of the Internet created demand for fast and efficient cryptographic processing • General purpose processors are sometimes sufficient, but sometimes too power- hungry, too slow, or too busy with other tasks.
Design • Focus on private-key encryption only • Study a handful of common ciphers (AES, Blowfish, etc.) • Identify bottlenecks
Design • Problems • Insufficient issue bandwidth • Insufficient functional unit resources • Not problems • Branch misses • Cache misses • Some ciphers could use more parallelism
Design • Found common operations ADD, ROT, MULT, MOD, AND, XOR, SBOX, XBOX • No division, square root, floating point, etc. • 32-bit sufficient
CryptoManiac • 4 wide VLIW • 32-bit, no cache, simple branch predictor • Small static RAM - 1K IMEM, 4K DATA • 1K SBOX cache in functional units • Triadic ISA (for instruction combining) • Composable processing elements
Processing Architecture
ISA
Functional Unit
Evaluation • Verilog model, 250nm, 360 MHz • ~1% the size and power consumption of the Alpha 21264 • 3-wide, 2-wide and 4-wide-non-combining were also modeled
Performance
Commercial Cryptographic Accelerators • None are programmable • For servers • Sun Crypto Accelerator 6000 PCIe Card ($1,350) • IBM PCI Cryptographic Accelerator (~$2,000) • For mobile devices • VIA Padlock
Recommend
More recommend