Area and Time Tradeoffs in FPGAs Examining the concept of area/time tradeoffs in FPGA design, pattern matching, and Advanced Encryption Standard (AES) Richie Ung Trang (z5061606) Harry Gougousidis (z5159917) Henry Veng (z5113239)
Presentation Overview 6. Prototype 1. Background 2. FPGA Design Tradeoffs 5. Timeline 3. AES 4. Pattern Matching 2
Motivation – Diverse Set of Vendors and Boards 1. Background A market full of FPGAs with differing cost, performance, and consumption requirements. What are the circuit and architectural design attributes of an FPGA that trade off area and speed? What are the magnitude of these tradeoffs? 3
Motivation – Narrowing Gap with ASICs 1. Background Learning which attributes affect area/time performance will help FPGAs narrow the gap to ASICs in one area. ~1/3 ~14x ~35x 4
Measuring Time in FPGA Design 1. Background Simple Approach: Take a set of circuits that make up the critical paths in a collection of benchmark designs to create a performance metric. 5
Measuring Time in FPGA Design 1. Background Model Approach: Use the shortest register to register path within the FPGA that contains all unique components. Use a weighted average based on the frequency each component is tested at during a critical path test. 6
Measuring Area in FPGA Design - SRAM 1. Background The SRAM is the single most frequently repeated structure in the FPGA. Significant effort is therefore spent optimizing the layout of the 6 transistors that make up a single bit. Transistor Model 7
Measuring Area in FPGA Design – SRAM 1. Background Minimum Transistor Width Model 8
Space/Time Tradeoff Results in FPGA Design 1. Background Some Comparative Results 9
PATTERN MATCHING ON FPGA 10 By Henry Veng
Background Pattern Matching: • Process of finding a particular substring (pattern) within a string 11
Background Uses: • Gene detection in DNA sequences • Network Intrusion Detection Systems(IDS) 12
Background Network IDS Requirements: • Many patterns against one string • Support as many rules as possible • Real time • Internet speeds 13
Design Overview • Custom implementation of KMP • Pipelining within pattern matching units • Linear array of pattern matching units 14
Design KMP Algorithms Characteristics: • Allows input stream to keep moving forward • Good worst-case performance 15
Design Custom KMP Pattern Matching Units: • Two comparators and a buffer • Buffer of specific size • Allows one character per clock cycle throughput 16
Design Pipelining within Units Pattern Memory • Two patterns sharing the same combinational circuit • Matching occurs out of phase • Allows a lower hardware per pattern Combinational Circuit ratio • Allows an increase in clock speed Pattern Memory 17
Design Linear Array of Pattern Matching Units: • Input characters must pass through all units • Different patterns loaded in different units • Allows parallelisation • Units are quickly reconfigurable 18
Area/Time Tradeoff Metrics: • Time: Throughput (Mb/s, Gb/s, etc) • Area: Logic cells used 19
Area/Time Tradeoff 20
Area/Time Tradeoff U/Crete: • Very high throughput (10.8Gb/s) but high area cost (532 logic cells/32- char unit) • Achieved through hardwired comparators and replicated 4 times • Supports ~100 rules and reconfiguration is very slow 21
Conclusion Different design decisions can affect/the area time tradeoff 22
Advanced Encryption Standard By Harry Gougousidis
FPGAs and Encryption • FPGAs allow more flexibility and potential speedup than most hardware options. • Maximising data throughput requires balancing resource utilisation and time delays. • Encryption takes significant time to process data and is often required on devices with small resource pools. 24
• Cryptography is popular on hardware due to relatively simple operations in a highly parallelisable way. • Fixed data block length, fixed amount of transformations, single key for encryption and decryption. • Four transformations operations: a lookup table, matrix multiplication, byte shifting, and key XORing. Introduction to AES
• Inter-round: unrolling, pipelining AES Enhancements • Intra-round: pipelining, partitioning 26
AES Implementations • Most transformations are Configurable Logic Blocks. • Byte substitution can be BRAM, Distributed RAM or CLBs to different effect. • Conventional processors run significantly worse than FPGAs. • ASICs run slightly faster than FPGAs but lack the flexibility. 27
Time/Area Tradeoffs • Fully unrolled has high area cost but large performance increase. • Pipelining and partitioning has a large increase in performance for minimal area increase. Can increase latency by a lot. • Performance can be calculated via maximum throughput (data per second), latency (time until first packet), and efficiency (throughput per slice count). 28
Paper Results • Optimal choice depends on user metrics. • Unrolling performed well but had poor efficiency. • Distributed RAM performed better than block RAM but block RAM was much better at efficiency. • Transformation partitioning alone gave significant improvement efficiently. 29
Thanks for listening! Question Time 30
Recommend
More recommend