Enter the Bathysphere Measuring Complexity-Effectiveness of Future-Generation Silicon Architectures Using FPGAs Andrew Schwerin Steven Swanson Mark Oskin
Simulation Methodology Have new idea while(not published) Hack simulator Run simulator Refine idea • Quick to implement • Short iteration period June 03 2
Simulation Drawbacks • Difficult to validate • Slow to execute Underestimate delay Subtle bug Choose unrepresentative data June 03 3
Custom Prototyping • Validates assumptions • Expensive • Time consuming • Labor intensive June 03 4
Can we have it all? • Short iteration period? • Low incremental cost? • Fast execution? • Validation? June 03 5
The Bathysphere • Deep submicron exploration vehicle! • ASIC model • FPGA Implementation Substrate June 03 6
Bathysphere SDRAM • 4 boards • 16 nodes / board Each node • Virtex 1k FPGA High-density 1M gate FPGA • 2x1MB SDRAM Total • 64M logic gates • 128 MB RAM June 03 7
Design Model June 03 8
Bathysphere Advantages • An Architecture Research Methodology – Brings physical constraints to fore – Faster than software simulation – Cheap: Approx $50k – Lots of iterations • Not just an emulation system – Different than a QuickTurn June 03 9
FPGA-ASIC Mismatch ≠ • Bathysphere is not an ASIC – Late binding of functionality costs flexibility • Work around FPGA-unfriendly structures June 03 10
Challenges for the Bathysphere • Multiported Memories • Content-addressable Memories • Inter-FPGA Bandwidth June 03 11
Problem: Multiported Memories • Multiported memories are everywhere • Silicons provides – Multiported memories (e.g., register files) – Large memories (e.g., caches) • FPGAs provide – Limited onboard memory resources • 16 KB on Virtex 1000 • Single, or at most dual-ported structures – Limited bandwidth to external memories June 03 12
Problem: Multiportedness How do we use this: To build this? • Single-ported memories • n -ported memories … P1 P2 P n P1 1-port RAM n -port RAM June 03 13
Let the Tools Handle It? A Memory: • 7 write ports • 512 words • 64-bit words % of resources to implement • Red box – In 35 nm silicon • Blue box – In the bathysphere June 03 14
Memory Ports vs. Area 4 3.5 3 Percent of Resources 2.5 2 1.5 1 0.5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Ports FPGA Write Ports FPGA Read Ports ASIC Write Ports ASIC Read Ports Virtex 1000 Capacity June 03 15
Memory Ports vs Latency 35 30 25 Access Latency (ns) 20 15 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Ports FPGA Write Ports FPGA Read Ports ASIC Write Ports ASIC Read Ports June 03 16
Multiportedness Solution • Time-multiplexing • Split logical cycle … P1 P2 P n – Create 2n + 1 µ-cycles – One µ-cycle per read * – One µ-cycle for logic – One µ-cycle per write • More sophisticated 1-port RAM multiplexing possible n -port RAM June 03 17
Content Addressable Memories • Common in architectures • Straightforward to build in silicon • Require parallel access to memory words – Extremely resource intensive in FPGAs – But, you can build a small one if you need – Or, use the off-chip RAM to back a hash table June 03 18
Bathysphere Communication • 76 I/O pins to 4 nearest neighbors • Long distance communication via adjoining FPGAs • Must confront communication costs June 03 19
The Bathysphere Methodology June 03 20
“This is your bathysphere” What would make it useful to you? June 03 21
Recommend
More recommend