On the Sensitivity of FPGA Architectural Conclusions to Experimental Assumptions, Tools, and Techniques Andy Yan, Rebecca Cheng, Steve Wilton University of British Columbia Vancouver, B.C. stevew@ece.ubc.ca
FPGA Experiments: Impressive improvement in FPGA Technology: 1994: 25,000 Gates was good 2001: 6,000,000 System Gates How did this happen? - Improvements in process technology - Improvements in CAD Tools - Improvements in Architectures The key behind this: Experimentation
The Danger of Experimentation: No matter how careful you are: - You will have to make some assumptions - You will have to settle on an experimental technique - You will have to settle on a CAD tool But what if these assumptions, techniques, & tools impact the conclusions… Can we believe any of these results?
This Talk: Take a step back and look at some basic experiments: - What is the best LUT size? - What is the best switch block topology? - What is the best cluster size? - What is the best memory size? The answers have all been published… But, how sensitive are they to the Assumptions, Tools, and Techniques
Question 1: What is the best LUT Size?
What is the best LUT size? Intuitively, in terms of area: - A smaller LUT takes up less chip area - But more of them area required for a circuit Intuitively, in terms of delay: - A smaller LUT is faster - But the critical path passes through more of them (and also through the routing!) Published results: 4-6 inputs in each LUT is a good choice
Baseline Experiment: Benchmark Architectures Circuits Optimize (eg. SIS) Technology Map (eg. Flowmap) Area Place and Route (eg. VPR) Delay
How Sensitive is this on the Tools? Benchmark Architectures Circuits Optimize (eg. SIS) Technology- Mapping Tool Technology Map (eg. Flowmap) Area Place and Route (eg. VPR) Delay
Sensitivity to Technology Mapper 0.30 Critical Path Delay (s) * 0.25 0.20 Area (MTE's) 0.15 Flowmap 0.10 (Baseline) 0.05 0 3 4 5 6 7 2 LUT Size
Sensitivity to Technology Mapper 0.30 Critical Path Delay (s) * 0.25 0.20 Area (MTE's) 0.15 Flowmap 0.10 (Baseline) 0.05 Cutmap 0 3 4 5 6 7 2 LUT Size
Sensitivity to Technology Mapper 0.30 Critical Path Delay (s) * 0.25 Chortle 0.20 Area (MTE's) 0.15 Flowmap 0.10 (Baseline) 0.05 Cutmap 0 3 4 5 6 7 2 LUT Size Conclusion depends on technology-mapper
How Sensitive are these results? Benchmark Architectures Circuits Optimize (eg. SIS) Technology Map (eg. Flowmap) Place and Route Tool Area Place and Route (eg. VPR) Delay
Sensitivity to Place and Route Tool: 0.70 0.60 Critical Path Delay (s) * 0.50 Area (MTE's) 0.40 0.30 0.20 0.10 Normal VPR (Baseline) 0 2 3 4 5 6 7 LUT Size
Sensitivity to Place and Route Tool: 0.70 0.60 Critical Path Delay (s) * 0.50 Area (MTE's) 0.40 0.30 0.20 Fast 0.10 Normal VPR (Baseline) 0 2 3 4 5 6 7 LUT Size
Sensitivity to Place and Route Tool: 0.70 0.60 Critical Path Delay (s) * 0.50 Area (MTE's) 0.40 0.30 UFP 0.20 Fast 0.10 Normal VPR (Baseline) 0 2 3 4 5 6 7 LUT Size
Sensitivity to Place and Route Tool: 0.70 0.60 Critical Path Delay (s) * 0.50 Area (MTE's) Routability- 0.40 Driven 0.30 UFP 0.20 Fast 0.10 Normal VPR (Baseline) 0 2 3 4 5 6 7 LUT Size
How Sensitive is this on the Tools? Benchmark Architectures Circuits Optimize (eg. SIS) Optimization Technology Map (eg. Flowmap) Area Place and Route (eg. VPR) Delay
Optimization Scripts: 5.5x10 6 5.0x10 6 SIS + Flowmap Area (MTE's) 4.5x10 6 4.0x10 6 3.5x10 6 3.0x10 6 2 3 4 5 6 7 LUT Size
Optimization Scripts: 5.5x10 6 5.0x10 6 SIS + Flowmap Area (MTE's) 4.5x10 6 4.0x10 6 3.5x10 6 (SIS + Flowmap)*2 3.0x10 6 2 3 4 5 6 7 LUT Size Optimization of circuits is important!
How Sensitive is this on the Circuits? Benchmark Architectures Circuits Optimize (eg. SIS) Circuits Technology Map (eg. Flowmap) Area Place and Route (eg. VPR) Delay
Benchmark Circuits: 0.60 Critical Path Delay (s) * 0.50 Area (MTE's) 0.40 0.30 Synthesized 0.20 MCNC 0.10 0 3 5 6 2 4 7 LUT Size MCNC Circuits behave differently than “real” circuits
Quantifying our Results Want a number that indicates how strongly our conclusions are affected by an experimental variation Consider an experiment to find best value of an architectural parameter Run 1: Baseline Run 2: Same experiment with one experimental parameter varied Margin = The difference in conclusion between Run 1 and Run 2
Margin : Case 1 Area * Delay RUN 1 Best Architecture Sweep of an Architectural Parameter
Margin : Case 1 Y% Area * Delay RUN 2 X% RUN 1 Best Architecture Sweep of an Architectural Parameter Margin = | X – Y |
Margin: Case 2 Y% RUN 2 Area * Delay Best Architecture RUN 1 X% Best Architecture Sweep of an Architectural Parameter Margin = MAX( X , Y )
Quantifying the Sensitivity: Categorize Experimental Variations by their Margin: 0%-2%: Not Sensitive 2%-5%: Slightly Sensitive 5%-10%: Sensitive 10%-100%: Very Sensitive > 100%: Extremely Sensitive We can have area margins, delay margins, and area * delay margins.
Margin Results: Summary I’ll leave a paper with tabulated results, but here are the variations that had a margin > 5%: Using Chortle instead of Flowmap: 76% Optimize and Tech Map circuits twice: 8.5% Use Routability-Driven Place and Route: 301% Synthesized Circuits rather than MCNC ccts: 11% Multiply Minimum Channel width by 1.1 5.4% Use Fc=0.3 rather than Fc=0.6 5.7% Use Fc=0.4 rather than Fc=0.6 11% Use Fc=0.7 rather than Fc=0.6 5.5% Use Fc=0.8 rather than Fc=0.6 11% Use Segments of Length 1 instead of 4 8.5%
Question 2: What is the best Switch Block Topology?
What is the best Switch Block? Published Switch Blocks - Disjoint switch block (Xilinx) - Universal switch block - Wilton switch block - Imran switch block (combination of Wilton and Disjoint block) Our FPL paper showed the Imran block was good: - Unlike disjoint, it does not divide routing fabric into segments - Unlike Wilton, it does not suffer from extra transistors in segmented architectures
Sensitivity to Place and Route Tool: 0.5 Disjoint Wilton Critical Path Delay (s) * 0.4 Universal Imran Area (MTE's) 0.3 0.2 0.1 0 VPR (Baseline)
Sensitivity to Place and Route Tool: 0.5 Disjoint Wilton Critical Path Delay (s) * 0.4 Universal Imran Area (MTE's) 0.3 0.2 0.1 0 VPR UFP (Baseline)
Sensitivity to Place and Route Tool: 0.5 Disjoint Wilton Critical Path Delay (s) * 0.4 Universal Imran Area (MTE's) 0.3 0.2 0.1 0 VPR UFP Routeability- (Baseline) Driven
Sensitivity to Place and Route Tool: 0.5 Disjoint Wilton Critical Path Delay (s) * 0.4 Universal Imran Area (MTE's) 0.3 0.2 0.1 0 Fast VPR UFP Routeability- (Baseline) Driven
Margin Results: Summary We did many experiments, but here are the variations that had a margin > 5%: Use Fast Option of VPR: 6.8% Use Routability-Driven Place and Route: 320% Synthesized Circuits rather than MCNC ccts: 7.5% Implement on Double-Sized FPGA: 7.5% Use Segments of Length 1 instead of 4 33% All switches buffered (instead of 50/50): 6.8%
Question 3: How Big should each cluster be?
What is the best Cluster (LAB) size? Intuitively: - A larger cluster (LAB) means more local connections - But a larger cluster is slower and has area overhead Previous Published Results: - Between 4 and 10 LUT’s / cluster seem to work well
Sensitivity to Place and Route Tool: 0.6 Critical Path Delay (s) * 0.5 Routability Area (MTE's) 0.4 0.3 UFP 0.2 Fast 0.1 VPR (Baseline) 0 6 7 8 9 10 1 2 3 4 5 Cluster Size
The Main Message is This: Experimental results can be significantly influenced by the assumptions, tools, and techniques used in experimentation There are many architecture papers out there: - Very few really address how sensitive their results are to the experimental assumptions (at UBC, we are guilty of this too) - The results in this talk show that they should
How Sensitive is this on the Architecture? Benchmark Architectures Circuits Optimize (eg. SIS) Orthogonal Architecture Technology Map (eg. Flowmap) Assumptions Area Place and Route (eg. VPR) Delay
Orthogonal Architecture Assumptions: 6.0x10 6 Area (MTE's) 5.5x10 6 5.0x10 6 Fc=0.6 (baseline) 4.5x10 6 2 3 4 5 6 7 LUT Size
Orthogonal Architecture Assumptions: 6.0x10 6 Area (MTE's) 5.5x10 6 Fc=1.0 Fc=0.3 5.0x10 6 Fc=0.6 (baseline) 4.5x10 6 2 3 4 5 6 7 LUT Size Conclusion does depend on Fc
Sensitivity to Fc: 0.14 0.13 Critical Path Delay (s) * 0.12 0.11 Area (MTE) 0.10 0.09 Fc=0.5 0.08 0.07 0.06 2 3 4 5 6 7 8 9 10 Cluster Size
Sensitivity to Fc: 0.14 0.13 Critical Path Delay (s) * 0.12 0.11 Area (MTE) 0.10 0.09 Fc=0.5 0.08 Fc=0.3 0.07 0.06 2 3 4 5 6 7 8 9 10 Cluster Size
Sensitivity to Fc: 0.14 0.13 Critical Path Delay (s) * 0.12 0.11 Area (MTE) Fc=0.7 0.10 0.09 Fc=0.5 0.08 Fc=0.3 0.07 0.06 2 3 4 5 6 7 8 9 10 Cluster Size
Recommend
More recommend