on the sensitivity of fpga architectural conclusions to
play

On the Sensitivity of FPGA Architectural Conclusions to Experimental - PowerPoint PPT Presentation

On the Sensitivity of FPGA Architectural Conclusions to Experimental Assumptions, Tools, and Techniques Andy Yan, Rebecca Cheng, Steve Wilton University of British Columbia Vancouver, B.C. stevew@ece.ubc.ca FPGA Experiments: Impressive


  1. On the Sensitivity of FPGA Architectural Conclusions to Experimental Assumptions, Tools, and Techniques Andy Yan, Rebecca Cheng, Steve Wilton University of British Columbia Vancouver, B.C. stevew@ece.ubc.ca

  2. FPGA Experiments: Impressive improvement in FPGA Technology: 1994: 25,000 Gates was good 2001: 6,000,000 System Gates How did this happen? - Improvements in process technology - Improvements in CAD Tools - Improvements in Architectures The key behind this: Experimentation

  3. The Danger of Experimentation: No matter how careful you are: - You will have to make some assumptions - You will have to settle on an experimental technique - You will have to settle on a CAD tool But what if these assumptions, techniques, & tools impact the conclusions… Can we believe any of these results?

  4. This Talk: Take a step back and look at some basic experiments: - What is the best LUT size? - What is the best switch block topology? - What is the best cluster size? - What is the best memory size? The answers have all been published… But, how sensitive are they to the Assumptions, Tools, and Techniques

  5. Question 1: What is the best LUT Size?

  6. What is the best LUT size? Intuitively, in terms of area: - A smaller LUT takes up less chip area - But more of them area required for a circuit Intuitively, in terms of delay: - A smaller LUT is faster - But the critical path passes through more of them (and also through the routing!) Published results: 4-6 inputs in each LUT is a good choice

  7. Baseline Experiment: Benchmark Architectures Circuits Optimize (eg. SIS) Technology Map (eg. Flowmap) Area Place and Route (eg. VPR) Delay

  8. How Sensitive is this on the Tools? Benchmark Architectures Circuits Optimize (eg. SIS) Technology- Mapping Tool Technology Map (eg. Flowmap) Area Place and Route (eg. VPR) Delay

  9. Sensitivity to Technology Mapper 0.30 Critical Path Delay (s) * 0.25 0.20 Area (MTE's) 0.15 Flowmap 0.10 (Baseline) 0.05 0 3 4 5 6 7 2 LUT Size

  10. Sensitivity to Technology Mapper 0.30 Critical Path Delay (s) * 0.25 0.20 Area (MTE's) 0.15 Flowmap 0.10 (Baseline) 0.05 Cutmap 0 3 4 5 6 7 2 LUT Size

  11. Sensitivity to Technology Mapper 0.30 Critical Path Delay (s) * 0.25 Chortle 0.20 Area (MTE's) 0.15 Flowmap 0.10 (Baseline) 0.05 Cutmap 0 3 4 5 6 7 2 LUT Size Conclusion depends on technology-mapper

  12. How Sensitive are these results? Benchmark Architectures Circuits Optimize (eg. SIS) Technology Map (eg. Flowmap) Place and Route Tool Area Place and Route (eg. VPR) Delay

  13. Sensitivity to Place and Route Tool: 0.70 0.60 Critical Path Delay (s) * 0.50 Area (MTE's) 0.40 0.30 0.20 0.10 Normal VPR (Baseline) 0 2 3 4 5 6 7 LUT Size

  14. Sensitivity to Place and Route Tool: 0.70 0.60 Critical Path Delay (s) * 0.50 Area (MTE's) 0.40 0.30 0.20 Fast 0.10 Normal VPR (Baseline) 0 2 3 4 5 6 7 LUT Size

  15. Sensitivity to Place and Route Tool: 0.70 0.60 Critical Path Delay (s) * 0.50 Area (MTE's) 0.40 0.30 UFP 0.20 Fast 0.10 Normal VPR (Baseline) 0 2 3 4 5 6 7 LUT Size

  16. Sensitivity to Place and Route Tool: 0.70 0.60 Critical Path Delay (s) * 0.50 Area (MTE's) Routability- 0.40 Driven 0.30 UFP 0.20 Fast 0.10 Normal VPR (Baseline) 0 2 3 4 5 6 7 LUT Size

  17. How Sensitive is this on the Tools? Benchmark Architectures Circuits Optimize (eg. SIS) Optimization Technology Map (eg. Flowmap) Area Place and Route (eg. VPR) Delay

  18. Optimization Scripts: 5.5x10 6 5.0x10 6 SIS + Flowmap Area (MTE's) 4.5x10 6 4.0x10 6 3.5x10 6 3.0x10 6 2 3 4 5 6 7 LUT Size

  19. Optimization Scripts: 5.5x10 6 5.0x10 6 SIS + Flowmap Area (MTE's) 4.5x10 6 4.0x10 6 3.5x10 6 (SIS + Flowmap)*2 3.0x10 6 2 3 4 5 6 7 LUT Size Optimization of circuits is important!

  20. How Sensitive is this on the Circuits? Benchmark Architectures Circuits Optimize (eg. SIS) Circuits Technology Map (eg. Flowmap) Area Place and Route (eg. VPR) Delay

  21. Benchmark Circuits: 0.60 Critical Path Delay (s) * 0.50 Area (MTE's) 0.40 0.30 Synthesized 0.20 MCNC 0.10 0 3 5 6 2 4 7 LUT Size MCNC Circuits behave differently than “real” circuits

  22. Quantifying our Results Want a number that indicates how strongly our conclusions are affected by an experimental variation Consider an experiment to find best value of an architectural parameter Run 1: Baseline Run 2: Same experiment with one experimental parameter varied Margin = The difference in conclusion between Run 1 and Run 2

  23. Margin : Case 1 Area * Delay RUN 1 Best Architecture Sweep of an Architectural Parameter

  24. Margin : Case 1 Y% Area * Delay RUN 2 X% RUN 1 Best Architecture Sweep of an Architectural Parameter Margin = | X – Y |

  25. Margin: Case 2 Y% RUN 2 Area * Delay Best Architecture RUN 1 X% Best Architecture Sweep of an Architectural Parameter Margin = MAX( X , Y )

  26. Quantifying the Sensitivity: Categorize Experimental Variations by their Margin: 0%-2%: Not Sensitive 2%-5%: Slightly Sensitive 5%-10%: Sensitive 10%-100%: Very Sensitive > 100%: Extremely Sensitive We can have area margins, delay margins, and area * delay margins.

  27. Margin Results: Summary I’ll leave a paper with tabulated results, but here are the variations that had a margin > 5%: Using Chortle instead of Flowmap: 76% Optimize and Tech Map circuits twice: 8.5% Use Routability-Driven Place and Route: 301% Synthesized Circuits rather than MCNC ccts: 11% Multiply Minimum Channel width by 1.1 5.4% Use Fc=0.3 rather than Fc=0.6 5.7% Use Fc=0.4 rather than Fc=0.6 11% Use Fc=0.7 rather than Fc=0.6 5.5% Use Fc=0.8 rather than Fc=0.6 11% Use Segments of Length 1 instead of 4 8.5%

  28. Question 2: What is the best Switch Block Topology?

  29. What is the best Switch Block? Published Switch Blocks - Disjoint switch block (Xilinx) - Universal switch block - Wilton switch block - Imran switch block (combination of Wilton and Disjoint block) Our FPL paper showed the Imran block was good: - Unlike disjoint, it does not divide routing fabric into segments - Unlike Wilton, it does not suffer from extra transistors in segmented architectures

  30. Sensitivity to Place and Route Tool: 0.5 Disjoint Wilton Critical Path Delay (s) * 0.4 Universal Imran Area (MTE's) 0.3 0.2 0.1 0 VPR (Baseline)

  31. Sensitivity to Place and Route Tool: 0.5 Disjoint Wilton Critical Path Delay (s) * 0.4 Universal Imran Area (MTE's) 0.3 0.2 0.1 0 VPR UFP (Baseline)

  32. Sensitivity to Place and Route Tool: 0.5 Disjoint Wilton Critical Path Delay (s) * 0.4 Universal Imran Area (MTE's) 0.3 0.2 0.1 0 VPR UFP Routeability- (Baseline) Driven

  33. Sensitivity to Place and Route Tool: 0.5 Disjoint Wilton Critical Path Delay (s) * 0.4 Universal Imran Area (MTE's) 0.3 0.2 0.1 0 Fast VPR UFP Routeability- (Baseline) Driven

  34. Margin Results: Summary We did many experiments, but here are the variations that had a margin > 5%: Use Fast Option of VPR: 6.8% Use Routability-Driven Place and Route: 320% Synthesized Circuits rather than MCNC ccts: 7.5% Implement on Double-Sized FPGA: 7.5% Use Segments of Length 1 instead of 4 33% All switches buffered (instead of 50/50): 6.8%

  35. Question 3: How Big should each cluster be?

  36. What is the best Cluster (LAB) size? Intuitively: - A larger cluster (LAB) means more local connections - But a larger cluster is slower and has area overhead Previous Published Results: - Between 4 and 10 LUT’s / cluster seem to work well

  37. Sensitivity to Place and Route Tool: 0.6 Critical Path Delay (s) * 0.5 Routability Area (MTE's) 0.4 0.3 UFP 0.2 Fast 0.1 VPR (Baseline) 0 6 7 8 9 10 1 2 3 4 5 Cluster Size

  38. The Main Message is This: Experimental results can be significantly influenced by the assumptions, tools, and techniques used in experimentation There are many architecture papers out there: - Very few really address how sensitive their results are to the experimental assumptions (at UBC, we are guilty of this too) - The results in this talk show that they should

  39. How Sensitive is this on the Architecture? Benchmark Architectures Circuits Optimize (eg. SIS) Orthogonal Architecture Technology Map (eg. Flowmap) Assumptions Area Place and Route (eg. VPR) Delay

  40. Orthogonal Architecture Assumptions: 6.0x10 6 Area (MTE's) 5.5x10 6 5.0x10 6 Fc=0.6 (baseline) 4.5x10 6 2 3 4 5 6 7 LUT Size

  41. Orthogonal Architecture Assumptions: 6.0x10 6 Area (MTE's) 5.5x10 6 Fc=1.0 Fc=0.3 5.0x10 6 Fc=0.6 (baseline) 4.5x10 6 2 3 4 5 6 7 LUT Size Conclusion does depend on Fc

  42. Sensitivity to Fc: 0.14 0.13 Critical Path Delay (s) * 0.12 0.11 Area (MTE) 0.10 0.09 Fc=0.5 0.08 0.07 0.06 2 3 4 5 6 7 8 9 10 Cluster Size

  43. Sensitivity to Fc: 0.14 0.13 Critical Path Delay (s) * 0.12 0.11 Area (MTE) 0.10 0.09 Fc=0.5 0.08 Fc=0.3 0.07 0.06 2 3 4 5 6 7 8 9 10 Cluster Size

  44. Sensitivity to Fc: 0.14 0.13 Critical Path Delay (s) * 0.12 0.11 Area (MTE) Fc=0.7 0.10 0.09 Fc=0.5 0.08 Fc=0.3 0.07 0.06 2 3 4 5 6 7 8 9 10 Cluster Size

Recommend


More recommend