On the Sensitivity of FPGA Architectural Conclusions to Experimental - PowerPoint PPT Presentation

On the Sensitivity of FPGA Architectural Conclusions to Experimental Assumptions, Tools, and Techniques Andy Yan, Rebecca Cheng, Steve Wilton University of British Columbia Vancouver, B.C. stevew@ece.ubc.ca

FPGA Experiments: Impressive improvement in FPGA Technology: 1994: 25,000 Gates was good 2001: 6,000,000 System Gates How did this happen? - Improvements in process technology - Improvements in CAD Tools - Improvements in Architectures The key behind this: Experimentation

The Danger of Experimentation: No matter how careful you are: - You will have to make some assumptions - You will have to settle on an experimental technique - You will have to settle on a CAD tool But what if these assumptions, techniques, & tools impact the conclusions… Can we believe any of these results?

This Talk: Take a step back and look at some basic experiments: - What is the best LUT size? - What is the best switch block topology? - What is the best cluster size? - What is the best memory size? The answers have all been published… But, how sensitive are they to the Assumptions, Tools, and Techniques

Question 1: What is the best LUT Size?

What is the best LUT size? Intuitively, in terms of area: - A smaller LUT takes up less chip area - But more of them area required for a circuit Intuitively, in terms of delay: - A smaller LUT is faster - But the critical path passes through more of them (and also through the routing!) Published results: 4-6 inputs in each LUT is a good choice

Baseline Experiment: Benchmark Architectures Circuits Optimize (eg. SIS) Technology Map (eg. Flowmap) Area Place and Route (eg. VPR) Delay

How Sensitive is this on the Tools? Benchmark Architectures Circuits Optimize (eg. SIS) Technology- Mapping Tool Technology Map (eg. Flowmap) Area Place and Route (eg. VPR) Delay

Sensitivity to Technology Mapper 0.30 Critical Path Delay (s) * 0.25 0.20 Area (MTE's) 0.15 Flowmap 0.10 (Baseline) 0.05 0 3 4 5 6 7 2 LUT Size

Sensitivity to Technology Mapper 0.30 Critical Path Delay (s) * 0.25 0.20 Area (MTE's) 0.15 Flowmap 0.10 (Baseline) 0.05 Cutmap 0 3 4 5 6 7 2 LUT Size

Sensitivity to Technology Mapper 0.30 Critical Path Delay (s) * 0.25 Chortle 0.20 Area (MTE's) 0.15 Flowmap 0.10 (Baseline) 0.05 Cutmap 0 3 4 5 6 7 2 LUT Size Conclusion depends on technology-mapper

How Sensitive are these results? Benchmark Architectures Circuits Optimize (eg. SIS) Technology Map (eg. Flowmap) Place and Route Tool Area Place and Route (eg. VPR) Delay

Sensitivity to Place and Route Tool: 0.70 0.60 Critical Path Delay (s) * 0.50 Area (MTE's) 0.40 0.30 0.20 0.10 Normal VPR (Baseline) 0 2 3 4 5 6 7 LUT Size

Sensitivity to Place and Route Tool: 0.70 0.60 Critical Path Delay (s) * 0.50 Area (MTE's) 0.40 0.30 0.20 Fast 0.10 Normal VPR (Baseline) 0 2 3 4 5 6 7 LUT Size

Sensitivity to Place and Route Tool: 0.70 0.60 Critical Path Delay (s) * 0.50 Area (MTE's) 0.40 0.30 UFP 0.20 Fast 0.10 Normal VPR (Baseline) 0 2 3 4 5 6 7 LUT Size

Sensitivity to Place and Route Tool: 0.70 0.60 Critical Path Delay (s) * 0.50 Area (MTE's) Routability- 0.40 Driven 0.30 UFP 0.20 Fast 0.10 Normal VPR (Baseline) 0 2 3 4 5 6 7 LUT Size

How Sensitive is this on the Tools? Benchmark Architectures Circuits Optimize (eg. SIS) Optimization Technology Map (eg. Flowmap) Area Place and Route (eg. VPR) Delay

Optimization Scripts: 5.5x10 6 5.0x10 6 SIS + Flowmap Area (MTE's) 4.5x10 6 4.0x10 6 3.5x10 6 3.0x10 6 2 3 4 5 6 7 LUT Size

Optimization Scripts: 5.5x10 6 5.0x10 6 SIS + Flowmap Area (MTE's) 4.5x10 6 4.0x10 6 3.5x10 6 (SIS + Flowmap)*2 3.0x10 6 2 3 4 5 6 7 LUT Size Optimization of circuits is important!

How Sensitive is this on the Circuits? Benchmark Architectures Circuits Optimize (eg. SIS) Circuits Technology Map (eg. Flowmap) Area Place and Route (eg. VPR) Delay

Benchmark Circuits: 0.60 Critical Path Delay (s) * 0.50 Area (MTE's) 0.40 0.30 Synthesized 0.20 MCNC 0.10 0 3 5 6 2 4 7 LUT Size MCNC Circuits behave differently than “real” circuits

Quantifying our Results Want a number that indicates how strongly our conclusions are affected by an experimental variation Consider an experiment to find best value of an architectural parameter Run 1: Baseline Run 2: Same experiment with one experimental parameter varied Margin = The difference in conclusion between Run 1 and Run 2

Margin : Case 1 Area * Delay RUN 1 Best Architecture Sweep of an Architectural Parameter

Margin : Case 1 Y% Area * Delay RUN 2 X% RUN 1 Best Architecture Sweep of an Architectural Parameter Margin = | X – Y |

Margin: Case 2 Y% RUN 2 Area * Delay Best Architecture RUN 1 X% Best Architecture Sweep of an Architectural Parameter Margin = MAX( X , Y )

Quantifying the Sensitivity: Categorize Experimental Variations by their Margin: 0%-2%: Not Sensitive 2%-5%: Slightly Sensitive 5%-10%: Sensitive 10%-100%: Very Sensitive > 100%: Extremely Sensitive We can have area margins, delay margins, and area * delay margins.

Margin Results: Summary I’ll leave a paper with tabulated results, but here are the variations that had a margin > 5%: Using Chortle instead of Flowmap: 76% Optimize and Tech Map circuits twice: 8.5% Use Routability-Driven Place and Route: 301% Synthesized Circuits rather than MCNC ccts: 11% Multiply Minimum Channel width by 1.1 5.4% Use Fc=0.3 rather than Fc=0.6 5.7% Use Fc=0.4 rather than Fc=0.6 11% Use Fc=0.7 rather than Fc=0.6 5.5% Use Fc=0.8 rather than Fc=0.6 11% Use Segments of Length 1 instead of 4 8.5%

Question 2: What is the best Switch Block Topology?

What is the best Switch Block? Published Switch Blocks - Disjoint switch block (Xilinx) - Universal switch block - Wilton switch block - Imran switch block (combination of Wilton and Disjoint block) Our FPL paper showed the Imran block was good: - Unlike disjoint, it does not divide routing fabric into segments - Unlike Wilton, it does not suffer from extra transistors in segmented architectures

Sensitivity to Place and Route Tool: 0.5 Disjoint Wilton Critical Path Delay (s) * 0.4 Universal Imran Area (MTE's) 0.3 0.2 0.1 0 VPR (Baseline)

Sensitivity to Place and Route Tool: 0.5 Disjoint Wilton Critical Path Delay (s) * 0.4 Universal Imran Area (MTE's) 0.3 0.2 0.1 0 VPR UFP (Baseline)

Sensitivity to Place and Route Tool: 0.5 Disjoint Wilton Critical Path Delay (s) * 0.4 Universal Imran Area (MTE's) 0.3 0.2 0.1 0 VPR UFP Routeability- (Baseline) Driven

Sensitivity to Place and Route Tool: 0.5 Disjoint Wilton Critical Path Delay (s) * 0.4 Universal Imran Area (MTE's) 0.3 0.2 0.1 0 Fast VPR UFP Routeability- (Baseline) Driven

Margin Results: Summary We did many experiments, but here are the variations that had a margin > 5%: Use Fast Option of VPR: 6.8% Use Routability-Driven Place and Route: 320% Synthesized Circuits rather than MCNC ccts: 7.5% Implement on Double-Sized FPGA: 7.5% Use Segments of Length 1 instead of 4 33% All switches buffered (instead of 50/50): 6.8%

Question 3: How Big should each cluster be?

What is the best Cluster (LAB) size? Intuitively: - A larger cluster (LAB) means more local connections - But a larger cluster is slower and has area overhead Previous Published Results: - Between 4 and 10 LUT’s / cluster seem to work well

Sensitivity to Place and Route Tool: 0.6 Critical Path Delay (s) * 0.5 Routability Area (MTE's) 0.4 0.3 UFP 0.2 Fast 0.1 VPR (Baseline) 0 6 7 8 9 10 1 2 3 4 5 Cluster Size

The Main Message is This: Experimental results can be significantly influenced by the assumptions, tools, and techniques used in experimentation There are many architecture papers out there: - Very few really address how sensitive their results are to the experimental assumptions (at UBC, we are guilty of this too) - The results in this talk show that they should

How Sensitive is this on the Architecture? Benchmark Architectures Circuits Optimize (eg. SIS) Orthogonal Architecture Technology Map (eg. Flowmap) Assumptions Area Place and Route (eg. VPR) Delay

Orthogonal Architecture Assumptions: 6.0x10 6 Area (MTE's) 5.5x10 6 5.0x10 6 Fc=0.6 (baseline) 4.5x10 6 2 3 4 5 6 7 LUT Size

Orthogonal Architecture Assumptions: 6.0x10 6 Area (MTE's) 5.5x10 6 Fc=1.0 Fc=0.3 5.0x10 6 Fc=0.6 (baseline) 4.5x10 6 2 3 4 5 6 7 LUT Size Conclusion does depend on Fc

Sensitivity to Fc: 0.14 0.13 Critical Path Delay (s) * 0.12 0.11 Area (MTE) 0.10 0.09 Fc=0.5 0.08 0.07 0.06 2 3 4 5 6 7 8 9 10 Cluster Size

Sensitivity to Fc: 0.14 0.13 Critical Path Delay (s) * 0.12 0.11 Area (MTE) 0.10 0.09 Fc=0.5 0.08 Fc=0.3 0.07 0.06 2 3 4 5 6 7 8 9 10 Cluster Size

Sensitivity to Fc: 0.14 0.13 Critical Path Delay (s) * 0.12 0.11 Area (MTE) Fc=0.7 0.10 0.09 Fc=0.5 0.08 Fc=0.3 0.07 0.06 2 3 4 5 6 7 8 9 10 Cluster Size

On the Sensitivity of FPGA Architectural Conclusions to Experimental - PowerPoint PPT Presentation

On the Sensitivity of FPGA Architectural Conclusions to Experimental Assumptions, Tools, and Techniques Andy Yan, Rebecca Cheng, Steve Wilton University of British Columbia Vancouver, B.C. stevew@ece.ubc.ca FPGA Experiments: Impressive

OBAMA PRESIDENTIAL CENTER INTRODUCTION 2 INTRODUCTION 3 ARCHITECTURAL DESIGN 4 ARCHITECTURAL

Climate Sensitivity We consider climate sensitivity in a very simple context. Climate Sensitivity

Religious Architectural Religious Architectural Religious Architectural Religious Architectural

Sensitivity Of Quake3 Players Sensitivity Of Quake3 Players Sensitivity Of Quake3 Players

Architectural Resources Cambridge Architectural Resources Cambridge Architectural Resources

Open Source FPGA Toolchain FPGA LSE Summer Week 2015 iCE40 Flow Conclusion Vincent Gatine

Tips about an FPGA 02/09/2018 J.C. special topic FPGA ( field-programmable gate array ) FPGA :

FPGA What is a FPGA? How FPGAs work How do they work? Manufacturers

WWW.FPGA What is an FPGA? Field Programmable Gate Array Introduction to FPGA designs

Sensitivity to Market Risks 1 METAC Workshop Sensitivity to Market Risks I OVERVIEW A

NES Architectural Ltd http://www.nes-solutions.co.uk/architectural Who Are we? NES Architectural

Current Trends in Hybrid FPGA/CPU Devices Hybrid FPGA/CPU Devices Xilinx Zynq Series Real

FPGA-CAPELLA: A REAL TIME AUDIO FX UNIT COSMA KUFA AND JUSTIN XIAO WHAT IS FPGA-CAPELLA?

Public FPGA based DM Public FPGA based DMA Atta A Attacking king UlfFrisk Agenda Background

GRVI Phalanx Update: A Massively Parallel RISC-V FPGA Accelerator Framework Jan Gray |

An introduction to FPGA-based acceleration of neural networks Marco Pagani 1 What is an FPGA?

Third quarter 2015 Conference Call Presenters: Yvon Charest, President and CEO Ren Chabot, EVP,

Combining Estimates CLRS 2014 Tom Struppeck The University of Texas at Austin Goal: Make a new

Variance Will Perkins January 22, 2013 Variance Definition The variance of a random variable X

Matrix Algebra of Sample Statistics James H. Steiger Department of Psychology and Human

Machine Translation CMSC 723 / LING 723 / INST 725 M ARINE C ARPUAT marine@cs.umd.edu T oday:

Tangent-Normal Adversarial Regularization for Semi-Supervised Learning Bing Yu , Jingfeng Wu

NumFOCUS: An Approach to Sustaining Scientific Software PRESENTED BY: Andy R. Terrel

LBNF/DUNE UK Project News Alfons Weber University of Oxford, UKRI/STFC Rutherford Appleton Lab