FloatApprox : faithfully rounded floating-point function approximators in FPGAs David B. Thomas Imperial College London 1 David Thomas, Imperial College, dt10@ic.ac.uk
FloPoCo : Parameterised primitives 2 David Thomas, Imperial College, dt10@ic.ac.uk
FloPoCo : Parameterised primitives 3 David Thomas, Imperial College, dt10@ic.ac.uk
FloPoCo : Parameterised primitives 4 David Thomas, Imperial College, dt10@ic.ac.uk
FloatApprox : Parameterised anything Approximation Input format interval Output format 5 David Thomas, Imperial College, dt10@ic.ac.uk
FloatApprox : Parameterised anything 6 David Thomas, Imperial College, dt10@ic.ac.uk
FloatApprox • Architecture for FPGA function approximation – Deeply pipelined – Floating-point in and out – Faithfully rounded • Method and tool for approximating functions – Handles any most twice-differentiable functions – Completely automated: expression -> VHDL – Designed for reliability rather than optimality 7 David Thomas, Imperial College, dt10@ic.ac.uk
1. Motivation 2. The FloatApprox approach 1. Range reduction and approximation method 2. Evaluation architecture 3. Evaluation in hardware 8 David Thomas, Imperial College, dt10@ic.ac.uk
Floating-point IP: Requirements • Faithfully rounded – Make every bit count – Tractable error analysis • Pipelined for 250MHz+ clock rate – Must be pipelined: RAM and DSPs are multi-cycle – HLS tools have retiming built-in • Working RTL (circuit) implementation – A paper can’t be synthesised 10 David Thomas, Imperial College, dt10@ic.ac.uk
What floating-point IP is available? Source Pipelined Faithful RTL add, mul, div FloPoCo Yes Yes Yes log, exp FloPoCo Yes Yes Yes sin, cos FPLibrary No Yes Yes Altera Yes Yes Altera flow only Xilinx Yes ? Vivado HLS only log1p Altera Yes Yes Altera flow only expm1 Altera Yes No OpenCL only erf Altera Yes No OpenCL only 11 David Thomas, Imperial College, dt10@ic.ac.uk
What floating-point IP is available? Source Pipelined Faithful RTL add, mul, div FloPoCo Yes Yes Yes log, exp FloPoCo Yes Yes Yes sin, cos FPLibrary No Yes Yes Altera Yes Yes Altera flow only Xilinx Yes ? Vivado HLS only log1p Altera Yes Yes Altera flow only expm1 Altera Yes No OpenCL only erf Altera Yes No OpenCL only 12 David Thomas, Imperial College, dt10@ic.ac.uk
What floating-point IP is available? Source Pipelined Faithful RTL add, mul, div FloPoCo Yes Yes Yes log, exp FloPoCo Yes Yes Yes sin, cos FPLibrary No Yes Yes Altera Yes Yes Altera flow only Xilinx Yes ? Vivado HLS only log1p Altera Yes Yes Altera flow only expm1 Altera Yes No OpenCL only erf Altera Yes No OpenCL only 13 David Thomas, Imperial College, dt10@ic.ac.uk
Motivation for FloatApprox • We currently have : + , - , * , / , log , exp – Use existing IP: FloPoCo, Xilinx, Altera, ... • We should have : log1p, expm1, erf, sin, acos, ... – What FloatApprox does badly... ... but better than anything else available • What I want : sqrt(-2 log(x)), 1/(1+exp(-x)) – What FloatApprox does well 14 David Thomas, Imperial College, dt10@ic.ac.uk
Goals of FloatApprox • Approximation : FloatApprox as a tool – Convert any function f(x) to RTL – Able to handle most smooth functions – Suitable for automated use • Input : data-types, range, function • Output : faithfully rounded circuit • Architecture : FloatApprox as generated IP – Pipelined – Faithfully rounded – Working RTL 15 David Thomas, Imperial College, dt10@ic.ac.uk
FloatApprox : Approximation • Given a function f t how do we create f a ? • Segment the function so that each segment is: 1. Contained in one input binade 2. Monotonically increasing or decreasing in range 3. Contained in one output binade 4. FaithfulReal: can approx. with real degree d poly 5. FaithfulFixed: can faithfully approximate with fixed-point polynomial of degree d 16 David Thomas, Imperial College, dt10@ic.ac.uk
FloatApprox : Approximation • Given a function f t how do we create f a ? • Segment the function so that each segment is: 1. Contained in one input binade 2. Monotonically increasing or decreasing in range 3. Contained in one output binade 4. FaithfulReal: can approx. with real degree d poly 5. FaithfulFixed: can faithfully approximate with fixed-point polynomial of degree d 17 David Thomas, Imperial College, dt10@ic.ac.uk
Example: Input function over reals 4.0 3.5 0 . 95 sin( 0 . 1 ) x 0 . 06 y x 3.0 x 0 16 x 2.5 y 1.5 0.5 0 2 4 6 8 10 12 14 16 x 18 David Thomas, Imperial College, dt10@ic.ac.uk
Move to float representation 2 4 2 3 2 2 2 1 2 0 2 -1 2 -2 2 -3 2 -4 2 -4 2 -3 2 -2 2 -1 2 0 2 1 2 2 2 3 2 4 19 David Thomas, Imperial College, dt10@ic.ac.uk
1 : Segment using input binades 2 4 2 3 2 2 2 1 2 0 2 -1 2 -2 2 -3 2 -4 2 -4 2 -3 2 -2 2 -1 2 0 2 1 2 2 2 3 2 4 20 David Thomas, Imperial College, dt10@ic.ac.uk
2 : Make segments monotonic 2 4 2 3 2 2 2 1 2 0 2 -1 2 -2 2 -3 2 -4 2 -4 2 -3 2 -2 2 -1 2 0 2 1 2 2 2 3 2 4 21 David Thomas, Imperial College, dt10@ic.ac.uk
3 : Segment using output binades 2 4 2 3 2 2 2 1 2 0 2 -1 2 -2 2 -3 2 -4 2 -4 2 -3 2 -2 2 -1 2 0 2 1 2 2 2 3 2 4 22 David Thomas, Imperial College, dt10@ic.ac.uk
3 : Segment using output binades 2 4 2 3 2 2 2 1 2 0 2 -1 2 -2 2 -3 2 -4 2 -4 2 -3 2 -2 2 -1 2 0 2 1 2 2 2 3 2 4 23 David Thomas, Imperial College, dt10@ic.ac.uk
3 : Segment using output binades 2 4 2 3 2 2 2 1 2 0 2 -1 2 -2 2 -3 2 -4 2 -4 2 -3 2 -2 2 -1 2 0 2 1 2 2 2 3 2 4 24 David Thomas, Imperial College, dt10@ic.ac.uk
3 : Segment using output binades 2 4 2 3 2 2 2 1 2 0 2 -1 2 -2 2 -3 2 -4 2 -4 2 -3 2 -2 2 -1 2 0 2 1 2 2 2 3 2 4 25 David Thomas, Imperial College, dt10@ic.ac.uk
4 : Split to degree d polynomials 2 4 2 3 2 2 2 1 2 0 2 -1 2 -2 2 -3 2 -4 2 -4 2 -3 2 -2 2 -1 2 0 2 1 2 2 2 3 2 4 26 David Thomas, Imperial College, dt10@ic.ac.uk
4 : Split to degree d polynomials 2 4 2 3 2 2 2 1 2 0 2 -1 2 -2 2 -3 2 -4 2 -4 2 -3 2 -2 2 -1 2 0 2 1 2 2 2 3 2 4 27 David Thomas, Imperial College, dt10@ic.ac.uk
4 : Split to degree d polynomials 2 4 2 3 2 2 2 1 2 0 2 -1 2 -2 2 -3 2 -4 2 -4 2 -3 2 -2 2 -1 2 0 2 1 2 2 2 3 2 4 28 David Thomas, Imperial College, dt10@ic.ac.uk
• Segments form a partition on input domain • Segment domains and ranges cover one binade • All segments can be faithfully calculated as degree d fixed-point polynomial 2 1 2 0 2 -1 2 -2 2 -3 2 -4 2 -4 2 -3 2 -2 2 -1 2 0 2 1 2 2 2 3 2 4
Real-world issues • Lots of corner cases to worry about – Crossing from negative to positive to NaN is fun – Method should be faithful by construction • Calculations performed using mpfr and sollya – Mostly interval arithmetic via sollya – Occasionally bisection search in mpfr • Speed of approximation is an issue – Single precision usually takes minutes – Double precision often takes hours
FloatApprox : Architecture flags s expnt significand x Segmentation ë S i û ≤ i ≤ é S i ù Table-Lookup flags s expnt c 0 c 1 ... c d c 0 c 1 c d x Fixed-Point Polynomial y = ∑ c i x i flags s expnt significand 31 David Thomas, Imperial College, dt10@ic.ac.uk
Compile-time configuration flags s expnt significand 2 1 2 0 x 2 -1 Segmentation ë S i û ≤ i ≤ é S i ù 2 -2 2 -3 Table-Lookup 2 -4 2 -4 2 -3 2 -2 2 -1 2 0 2 1 2 2 2 3 flags s expnt c 0 c 1 ... c d c 0 c 1 c d x Fixed-Point Polynomial y = ∑ c i x i flags s expnt significand 32 David Thomas, Imperial College, dt10@ic.ac.uk
Evaluation: Input flags s expnt significand 2 1 2 0 x 2 -1 Segmentation ë S i û ≤ i ≤ é S i ù 2 -2 2 -3 Table-Lookup 2 -4 2 -4 2 -3 2 -2 2 -1 2 0 2 1 2 2 2 3 flags s expnt c 0 c 1 ... c d c 0 c 1 c d x Fixed-Point Polynomial y = ∑ c i x i flags s expnt significand 33 David Thomas, Imperial College, dt10@ic.ac.uk
Evaluation: Segmentation flags s expnt significand 2 1 2 0 x 2 -1 Segmentation ë S i û ≤ i ≤ é S i ù 2 -2 2 -3 Table-Lookup 2 -4 2 -4 2 -3 2 -2 2 -1 2 0 2 1 2 2 2 3 flags s expnt c 0 c 1 ... c d c 0 c 1 c d x Fixed-Point Polynomial y = ∑ c i x i flags s expnt significand 34 David Thomas, Imperial College, dt10@ic.ac.uk
Evaluation: Segmentation flags s expnt significand 2 1 2 0 x 2 -1 Segmentation ë S i û ≤ i ≤ é S i ù 2 -2 2 -3 Table-Lookup 2 -4 2 -4 2 -3 2 -2 2 -1 2 0 2 1 2 2 2 3 flags s expnt c 0 c 1 ... c d c 0 c 1 c d x Fixed-Point Polynomial y = ∑ c i x i flags s expnt significand 35 David Thomas, Imperial College, dt10@ic.ac.uk
Evaluation: Segmentation flags s expnt significand 2 1 2 0 x 2 -1 Segmentation ë S i û ≤ i ≤ é S i ù 2 -2 2 -3 Table-Lookup 2 -4 2 -4 2 -3 2 -2 2 -1 2 0 2 1 2 2 2 3 flags s expnt c 0 c 1 ... c d c 0 c 1 c d x Fixed-Point Polynomial y = ∑ c i x i flags s expnt significand 36 David Thomas, Imperial College, dt10@ic.ac.uk
Evaluation: Segmentation flags s expnt significand 2 1 2 0 x 2 -1 Segmentation ë S i û ≤ i ≤ é S i ù 2 -2 2 -3 Table-Lookup 2 -4 2 -4 2 -3 2 -2 2 -1 2 0 2 1 2 2 2 3 flags s expnt c 0 c 1 ... c d c 0 c 1 c d x Fixed-Point Polynomial y = ∑ c i x i flags s expnt significand 37 David Thomas, Imperial College, dt10@ic.ac.uk
Recommend
More recommend