Hybrid Dot-Product Design for FP-Enabled FPGAs Bogdan Pasca Intel ARITH 2019, June 10-12, 2019
Hybrid Dot-Product Design for FP-Enabled FPGAs Bogdan Pasca Intel ARITH 2019, June 10-12, 2019
Context FPGAs intersting neural network training accelerators training: mostly dot-products in forward+backward propagation current industry-standard: bfloat16 multiplications, SP reduction bfloat16 vs SP: 2X bandwidth Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Context FPGAs intersting neural network training accelerators training: mostly dot-products in forward+backward propagation current industry-standard: bfloat16 multiplications, SP reduction bfloat16 vs SP: 2X bandwidth Goal → Find a dot-product implementation that: maintains an accuracy comparable to bfloat16+SP maximizes the dot-product density for a given FPGA Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Density FPGA devices have a various mix of resources increasing compute density → make efficient use of existing mix Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Density FPGA devices have a various mix of resources increasing compute density → make efficient use of existing mix Focus on ”Core Logic Fabric” and VP DSP Blocks Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Density - some current devices Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Density - some current devices Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Density - some current devices Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Density - some current devices Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Density - some current devices Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Background Intel FPGAs: DSP blocks implement SP mult-add N-element SP dot-product = N DSPs A B C D AB+CD E F EF+GH G H 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 AB+CD AB+CD+EF+GH EF+GH Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Background Intel FPGAs: DSP blocks implement SP mult-add N-element SP dot-product = N DSPs A B C D AB+CD E F EF+GH G H 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 AB+CD AB+CD+EF+GH EF+GH soft-logic-only solution bfloat16 multiplier → 2/DSP SP FP adder 1 → 1/DSP N-element dot product: C DSP = N / 2 + N − 1 Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Background Intel FPGAs: DSP blocks implement SP mult-add N-element SP dot-product = N DSPs A B C D AB+CD E F EF+GH G H 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 AB+CD AB+CD+EF+GH EF+GH soft-logic-only solution bfloat16 multiplier → 2/DSP SP FP adder 1 → 1/DSP N-element dot product: C DSP = N / 2 + N − 1 adjust ratio: migrate SP FP adders to logic (300-400 ALMs/add) Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Background Intel FPGAs: DSP blocks implement SP mult-add N-element SP dot-product = N DSPs A B C D AB+CD E F EF+GH G H 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 AB+CD AB+CD+EF+GH EF+GH soft-logic-only solution bfloat16 multiplier → 2/DSP SP FP adder 1 → 1/DSP N-element dot product: C DSP = N / 2 + N − 1 adjust ratio: migrate SP FP adders to logic (300-400 ALMs/add) solution is too large Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
How do we solve this? Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Our implementation A 1 .. α B 1 .. α A α + 1 .. α + β B α + 1 .. α + β ACC α α β β soft−logic hard FP part P b dot−prodct P g of the dot−product P l P N = α + β C ALM = f ( α , w ) C DSP = α / 4 + β Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Our implementation A 1 .. α B 1 .. α A α + 1 .. α + β B α + 1 .. α + β ACC α α β β soft−logic hard FP part P b dot−prodct P g of the dot−product P l P N = α + β C ALM = f ( α , w ) C DSP = α / 4 + β Objective: C ALM / C DSP ≈ device ALM/DSP ratio Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Hard FP part P l A0 B0 A1 B1 A2 B2 A3 B3 ACC 32 32 32 32 32 32 32 32 32 32 32 32 A2B2+(A3B3+ACC) A3B3+ACC P b 32 32 32 32 P g P A0B0+A1B1 SP accumulation integrated P g will merge with the logic-based dot product P l recirculated, added with P b using spare adder Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Soft FP part A/B 0-1 2-3 4-5 6-7 8-9 10-11 1 2 two two two 3 18x18 18x18 18x18 4 dot2 dot2 dot2 dot2 dot2 dot2 5 6 7 8 + + + 9 CONV 10 11 + + 12 13 14 + 15 16 17 NORM 18 19 20 21 22 23 Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Soft FP part A/B 0-1 2-3 4-5 6-7 8-9 10-11 A/B 12 A/B 13 1 FPDSP 2 two two two 3 18x18 18x18 18x18 4 dot2 dot2 dot2 dot2 dot2 dot2 5 6 7 8 + + + 9 CONV 10 11 + + 12 13 14 A/B 14 A/B 15 ACC + 15 16 FPDSP 17 NORM 18 19 20 21 22 23 P Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Soft FP part A/B 0-1 2-3 4-5 6-7 8-9 10-11 A/B 12 A/B 13 1 FPDSP 2 two two two 3 18x18 18x18 18x18 8 4 dot2 dot2 dot2 dot2 dot2 dot2 w 5 ... 6 8 7 w ... 8 + + + 9 8 CONV 10 w ... 11 + + 12 8 13 w ... 14 A/B 14 A/B 15 ACC + 15 8 16 FPDSP w ... 17 NORM 8 18 w 19 ... 20 21 22 23 P Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Soft FP part - fused Multipliers 1 DSP = 2 × 18 x 18 = 4 × 8 x 8 mantissa multipliers (+ALMs) skip multiplier normalization RN → RZ w extend exponent to avoid overflow/underlow Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Soft FP part - fused Multipliers 1 DSP = 2 × 18 x 18 = 4 × 8 x 8 mantissa multipliers (+ALMs) skip multiplier normalization RN → RZ w extend exponent to avoid overflow/underlow Adders (except first layer inputs) operate on 2’s complement mantissas mantissa grows by 1 (+1 optional) bit(s) every stage mantissa format changes from (SM, 1, wF) → (2C, 1+1+L, w+L) after final adder, normalization converts to SP Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Soft FP part - fused Multipliers 1 DSP = 2 × 18 x 18 = 4 × 8 x 8 mantissa multipliers (+ALMs) skip multiplier normalization RN → RZ w extend exponent to avoid overflow/underlow Adders (except first layer inputs) operate on 2’s complement mantissas mantissa grows by 1 (+1 optional) bit(s) every stage mantissa format changes from (SM, 1, wF) → (2C, 1+1+L, w+L) after final adder, normalization converts to SP intermediary normalization may be introduced for large α Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Accuracy (Average) w - knob to control the accuracy e c - exponents centered, e s - the exponent span e c = 0 , e s = 10 - inputs generated in ( 2 − 10 · 2 , 2 10 · 2 ) Table: Average relative error comparison between the proposed hybrid dot-product and a typical AI bfloat16+SP implementation for n = 16, α = 12, β = 4, β g = 2, β b = 2 Config Param Proposed AI w = 7 1.287601e-02 e c = 0, e s = 5 w = 8 6.172194e-03 4.570449e-03 w = 9 2.935275e-03 w = 7 7.934867e-03 e c = 0, e s = 10 w = 8 4.120781e-03 3.402314e-03 w = 9 1.864206e-03 w = 7 6.672454e-03 e c = 0, e s = 20 w = 8 3.161355e-03 2.996574e-03 w = 9 1.588372e-03 Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Density r dot = C DSP / ALMs Config Param ALMs DSPs r dot n = 16 w = 7 1030 147 α = 12 , β = 4 w = 8 1075 7 153 β g = 2 , β b = 2 w = 9 1141 163 n = 16 w = 7 863 102 α = 10 , β = 6 w = 8 894 8.5 106 β g = 4 , β b = 2 w = 9 948 112 Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Density r dot = C DSP / ALMs Config Param ALMs DSPs r dot n = 16 w = 7 1030 147 α = 12 , β = 4 w = 8 1075 7 153 β g = 2 , β b = 2 w = 9 1141 163 n = 16 w = 7 863 102 α = 10 , β = 6 w = 8 894 8.5 106 β g = 4 , β b = 2 w = 9 948 112 Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Open questions reduction tree topology? Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Open questions reduction tree topology? accuracy? Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Open questions reduction tree topology? accuracy? resource ratio - account for plumbing? Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Open questions reduction tree topology? accuracy? resource ratio - account for plumbing? integration/syncronization? Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Open questions reduction tree topology? accuracy? resource ratio - account for plumbing? integration/syncronization? design/portability? Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Open questions reduction tree topology? accuracy? resource ratio - account for plumbing? integration/syncronization? design/portability? routability? Bogdan Pasca Hybrid Dot-Product Design for FP-Enabled FPGAs
Recommend
More recommend