efficient floating point logarithm unit for fpgas
play

Efficient Floating-Point Logarithm Unit for FPGAs Nikolaos Alachiotis - PowerPoint PPT Presentation

Efficient Floating-Point Logarithm Unit for FPGAs Nikolaos Alachiotis , Alexandros Stamatakis The Exelixis Lab, Dept. of Computer Science, TUM, Munich, Germany PRESENTATION OVERVIEW Introduction Approximation Strategy Reconfigurable


  1. Efficient Floating-Point Logarithm Unit for FPGAs Nikolaos Alachiotis , Alexandros Stamatakis The Exelixis Lab, Dept. of Computer Science, TUM, Munich, Germany

  2. PRESENTATION OVERVIEW ● Introduction ● Approximation Strategy ● Reconfigurable Architecture ● Performance Evaluation ● Conclusion and Future Work

  3. INTRODUCTION ● The Project: Design of HW accelerators for Phylogenetic Inference Programs

  4. INTRODUCTION ● The Project: Design of HW accelerators for Phylogenetic Inference Programs Calculation of evolutionary relationships between organisms core function: the Phylogenetic Likelihood Function

  5. INTRODUCTION ● The Project: Design of HW accelerators for Phylogenetic Inference Programs Ancestral probability vector Virtual Root Tip probability vector

  6. INTRODUCTION ● The Project: Design of HW accelerators for Phylogenetic Inference Programs ● The Phylogenetic Likelihood Function: 85% of total execution time ● Log-Likelihood Scores: 2% of total execution time

  7. INTRODUCTION ● The Project: Design of HW accelerators for Phylogenetic Inference Programs ● The Phylogenetic Likelihood Function: 85% of total execution time Need for a ● Log-Likelihood Scores: resource-efficient 2% of total execution time logarithm function

  8. APPROXIMATION STRATEGY “A Hardware-Independent Fast Logarithm Approximation with Adjustable Accuracy,” by O. Vinyals, G. Friedland. Tenth IEEE Inter. Symposium on Multimedia , pp. 61–65, 2008. Open source C implementation: ICSILog 0.6 BETA Floating-Point number in IEEE-754 standard sign exponent mantissa Number = sign * 2 exponent * mantissa

  9. APPROXIMATION STRATEGY Number = sign * 2 exponent * mantissa Logarithm defined only for positive values LOG(Number) = LOG ( 2 exponent * mantissa ) Multiplicative property of logarithm = LOG ( 2 exponent ) + LOG( mantissa ) = exponent * LOG (2) + LOG( mantissa )

  10. APPROXIMATION STRATEGY Number = sign * 2 exponent * mantissa Logarithm defined only for positive values LOG(Number) = LOG ( 2 exponent * mantissa ) Multiplicative property of logarithm = LOG ( 2 exponent ) + LOG( mantissa ) = exponent * LOG (2) + LOG( mantissa ) Lookup Table

  11. APPROXIMATION STRATEGY LOG(Value) = exponent * LOG(2) + LOG( mantissa ) VALUE Sign Exponent Mantissa 63 62 downto 52 51 downto 0 51-q MSBs log(2) X LUT + LOG(VALUE) Sign Exponent Mantissa

  12. LOGARITHM APPROXIMATION UNIT (LAU) ARCHITECTURE Sign Exponent Mantissa 63 62 downto 52 51 downto 0 input 2046 SUB 1 0 MAN LUT EXP LUT FP FP VAL VAL log(2) P MULT R CASE DETECT ADD 1 0 log(input) Sign Exponent Mantissa

  13. LOGARITHM APPROXIMATION UNIT (LAU) ARCHITECTURE Sign Exponent Mantissa 63 62 downto 52 51 downto 0 input 2046 SUB INPUT CASE DETECTION 1 0 MAN log(Negative number)=nan LUT EXP LUT log(Nan)=nan FP FP log(Inf)=Inf VAL VAL log(2) log(-Inf)=nan P MULT R CASE DETECT ADD 1 0 log(input) Sign Exponent Mantissa

  14. LOGARITHM APPROXIMATION UNIT (LAU) ARCHITECTURE Sign Exponent Mantissa 63 62 downto 52 51 downto 0 input 2046 SUB CREATE THE EXPLUT INDEX 1 0 MAN Decimal value Exponent LUT EXP 0 -1023 LUT 1 -1022 … … FP FP 1022 -1 VAL VAL log(2) 1023 0 1024 1 P … ... MULT R 2046 1023 CASE DETECT ADD 1 0 log(input) Sign Exponent Mantissa

  15. LOGARITHM APPROXIMATION UNIT (LAU) ARCHITECTURE Sign Exponent Mantissa 63 62 downto 52 51 downto 0 input 2046 SUB CREATE THE EXPLUT INDEX 1 0 MAN Decimal value Exponent LUT EXP 0 -1023 LUT 1 -1022 … … FP FP 1022 -1 VAL VAL log(2) 1023 0 1024 1 P … ... MULT R 2046 1023 CASE DETECT ADD 1 0 log(input) Sign Exponent Mantissa

  16. LOGARITHM APPROXIMATION UNIT (LAU) ARCHITECTURE Sign Exponent Mantissa 63 62 downto 52 51 downto 0 input 2046 SUB CREATE THE EXPLUT INDEX 1 0 MAN Decimal value Exponent LUT EXP 0 -1023 LUT 1 -1022 EXP … … FP FP LUT 1022 -1 VAL VAL log(2) 1023 0 EXP 1024 1 LUT P … ... MULT R 2046 1023 CASE DETECT ADD 1 0 log(input) Sign Exponent Mantissa

  17. LOGARITHM APPROXIMATION UNIT (LAU) ARCHITECTURE Sign Exponent Mantissa 63 62 downto 52 51 downto 0 input 2046 SUB CREATE THE EXPLUT INDEX 1 0 MAN Decimal value Exponent LUT EXP 0 -1023 LUT 1 -1022 EXP … … FP FP LUT 1022 -1 VAL VAL log(2) 1023 0 EXP 1024 1 X LUT P … ... MULT R 2046 1023 CASE DETECT ADD X - 1023 1 0 log(input) Sign Exponent Mantissa

  18. LOGARITHM APPROXIMATION UNIT (LAU) ARCHITECTURE Sign Exponent Mantissa 63 62 downto 52 51 downto 0 input 2046 SUB CREATE THE EXPLUT INDEX 1 0 MAN Decimal value Exponent LUT EXP 0 -1023 LUT 1 -1022 EXP … … FP FP LUT 1022 -1 VAL VAL log(2) 1023 0 EXP 1024 1 X LUT P … ... MULT R 2046 1023 CASE DETECT ADD 1023- (X – 1023) =2046-X 1 0 log(input) Sign Exponent Mantissa

  19. LOGARITHM APPROXIMATION UNIT (LAU) ARCHITECTURE Sign Exponent Mantissa 63 62 downto 52 51 downto 0 input 2046 SUB FLOATING-POINT VALUE Single-precision values 1 0 MAN Single-precision MULT and ADD LUT EXP LUT FP FP For single-precision inputs VAL VAL log(2) EXPLUT containts 128 entries to construct a single-precision value P MULT R CASE For double-precision inputs DETECT EXPLUT contains 1024 entries ADD to construct a single-precision value 1 0 log(input) Sign Exponent Mantissa

  20. LOGARITHM APPROXIMATION UNIT (LAU) ARCHITECTURE Sign Exponent Mantissa 63 62 downto 52 51 downto 0 input 2046 SUB MANTISSA LUT 1 0 MAN ICSILog 0.6 software LUT EXP LUT FP FP VAL VAL log(2) P MULT R CASE DETECT ADD 1 0 log(input) Sign Exponent Mantissa

  21. PERFORMANCE EVALUATION Accuracy Versus Hardware resources Average Error 0.4 (x10 3 ) 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 10 20 30 40 50 60 Resources (Number of 18Kb block rams)

  22. PERFORMANCE EVALUATION Accuracy Versus Hardware resources Average Error 0.4 (x10 3 ) Log-Likelihood score deviation 0.35 Dataset DP-GNU DP-ICSILog 0.3 (Organisms) 150 -39606.3 -39606.6 0.25 218 -134173.8 -134167.5 6 block rams = 0.2 140 (Prot) -124777.2 -124780.1 4096 LUT entries 0.15 0.1 0.05 0 0 10 20 30 40 50 60 Resources (Number of 18Kb block rams)

  23. PERFORMANCE EVALUATION VIRTEX 5 SX95T for mapping and verification XILINX ISE 10.1 and CHIPSCOPE Pro Analyzer F. de Dinechin, C. Klein, B. Pasca, “Generating high-performance custom floating-point pipelines,” Proc. of FPL 2009 .

  24. PERFORMANCE EVALUATION Resource Utilization and Performance: Single Precision 1200 1000 SP-FPLog SP-LAU 800 600 400 200 0 Slice Registers Slice LUTs Occupied Slices

  25. PERFORMANCE EVALUATION Resource Utilization and Performance: Single Precision 6 SP-FPLog SP-LAU 5 4 3 2 1 0 BRAMs 18k BRAMs 36k DSP48Es

  26. PERFORMANCE EVALUATION Resource Utilization and Performance: Single Precision 6 SP-FPLog SP-LAU 5 4 3 FPLog LAU Clock 20 22 2 Latency 244.7 353.5 Max 1 Frequency 0 BRAMs 18k BRAMs 36k DSP48Es

  27. PERFORMANCE EVALUATION Resource Utilization and Performance: Double Precision 3000 DP-FPLog DP-LAU 2500 2000 1500 1000 500 0 Slice Registers Slice LUTs Occupied Slices

  28. PERFORMANCE EVALUATION Resource Utilization and Performance: Double Precision 20 DP-FPLog DP-LAU 18 16 14 12 10 8 6 4 2 0 BRAMs 18k BRAMs 36k DSP48Es

  29. PERFORMANCE EVALUATION Resource Utilization and Performance: Double Precision 20 DP-FPLog DP-LAU 18 16 14 12 10 FPLog LAU 8 Clock 34 22 6 Latency 4 192.3 320.6 Max Frequency 2 0 BRAMs 18k BRAMs 36k DSP48Es

  30. PERFORMANCE EVALUATION Resource Utilization and Performance: Double Precision DP-FPLog with same accuracy as DP-LAU 1200 DP-FPLog DP-LAU 1000 800 600 400 200 0 Slice Registers Slice LUTs Occupied Slices

  31. PERFORMANCE EVALUATION Resource Utilization and Performance: Double Precision DP-FPLog with same accuracy as DP-LAU 3.5 DP-FPLog DP-LAU 3 2.5 2 1.5 1 0.5 0 BRAMs 18k BRAMs 36k DSP48Es

  32. PERFORMANCE EVALUATION Resource Utilization and Performance: Double Precision DP-FPLog with same accuracy as DP-LAU 3.5 DP-FPLog DP-LAU 3 2.5 2 1.5 FPLog LAU Clock 20 22 1 Latency 239.6 320.6 Max 0.5 Frequency 0 BRAMs 18k BRAMs 36k DSP48Es

  33. PERFORMANCE EVALUATION Performance: LAU vs SP/DP-ICSILog vs GNU Log vs MKL Log 7000 Intel Core2 DUO T9600 @ 2.8GHz GNU Log (gnu) 6000 MKL Log (icc) 6MB L2 Cache SP-ICSILog DP-ICSILog 5000 SP-LAU DP-LAU time in milliseconds 4000 SP-LAU VS 3000 GNU-LOG : 11X MKL-LOG : 1.6X 2000 DP-LAU VS 1000 GNU-LOG: 18X MKL-LOG: 2.5X 0 Single Precision Double Precision 100000000 logarithm calculations

  34. CONCLUSION and FUTURE WORK AVAILABILITY DP-ICSILog C Implementation and SP/DP LAU FPGA core for Virtex4 and Virtex5 FPGAs http://wwwkrammer.in.tum.de/exelixis/nikos/ipcores.html Or OpenCores.org: Project name: fp_log http://www.opencores.org/project,fp_log

Recommend


More recommend