evaluating the hardware cost of the posit number system
play

Evaluating the hardware cost of the posit number system FPL19 - PowerPoint PPT Presentation

Evaluating the hardware cost of the posit number system FPL19 Barcelona Yohann Uguen, Luc Forget, Florent de Dinechin Univ Lyon, INSA Lyon, Inria, CITI September 9, 2019 Motivation Posit : new encoding scheme for real values 2/ 19


  1. Evaluating the hardware cost of the posit number system FPL’19 – Barcelona Yohann Uguen, Luc Forget, Florent de Dinechin Univ Lyon, INSA Lyon, Inria, CITI September 9, 2019

  2. Motivation Posit : new encoding scheme for real values 2/ 19

  3. Motivation Posit : new encoding scheme for real values Posit claim : fewer bits, better results 2/ 19

  4. Motivation Posit : new encoding scheme for real values Posit claim : fewer bits, better results 2/ 19

  5. Motivation Posit : new encoding scheme for real values Posit claim : fewer bits, better results How much does it cost ? 2/ 19

  6. Floating point numbers Floating point values consist in a value 1 . F scaled by a power of two 2 E . v = ( − 1) s × 1 . F × 2 E 3/ 19

  7. Floating point numbers Floating point values consist in a value 1 . F scaled by a power of two 2 E . v = ( − 1) s × 1 . F × 2 E Number of values Encoding scheme Max 2 power ∈ [1 , 2[ W E = 3 W F = 4 2 7 = 128 2 4 = 16 e 2 e 1 e 0 s f 3 f 2 f 1 f 0 W E = 2 W F = 5 2 3 = 8 2 5 = 32 e 1 e 0 s f 4 f 3 f 2 f 1 f 0 3/ 19

  8. Floating point numbers dilemma Trade-off between dynamic range and precision with the choice of W E and W F . 4/ 19

  9. Floating point numbers dilemma Trade-off between dynamic range and precision with the choice of W E and W F . IEEE binary16 = FP<5, 10> W E = 5 W F = 10 s e 4 e 3 e 2 e 1 e 0 f 9 f 8 f 7 f 6 f 5 f 4 f 3 f 2 f 1 f 0 4/ 19

  10. Floating point numbers dilemma Trade-off between dynamic range and precision with the choice of W E and W F . IEEE binary16 = FP<5, 10> W E = 5 W F = 10 e 4 e 3 e 2 e 1 e 0 s f 9 f 8 f 7 f 6 f 5 f 4 f 3 f 2 f 1 f 0 bfloat16 = FP<8,7> W E = 8 W F = 7 e 7 e 6 e 5 e 4 e 3 e 2 e 1 e 0 s f 6 f 5 f 4 f 3 f 2 f 1 f 0 4/ 19

  11. Floating point numbers dilemma Trade-off between dynamic range and precision with the choice of W E and W F . IEEE binary16 = FP<5, 10> W E = 5 W F = 10 s e 4 e 3 e 2 e 1 e 0 f 9 f 8 f 7 f 6 f 5 f 4 f 3 f 2 f 1 f 0 bfloat16 = FP<8,7> W E = 8 W F = 7 e 7 e 6 e 5 e 4 e 3 e 2 e 1 e 0 s f 6 f 5 f 4 f 3 f 2 f 1 f 0 DLFloat16 = FP<9, 6> W E = 9 W F = 6 e 8 e 7 e 6 e 5 e 4 e 3 e 2 e 1 e 0 s f 5 f 4 f 3 f 2 f 1 f 0 4/ 19

  12. The posit encoding scheme – simple case • Word size N • Exponent: variable length sequence r of identical bits. • Remaining bits: fraction bits 5/ 19

  13. The posit encoding scheme – simple case • Word size N • Exponent: variable length sequence r of identical bits. • Remaining bits: fraction bits r = 1 1 . 10001 × 2 1 − 1 = 1 . 53125 Posit<8> 0 1 0 1 0 0 0 1 5/ 19

  14. The posit encoding scheme – simple case • Word size N • Exponent: variable length sequence r of identical bits. • Remaining bits: fraction bits r = 1 1 . 10001 × 2 1 − 1 = 1 . 53125 0 1 0 1 0 0 0 1 Posit<8> r = 3 1 . 001 × 2 3 − 1 = 4 . 5 0 1 1 1 0 0 0 1 5/ 19

  15. The posit encoding scheme – simple case • Word size N • Exponent: variable length sequence r of identical bits. • Remaining bits: fraction bits r = 1 1 . 10001 × 2 1 − 1 = 1 . 53125 0 1 0 1 0 0 0 1 r = 3 Posit<8> 1 . 001 × 2 3 − 1 = 4 . 5 0 1 1 1 0 0 0 1 r = 5 1 . 1 × 2 5 − 1 = 24 0 1 1 1 1 1 0 1 5/ 19

  16. The posit encoding scheme – simple case • Word size N • Exponent: variable length sequence r of identical bits. • Remaining bits: fraction bits r = 1 1 . 10001 × 2 1 − 1 = 1 . 53125 0 1 0 1 0 0 0 1 r = 3 1 . 001 × 2 3 − 1 = 4 . 5 0 1 1 1 0 0 0 1 r = 5 Posit<8> 1 . 1 × 2 5 − 1 = 24 0 1 1 1 1 1 0 1 r = 7 1 × 2 7 − 1 = 64 0 1 1 1 1 1 1 1 5/ 19

  17. Posit simple case limitation Bill Gates’s fortune : ≈ 103 . 5 × 10 9 $ 6/ 19

  18. Posit simple case limitation Bill Gates’s fortune : ≈ 103 . 5 × 10 9 $ Posit < 32 > (103 . 5 × 10 9 ) = 2 30 ≈ 1 . 1 × 10 9 6/ 19

  19. Posit simple case limitation Bill Gates’s fortune : ≈ 103 . 5 × 10 9 $ Posit < 32 > (103 . 5 × 10 9 ) = 2 30 ≈ 1 . 1 × 10 9 6/ 19

  20. Increasing the range • Shift the exponent of W ES bits (scale by 2 W ES ). • Store exponent W ES low bits before fraction bits. 7/ 19

  21. Increasing the range • Shift the exponent of W ES bits (scale by 2 W ES ). • Store exponent W ES low bits before fraction bits. Posit<8,2> W ES = 2 1 . 001 × 2 0 × 4+3 = 1 . 001 × 2 3 = 9 0 1 0 1 1 0 0 1 7/ 19

  22. Increasing the range • Shift the exponent of W ES bits (scale by 2 W ES ). • Store exponent W ES low bits before fraction bits. Posit<8,2> W ES = 2 1 . 001 × 2 0 × 4+3 = 1 . 001 × 2 3 = 9 0 1 0 1 1 0 0 1 1 . 1 × 2 2 × 4+0 = 1 . 1 × 2 8 = 385 0 1 1 1 0 0 0 1 7/ 19

  23. Increasing the range • Shift the exponent of W ES bits (scale by 2 W ES ). • Store exponent W ES low bits before fraction bits. Posit<8,2> W ES = 2 1 . 001 × 2 0 × 4+3 = 1 . 001 × 2 3 = 9 0 1 0 1 1 0 0 1 1 . 1 × 2 2 × 4+0 = 1 . 1 × 2 8 = 385 0 1 1 1 0 0 0 1 1 × 2 6 × 4+0 = 1 × 2 24 ≈ 16 × 10 6 0 1 1 1 1 1 1 1 7/ 19

  24. Overview Our goals: • Evaluate the hardware cost of posits • Compare this cost to standard FP hardware • Provide an experimentation framework for posit hardware gitlab.inria.fr/lforget/marto 8/ 19

  25. Overview Our goals: • Evaluate the hardware cost of posits • Compare this cost to standard FP hardware • Provide an experimentation framework for posit hardware Our tool Marto (Modern arithmetic tools): • Open source HLS library for custom sized posit arithmetic • Handling of Addition, Product, and quire accumulation gitlab.inria.fr/lforget/marto 8/ 19

  26. Marto usage example IEEE binary32 adder Posit 32,2 adder #include "ieeefloats/ieee_dim.hpp" #include "posit/posit_dim.hpp" // IEEENumber<WE, WF> // PositNumber<N, WES> IEEENumber<8, 23> op1; PositNumber<32, 2> op1; IEEENumber<8, 23> op2; PositNumber<32, 2> op2; IEEENumber<8, 23> op3; PositNumber<32, 2> op3; // Compute the IEEE sum // compute the Posit(32,2) sum auto sum = op1 + op2 + op3; auto sum = op1 + op2 + op3; // ... // ... 9/ 19

  27. Variable-size fields are not hardware friendly Fixed Size Fields Intermediate Representation posit operand 1 decoder input 1 posit operator result encoder result posit operand 2 decoder input 2 10/ 19

  28. Variable-size fields are not hardware friendly Fixed Size Fields Intermediate Representation posit operand 1 decoder input 1 posit operator result encoder result posit operand 2 decoder input 2 Which intermediate representation ? 10/ 19

  29. Posit Intermediate Format Posit Intermediate Format (PIF) : the smallest floating point format to store any value of a Posit format. • Significand stored in 2’s complement • Extra bits for exact rounding (Round, Sticky) • Extra bits for logic simplification (IsNaR, I) Format W E W F Posit(8,0) 4 5 Posit(16, 1) 6 12 Posit(32, 2) 8 27 Posit(64, 3) 10 58 11/ 19

  30. Posit decoder Fixed Size Fields Intermediate Representation posit operand 1 decoder input 1 posit operator result encoder result posit operand 2 decoder input 2 12/ 19

  31. Posit decoder PositN / N − 1 / / N − 2 N − 1 OR reduce LZOC + Shift s r ES F / N − 3 es 1 es 0 0 1 1 0 f 1 f 0 log 2 ( N ) / w es / + Bias / N − 3 − w es S isNaR I E F 12/ 19

  32. Posit decoder PositN / N − 1 / / N − 2 N − 1 OR reduce LZOC + Shift s r ES F / N − 3 es 1 es 0 0 1 1 0 f 1 f 0 log 2 ( N ) / w es / es 1 es 0 0 0 1 f 1 f 0 0 + Bias / N − 3 − w es S isNaR I E F 12/ 19

  33. Posit encoder Fixed Size Fields Intermediate Representation posit operand 1 decoder input 1 Posit operator result encoder result posit operand 2 decoder input 2 13/ 19

  34. Posit encoder isNaR S E F Round Sticky / / ⌈ log 2( N ) ⌉ + 1 + w es / N − 1 − w es N − 3 − w es − Bias w es / / N / ⌈ log 2( N ) ⌉ + 1 01 10 shifter+sticky ∼ / 2 / w es + 2 / N + 1 / ( msb ) 1 / / N − 1 / 1 / / 1 ( lsb ) ⌈ log 2( N ) ⌉ +0 / 1 + / N − 1 / NaR N PositN 13/ 19

  35. Posit addition comparison with state of the art N Design LUTs Delay (ns) Chaurasiya et al. 320 23 16 Jaiswal et al. 460 21 Marto (this work) 320 21 Synthesis targets Zynq FPGA Chaurasiya et al. 981 40 32 Jaiswal et al. 1115 29 Marto (this work) 745 24 • Chaurasiya et al. : Parametrized Posit Arithmetic Hardware Generator 2018 • Jaiswal et al. : PACoGen: A Hardware Posit Arithmetic Core Generator 2019 14/ 19

  36. Posit product comparison with state of the art N Design LUTs delay (ns) DSPs Chaurasiya et al. 218 24 1 16 Jaiswal et al. 271 19 1 Marto (this work) 253 18 1 Synthesis targets Zynq FPGA Chaurasiya et al. 572 33 4 32 Jaiswal et al. 648 27 4 Marto (this work) 4 469 27 • Chaurasiya et al. : Parametrized Posit Arithmetic Hardware Generator 2018 • Jaiswal et al. : PACoGen: A Hardware Posit Arithmetic Core Generator 2019 15/ 19

  37. Comparison with floating point adder N format LUTs Regs. cycles@333 MHz Marto posit 447 371 17 16 IEEE-754 216 205 12 Marto posit 999 975 23 Synthesis targets Kintex 7 32 IEEE-754 425 375 14 Xilinx float 341 467 9 Marto posit 1759 2785 36 64 IEEE-754 918 792 17 Xilinx double 641 1098 11 Posit product : ∼ 2x slower , requires ∼ 2x more LUTs 16/ 19

Recommend


More recommend