Computing correctly rounded logarithms with fixed-point operations Julien Le Maire, Florent de Dinechin, Jean-Michel Muller and Nicolas Brunie
Outline Introduction and context Algorithm Results and comparisons for libm log Bonus: a floating-point in, fixed-point out variant Conclusions J. Le Maire, F. de Dinechin, J.-M. Muller and N. Brunie Computing correctly rounded logarithm with fixed-point operations 2
Software evaluating elementary function This work is about libm functions: prototype : floating-point in, floating-point out, e.g. double log(double x); J. Le Maire, F. de Dinechin, J.-M. Muller and N. Brunie Computing correctly rounded logarithm with fixed-point operations 3
Software evaluating elementary function This work is about libm functions: prototype : floating-point in, floating-point out, e.g. double log(double x); implementation : Nearly 100% of the implementations and literature use floating-point Integer-based implementations: only on processors without FPU (StrongArm, ST200) 1960 1980 2000 IEEE-754 (64 bits) mainstream floating-point 32-bits mainstream integer J. Le Maire, F. de Dinechin, J.-M. Muller and N. Brunie Computing correctly rounded logarithm with fixed-point operations 3
The times they are a-changing 1960 1980 2000 IEEE-754 (64 bits) mainstream floating-point 32-bits mainstream integer J. Le Maire, F. de Dinechin, J.-M. Muller and N. Brunie Computing correctly rounded logarithm with fixed-point operations 4
The times they are a-changing 1960 1980 2000 IEEE-754 (64 bits) mainstream floating-point 32-bits 64-bits mainstream integer J. Le Maire, F. de Dinechin, J.-M. Muller and N. Brunie Computing correctly rounded logarithm with fixed-point operations 4
The times they are a-changing 1960 1980 2000 IEEE-754 (64 bits) mainstream floating-point 32-bits 64-bits mainstream integer At the same time, architectural changes such as unified integer/floating-point registers. J. Le Maire, F. de Dinechin, J.-M. Muller and N. Brunie Computing correctly rounded logarithm with fixed-point operations 4
The times they are a-changing 1960 1980 2000 IEEE-754 (64 bits) mainstream floating-point 32-bits 64-bits mainstream integer At the same time, architectural changes such as unified integer/floating-point registers. This work: Re-evaluate the idea of implementing floating-point functions using integer arithmetic. J. Le Maire, F. de Dinechin, J.-M. Muller and N. Brunie Computing correctly rounded logarithm with fixed-point operations 4
Integer now seems better than floating-point most operations are faster on integers, especially addition 3-5 cycles in floating point 1 cycle in integer (more or less defines the processor cycle time) J. Le Maire, F. de Dinechin, J.-M. Muller and N. Brunie Computing correctly rounded logarithm with fixed-point operations 5
Integer now seems better than floating-point most operations are faster on integers, especially addition 3-5 cycles in floating point 1 cycle in integer (more or less defines the processor cycle time) 64 bits of significand is better than 52 if you can predict the value of the exponent, exponent bits are wasted bits convert precision to speed? J. Le Maire, F. de Dinechin, J.-M. Muller and N. Brunie Computing correctly rounded logarithm with fixed-point operations 5
Integer now seems better than floating-point most operations are faster on integers, especially addition 3-5 cycles in floating point 1 cycle in integer (more or less defines the processor cycle time) 64 bits of significand is better than 52 if you can predict the value of the exponent, exponent bits are wasted bits convert precision to speed? modern 64-bit machines offer all the integer instructions we need addition multiplication 64 x 64 → 128 (mulq) count leading zeroes, shifts (lzcnt, bsr) J. Le Maire, F. de Dinechin, J.-M. Muller and N. Brunie Computing correctly rounded logarithm with fixed-point operations 5
Integer now seems better than floating-point most operations are faster on integers, especially addition 3-5 cycles in floating point 1 cycle in integer (more or less defines the processor cycle time) 64 bits of significand is better than 52 if you can predict the value of the exponent, exponent bits are wasted bits convert precision to speed? modern 64-bit machines offer all the integer instructions we need addition multiplication 64 x 64 → 128 (mulq) count leading zeroes, shifts (lzcnt, bsr) Fast small multiprecision out of the box: mainstream compilers (gcc, clang, icc) support __int_128 addition 128 x 128 → 128 (add, adc) shift on two registers (shld, shrd) multiplication, etc... J. Le Maire, F. de Dinechin, J.-M. Muller and N. Brunie Computing correctly rounded logarithm with fixed-point operations 5
No vectorization yet Integer SIMD/vector support still lagging behind FP until recently, no vector multiplication AVX512: 52-bit vector multiplication (recycling mantissa multiplier) So all hope is not left. J. Le Maire, F. de Dinechin, J.-M. Muller and N. Brunie Computing correctly rounded logarithm with fixed-point operations 6
This work: an experiment Implementing the floating-point logarithm function using only integer arithmetic for performance (previous work motivated by lack of FP hardware ) with state of the art accuracy : correct rounding J. Le Maire, F. de Dinechin, J.-M. Muller and N. Brunie Computing correctly rounded logarithm with fixed-point operations 7
This work: an experiment Implementing the floating-point logarithm function using only integer arithmetic for performance (previous work motivated by lack of FP hardware ) with state of the art accuracy : correct rounding Why the log? Because it seemed the easiest function for this. J. Le Maire, F. de Dinechin, J.-M. Muller and N. Brunie Computing correctly rounded logarithm with fixed-point operations 7
Main results worst-case execution time now in a factor 3 of the best faithful implementations An improvement of a factor 5 over previous state of the art average time almost twice better than current glibc. proposal of a floating-point in, fix-point out variant of the log function J. Le Maire, F. de Dinechin, J.-M. Muller and N. Brunie Computing correctly rounded logarithm with fixed-point operations 8
Outline Introduction and context Algorithm Results and comparisons for libm log Bonus: a floating-point in, fixed-point out variant Conclusions J. Le Maire, F. de Dinechin, J.-M. Muller and N. Brunie Computing correctly rounded logarithm with fixed-point operations 9
Logarithm, the mathematical version y y = ln( x ) 2 1 x 1 2 3 4 5 6 7 − 1 − 2 J. Le Maire, F. de Dinechin, J.-M. Muller and N. Brunie Computing correctly rounded logarithm with fixed-point operations 10
Logarithm, the mathematical version ln ( a × b ) = ln ( a ) + ln ( b ) y y = ln( x ) 2 1 x 1 2 3 4 5 6 7 − 1 − 2 J. Le Maire, F. de Dinechin, J.-M. Muller and N. Brunie Computing correctly rounded logarithm with fixed-point operations 10
Logarithm, the mathematical version ln ( a × b ) = ln ( a ) + ln ( b ) ln ( b a ) = a × ln( b ) y y = ln( x ) 2 1 x 1 2 3 4 5 6 7 − 1 − 2 J. Le Maire, F. de Dinechin, J.-M. Muller and N. Brunie Computing correctly rounded logarithm with fixed-point operations 10
Logarithm, the mathematical version ln ( a × b ) = ln ( a ) + ln ( b ) ln ( b a ) = a × ln( b ) ln(1 + x ) ≈ x − x 2 / 2 + x 3 / 3 ... Taylor: for x small, y y = ln( x ) 2 1 x 1 2 3 4 5 6 7 − 1 − 2 J. Le Maire, F. de Dinechin, J.-M. Muller and N. Brunie Computing correctly rounded logarithm with fixed-point operations 10
Logarithm, the floating-point version The floating point version of the natural logarithm is called log (you will also find log2 and log10 and a few others) ∀ x ∈ F 64 log ( x ) = ◦ (ln( x )) y y = ln( x ) 2 1 x 1 2 3 4 5 6 7 − 1 − 2 J. Le Maire, F. de Dinechin, J.-M. Muller and N. Brunie Computing correctly rounded logarithm with fixed-point operations 11
On-demand accuracy Muller and Lefèvre solved the table maker dilema for log Computing the log with an error ≤ 2 − 113 enables correct rounding two consecutive floating-point numbers real numbers J. Le Maire, F. de Dinechin, J.-M. Muller and N. Brunie Computing correctly rounded logarithm with fixed-point operations 12
On-demand accuracy Muller and Lefèvre solved the table maker dilema for log Computing the log with an error ≤ 2 − 113 enables correct rounding two consecutive floating-point numbers real numbers computed logarithm, with error margin J. Le Maire, F. de Dinechin, J.-M. Muller and N. Brunie Computing correctly rounded logarithm with fixed-point operations 12
On-demand accuracy Muller and Lefèvre solved the table maker dilema for log Computing the log with an error ≤ 2 − 113 enables correct rounding two consecutive floating-point numbers real numbers computed logarithm, with error margin J. Le Maire, F. de Dinechin, J.-M. Muller and N. Brunie Computing correctly rounded logarithm with fixed-point operations 12
Recommend
More recommend