efficient implementation of elementary functions in the
play

Efficient implementation of elementary functions in the - PowerPoint PPT Presentation

Efficient implementation of elementary functions in the medium-precision range Fredrik Johansson (LFANT, INRIA Bordeaux) 22nd IEEE Symposium on Computer Arithmetic (ARITH 22), Lyon, France, June 2015 1 / 27 Elementary functions Functions :


  1. Efficient implementation of elementary functions in the medium-precision range Fredrik Johansson (LFANT, INRIA Bordeaux) 22nd IEEE Symposium on Computer Arithmetic (ARITH 22), Lyon, France, June 2015 1 / 27

  2. Elementary functions Functions : exp, log, sin, cos, atan 2 / 27

  3. Elementary functions Functions : exp, log, sin, cos, atan My goal is to do interval arithmetic with arbitrary precision . 2 / 27

  4. Elementary functions Functions : exp, log, sin, cos, atan My goal is to do interval arithmetic with arbitrary precision . Input : floating-point number x = a · 2 b , precision p ≥ 2 Output : m , r with f ( x ) ∈ [ m − r , m + r ] and r ≈ 2 − p | f ( x ) | 2 / 27

  5. Precision ranges Hardware precision ( n ≈ 53 bits) ◮ Extensively studied - elsewhere! 3 / 27

  6. Precision ranges Hardware precision ( n ≈ 53 bits) ◮ Extensively studied - elsewhere! Medium precision ( n ≈ 100 - 10 000 bits) ◮ Multiplication costs M ( n ) = O ( n 2 ) or O ( n 1 . 6 ) ◮ Argument reduction + rectangular splitting: O ( n 1 / 3 M ( n )) ◮ In the lower range, software overhead is significant 3 / 27

  7. Precision ranges Hardware precision ( n ≈ 53 bits) ◮ Extensively studied - elsewhere! Medium precision ( n ≈ 100 - 10 000 bits) ◮ Multiplication costs M ( n ) = O ( n 2 ) or O ( n 1 . 6 ) ◮ Argument reduction + rectangular splitting: O ( n 1 / 3 M ( n )) ◮ In the lower range, software overhead is significant Very high precision ( n ≫ 10 000 bits) ◮ Multiplication costs M ( n ) = O ( n log n log log n ) ◮ Asymptotically fast algorithms: binary splitting, arithmetic-geometric mean (AGM) iteration: O ( M ( n ) log( n )) 3 / 27

  8. Improvements in this work 1. The low-level mpn layer of GMP is used exclusively ◮ Most libraries (e.g. MPFR) use more convenient types, e.g. mpz and mpfr , to implement elementary functions 4 / 27

  9. Improvements in this work 1. The low-level mpn layer of GMP is used exclusively ◮ Most libraries (e.g. MPFR) use more convenient types, e.g. mpz and mpfr , to implement elementary functions 2. Argument reduction based on lookup tables ◮ Old idea, not well explored in high precision 4 / 27

  10. Improvements in this work 1. The low-level mpn layer of GMP is used exclusively ◮ Most libraries (e.g. MPFR) use more convenient types, e.g. mpz and mpfr , to implement elementary functions 2. Argument reduction based on lookup tables ◮ Old idea, not well explored in high precision 3. Faster evaluation of Taylor series ◮ Optimized version of Smith’s rectangular splitting algorithm ◮ Takes advantage of mpn level functions 4 / 27

  11. Recipe for elementary functions exp( x ) sin( x ) , cos( x ) log(1 + x ) atan( x ) ↓ Domain reduction using π and log(2) ↓ x ∈ [0 , log(2)) x ∈ [0 , π/ 4) x ∈ [0 , 1) x ∈ [0 , 1) 5 / 27

  12. Recipe for elementary functions exp( x ) sin( x ) , cos( x ) log(1 + x ) atan( x ) ↓ Domain reduction using π and log(2) ↓ x ∈ [0 , log(2)) x ∈ [0 , π/ 4) x ∈ [0 , 1) x ∈ [0 , 1) ↓ Argument-halving r ≈ 8 times exp( x ) = [exp( x / 2)] 2 log(1 + x ) = 2 log( √ 1 + x ) ↓ x ∈ [0 , 2 − r ) ↓ Taylor series 5 / 27

  13. Better recipe at medium precision exp( x ) sin( x ) , cos( x ) log(1 + x ) atan( x ) ↓ Domain reduction using π and log(2) ↓ x ∈ [0 , log(2)) x ∈ [0 , π/ 4) x ∈ [0 , 1) x ∈ [0 , 1) ↓ Lookup table with 2 r ≈ 2 8 entries exp( t + x ) = exp( t ) exp( x ) log(1 + t + x ) = log(1 + t ) + log(1 + x / (1 + t )) ↓ x ∈ [0 , 2 − r ) ↓ Taylor series 6 / 27

  14. Argument reduction formulas What we want to compute: f ( x ) , x ∈ [0 , 1) Table size: q = 2 r t = i / q , i = ⌊ 2 r x ⌋ Precomputed value: f ( t ) , y ∈ [0 , 2 − r ) Remaining value to compute: f ( y ) , 7 / 27

  15. Argument reduction formulas What we want to compute: f ( x ) , x ∈ [0 , 1) Table size: q = 2 r t = i / q , i = ⌊ 2 r x ⌋ Precomputed value: f ( t ) , y ∈ [0 , 2 − r ) Remaining value to compute: f ( y ) , exp( x ) = exp( t ) exp( y ) , y = x − i / q sin( x ) = sin( t ) cos( y ) + cos( t ) sin( y ) , y = x − i / q cos( x ) = cos( t ) cos( y ) − sin( t ) sin( y ) , y = x − i / q 7 / 27

  16. Argument reduction formulas What we want to compute: f ( x ) , x ∈ [0 , 1) Table size: q = 2 r t = i / q , i = ⌊ 2 r x ⌋ Precomputed value: f ( t ) , y ∈ [0 , 2 − r ) Remaining value to compute: f ( y ) , exp( x ) = exp( t ) exp( y ) , y = x − i / q sin( x ) = sin( t ) cos( y ) + cos( t ) sin( y ) , y = x − i / q cos( x ) = cos( t ) cos( y ) − sin( t ) sin( y ) , y = x − i / q log(1 + x ) = log(1 + t ) + log(1 + y ) , y = ( qx − i ) / ( i + q ) atan( x ) = atan( t ) + atan( y ) , y = ( qx − i ) / ( ix + q ) 7 / 27

  17. Optimizing lookup tables m = 2 tables with 2 5 + 2 5 entries gives same reduction as m = 1 table with 2 10 entries Function Precision Entries Size (KiB) m r exp ≤ 512 1 8 178 11.125 exp ≤ 4608 2 5 23+32 30.9375 sin ≤ 512 1 8 203 12.6875 sin ≤ 4608 2 5 26+32 32.625 cos ≤ 512 1 8 203 12.6875 cos ≤ 4608 2 5 26+32 32.625 log ≤ 512 2 7 128+128 16 log ≤ 4608 2 5 32+32 36 atan ≤ 512 1 8 256 16 atan ≤ 4608 2 5 32+32 36 Total 236.6875 8 / 27

  18. Taylor series Logarithmic series : 3 x 3 + 1 5 x 5 − 1 7 x 7 + . . . atan( x ) = x − 1 log(1 + x ) = 2 atanh( x / ( x + 2)) With x < 2 − 10 , need 230 terms for 4600-bit precision 9 / 27

  19. Taylor series Logarithmic series : 3 x 3 + 1 5 x 5 − 1 7 x 7 + . . . atan( x ) = x − 1 log(1 + x ) = 2 atanh( x / ( x + 2)) With x < 2 − 10 , need 230 terms for 4600-bit precision Exponential series : 2! x 2 + 1 3! x 3 + 1 4! x 4 + . . . exp( x ) = 1 + x + 1 3! x 3 + 1 5! x 5 − . . . , 2! x 2 + 1 4! x 4 − . . . sin( x ) = x − 1 cos( x ) = 1 − 1 With x < 2 − 10 , need 280 terms for 4600-bit precision 9 / 27

  20. Taylor series Logarithmic series : 3 x 3 + 1 5 x 5 − 1 7 x 7 + . . . atan( x ) = x − 1 log(1 + x ) = 2 atanh( x / ( x + 2)) With x < 2 − 10 , need 230 terms for 4600-bit precision Exponential series : 2! x 2 + 1 3! x 3 + 1 4! x 4 + . . . exp( x ) = 1 + x + 1 3! x 3 + 1 5! x 5 − . . . , 2! x 2 + 1 4! x 4 − . . . sin( x ) = x − 1 cos( x ) = 1 − 1 With x < 2 − 10 , need 280 terms for 4600-bit precision � 1 − sin 2 ( x ) Above 300 bits: cos( x ) = � 1 + sinh 2 ( x ) Above 800 bits: exp( x ) = sinh( x ) + 9 / 27

  21. Evaluating Taylor series using rectangular splitting Paterson and Stockmeyer, 1973: i =0 � x i in O ( n ) cheap steps + O ( n 1 / 2 ) expensive steps � n 10 / 27

  22. Evaluating Taylor series using rectangular splitting Paterson and Stockmeyer, 1973: i =0 � x i in O ( n ) cheap steps + O ( n 1 / 2 ) expensive steps � n � x 2 � x 3 ( + � x + + ) + � � x 2 � x 3 x 4 ( � + � x + + ) + � x 2 � x 3 x 8 ( � + � x + + ) + � x 2 � x 3 x 12 ( + + + ) � � x 10 / 27

  23. Evaluating Taylor series using rectangular splitting Paterson and Stockmeyer, 1973: i =0 � x i in O ( n ) cheap steps + O ( n 1 / 2 ) expensive steps � n � x 2 � x 3 ( + � x + + ) + � � x 2 � x 3 x 4 ( � + � x + + ) + � x 2 � x 3 x 8 ( � + � x + + ) + � x 2 � x 3 x 12 ( + + + ) � � x ◮ Smith, 1989: elementary and hypergeometric functions 10 / 27

  24. Evaluating Taylor series using rectangular splitting Paterson and Stockmeyer, 1973: i =0 � x i in O ( n ) cheap steps + O ( n 1 / 2 ) expensive steps � n � x 2 � x 3 ( + � x + + ) + � � x 2 � x 3 x 4 ( � + � x + + ) + � x 2 � x 3 x 8 ( � + � x + + ) + � x 2 � x 3 x 12 ( + + + ) � � x ◮ Smith, 1989: elementary and hypergeometric functions ◮ Brent & Zimmermann, 2010: improvements to Smith 10 / 27

  25. Evaluating Taylor series using rectangular splitting Paterson and Stockmeyer, 1973: i =0 � x i in O ( n ) cheap steps + O ( n 1 / 2 ) expensive steps � n � x 2 � x 3 ( + � x + + ) + � � x 2 � x 3 x 4 ( � + � x + + ) + � x 2 � x 3 x 8 ( � + � x + + ) + � x 2 � x 3 x 12 ( + + + ) � � x ◮ Smith, 1989: elementary and hypergeometric functions ◮ Brent & Zimmermann, 2010: improvements to Smith ◮ FJ, 2014: generalization to D-finite functions 10 / 27

  26. Evaluating Taylor series using rectangular splitting Paterson and Stockmeyer, 1973: i =0 � x i in O ( n ) cheap steps + O ( n 1 / 2 ) expensive steps � n � x 2 � x 3 ( + � x + + ) + � � x 2 � x 3 x 4 ( � + � x + + ) + � x 2 � x 3 x 8 ( � + � x + + ) + � x 2 � x 3 x 12 ( + + + ) � � x ◮ Smith, 1989: elementary and hypergeometric functions ◮ Brent & Zimmermann, 2010: improvements to Smith ◮ FJ, 2014: generalization to D-finite functions ◮ New: optimized algorithm for elementary functions 10 / 27

  27. Logarithmic series Rectangular splitting: x + 1 2 x 2 + x 3 � 1 3 + 1 4 x + 1 5 x 2 + x 3 � 1 6 + 1 7 x + 1 8 x 2 �� 11 / 27

  28. Logarithmic series Rectangular splitting: x + 1 2 x 2 + x 3 � 1 3 + 1 4 x + 1 5 x 2 + x 3 � 1 6 + 1 7 x + 1 8 x 2 �� Improved algorithm with fewer divisions: x + 1 10+ 1 � 30 x 2 + x 3 � 20+15 x +12 x 2 + x 3 � � � 8 x +7 x 2 ����� 60 60 56 11 / 27

  29. Exponential series Rectangular splitting: 1+ x +1 x 2 +1 1+1 x +1 x 2 +1 1+1 x +1 � 3 x 3 � � � 6 x 3 � � 8 x 2 ������ 2 4 5 7 12 / 27

Recommend


More recommend