fl floating ting poin oint t routines outines
play

FL FLOATING TING-POIN OINT T ROUTINES OUTINES Zvonimir Rakamari - PowerPoint PPT Presentation

AN ANAL ALYSIS A SIS AND SYNTHESIS ND SYNTHESIS OF OF FL FLOATING TING-POIN OINT T ROUTINES OUTINES Zvonimir Rakamari FL FLOATING TING-POINT POINT COMPUT COMPUTATIONS TIONS ARE ARE UBIQUIT UBIQUITOUS OUS CHALLENGES


  1. AN ANAL ALYSIS A SIS AND SYNTHESIS ND SYNTHESIS OF OF FL FLOATING TING-POIN OINT T ROUTINES OUTINES Zvonimir Rakamarić

  2. FL FLOATING TING-POINT POINT COMPUT COMPUTATIONS TIONS ARE ARE UBIQUIT UBIQUITOUS OUS

  3. CHALLENGES CHALLENGES  FP is “weird”  Does not faithfully match math (finite precision)  Non-associative  Heterogeneous hardware support  FP code is hard to get right  Lack of good understanding  Lack of good and extensive tool support  FP software is large and complex  High-performance computing (HPC) simulations  Machine learning

  4. FP IS FP IS WEIRD WEIRD  Finite precision and rounding  x + y in reals ≠ x + y in floating-point  Non-associative  (x + y) + z ≠ x + (y + z)  Creates issues with  Compiler optimizations (e.g., vectorization)  Concurrency (e.g., reductions)  Standard completely specifies only +, -, *, /, comparison, remainder, and square root  Only recommendation for some functions (trigonometry)

  5. FP IS FP IS WEIRD WEIRD cont cont.  Heterogeneous hardware support  x + y*z on Xeon ≠ x + y*z on Xeon Phi  Fused multiply-add  Intel’s online article “Differences in Floating -Point Arithmetic Between Intel Xeon Processors and the Intel Xeon Phi Coprocessor”  Common sense does not (always) work  x “is better than” log( e^x)  (e^x- 1)/x “can be worse than” (e^x -1)/log(e^x)  Error cancellation

  6. FL FLOATING TING-POINT POINT NUMBERS NUMBERS  IEEE 754 standard  Sign (s), mantissa (m), exponent (exp): (-1) s * 1.m * 2 exp  Single precision: 1, 23, 8 bits  Double precision: 1, 52, 11 bits

  7. FL FLOATING TING-POINT POINT NUMBER NUMBER LINE LINE  3 bits for precision  Between any two powers of 2, there are 2 3 = 8 representable numbers

  8. ROUNDING OUNDING IS IS SOUR SOURCE CE OF OF ERR ERRORS ORS 𝒚 𝒛 - ∞ ∞ 0 Real Numbers 𝒚 ෥ 𝒛 ෥ 0 - ∞ ∞ Floating-Point Numbers (෥ 𝒚 − 𝒚) (෥ 𝒛 − 𝒛)

  9. FL FLOATING TING-POINT POINT OPERA OPERATIONS TIONS  First normalize to the same exponent  Smaller exponent -> shift mantissa right  Then perform the operation  Losing bits when exponents are not the same!

  10. UT UTAH AH FL FLOATING TING-POINT POINT TEAM TEAM 1. Ganesh Gopalakrishnan (prof) 2. Zvonimir Rakamarić (prof) 3. Ian Briggs (staff programmer) 4. Mark Baranowski (PhD) 5. Rocco Salvia (PhD) 6. Shaobo He (PhD) 7. Thanhson Nguyen (PhD) Alumni: Alexey Solovyev (postdoc), Wei-Fan Chiang (PhD), Dietrich Geisler (undergrad), Liam Machado (undergrad)

  11. RESE RESEAR ARCH CH THR THRUSTS USTS Analysis  Verification of floating-point programs  Estimation of floating-point errors Dynamic 1.  Best effort, produces lower bound (under-approximation) Static 2.  Rigorous, produces upper bound (over-approximation) Synthesis  Rigorous mixed-precision tuning Constraint Solving  Search-based solving of floating-point constraints  Solving mixed real and floating-point constraints

  12. RESE RESEAR ARCH CH THR THRUSTS USTS Analysis  Verification of floating-point programs  Estimation of floating-point errors Dynamic 1.  Best effort, produces lower bound (under-approximation) Static 2.  Rigorous, produces upper bound (over-approximation) Synthesis  Rigorous mixed-precision tuning Constraint Solving  Search-based solving of floating-point constraints  Solving mixed real and floating-point constraints

  13. ERROR ANALYSIS

  14. FL FLOATING TING-POINT POINT ERR ERROR OR Input values: x, y Finite precision Infinite precision z fp = f fp (x, y) z inf = f inf (x, y) ≠ z fp z inf Absolute error: | z fp – z inf | Relative error: | (z fp – z inf ) / z inf |

  15. ERR ERROR OR PL PLOT FO T FOR R MUL MULTIPLICA TIPLICATION TION Absolute Error Y values X values

  16. ERR ERROR OR PL PLOT FO T FOR R ADDIT ADDITION ION Absolute Error Y values X values

  17. USA USAGE GE SCEN SCENARIOS ARIOS  Reason about floating-point computations  Precisely characterize floating-point behavior of libraries  Support performance-precision tuning and synthesis  Help decide where error-compensation is needed  “Equivalence” checking

  18. STATIC ANALYSIS http://github.com/soarlab/FPTaylor

  19. CONTRIB CONTRIBUTIO UTIONS NS  Handles non-linear and transcendental functions  Tight error upper bounds  Better than previous work  Rigorous  Over-approximation  Based on our own rigorous global optimizer  Emits a HOL-Lite proof certificate  Verification of the certificate guarantees estimate  Tool called FPTaylor publicly available

  20. FPT FPTaylor aylor TOO OOLF LFLOW Given FP Obtain Obtain Maximize Expression Symbolic Error the Error and Input Taylor Form Function Function Intervals Generate Certificate in HOL-Lite

  21. IEEE IEEE ROUNDING OUNDING MODEL MODEL Consider 𝑝𝑞 𝑦, 𝑧 where 𝑦 and 𝑧 are floating- point values, and 𝑝𝑞 is a function from floats to reals IEEE round-off errors are specified as 𝑝𝑞 𝑦, 𝑧 ⋅ 1 + 𝑓 𝑝𝑞 + 𝑒 𝑝𝑞 For subnormal values For normal values Only one of 𝑓 𝑝𝑞 or 𝑒 𝑝𝑞 is non-zero: 𝑓 𝑝𝑞 ≤ 2 −24 , 𝑒 𝑝𝑞 ≤ 2 −150 (single precision) 𝑓 𝑝𝑞 ≤ 2 −53 , 𝑒 𝑝𝑞 ≤ 2 −1075 (double precision)

  22. ERR ERROR OR EST ESTIMA IMATION TION EXAMP EXAMPLE LE  Model floating-point computation of 𝐹 = 𝑦/ 𝑦 + 𝑧 using reals as 𝑦 ෨ 𝐹 = ⋅ 1 + 𝑓 2 𝑦 + 𝑧 ⋅ 1 + 𝑓 1 𝑓 1 ≤ 𝜗 1 , 𝑓 2 ≤ 𝜗 2  Absolute rounding error is then ෨ 𝐹 − 𝐹  We have to find the max of this function over  Input variables 𝑦, 𝑧  Exponential in the number of inputs  Additional variables 𝑓 1 , 𝑓 2 for operators  Exponential in floating-point routine size!

  23. SYMBOLIC SYMBOLIC TAYL YLOR OR EXP EXPANSION ANSION  Reduces dimensionality of the optimization problem  Basic idea  Treat each 𝑓 as “noise” (error) variables  Now expand based on Taylor’s theorem  Coefficients are symbolic  Coefficients weigh the “noise” correctly and are correlated  Apply global optimization on reduced problem  Our own parallel rigorous global optimizer called Gelpia  Non-linear reals, transcendental functions

  24. ERR ERROR OR EST ESTIMA IMATION TION EXAMP EXAMPLE LE 𝑦 ෨ 𝐹 = ⋅ 1 + 𝑓 2 𝑦 + 𝑧 ⋅ 1 + 𝑓 1 expands into 𝐹 = 𝐹 + 𝜖 ෨ 0 × 𝑓 1 + 𝜖 ෨ 𝐹 𝐹 ෨ 0 × 𝑓 2 +M 2 𝜖𝑓 1 𝜖𝑓 2 where 𝑁 2 summarizes the second and higher order error terms and 𝑓 0 ≤ 𝜗 0 , 𝑓 1 ≤ 𝜗 1 Floating-point error is then bounded by 𝐹 − 𝐹 ≤ 𝜖 ෨ × 𝜗 1 + 𝜖 ෨ 𝐹 𝐹 ෨ 0 0 × 𝜗 2 +M 2 𝜖𝑓 1 𝜖𝑓 2

  25. ERROR ERR OR EST ESTIMA IMATION TION EXAMP EXAMPLE LE  Using global optimization find constant bounds  M 2 can be easily over-approximated  Greatly reduced problem dimensionality  Search only over inputs 𝑦, 𝑧 using our Gelpia optimizer 𝜖෩ 𝐹 𝑦 ∀𝑦, 𝑧. = 𝑦+𝑧 ≤ 𝑉 1 𝜖𝑓1 0 𝐹 − 𝐹 ≤ 𝜖 ෨ × 𝜗 1 + 𝜖 ෨ 𝐹 𝐹 ෨ 0 0 × 𝜗 2 +M 2 𝜖𝑓 1 𝜖𝑓 2

  26. ERR ERROR OR EST ESTIMA IMATION TION EXAMP EXAMPLE LE  Operations are single-precision (32 bits) ෨ 𝐹 − 𝐹 ≤ 𝑉 1 × 𝜗 32−𝑐𝑗𝑢 +𝑉 2 × 𝜗 32−𝑐𝑗𝑢  Operations are double-precision (64 bits) ෨ 𝐹 − 𝐹 ≤ 𝑉 1 × 𝜗 64−𝑐𝑗𝑢 +𝑉 2 × 𝜗 64−𝑐𝑗𝑢

  27. RESUL RESULTS TS FOR FOR JETENGINE JETENGINE

  28. SUMMAR SUMMARY  New method for rigorous floating-point round- off error estimation  Our method is embodied in new tool FPTaylor  FPTaylor performs well and returns tighter bounds than previous approaches

  29. SYNTHESIS http://github.com/soarlab/FPTuner

  30. MIXED MIXED-PRECISIO PRECISION N TUNING TUNING Goal: Given a real-valued expression and output error bound, automatically synthesize precision allocation for operations and variables

  31. APPR APPROACH CH  Replace machine epsilons with symbolic variables 𝑡 0 , 𝑡 1 ∈ 𝜗 32−𝑐𝑗𝑢 , 𝜗 64−𝑐𝑗𝑢 ෨ 𝐹 − 𝐹 ≤ 𝑉 1 × 𝑡 1 + 𝑉 2 × 𝑡 2  Compute precision allocation that satisfies given error bound  Take care of type casts  Implemented in FPTuner tool

  32. FPT FPTuner uner TOO OOLF LFLOW Routine: Real-valued Expression Generic Gelpia Efficiency Error Global Model User Specifications Model Optimizer Error Threshold Gurobi Optimization Problem Operator Weights Extra Constraints Optimal Mixed- precision

  33. EXAMP EXAMPLE: LE: JACOBI COBI METHOD METHOD  Inputs:  2x2 matrix  Vector of size 2  Error bound: 1e-14  Available precisions: single, double, quad  FPTuner automatically allocates precisions for all variables and operations

  34. SUMMAR SUMMARY  Support mixed-precision allocation  Based on rigorous formal reasoning  Encoded as an optimization problem  Extensive empirical evaluation  Includes real-world energy measurements showing benefits of precision tuning

  35. SOLVING http://github.com/soarlab/OL1V3R

Recommend


More recommend