AM P A R CudA Multiple Precision ARithmetic librarY Floating - PowerPoint PPT Presentation

Tight and rigorous error bounds for basic building blocks of double-word arithmetic Mioara Joldes, Jean-Michel Muller, Valentina Popescu JNCF - January 2017 AM P A R CudA Multiple Precision ARithmetic librarY

Floating point arithmetics A real number X is approximated in a machine by a rational x = M x · 2 e x − p +1 , – M x is the significand , a signed integer number of p digits in radix 2 s.t. 2 p − 1 ≤ | M x | ≤ 2 p − 1 ; – e x is the exponent , a signed integer ( e min ≤ e x ≤ e max ). 1 / 18

Floating point arithmetics A real number X is approximated in a machine by a rational x = M x · 2 e x − p +1 , – M x is the significand , a signed integer number of p digits in radix 2 s.t. 2 p − 1 ≤ | M x | ≤ 2 p − 1 ; – e x is the exponent , a signed integer ( e min ≤ e x ≤ e max ). IEEE 754-2008: Most common formats Single ( binary 32 ) precision format: 1 8 23 s e m Double ( binary 64 ) precision format: 1 11 52 s e m 1 / 18

Floating point arithmetics A real number X is approximated in a machine by a rational x = M x · 2 e x − p +1 , – M x is the significand , a signed integer number of p digits in radix 2 s.t. 2 p − 1 ≤ | M x | ≤ 2 p − 1 ; – e x is the exponent , a signed integer ( e min ≤ e x ≤ e max ). IEEE 754-2008: Most common formats Single ( binary 32 ) precision format: 1 8 23 s e m Double ( binary 64 ) precision format: 1 11 52 s e m Rounding modes 4 rounding modes: RD, RU, RZ, RN; Correct rounding for: + , − , × , ÷ , √ ; Portability, determinism. 1 / 18

Applications → Computations with increased (multiple) precision in numerical applications. 0.4 Chaotic dynamical systems: 0.2 x2 0 bifurcation analysis; -0.2 compute periodic orbits (e.g., H´ enon map, Lorenz -0.4 -1.5 -1 -0.5 0 0.5 1 1.5 attractor); x1 celestial mechanics (e.g., stability of the solar system). Experimental mathematics: computational geometry (e.g., kissing numbers); polynomial optimization etc. 2 / 18

What is a double-word number? Definition: A double-word number x is the unevaluated sum x h + x ℓ of two floating-point numbers x h and x ℓ such that x h = RN ( x ) . 3 / 18

What is a double-word number? Definition: A double-word number x is the unevaluated sum x h + x ℓ of two floating-point numbers x h and x ℓ such that x h = RN ( x ) . → Called double-double when using the binary64 standard format. Example: π in double-double p h = 11 . 001001000011111101101010100010001000010110100011000 2 , and p ℓ = 1 . 0001101001100010011000110011000101000101110000000111 2 × 2 − 53 ; p h + p ℓ ↔ 107 bit FP approx. 3 / 18

Remark − → Not the same as IEEE 754-2008 standard’s binary128/quadruple -precision. Double-word (using binary64/double -precision): Binary128/quadruple -precision: 4 / 18

Remark − → Not the same as IEEE 754-2008 standard’s binary128/quadruple -precision. Double-word (using binary64/double -precision): Binary128/quadruple -precision: “wobbling precision” ≥ 107 bits of precision; 113 bits of precision; 4 / 18

Remark − → Not the same as IEEE 754-2008 standard’s binary128/quadruple -precision. Double-word (using binary64/double -precision): Binary128/quadruple -precision: “wobbling precision” ≥ 107 bits of precision; 113 bits of precision; exponent range limited by larger exponent range ( 15 bits): binary64 ( 11 bits) i.e. − 1022 to − 16382 to 16383 ; 1023 ; 4 / 18

Remark − → Not the same as IEEE 754-2008 standard’s binary128/quadruple -precision. Double-word (using binary64/double -precision): Binary128/quadruple -precision: “wobbling precision” ≥ 107 bits of precision; 113 bits of precision; exponent range limited by larger exponent range ( 15 bits): binary64 ( 11 bits) i.e. − 1022 to − 16382 to 16383 ; 1023 ; defined with all rounding modes lack of clearly defined rounding modes; 4 / 18

Remark − → Not the same as IEEE 754-2008 standard’s binary128/quadruple -precision. Double-word (using binary64/double -precision): Binary128/quadruple -precision: “wobbling precision” ≥ 107 bits of precision; 113 bits of precision; exponent range limited by larger exponent range ( 15 bits): binary64 ( 11 bits) i.e. − 1022 to − 16382 to 16383 ; 1023 ; defined with all rounding modes lack of clearly defined rounding not implemented in hardware on modes; widely available processors. manipulated using error-free transforms (next slide). 4 / 18

Error-Free Transforms Theorem 1 ( 2Sum algorithm) Let a and b be FP numbers. Algorithm 2Sum computes two FP numbers s and e that satisfy the following: s + e = a + b exactly; s = RN ( a + b ) . ( RN stands for performing the operation in rounding to nearest rounding mode.) Algorithm 1 ( 2Sum ( a, b ) ) s ← RN ( a + b ) t ← RN ( s − b ) e ← RN ( RN ( a − t ) + RN ( b − RN ( s − t ))) return ( s, e ) − → 6 FP operations (proved to be optimal unless we have information on the ordering of | a | and | b | ) 5 / 18

Error-Free Transforms Theorem 2 ( Fast2Sum algorithm) Let a and b be FP numbers that satisfy e a ≥ e b ( | a | ≥ | b | ) . Algorithm Fast2Sum computes two FP numbers s and e that satisfy the following: s + e = a + b exactly; s = RN ( a + b ) . Algorithm 2 ( Fast2Sum ( a, b ) ) s ← RN ( a + b ) z ← RN ( s − a ) e ← RN ( b − z ) return ( s, e ) − → 3 FP operations 6 / 18

Error-Free Transforms Theorem 3 ( 2ProdFMA algorithm) Let a and b be FP numbers, e a + e b ≥ e min + p − 1 . Algorithm 2ProdFMA computes two FP numbers p and e that satisfy the following: p + e = a · b exactly; p = RN ( a · b ) . Algorithm 3 ( 2ProdFMA ( a, b ) ) p ← RN ( a · b ) e ← fma ( a, b, − p ) return ( p, e ) − → 2 FP operations − → hardware-implemented FMA available in latest processors. 7 / 18

Previous work: concept introduced by Dekker [DEK71] together with some algorithms for basic operations; Linnainmaa [LIN81] suggested similar algorithms assuming an underlying wider format; library written by Briggs [BRI98] - that is no longer maintained; QD library written by Bailey [Li.et.al02]. 8 / 18

Previous work: concept introduced by Dekker [DEK71] together with some algorithms for basic operations; Linnainmaa [LIN81] suggested similar algorithms assuming an underlying wider format; library written by Briggs [BRI98] - that is no longer maintained; QD library written by Bailey [Li.et.al02]. Problems: 1 . most algorithms come without correctness proof and error bound; 2 . some error bounds published without a proof; 3 . differences between published and implemented algorithms. 8 / 18

Previous work: concept introduced by Dekker [DEK71] together with some algorithms for basic operations; Linnainmaa [LIN81] suggested similar algorithms assuming an underlying wider format; library written by Briggs [BRI98] - that is no longer maintained; QD library written by Bailey [Li.et.al02]. Problems: 1 . most algorithms come without correctness proof and error bound; 2 . some error bounds published without a proof; 3 . differences between published and implemented algorithms. − → Strong need to “clean up” the existing literature. 8 / 18

Previous work: concept introduced by Dekker [DEK71] together with some algorithms for basic operations; Linnainmaa [LIN81] suggested similar algorithms assuming an underlying wider format; library written by Briggs [BRI98] - that is no longer maintained; QD library written by Bailey [Li.et.al02]. Problems: 1 . most algorithms come without correctness proof and error bound; 2 . some error bounds published without a proof; 3 . differences between published and implemented algorithms. − → Strong need to “clean up” the existing literature. Notation: p represents the precision of the underlying FP format; ulp ( x ) = 2 ⌊ log 2 | x |⌋− p +1 , for x � = 0 ; u = 2 − p = 1 2 ulp (1) denotes the roundoff error unit. 8 / 18

Addition: DWPlusFP ( x h , x ℓ , y ) Algorithm 4 1: ( s h , s ℓ ) ← 2Sum ( x h , y ) 2: v ← RN ( x ℓ + s ℓ ) 3: ( z h , z ℓ ) ← Fast2Sum ( s h , v ) 4: return ( z h , z ℓ ) – implemented in the QD library; – no previous error bound published; – relative error bounded by 2 · 2 − 2 p 1 − 2 · 2 − p = 2 · 2 − 2 p + 4 · 2 − 3 p + 8 · 2 − 4 p + · · · , which is less than 2 · 2 − 2 p + 5 · 2 − 3 p as soon as p ≥ 4 ; 9 / 18

Addition: DWPlusFP ( x h , x ℓ , y ) Algorithm 4 1: ( s h , s ℓ ) ← 2Sum ( x h , y ) 2: v ← RN ( x ℓ + s ℓ ) 3: ( z h , z ℓ ) ← Fast2Sum ( s h , v ) 4: return ( z h , z ℓ ) – implemented in the QD library; – no previous error bound published; – relative error bounded by 2 · 2 − 2 p 1 − 2 · 2 − p = 2 · 2 − 2 p + 4 · 2 − 3 p + 8 · 2 − 4 p + · · · , which is less than 2 · 2 − 2 p + 5 · 2 − 3 p as soon as p ≥ 4 ; Asymptotically optimal bound Let x h = 1 , x ℓ = (2 p − 1) · 2 − 2 p , and y = − 1 2 (1 − 2 − p ) . Then: 2 + 3 · 2 − p − 1 and; – z h + z ℓ = 1 2 + 3 · 2 − p − 1 − 2 − 2 p ; – x + y = 1 – relative error 2 · 2 − 2 p 1 + 3 · 2 − p − 2 · 2 − 2 p ≈ 2 · 2 − 2 p − 6 · 2 − 3 p . 9 / 18

Sketch of the proof Lemma 4 (Sterbenz Lemma) Let a and b be two positive FP numbers. If a 2 ≤ b ≤ 2 a, then a − b is a floating-point number, so that RN ( a − b ) = a − b . Lemma 5 Let a and b be FP numbers, and let s = RN ( a + b ) . If s � = 0 then � 1 2 ulp ( a ) , 1 � s ≥ max 2 ulp ( b ) . 10 / 18

AM P A R CudA Multiple Precision ARithmetic librarY Floating - PowerPoint PPT Presentation

Tight and rigorous error bounds for basic building blocks of double-word arithmetic Mioara Joldes, Jean-Michel Muller, Valentina Popescu JNCF - January 2017 AM P A R CudA Multiple Precision ARithmetic librarY Floating point arithmetics

On various ways to split a floating-point number Claude-Pierre Jeannerod Jean-Michel Muller Paul

Towards a separation logic for Multicore OCaml Glen Mvel , Jacques-Henri Jourdan, Franois

SHARED MEMORY SYSTEMS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of

Review of Memory Models: A Case for Rethinking Parallel Languages and Hardware by Sarita V. Adve

Accelerate Iterative Methods Good Algorithms Mixed Precision Iterative Methods Good

Memory consistency models Computer Architecture J. Daniel Garca Snchez (coordinator) David

Ch 9: Control flow Sequencers We will study a number of alternatives A sequencer is a

Exercises for TDDD82, Processes I January 26, 2018 1. Forking processes (based on [1] Excercise

The Benefits of Duality in Verifying Concurrent Programs under TSO Parosh Aziz Abdulla 1 Ahmed

From Traces To Proofs: Proving Concurrent Programs Safe S. Arun-Kumar (Joint work with Chinmay

Towards NNLO Corrections for Jet Observables at LHC Thomas Gehrmann Universit at Z urich

Introduction to Natural Language Processing PARSING: Earley, Bottom-Up Chart Parsing

Rectangle-of-influence triangulations Therese Biedl 1 Anna Lubiw 1 Saeed Mehrabi 1 Sander

Approximate Nearest Neighbor Problem: Improving Query Time CS468, 10/9/2006 Outline d

UMBC A B M A L T F O U M B C I M Y O R T 1 (9/10/04) I E S R C E O V U

EE-612: Lecture 25: CMOS Circuits: Part 2 Mark Lundstrom Electrical and Computer Engineering

Dynamic operation 20 A simple model for the propagation delay Symmetric inverter (rise and

ambrosys What is an ODE? Examples Newtons equations Reaction and relaxation equations (i.e.

Assimulo - a Python package for solving differential equations with interface to equation based

Joint work with Stability of difference equations with an infinite delay Leonid Berezansky

Rigorous computations for infinite dimensional dynamical systems Jean-Philippe Lessard Southern

Bio-PEPAd: Integrating exponential and deterministic delays Jane Hillston. LFCS and SynthSys,

Eigenvalue based techniques for the stability analysis and robust control of linear systems with

tr ss trrt