Introduction to Computer Arithmetic for Efficient Hardware - PowerPoint PPT Presentation

Fixed-Point Representations Widely used in DSPs and digital integrated circuits for higher speed, lower silicon area and power consumption compared to floating point 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 N16 or Z16 s 2 0 −1 −2 −3 −4 −5 −6 −7 −8 −9 −10 −11 −12 −13 −14 −15 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1Q15 s −1 −2 −3 −4 −5 −6 −7 −8 −9 −10 −11 −12 −13 −14 −15 2 −16 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Q16 s 8Q16 7 6 5 4 3 2 1 0 −1 −2 −3 −4 −5 −6 −7 −8 −9 −10 −11 −12 −13 −14 −15 −16 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 s 23 16 8 0 MSB ranks LSB Typical fixed-point formats: 16, 24, 32 and 48 bits Arnaud Tisserand. CNRS – Lab-STICC 7/48

Representation(s) of Numbers and Power Consumption Impact of the representation of numbers: • operator speed • circuit area • useful and useless activity cycle value 2’s complement sign/magnitude t c 2 t sm 0 0 0000000000000000 0 0000000000000000 0 1 1 0000000000000001 1 0000000000000001 1 2 -1 1111111111111111 15 1000000000000001 1 3 8 0000000000001000 15 0000000000001000 3 4 -27 1111111111100101 15 1000000000011011 4 5 27 0000000000011011 15 0000000000011011 1 total 61 10 • sign/magnitude (absolute value): n − 2 A = ( s a a n − 2 . . . a 1 a 0 ) = ( − 1) s a × � a i 2 i i =0 • 2’s complement: n − 2 A = ( a n − 1 a n − 2 . . . a 1 a 0 ) = − a n − 1 2 n − 1 + � a i 2 i i =0 Arnaud Tisserand. CNRS – Lab-STICC 8/48

Floating-Point Representation(s) Radix- β floating-point representation of x : • sign s x , 1-bit encoding: 0 ⇒ x > 0 and 1 ⇒ x < 0 • exponent e x ∈ N on k digits and e min ≤ e x ≤ e max • mantissa m x on n + 1 digits • encoding: x = ( − 1) s x × m x × β e x m x = x 0 . x 1 x 2 x 3 · · · x n x i ∈ { 0 , 1 , . . . , β − 1 } For accuracy purpose, the mantissa must be normalized ( x 0 � = 0) Then m x ∈ [1 , β [ and a specific encoding is required for the number 0 Arnaud Tisserand. CNRS – Lab-STICC 9/48

IEEE-754: basic formats Radix β = 2, the first bit of the normalized mantissa is always a “1” (non-stored implicit bit) number of bits format total sign exponent mantissa double precision 64 1 11 52 + 1 simple precision 32 1 8 23 + 1 double precision single precision 63 56 48 40 32 24 16 8 0 MSB ranks LSB Arnaud Tisserand. CNRS – Lab-STICC 10/48

Basic Cells for Addition Useful circuit element in computer arithmetic: counter A ( m , k )-counter is a cell that counts the number of 1 on its m inputs (result expressed as a k -bit integer) a a a a 0 m−1 m−2 1 ... m − 1 k − 1 � � (m,k) s j 2 j a i = i =0 j =0 ... s s k−1 0 Standard counters: • half-adder or HA is a (2,2)-counter • full-adder or FA is a (3,2)-counter Arnaud Tisserand. CNRS – Lab-STICC 11/48

FA Cell a b d c s Arithmetic equation: a b d 0 0 0 0 0 0 0 1 0 1 2 c + s = a + b + d 0 1 0 0 1 FA 0 1 1 1 0 Logic equation: 1 0 0 0 1 1 0 1 1 0 c s s = a ⊕ b ⊕ d 1 1 0 1 0 1 1 1 1 1 = ab + ad + bd c Articles about FA in IEEE Journals 3 2 There many implementations of #articles the FA cell 1 0 1990 1992 1994 1996 1998 2000 2002 2004 Year Arnaud Tisserand. CNRS – Lab-STICC 12/48

Carry Ripple Adder (CRA) Very simple architecture: n FA cells connected in series a b a b a b a b a b a b 5 5 4 4 3 3 2 2 1 1 0 0 r r r r r r 5 4 3 2 1 0 FA FA FA FA FA FA s 6 s s s s s s 5 4 3 2 1 0 complexity delay O ( n ) area O ( n ) Warning: Sometimes a CRA is also called Carry Propagate Adder (CPA), but CPA also means a non-redundant adder (that propagates) Arnaud Tisserand. CNRS – Lab-STICC 13/48

Useless Activity in a Carry Ripple Adder a b a b a b a b a b a b 5 5 4 4 3 3 2 2 1 1 0 0 Very simple architecture: r r r r r r 5 4 3 2 1 0 FA FA FA FA FA FA n FA cells connected in series s 6 s s s s s s 5 4 3 2 1 0 V cycle i 1 1 0 0 1 1 0 0 1 1 0 0 CLK cycle i+1 1 0 1 0 1 0 1 0 1 0 1 1 t 0 FA FA FA FA FA FA V cycle i 0 1 0 1 0 0 CLK cycle i+1 1 0 1 0 1 0 0 1 0 1 0 0 activity 1 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 stable 0 0 0 0 0 0 t Theoretical models (equiprobable and uniform distribution of inputs): • worst case n 2 / 2 transitions • average 3 n / 2 transitions and only n / 2 useful Arnaud Tisserand. CNRS – Lab-STICC 14/48

Carry-Select Adder Idea: computation of the higher half part for the 2 possible input carries (0 and 1) and selection when the output carry from lower half part is known a H b H a L b L 0 s L lower part 1 1 0 0 1 s n higher part s H Recursive version − → O (log n ) delay but there is a fanout problem. . . Arnaud Tisserand. CNRS – Lab-STICC 15/48

Carry Lookahead Adder: 4-Bit Example c 1 = g 0 + p 0 c 0 c 2 = g 1 + p 1 g 0 + p 1 p 0 c 0 c 3 = g 2 + p 2 g 1 + p 2 p 1 g 0 + p 2 p 1 p 0 c 0 c 4 = g 3 + p 3 g 2 + p 3 p 2 g 1 + p 3 p 2 p 1 g 0 + p 3 p 2 p 1 p 0 c 0 p g p g p g p g 3 3 2 2 1 1 0 0 c 0 c c c c 4 3 2 1 Arnaud Tisserand. CNRS – Lab-STICC 16/48

Parallel-Prefix Addition: Standard Architectures 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 1 0 2 1 3 2 4 3 5 4 6 5 Brent−Kung 6 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 7 0 8 1 9 2 10 11 3 12 13 14 4 15 Kogge−Stone carry ripple 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 1 1 2 2 3 3 4 4 Sklansky 5 Han−Carlson Arnaud Tisserand. CNRS – Lab-STICC 17/48

Redundant or Constant Time Adders To speed-up the addition, one solution consists in “saving” the carries and using them (this makes sense only in case of multiple additions) In 1961, Avizienis suggested to represent numbers in radix β with digits in {− α, − α + 1 , . . . , 0 , . . . , α − 1 , α } instead of { 0 , 1 , 2 , . . . , β − 1 } with α ≤ β − 1 Using this representation, if 2 α + 1 > β some numbers have several possible representation at the bit level. For instance, the value 2345 (in the standard representation) can be represented in radix 10 with digits in {− 5 , − 4 , − 3 , − 2 , − 1 , 0 , 1 , 2 , 3 , 4 , 5 } by the values 2345, 235(-5) or 24(-5)(-5) Such a representation is said redundant In a redundant number system there is constant-time addition algorithm (without carry propagation) where all computations are done in parallel Arnaud Tisserand. CNRS – Lab-STICC 18/48

Addition Q: How can we speed up addition? x 4 y 4 x 3 y 3 x 2 y 2 x 1 y 1 x 0 y 0 r 0 FA FA FA FA FA s 5 s 4 s 3 s 2 s 1 s 0 Arnaud Tisserand. CNRS – Lab-STICC 19/48

Addition Q: How can we speed up addition? R: Save the carries! x 4 y 4 x 3 y 3 x 2 y 2 x 1 y 1 x 0 y 0 z 4 z 3 z 2 z 1 z 0 r 0 0 FA FA FA FA FA s 5 r 5 s 4 r 4 s 3 r 3 s 2 r 2 s 1 r 1 s 0 r 0 Arnaud Tisserand. CNRS – Lab-STICC 19/48

Addition Q: How can we speed up addition? R: Save the carries! x 4 y 4 x 3 y 3 x 2 y 2 x 1 y 1 x 0 y 0 z 4 z 3 z 2 z 1 z 0 r 0 0 FA FA FA FA FA s 5 r 5 s 4 r 4 s 3 r 3 s 2 r 2 s 1 r 1 s 0 r 0 n � ( s i + r i ) 2 i X + Y + Z = S + R = i =0 The computation time does not depend on n T ( n ) = O (1) Arnaud Tisserand. CNRS – Lab-STICC 19/48

Addition using the carry-save representation Q: How can we speed up addition? R: Save the carries! x 4 y 4 x 3 y 3 x 2 y 2 x 1 y 1 x 0 y 0 z 4 z 3 z 2 z 1 z 0 r 0 0 FA FA FA FA FA s 5 r 5 s 4 r 4 s 3 r 3 s 2 r 2 s 1 r 1 s 0 r 0 w 5 w 4 w 3 w 2 w 1 w 0 n � ( s i + r i ) 2 i X + Y + Z = S + R = i =0 n � w i 2 i = W = w i = s i + r i ∈ { 0 , 1 , 2 } avec i =0 The computation time does not depend on n T ( n ) = O (1) Arnaud Tisserand. CNRS – Lab-STICC 19/48

Addition using the carry-save representation Q: How can we speed up addition? R: Save the carries! x 4 y 4 x 3 y 3 x 2 y 2 x 1 y 1 x 0 y 0 z 4 z 3 z 2 z 1 z 0 r 0 0 FA FA FA FA FA s 5 r 5 s 4 r 4 s 3 r 3 s 2 r 2 s 1 r 1 s 0 r 0 w 5 w 4 w 3 w 2 w 1 w 0 n � ( s i + r i ) 2 i X + Y + Z = S + R = i =0 n � w i 2 i = W = w i = s i + r i ∈ { 0 , 1 , 2 } avec i =0 � s n � � � r n − 1 · · · s 1 s n − 1 s 0 = w n w n − 1 . . . w 1 w 0 = r n r 1 r 0 cs cs The computation time does not depend on n T ( n ) = O (1) Arnaud Tisserand. CNRS – Lab-STICC 19/48

Addition of 2 Carry-Save Numbers x 4 y 4 x 3 y 3 x 2 y 2 x 1 y 1 x 0 y 0 ◦ • ◦ • ◦ • ◦ • ◦ • ◦ • ◦ • ◦ • ◦ • ◦ • FA FA FA FA FA FA FA FA FA FA 0 0 0 ◦ • ◦ • ◦ • ◦ • ◦ • ◦ • w 5 w 4 w 3 w 2 w 1 w 0 n � x i 2 i = x i = x s , i + x r , i = ◦ + • X avec i =0 n � y i 2 i Y = avec y i = y s , i + y r , i = ◦ + • i =0 n � w i 2 i X+Y = W = w i = w s , i + w r , i = ◦ + • avec i =0 Arnaud Tisserand. CNRS – Lab-STICC 20/48

Carry-Save Trees Example with 3 inputs: A , B and C a b c a b c a b c a b c a b c a b c 5 5 5 4 4 4 3 3 3 2 2 2 1 1 1 0 0 0 FA FA FA FA FA FA 6 5 5 4 4 3 3 2 2 1 1 0 0 0 s 6 s 5 s 4 s 3 s 2 s 1 s 0 Carry-save reduction tree: n ( h ) non-redundant inputs can be reduced by a h -level carry-save tree where n ( h ) = ⌊ 3 n ( h − 1) / 2 ⌋ and n (0) = 2 h 1 2 3 4 5 6 7 8 9 10 11 n ( h ) 3 4 6 9 13 19 28 42 63 94 141 Arnaud Tisserand. CNRS – Lab-STICC 21/48

Fast Multipliers B n bits 1. partial products generation a i b j A (with or without recoding) PP generation → delay in O (1) (fanout a i , b j ֒ n bits O (log n )) a b 2 n bits i j 2. sum of the partial products using a carry-save reduction tree reduction ֒ → delay in O (log n ) 4n bits 3. assimilation of the carries using a P (carry−save) fast adder ֒ → delay in O (log n ) 2n bits P Multiplication delay O (log n ), area O ( n 2 ) Arnaud Tisserand. CNRS – Lab-STICC 22/48

Power Consumption in Fast Multipliers 70 70 67% Relative power consumption [%] 60 60 54% Relative delay [%] 50 50 40 40 31% 30 30 20 20 17% 16% 15% 10 10 0 0 PP gen. reduc. assim. PP gen. reduc assim. power delay • 30% to 70% of redundant transitions (useless) • place and route steps based on the internal arrival time • add a pipeline stage Arnaud Tisserand. CNRS – Lab-STICC 23/48

MAC and FMA MAC: multiply and accumulate P ( t ) = A × B + P ( t − 1) A , B are n -bit values and P a m -bit with m >> n (e.g., 16 × 16 + 40 − → 40 in some DSPs) FMA: fused multiply and add P = A × B + C where A , B , C and P can be stored in different registers (recent general purpose processors, e.g., Itanium) C set clk A B reg generation reduction assimilation P Arnaud Tisserand. CNRS – Lab-STICC 24/48

Squarer a 5 a 4 a 3 a 2 a 1 a 0 a 5 a 4 a 3 a 2 a 1 a 0 a 5 a 0 a 4 a 0 a 3 a 0 a 2 a 0 a 1 a 0 a 0 a 0 a 5 a 1 a 4 a 1 a 3 a 1 a 2 a 1 a 1 a 1 a 0 a 1 a i a i = a i a 5 a 2 a 4 a 2 a 3 a 2 a 2 a 2 a 1 a 2 a 0 a 2 a 5 a 3 a 4 a 3 a 3 a 3 a 2 a 3 a 1 a 3 a 0 a 3 a i a j + a j a i = 2 a i a j a 5 a 4 a 4 a 4 a 3 a 4 a 2 a 4 a 1 a 4 a 0 a 4 a 5 a 5 a 4 a 5 a 3 a 5 a 2 a 5 a 1 a 5 a 0 a 5 a 5 a 4 a 5 a 3 a 5 a 2 a 5 a 1 a 5 a 0 a 4 a 0 a 3 a 0 a 2 a 0 a 1 a 0 a 0 a i a j + a i = 2 a i a j + a i − a i a j a 5 a 4 a 3 a 4 a 2 a 4 a 1 a 3 a 1 a 2 a 1 a 1 = 2 a i a j + a i ( 1 − a j ) a 4 a 3 a 2 a 2 = 2 a i a j + a i a j a 3 a 5 a 4 a 5 a 4 a 5 a 3 a 5 a 2 a 5 a 1 a 5 a 0 a 4 a 0 a 3 a 0 a 2 a 0 a 1 a 0 a 0 15 AND + 5 IAND12 a 4 a 3 a 4 a 3 a 4 a 2 a 4 a 1 a 3 a 1 a 2 a 1 a 1 a 0 3 FA + 2 HA a 3 a 2 a 3 a 2 a 2 a 1 a 5 a 4 a 5 a 4 a 3 a 0 a 2 a 0 a 1 a 0 a 0 1 ADD(9 bits) a 2 a 1 a 1 a 0 Arnaud Tisserand. CNRS – Lab-STICC 25/48

Multiplication by Constants (1/2) Problem: substitute a complete multiplier by an optimized sequence of shifts and additions and/or subtractions Example: p = 111463 × x algo. p = 111463 × x = #op. direct ( x ≪ 16)+( x ≪ 15)+( x ≪ 13)+( x ≪ 12)+( x ≪ 9) 10 ± +( x ≪ 8)+( x ≪ 6)+( x ≪ 5)+( x ≪ 2)+( x ≪ 1)+ x CSD ( x ≪ 17) − ( x ≪ 14) − ( x ≪ 12)+( x ≪ 10) 7 ± − ( x ≪ 7) − ( x ≪ 5)+( x ≪ 3) − x Bernstein ((( t 2 ≪ 2)+ x ) ≪ 3) − x 5 ± where t 1 = ((( x ≪ 3) − x ) ≪ 2) − x t 2 = t 1 ≪ 7+ t 1 Our ( t 2 ≪ 12)+( t 2 ≪ 5)+ t 1 4 ± where t 1 = ( x ≪ 3) − x t 2 = ( t 1 ≪ 2) − x CSD: canonical signed digit, 111463 = 11011001101100111 2 = 100101010010101001 2 Arnaud Tisserand. CNRS – Lab-STICC 26/48

Multiplication by Constants (2/2) FIR (1 , 5 , 5 , 1) x[t] Power savings: 30 up to 60% D D D operator init. [1] [2] our 4 DCT 8b 300 94 73 56 4 DCT 12b 368 100 84 70 y[t] A DCT 16b 521 129 114 89 DCT 24b 789 212 — 119 D D D x[t] Power savings: 10% 4 y[t] B operator init. [1] [2] our 8 × 8 Had. 56 24 — 24 D D D y[t] (16 , 11) R.-M. 61 43 31 31 x[t] 4 (15 , 7) BCH 72 48 47 44 C (24 , 12 , 8) Golay 76 — 47 45 x[t] D D D y[t] Power savings: up to 40% z[t] 4 operator init. [22] our D 8 bits 35 32 24 D D z’[t] 16 bits 72 70 46 x[t] y[t] Parks-McClellan filter 4 D E remez (25 , [0 0 . 2 0 . 25 1] , [1 1 0 0]). Arnaud Tisserand. CNRS – Lab-STICC 27/48

Error and Accuracy Question : how many bits are correct ?  = (1 . 000 000 00) 2 x t theoretical value   x c = (0 . 111 111 11) 2 value in the circuit  = (0 . 000 000 01) 2 = 2 − 8 | x t − x c |  Arnaud Tisserand. CNRS – Lab-STICC 28/48

Error and Accuracy Question : how many bits are correct ?  = (1 . 000 000 00) 2 x t theoretical value   x c = (0 . 111 111 11) 2 value in the circuit  = (0 . 000 000 01) 2 = 2 − 8 | x t − x c |  Error, ǫ : distance between 2 objects (e.g. ǫ = || f ( x ) − p ( x ) || ) Accuracy, µ : (fractional) number of bits required to represent values with an error ≤ ǫ µ = − log 2 | ǫ | Arnaud Tisserand. CNRS – Lab-STICC 28/48

Error and Accuracy Question : how many bits are correct ?  = (1 . 000 000 00) 2 x t theoretical value   x c = (0 . 111 111 11) 2 value in the circuit  = (0 . 000 000 01) 2 = 2 − 8 | x t − x c |  Error, ǫ : distance between 2 objects (e.g. ǫ = || f ( x ) − p ( x ) || ) Accuracy, µ : (fractional) number of bits required to represent values with an error ≤ ǫ µ = − log 2 | ǫ | Notation : µ expressed in terms of correct or significant bits ([cb], [sb]) Arnaud Tisserand. CNRS – Lab-STICC 28/48

Error and Accuracy Question : how many bits are correct ?  = (1 . 000 000 00) 2 x t theoretical value   x c = (0 . 111 111 11) 2 value in the circuit  = (0 . 000 000 01) 2 = 2 − 8 | x t − x c |  Error, ǫ : distance between 2 objects (e.g. ǫ = || f ( x ) − p ( x ) || ) Accuracy, µ : (fractional) number of bits required to represent values with an error ≤ ǫ µ = − log 2 | ǫ | Notation : µ expressed in terms of correct or significant bits ([cb], [sb]) Example : error ǫ = 0 . 0000107 is equivalent to accuracy µ = 16 . 5 sb 12 11 10 9 8 7 6 5 4 3 2 1 µ [sb] ǫ 2 − 12 2 − 11 2 − 10 2 − 9 2 − 8 2 − 7 2 − 6 2 − 5 2 − 4 2 − 3 2 − 2 2 − 1 Arnaud Tisserand. CNRS – Lab-STICC 28/48

Polynomial Approximations x x argument f ( x ) [ a , b ] domain b ′ operator f function f f ( x ) a ′ x a b Arnaud Tisserand. CNRS – Lab-STICC 29/48

Polynomial Approximations x x argument f ( x ) [ a , b ] domain b ′ operator f function f p ( x ) ≈ f ( x ) p p polynomial a ′ x a b ǫ ( x ) ǫ approx. error ǫ ( x ) = f ( x ) − p ( x ) x ǫ Arnaud Tisserand. CNRS – Lab-STICC 29/48

Polynomial Approximations x x argument f ( x ) [ a , b ] domain b ′ operator f function f p p ( x ) ≈ f ( x ) p polynomial a ′ x a b ǫ ( x ) ǫ approx. error ǫ ( x ) = f ( x ) − p ( x ) x ǫ Arnaud Tisserand. CNRS – Lab-STICC 29/48

Polynomial Approximations x x argument f ( x ) [ a , b ] domain b ′ operator f function f p p ( x ) ≈ f ( x ) p polynomial a ′ x a b ǫ ( x ) ǫ approx. error ǫ ( x ) = f ( x ) − p ( x ) x ǫ ( x ) ≤ ǫ target ǫ target maximum ǫ allowed error Arnaud Tisserand. CNRS – Lab-STICC 29/48

Polynomial Approximations x x argument f ( x ) [ a , b ] domain b ′ operator f function f p p ( x ) ≈ f ( x ) p polynomial a ′ Question : what is the best p ? x a b ǫ ( x ) ǫ approx. error ǫ ( x ) = f ( x ) − p ( x ) x ǫ ( x ) ≤ ǫ target ǫ target maximum ǫ allowed error Arnaud Tisserand. CNRS – Lab-STICC 29/48

Accuracy, Degree and Evaluation Cost Degree- d minimax approximation polynomials to sin( x ) with x ∈ [ a , b ]: µ [sb] 24 [ a , b ] 20 16 12 8 4 d 1 2 3 4 5 π π π 0 2 π 4 2 Arnaud Tisserand. CNRS – Lab-STICC 30/48

Accuracy, Degree and Evaluation Cost Degree- d minimax approximation polynomials to sin( x ) with x ∈ [ a , b ]: µ [sb] 24 [ a , b ] 20 16 12 8 4 d 1 2 3 4 5 π π π 0 2 π 4 2 • higher accuracy = ⇒ higher degree • higher degree = ⇒ more costly evaluation Arnaud Tisserand. CNRS – Lab-STICC 30/48

Polynomial Evaluation Schemes scheme computations # ± # × p 0 + p 1 x + p 2 x 2 + p 3 x 3 direct 3 5 � � Horner p 0 + p 1 + ( p 2 + p 3 x ) x x 3 3 p 0 + p 1 x + ( p 2 + p 3 x ) x 2 Estrin 3 4 Trade-off: • direct scheme − → high operation cost and smaller accuracy • Horner scheme − → smallest cost but sequential • Estrin scheme − → some internal parallelism Arnaud Tisserand. CNRS – Lab-STICC 31/48

Polynomial Evaluation Schemes scheme computations # ± # × p 0 + p 1 x + p 2 x 2 + p 3 x 3 direct 3 5 � � Horner p 0 + p 1 + ( p 2 + p 3 x ) x x 3 3 p 0 + p 1 x + ( p 2 + p 3 x ) x 2 Estrin 3 4 Trade-off: • direct scheme − → high operation cost and smaller accuracy • Horner scheme − → smallest cost but sequential • Estrin scheme − → some internal parallelism Question : what is the best evaluation scheme? Arnaud Tisserand. CNRS – Lab-STICC 31/48

Round-off Errors Round-off errors occur during most of computations: • due to the finite accuracy during the computations • small for a single operation (fraction of the LSB) • accumulation of such errors may be a problem in long computation sequences • need for a sufficient datapath width in order to limit round-off errors Examples: 1 / 3 = 0 . 33333333 . . . → 0 . 3333 or 0 . 3334 in 1 Q 10 4 format + × Arnaud Tisserand. CNRS – Lab-STICC 32/48

Round-off Errors Round-off errors occur during most of computations: • due to the finite accuracy during the computations • small for a single operation (fraction of the LSB) • accumulation of such errors may be a problem in long computation sequences • need for a sufficient datapath width in order to limit round-off errors Examples: 1 / 3 = 0 . 33333333 . . . → 0 . 3333 or 0 . 3334 in 1 Q 10 4 format + × Question : what is the best datapath width? Arnaud Tisserand. CNRS – Lab-STICC 32/48

Rounding Modes and Correct Rounding Notations: • ⊚ is an operation ± , × , ÷ . . . • ⋄ is the active rounding mode (or quantization mode) IEEE-754: △ ( x ) towards + ∞ (up), ∇ ( x ) towards −∞ (down), Z ( x ) towards 0, N ( x ) towards the nearest R representable values midpoints x finite precision values mathematical values r math = a ⊚ math b r finite = a ⊚ finite b Arnaud Tisserand. CNRS – Lab-STICC 33/48

Rounding Modes and Correct Rounding Notations: • ⊚ is an operation ± , × , ÷ . . . • ⋄ is the active rounding mode (or quantization mode) IEEE-754: △ ( x ) towards + ∞ (up), ∇ ( x ) towards −∞ (down), Z ( x ) towards 0, N ( x ) towards the nearest ∇ ( x ) △ ( x ) R representable values midpoints x finite precision values mathematical values r math = a ⊚ math b r finite = a ⊚ finite b Arnaud Tisserand. CNRS – Lab-STICC 33/48

Rounding Modes and Correct Rounding Notations: • ⊚ is an operation ± , × , ÷ . . . • ⋄ is the active rounding mode (or quantization mode) IEEE-754: △ ( x ) towards + ∞ (up), ∇ ( x ) towards −∞ (down), Z ( x ) towards 0, N ( x ) towards the nearest ∇ ( x ) △ ( x ) R representable values midpoints 0 Z ( x ) x finite precision values mathematical values r math = a ⊚ math b r finite = a ⊚ finite b Arnaud Tisserand. CNRS – Lab-STICC 33/48

Rounding Modes and Correct Rounding Notations: • ⊚ is an operation ± , × , ÷ . . . • ⋄ is the active rounding mode (or quantization mode) IEEE-754: △ ( x ) towards + ∞ (up), ∇ ( x ) towards −∞ (down), Z ( x ) towards 0, N ( x ) towards the nearest ∇ ( x ) △ ( x ) R representable values midpoints 0 Z ( x ) x N ( x ) finite precision values mathematical values r math = a ⊚ math b r finite = a ⊚ finite b Arnaud Tisserand. CNRS – Lab-STICC 33/48

Rounding Modes and Correct Rounding Notations: • ⊚ is an operation ± , × , ÷ . . . • ⋄ is the active rounding mode (or quantization mode) IEEE-754: △ ( x ) towards + ∞ (up), ∇ ( x ) towards −∞ (down), Z ( x ) towards 0, N ( x ) towards the nearest ∇ ( x ) △ ( x ) R representable values midpoints 0 Z ( x ) x N ( x ) finite precision values mathematical values r math = a ⊚ math b r finite = a ⊚ finite b a ⊚ math b � � r finite = ⋄ Arnaud Tisserand. CNRS – Lab-STICC 33/48

Bounding Round-off Errors Problem : it is very difficult to get tight bounds Solutions: • worst case: assume 1 / 2 LSB error for each operation � simple but very pessimistic • qualification: exhaustive or selected simulations � simple but only validated bounds for small systems • specific tools: formal accurate analysis (and proof) � we use gappa developed by Guillaume Melquiond Arnaud Tisserand. CNRS – Lab-STICC 34/48

Gappa Overview • developed by Guillaume Melquiond • goal: formal verification of the correctness of numerical programs: ◮ software and hardware ◮ integer, floating-point and fixed-point arithmetic ( ± , × , ÷ , √ ) • uses multiple-precision interval arithmetic, forward error analysis and expression rewriting to bound mathematical expressions (rounded and exact operators) • generates a theorem and its proof which can be automatically checked using a proof assistant (e.g. Coq or HOL Light) • reports tight error bounds for given expressions in a given domain • C++ code and free software licence (CeCILL ≃ GPL) • publication: ACM Transactions on Mathematical Software, n. 1, vol. 37, 2010, pp: 2:1–20, doi: 10.1145/1644001.1644003 • source code and doc: http://gappa.gforge.inria.fr/ Arnaud Tisserand. CNRS – Lab-STICC 35/48

Gappa Example Degree-2 polynomial approximation to e x over [1 / 2 , 1] and format 1Q9: 1 p0 = 571/512; p1 = 275/512; p2 = 545/512; 2 3 x = f i x e d < − 9,dn > (Mx) ; 4 5 y1 f i x e d < − 9,dn > = p2 ∗ x + p1 ; 6 p f i x e d < − 9,dn > = y1 ∗ x + p0 ; 7 8 Mp = ( p2 ∗ Mx + p1 ) ∗ Mx + p0 ; 9 10 { Mx in [ 0 . 5 , 1 ] / \ | Mp − Mf | in [ 0 , 0 . 0 0 1 3 8 5 ] 11 12 − > | p − Mf | in ? 13 14 } x b y = x 2 y ): Gappa-0.14.0 result ([ a , b ], x { ( ≈ x ) 10 , log 2 x } , Results for Mx in [0.5, 1] and |Mp - Mf| in [0, 0.001385]: |p - Mf| in [0, 193518932894171697b-64 {0.0104907, 2^(-6.57475)}] Arnaud Tisserand. CNRS – Lab-STICC 36/48

Still Pending Questions Question : what is the best (or a good) p ? Question : what is the best (or a good) datapath width? Question : what is the best (or a good) evaluation scheme? Arnaud Tisserand. CNRS – Lab-STICC 37/48

Still Pending Questions Question : what is the best (or a good) p ? mathematical p : minimax approximations implemented p : simple selection of representable coefficients links to other methods and tools Question : what is the best (or a good) datapath width? Question : what is the best (or a good) evaluation scheme? Arnaud Tisserand. CNRS – Lab-STICC 37/48

Still Pending Questions Question : what is the best (or a good) p ? mathematical p : minimax approximations implemented p : simple selection of representable coefficients links to other methods and tools Question : what is the best (or a good) datapath width? basic optimization method better heuristics under development. . . Question : what is the best (or a good) evaluation scheme? Arnaud Tisserand. CNRS – Lab-STICC 37/48

Still Pending Questions Question : what is the best (or a good) p ? mathematical p : minimax approximations implemented p : simple selection of representable coefficients links to other methods and tools Question : what is the best (or a good) datapath width? basic optimization method better heuristics under development. . . Question : what is the best (or a good) evaluation scheme? Horner or specific scheme examples. . . work still in progress. . . Arnaud Tisserand. CNRS – Lab-STICC 37/48

Minimax Polynomial Approximations • approximation error ǫ app = || f − p || ∞ = max a ≤ x ≤ b | f ( x ) − p ( x ) | • minimax polynomial approximation to f over [ a , b ] is p ∗ such that: || f − p ∗ || ∞ = min p ∈P d || f − p || ∞ • P d set of polynomials with real coefficients and degree ≤ d • p ∗ computed using an algorithm from Remez (numerically implemented in Maple, Matlab, sollya. . . ) Problems: • p ∗ coefficients in R = ⇒ conversion to finite precision • during p ∗ evaluation, some round-off errors add up to ǫ app Arnaud Tisserand. CNRS – Lab-STICC 38/48

Example f ( x ) = 2 x and x ∈ [0 , 1] f ( x ) 2 x 2 d µ [sb] ǫ app 4 . 31 × 10 − 2 1 4 . 53 2 . 48 × 10 − 3 2 8 . 65 1 . 08 × 10 − 4 3 13 . 18 3 . 71 × 10 − 6 4 18 . 04 1 . 07 × 10 − 7 5 23 . 15 x 1 0 1 p ∗ ? Arnaud Tisserand. CNRS – Lab-STICC 39/48

Example f ( x ) = 2 x and x ∈ [0 , 1] f ( x ) 2 x 2 d µ [sb] ǫ app 4 . 31 × 10 − 2 1 4 . 53 2 . 48 × 10 − 3 2 8 . 65 1 . 08 × 10 − 4 3 13 . 18 3 . 71 × 10 − 6 4 18 . 04 1 . 07 × 10 − 7 5 23 . 15 x 1 0 1 p ∗ = 0 . 956964333 + 1 . 000000000 × Arnaud Tisserand. CNRS – Lab-STICC 39/48

Example f ( x ) = 2 x and x ∈ [0 , 1] f ( x ) 2 x 2 d µ [sb] ǫ app 4 . 31 × 10 − 2 1 4 . 53 2 . 48 × 10 − 3 2 8 . 65 1 . 08 × 10 − 4 3 13 . 18 3 . 71 × 10 − 6 4 18 . 04 1 . 07 × 10 − 7 5 23 . 15 x 1 0 1 p ∗ = 1 . 002476056 + x × (0 . 651046780 + x × 0 . 344001106) Arnaud Tisserand. CNRS – Lab-STICC 39/48

Example f ( x ) = 2 x and x ∈ [0 , 1] f ( x ) 2 x 2 d µ [sb] ǫ app 4 . 31 × 10 − 2 1 4 . 53 2 . 48 × 10 − 3 2 8 . 65 1 . 08 × 10 − 4 3 13 . 18 3 . 71 × 10 − 6 4 18 . 04 1 . 07 × 10 − 7 5 23 . 15 x 1 0 1 p ∗ = 0 . 999892965 + x × (0 . 696457394 + x × (0 . 224338364 + x × 0 . 079204240)) Arnaud Tisserand. CNRS – Lab-STICC 39/48

Example f ( x ) = 2 x and x ∈ [0 , 1] f ( x ) 2 x 2 d µ [sb] ǫ app 4 . 31 × 10 − 2 1 4 . 53 2 . 48 × 10 − 3 2 8 . 65 1 . 08 × 10 − 4 3 13 . 18 3 . 71 × 10 − 6 4 18 . 04 1 . 07 × 10 − 7 5 23 . 15 x 1 0 1 p ∗ = 1 . 000003704 + x × (0 . 692966122 + x × (0 . 241638445 + x × (0 . 051690358 + x × 0 . 013697664))) Arnaud Tisserand. CNRS – Lab-STICC 39/48

Finite Precision Coefficients Selection Problem Example: f ( x ) = e x over [1 / 2 , 1] with d = 2, the remez function from sollya gives: p ∗ = 1 . 116019297 . . . + 0 . 535470348 . . . × x + 1 . 065407185 . . . × x 2 Arnaud Tisserand. CNRS – Lab-STICC 40/48

Finite Precision Coefficients Selection Problem Example: f ( x ) = e x over [1 / 2 , 1] with d = 2, the remez function from sollya gives: p ∗ = 1 . 116019297 . . . + 0 . 535470348 . . . × x + 1 . 065407185 . . . × x 2 Question : what are “good” representable values for p 0 , p 1 and p 2 ? Problem : p ∗ is the best theoretical approximation to f (i.e. p i ∈ R ) Need : find good approximations with “machine-representable” coefficients Arnaud Tisserand. CNRS – Lab-STICC 40/48

Finite Precision Coefficients Selection Problem Example: f ( x ) = e x over [1 / 2 , 1] with d = 2, the remez function from sollya gives: p ∗ = 1 . 116019297 . . . + 0 . 535470348 . . . × x + 1 . 065407185 . . . × x 2 Question : what are “good” representable values for p 0 , p 1 and p 2 ? Problem : p ∗ is the best theoretical approximation to f (i.e. p i ∈ R ) Need : find good approximations with “machine-representable” coefficients Above example with 1Q9 format (all values for domain [1 / 2 , 1]): � • ǫ app = || f − p ∗ || ∞ ≃ 1 . 385 × 10 − 3 ≃ 9 . 4 sb � 571 512 + 137 256 x + 545 512 x 2 • 8 . 1 sb ( ∀ i use N ( p i )) Arnaud Tisserand. CNRS – Lab-STICC 40/48

Finite Precision Coefficients Selection Problem Example: f ( x ) = e x over [1 / 2 , 1] with d = 2, the remez function from sollya gives: p ∗ = 1 . 116019297 . . . + 0 . 535470348 . . . × x + 1 . 065407185 . . . × x 2 Question : what are “good” representable values for p 0 , p 1 and p 2 ? Problem : p ∗ is the best theoretical approximation to f (i.e. p i ∈ R ) Need : find good approximations with “machine-representable” coefficients Above example with 1Q9 format (all values for domain [1 / 2 , 1]): � • ǫ app = || f − p ∗ || ∞ ≃ 1 . 385 × 10 − 3 ≃ 9 . 4 sb � 571 512 + 137 256 x + 545 512 x 2 • 8 . 1 sb ( ∀ i use N ( p i )) � • 571 512 + 275 512 x + 545 512 x 2 9 . 3 sb (best selection) Arnaud Tisserand. CNRS – Lab-STICC 40/48

Basic Coefficient Selection Method Idea: search among all the rounding modes for all the p ∗ i • round up p i = △ ( p ∗ i ), round down p i = ▽ ( p ∗ i ) ⇒ total of 2 d +1 values (but d is small) • 2 values per coeff. = • for each polynomial p evaluate ǫ app = || f − p || ∞ , then select polynomial(s) with the smallest ǫ app height = d + 1 ▽ ( p 0 ) △ ( p 0 ) ▽ ( p 1 ) △ ( p 1 ) ▽ ( p 1 ) △ ( p 1 ) ▽ ( p 2 ) △ ( p 2 ) ▽ ( p 2 ) △ ( p 2 ) ▽ ( p 2 ) △ ( p 2 ) ▽ ( p 2 ) △ ( p 2 ) i =0 p i x i where all p i are representable in target format Result: p ( x ) = � d Arnaud Tisserand. CNRS – Lab-STICC 41/48

Basic Coefficient Selection Method Idea: search among all the rounding modes for all the p ∗ i • round up p i = △ ( p ∗ i ), round down p i = ▽ ( p ∗ i ) ⇒ total of 2 d +1 values (but d is small) • 2 values per coeff. = • for each polynomial p evaluate ǫ app = || f − p || ∞ , then select polynomial(s) with the smallest ǫ app height = d + 1 ▽ ( p 0 ) △ ( p 0 ) ▽ ( p 1 ) △ ( p 1 ) ▽ ( p 1 ) △ ( p 1 ) ▽ ( p 2 ) △ ( p 2 ) ▽ ( p 2 ) △ ( p 2 ) ▽ ( p 2 ) △ ( p 2 ) ▽ ( p 2 ) △ ( p 2 ) ǫ app i =0 p i x i where all p i are representable in target format Result: p ( x ) = � d Arnaud Tisserand. CNRS – Lab-STICC 41/48

Example for f ( x ) = 2 x , x ∈ [0 , 1] and d = 4 � ǫ app ( p ∗ ) 18 . 04 sb ǫ app [sb] ǫ app ( p ) ǫ app ( p ) p p 20 ( ▽ , ▽ , ▽ , ▽ , ▽ ) 12.00 ( ▽ , ▽ , ▽ , ▽ , △ ) 13.00 d = 4 18 ( ▽ , ▽ , ▽ , △ , ▽ ) 13.00 ( ▽ , ▽ , ▽ , △ , △ ) 14.03 ( ▽ , ▽ , △ , ▽ , ▽ ) 13.00 ( ▽ , ▽ , △ , ▽ , △ ) 14.55 16 ( ▽ , ▽ , △ , △ , ▽ ) 14.99 ( ▽ , ▽ , △ , △ , △ ) 13.00 ( ▽ , △ , ▽ , ▽ , ▽ ) 13.00 ( ▽ , △ , ▽ , ▽ , △ ) 16.13 14 ( ▽ , △ , ▽ , △ , ▽ ) 17.12 ( ▽ , △ , ▽ , △ , △ ) 13.00 ( ▽ , △ , △ , ▽ , ▽ ) 15.71 ( ▽ , △ , △ , ▽ , △ ) 13.00 12 ( ▽ , △ , △ , △ , ▽ ) 13.00 ( ▽ , △ , △ , △ , △ ) 12.00 ( △ , ▽ , ▽ , ▽ , ▽ ) 13.00 ( △ , ▽ , ▽ , ▽ , △ ) 13.00 10 ( △ , ▽ , ▽ , △ , ▽ ) 13.00 ( △ , ▽ , ▽ , △ , △ ) 13.00 8 ( △ , ▽ , △ , ▽ , ▽ ) 13.00 ( △ , ▽ , △ , ▽ , △ ) 13.00 ( △ , ▽ , △ , △ , ▽ ) 12.99 ( △ , ▽ , △ , △ , △ ) 12.00 6 ( △ , △ , ▽ , ▽ , ▽ ) 12.99 ( △ , △ , ▽ , ▽ , △ ) 12.98 ( △ , △ , ▽ , △ , ▽ ) 12.91 ( △ , △ , ▽ , △ , △ ) 12.00 4 ( △ , △ , △ , ▽ , ▽ ) 12.79 ( △ , △ , △ , ▽ , △ ) 12.00 ( △ , △ , △ , △ , ▽ ) 12.00 ( △ , △ , △ , △ , △ ) 11.41 2 p represented by ( p 0 , p 1 , p 2 , p 3 , p 4 ) 0 Arnaud Tisserand. CNRS – Lab-STICC 42/48

Example for f ( x ) = 2 x , x ∈ [0 , 1] and d = 4 � ǫ app ( p ∗ ) 18 . 04 sb ǫ app [sb] ǫ app ( p ) ǫ app ( p ) p p 20 ( ▽ , ▽ , ▽ , ▽ , ▽ ) 12.00 ( ▽ , ▽ , ▽ , ▽ , △ ) 13.00 d = 4 18 ( ▽ , ▽ , ▽ , △ , ▽ ) 13.00 ( ▽ , ▽ , ▽ , △ , △ ) 14.03 ( ▽ , ▽ , △ , ▽ , ▽ ) 13.00 ( ▽ , ▽ , △ , ▽ , △ ) 14.55 16 ( ▽ , ▽ , △ , △ , ▽ ) 14.99 ( ▽ , ▽ , △ , △ , △ ) 13.00 ( ▽ , △ , ▽ , ▽ , ▽ ) 13.00 ( ▽ , △ , ▽ , ▽ , △ ) 16.13 14 d = 3 ( ▽ , △ , ▽ , △ , ▽ ) 17.12 ( ▽ , △ , ▽ , △ , △ ) 13.00 ( ▽ , △ , △ , ▽ , ▽ ) 15.71 ( ▽ , △ , △ , ▽ , △ ) 13.00 12 ( ▽ , △ , △ , △ , ▽ ) 13.00 ( ▽ , △ , △ , △ , △ ) 12.00 ( △ , ▽ , ▽ , ▽ , ▽ ) 13.00 ( △ , ▽ , ▽ , ▽ , △ ) 13.00 10 d = 2 ( △ , ▽ , ▽ , △ , ▽ ) 13.00 ( △ , ▽ , ▽ , △ , △ ) 13.00 8 ( △ , ▽ , △ , ▽ , ▽ ) 13.00 ( △ , ▽ , △ , ▽ , △ ) 13.00 ( △ , ▽ , △ , △ , ▽ ) 12.99 ( △ , ▽ , △ , △ , △ ) 12.00 6 ( △ , △ , ▽ , ▽ , ▽ ) 12.99 ( △ , △ , ▽ , ▽ , △ ) 12.98 d = 1 ( △ , △ , ▽ , △ , ▽ ) 12.91 ( △ , △ , ▽ , △ , △ ) 12.00 4 ( △ , △ , △ , ▽ , ▽ ) 12.79 ( △ , △ , △ , ▽ , △ ) 12.00 ( △ , △ , △ , △ , ▽ ) 12.00 ( △ , △ , △ , △ , △ ) 11.41 2 p represented by ( p 0 , p 1 , p 2 , p 3 , p 4 ) 0 Arnaud Tisserand. CNRS – Lab-STICC 42/48

Example: 2 x over [0 , 1] and µ ≤ 12 sb (1/2) Let us try with d = 3 (max. theoretical accuracy 13 . 18 sb): p ∗ ( x ) = 0 . 999892965 + 0 . 696457394 x + 0 . 224338364 x 2 + 0 . 079204240 x 3 Coefficients (fractional part) size selection: 12 13 14 15 16 l ǫ app 12 . 38 12 . 45 13 . 00 13 . 00 13 . 02 # polynomials 0 0 2 2 7 Coefficients selection: for n = k + l = 1 + 14 bits, we get: ( ▽ , ▽ , ▽ , ▽ ) 11.41 ( ▽ , ▽ , ▽ , △ ) 12.00 ( ▽ , ▽ , △ , ▽ ) 12.00 ( ▽ , ▽ , △ , △ ) 12.84 ( ▽ , △ , ▽ , ▽ ) 12.00 ( ▽ , △ , ▽ , △ ) 13.00 ( ▽ , △ , △ , ▽ ) 13.00 ( ▽ , △ , △ , △ ) 12.36 ( △ , ▽ , ▽ , ▽ ) 12.00 ( △ , ▽ , ▽ , △ ) 12.25 ( △ , ▽ , △ , ▽ ) 12.23 ( △ , ▽ , △ , △ ) 12.23 ( △ , △ , ▽ , ▽ ) 12.13 ( △ , △ , ▽ , △ ) 12.12 ( △ , △ , △ , ▽ ) 12.05 ( △ , △ , △ , △ ) 11.64 Arnaud Tisserand. CNRS – Lab-STICC 43/48

Example: 2 x over [0 , 1] and µ ≤ 12 sb (2/2) Datapath size selection: n ′ 14 15 16 17 18 19 20 ǫ eval direct 11 . 24 11 . 86 12 . 32 12 . 62 12 . 79 12 . 89 12 . 94 ǫ eval Horner 11 . 32 11 . 93 12 . 36 12 . 65 12 . 81 12 . 90 12 . 95 Solution: d = 3, n = k + l = 1 + 14 and n ′ = 16 Implementation results: solution area period #cycles latency power wo. tools 1 . 00 1 . 00 4 1 . 00 1 . 00 w. tools 0 . 83 0 . 82 3 0 . 61 0 . 68 Arnaud Tisserand. CNRS – Lab-STICC 44/48

Introduction to Computer Arithmetic for Efficient Hardware - PowerPoint PPT Presentation

Introduction to Computer Arithmetic for Efficient Hardware Implementations Arnaud Tisserand CNRS, Lab-STICC CEA-SPEC Seminar, Nov. 2019 -- Babylonian Arithmetic Use of a positional number system with: primary radix 60 auxiliary radix

By Shervin Daneshpajouh Computer Arithmetic Computer Arithmetic p Computer Computer Arithmetic

Digital Design Discussion: Arithmetic Binary Arithmetic Floating-Point Arithmetic Binary

Lecture 4 Arithmetic-Logic Unit 1 Arithmetic - Logic Unit ALU Handles integers Does the

Numeration and Computer Arithmetic Some Examples JC Bajard LIRMM, CNRS UM2 161 rue Ada, 34392

Arithmetic for Computers October 31, 2008 Arithmetic for Computers ALU Arithmetic Logic Unit

Section 4 Section 4 Arithmetic Units a 4-1 1 ALU ALU a 4-2 2 Arithmetic Logic Unit (ALU)

Part I: RELIC Diego F. Aranha Efficient Binary Field Arithmetic Numbers RELIC is an Efficient

Fast Arithmetic Philipp Koehn 27 September 2019 Philipp Koehn Computer Systems Fundamental:

Arithmetic Logic Unit (ALU) By : Khawar Nehal 18 June 2020 Updated 21 June 2020 1 / 32

Arithmetic Series (Lesson Slides) UNIT #7: Sequences and Series WARMUP Arithmetic Series

Peano Arithmetic Definition. The axioms of Peano Arithmetic (1889), denoted PA , consist of the

Lecture 14. Outline. Modular Arithmetic Fact and Secrets There exists a polynomial... Modular

Coding addition Sasha Rubin Cornell REU 2009 Arithmetic on N Addition is space-efficient. eg.

Chapt hapter er 3 3 Arithmetic for Computers 3.1 Introduction Arithmetic for Computers

Arb: efficient arbitrary-precision midpoint-radius interval arithmetic Fredrik Johansson LFANT,

15-251 Great Theoretical Ideas in Computer Science Lecture 21: Computational Arithmetic November

GRAM-SCHMIDT ORTHOGONALIZATION WITH STANDARD AND NON-STANDARD INNER PRODUCT: ROUNDING ERROR

FL FLOATING TING-POIN OINT T ROUTINES OUTINES Zvonimir Rakamari FL FLOATING TING-POINT

Computation of the error functions erf and erfc in arbitrary precision with correct rounding

On the maximum relative error when computing x n in floating-point arithmetic Jean-Michel Muller

Scientific Computing What is scientific computing ? Design and analysis of algorithms for solving

Automated Precision Tuning using Semidefinite Programming Victor Magron , RA Imperial College

Behaviour of some data-types In this laboration you will visualize and investigate some

Numerical Computation for Deep Learning Lecture slides for Chapter 4 of Deep Learning