Rounding errors
Example Show demo: “Waiting for 1”. Determine the double-precision machine representation for 0.1 ! ×2 "# 0.1 = 0.000110011 0011 … ! = 1.100110011 … 𝑡 = 0 #×𝟑 Integer Fractional part part 𝑔 = 100110011 … 00110011010 0.2 0 0.2 0.4 0 0.4 𝑛 = −4 0.8 0 0.8 𝑑 = 𝑛 + 1023 = 1019 = 01111111011 ! 1.6 1 0.6 1.2 1 0.2 0 01111111011 10011 … 0011 … 0011010 0.4 0 0.4 0.8 0 0.8 (52-bit) 1.6 1 0.6 Roundoff error in its basic form! 1.2 1 0.2
Machine floating point number • Not all real numbers can be exactly represented as a machine floating-point number. • Consider a real number in the normalized floating-point form: 𝑦 = ±1. 𝑐 ! 𝑐 " 𝑐 # … 𝑐 $ …× 2 % The real number 𝑦 will be approximated by either 𝑦 & or 𝑦 ' , the nearest two • machine floating point numbers. 𝑦 0 𝑦 𝑦 1 +∞ 0 Without loss of generality, let’s see what happens when trying to represent a positive machine floating point number: 𝑦 = 1. 𝑐 % 𝑐 & 𝑐 ' … 𝑐 ( …× 2 ) Exact number: 𝑦 $ = 1. 𝑐 % 𝑐 & 𝑐 ' … 𝑐 ( × 2 ) (rounding by chopping) 𝑦 * = 1. 𝑐 % 𝑐 & 𝑐 ' … 𝑐 ( × 2 ) + 0.000 … 01× 2 ) 𝜗 $
𝑦 0 𝑦 𝑦 1 +∞ 0 𝑦 = 1. 𝑐 % 𝑐 & 𝑐 ' … 𝑐 ( …× 2 ) Exact number: 𝑦 $ = 1. 𝑐 % 𝑐 & 𝑐 ' … 𝑐 ( × 2 ) 𝑦 * = 1. 𝑐 % 𝑐 & 𝑐 ' … 𝑐 ( × 2 ) + 0.000 … 01× 2 ) 𝜗 $ Gap between 𝑦 ' and 𝑦 & : 𝑦 % − 𝑦 " = 𝜗 $ × 2 $ Examples for single precision: 𝑦 ' and 𝑦 & of the form 𝑟 × 2 &!( : 𝑦 ' − 𝑦 & = 2 &## ≈ 10 &!( 𝑦 ' and 𝑦 & of the form 𝑟 × 2 ) : 𝑦 ' − 𝑦 & = 2 &!* ≈ 2× 10 &+ 𝑦 ' and 𝑦 & of the form 𝑟 × 2 "( : 𝑦 ' − 𝑦 & = 2 &# ≈ 0.125 𝑦 ' and 𝑦 & of the form 𝑟 × 2 +( : 𝑦 ' − 𝑦 & = 2 #, ≈ 10 !! The interval between successive floating point numbers is not uniform: the interval is smaller as the magnitude of the numbers themselves is smaller, and it is bigger as the numbers get bigger.
Gap between two successive machine floating point numbers A ”toy” number system can be represented as 𝑦 = ±1. 𝑐 4 𝑐 5 ×2 6 for 𝑛 ∈ [−4,4] and 𝑐 - ∈ {0,1} . 1.00 ! ×2 & = 1 1.00 ! ×2 ! = 4.0 1.00 ! ×2 ' = 2 1.01 ! ×2 & = 1.25 1.01 ! ×2 ' = 2.5 1.01 ! ×2 ! = 5.0 1.10 ! ×2 & = 1.5 1.10 ! ×2 ' = 3.0 1.10 ! ×2 ! = 6.0 1.11 ! ×2 & = 1.75 1.11 ! ×2 ' = 3.5 1.11 ! ×2 ! = 7.0 1.00 ! ×2 ( = 8.0 1.00 ! ×2 "' = 0.5 1.00 ! ×2 # = 16.0 1.01 ! ×2 ( = 10.0 1.01 ! ×2 "' = 0.625 1.01 ! ×2 # = 20.0 1.10 ! ×2 ( = 12.0 1.10 ! ×2 "' = 0.75 1.10 ! ×2 # = 24.0 1.11 ! ×2 ( = 14.0 1.11 ! ×2 "' = 0.875 1.11 ! ×2 # = 28.0 1.00 ! ×2 "! = 0.25 1.00 ! ×2 "( = 0.125 1.00 ! ×2 "# = 0.0625 1.01 ! ×2 "! = 0.3125 1.01 ! ×2 "( = 0.15625 1.01 ! ×2 "# = 0.078125 1.10 ! ×2 "! = 0.375 1.10 ! ×2 "# = 0.09375 1.10 ! ×2 "( = 0.1875 1.11 ! ×2 "! = 0.4375 1.11 ! ×2 "# = 0.109375 1.11 ! ×2 "( = 0.21875
Rounding The process of replacing 𝑦 by a nearby machine number is called rounding, and the error involved is called roundoff error. Round Round Round Round towards towards towards towards zero + ∞ zero − ∞ 𝑦 1 𝑦 1 −∞ 𝑦 𝑦 0 𝑦 0 𝑦 +∞ 0 Round by chopping : 𝑔𝑚 𝑦 = 𝑦 & 𝑦 is positive number 𝑦 is negative number 𝑔𝑚 𝑦 = 𝑦 % 𝑔𝑚 𝑦 = 𝑦 " Round up (ceil) Rounding towards +∞ Rounding towards zero 𝑔𝑚 𝑦 = 𝑦 " 𝑔𝑚 𝑦 = 𝑦 % Round down (floor) Rounding towards −∞ Rounding towards zero Round to nearest : either round up or round down, whichever is closer
Rounding (roundoff) errors Consider rounding by chopping: • Absolute error: )l(𝑦) − 𝑦 ≤ 𝑦 1 − 𝑦 0 = 𝜗 6 × 2 6 )l(𝑦) − 𝑦 ≤ 𝜗 6 × 2 6 • Relative error: 𝜗 6 × 2 6 )l(𝑦) − 𝑦 ≤ 1. 𝑐 4 𝑐 5 𝑐 > … 𝑐 ? …× 2 6 𝑦 )l(𝑦) − 𝑦 ≤ 𝜗 6 𝑦
Rounding (roundoff) errors 𝑦 0 𝑦 1 𝑦 = 1. 𝑐 4 𝑐 5 𝑐 > … 𝑐 ? …× 2 6 𝑦 − 𝑦 2 𝑦 − 𝑦 2 ≤ 2 05> ≈ 1.2×10 0@ ≤ 2 0A5 ≈ 2.2×10 04B |𝑦| |𝑦| Single precision: Floating-point Double precision: Floating-point math consistently introduces relative math consistently introduces errors of about 10 0@ . Hence, single relative errors of about 10 04B . precision gives you about 7 Hence, double precision gives you (decimal) accurate digits. about 16 (decimal) accurate digits.
Iclicker question Assume you are working with IEEE single-precision numbers. Find the smallest number 𝑏 that satisfies 2 C + 𝑏 ≠ 2 C A) 2 04D@E B) 2 04D55 C) 2 0A5 D) 2 04A E) 2 0C
Demo
Arithmetic with machine numbers
Mathematical properties of FP operations Not necessarily associative : For some 𝑦 , 𝑧, 𝑨 the result below is possible: 𝑦 + 𝑧 + 𝑨 ≠ 𝑦 + (𝑧 + 𝑨) Not necessarily distributive : For some 𝑦 , 𝑧, 𝑨 the result below is possible: 𝑨 𝑦 + 𝑧 ≠ 𝑨 𝑦 + 𝑨 𝑧 Not necessarily cumulative : Repeatedly adding a very small number to a large number may do nothing
Floating point arithmetic (basic idea) 𝑦 = (−1) 𝒕 1. 𝒈 × 2 𝒏 = 𝒅 𝒈 𝒕 • First compute the exact result • Then round the result to make it fit into the desired precision • 𝑦 + 𝑧 = 𝑔𝑚 𝑦 + 𝑧 • 𝑦 × 𝑧 = 𝑔𝑚 𝑦 × 𝑧
Floating point arithmetic Consider a number system such that 𝑦 = ±1. 𝑐 4 𝑐 5 𝑐 > ×2 6 for 𝑛 ∈ [−4,4] and 𝑐 - ∈ {0,1} . Rough algorithm for addition and subtraction: 1. Bring both numbers onto a common exponent 2. Do “grade-school” operation 3. Round result • Example 1: No rounding needed 𝑏 = 1.101 5 ×2 4 𝑐 = 1.001 5 ×2 4 𝑑 = 𝑏 + 𝑐 = 10.110 5 ×2 4 = 1.011 5 ×2 5
Floating point arithmetic Consider a number system such that 𝑦 = ±1. 𝑐 4 𝑐 5 𝑐 > ×2 6 for 𝑛 ∈ [−4,4] and 𝑐 - ∈ {0,1} . • Example 2: Require rounding 𝑏 = 1.101 5 ×2 D 𝑐 = 1.000 5 ×2 D 𝑑 = 𝑏 + 𝑐 = 10.101 5 ×2 D ≈ 1.010 5 ×2 4 • Example 3: 𝑏 = 1.100 5 ×2 4 𝑐 = 1.100 5 ×2 04 𝑑 = 𝑏 + 𝑐 = 1.100 5 ×2 4 + 0.011 5 ×2 4 = 1.111 5 ×2 4
Floating point arithmetic Consider a number system such that 𝑦 = ±1. 𝑐 4 𝑐 5 𝑐 > 𝑐 E ×2 6 for 𝑛 ∈ [−4,4] and 𝑐 - ∈ {0,1} . • Example 4: 𝑏 = 1.1011 5 ×2 4 𝑐 = 1.1010 5 ×2 4 𝑑 = 𝑏 − 𝑐 = 0.0001 5 ×2 4 𝑑 = 1. ? ? ? ? 5 ×2 0> Or after normalization: Unfortunately there is not data to indicate what the missing digits should be. The effect is that the number of significant digits in the result is reduced. Machine fills them with its best guess, which is often not good (usually what is called spurious zeros). This phenomenon is called Catastrophic Cancellation .
Cancellation 𝑏 = 1. 𝑏 4 𝑏 5 𝑏 > 𝑏 E 𝑏 A 𝑏 B … 𝑏 ? …×2 64 𝑐 = 1. 𝑐 4 𝑐 5 𝑐 > 𝑐 E 𝑐 A 𝑐 B … 𝑐 ? …×2 65 Suppose 𝑏 ≈ 𝑐 and single precision (without loss of generality) 𝑏 = 1. 𝑏 4 𝑏 5 𝑏 > 𝑏 E 𝑏 A 𝑏 B … 𝑏 5D 𝑏 54 10𝑏 5E 𝑏 5A 𝑏 5B 𝑏 5@ … ×2 6 𝑐 = 1. 𝑏 4 𝑏 5 𝑏 > 𝑏 E 𝑏 A 𝑏 B … 𝑏 5D 𝑏 54 11𝑐 5E 𝑐 5A 𝑐 5B 𝑐 5@ …×2 6 Lost due to rounding 𝑔𝑚(𝑐 − 𝑏) = 0.0000 … 0001×2 6 = 1. ? ? ? ? ? ? … ? ?×2 0?16 𝑔𝑚 𝑐 − 𝑏 = 1.000 … 00×2 0?16 Not significant bits (precision lost, not due to 𝑔𝑚(𝑐 − 𝑏) but due to rounding of a, 𝑐 from the beginning
Example of cancellation:
Loss of significance Assume 𝑏 ≫ 𝑐 . For example 𝑏 = 1. 𝑏 4 𝑏 5 𝑏 > 𝑏 E 𝑏 A 𝑏 B … 𝑏 ? …×2 D 𝑐 = 1. 𝑐 4 𝑐 5 𝑐 > 𝑐 E 𝑐 A 𝑐 B … 𝑐 ? …×2 0C In Single Precision (without loss of generality): 𝑔𝑚(𝑏) = 1. 𝑏 4 𝑏 5 𝑏 > 𝑏 E 𝑏 A 𝑏 B … 𝑏 55 𝑏 5> ×2 D 𝑔𝑚(𝑐) = 1. 𝑐 4 𝑐 5 𝑐 > 𝑐 E 𝑐 A 𝑐 B … 𝑐 55 𝑐 5> ×2 0C 1. 𝑏 4 𝑏 5 𝑏 > 𝑏 E 𝑏 A 𝑏 B 𝑏 @ 𝑏 C 𝑏 N … 𝑏 55 𝑏 5> ×2 D 0.00000001𝑐 4 𝑐 5 𝑐 > 𝑐 E 𝑐 A … 𝑐 4E 𝑐 4A ×2 D + In this example, the result 𝑔𝑚 𝑏 + 𝑐 includes 15 bits of precision from 𝑔𝑚(𝑐) . Lost precision!
Loss of Significance How can we avoid this loss of significance? For example, consider the 𝑦 5 + 1 − 1 function 𝑔 𝑦 = If we want to evaluate the function for values 𝑦 near zero, there is a potential loss of significance in the subtraction. For example, if 𝑦 = 10 0> and we use five-decimal-digit arithmetic 𝑔 10 0> = (10 0> ) 5 + 1 − 1 = 0 How can we fix this issue?
Loss of Significance O ! Re-write the function as 𝑔 𝑦 = O ! 1404 (no subtraction!) Evaluate now the function for 𝑦 = 10 0> using five-decimal-digit arithmetic (4D "# ) ! 1404 = 4D "$ (4D "# ) ! 𝑔 10 0> = 5
Recommend
More recommend