algorithms using real numbers
play

Algorithms using real numbers Floating point arithmetic limited - PDF document

Algorithms using real numbers Floating point arithmetic limited precision Algorithms using real numbers computational errors IEEE standard type double in Java Computational errors due to limited precision


  1. Algorithms using “real” numbers • Floating point arithmetic – limited precision Algorithms using “real” numbers – computational errors – IEEE standard – type double in Java • Computational errors due to limited precision arithmetic • floating point arithmetic – typical errors involving summation, convergence, subtraction • typical computational errors due to limited precision arithmetic – how to avoid some of them • Applications of numerical algorithms • advice for constructing algorithms using real numbers – Ex. π -computation – graphical / geometric computations – scientific computations 1 2 Floating point arithmetic: limited precision • A computer can handle only a finite subset of the reals directly Floating point arithmetic: limited precision in hardware • These numbers make up a floating point number system • Each floating point number in F ( β, s, m, M ) has the form F ( β, s, m, M ) characterized by y = ± d 1 .d 2 . . . d s · β e , β ∈ N\{ 1 } a base a number of digits s ∈ N with m ≤ e ≤ M and 0 ≤ d k ≤ β − 1. • The mantissa d 1 .d 2 . . . d s represents d 1 + d 2 · β − 1 + · · · + d s · β − s +1 a smallest exponent m ∈ Z M ∈ Z largest exponent • the exponent part is β e • Each floating point number has the form • The system includes 0 = 0 . 0 . . . 0 · β e . y = ± d 1 .d 2 . . . d s · β e , • All other numbers are normalised: 1 ≤ d 1 ≤ β − 1 with m ≤ e ≤ M and 0 ≤ d k ≤ β − 1. 3 4 Floating point arithmetic: limited precision Floating point arithmetic: computational errors • Example: the number system F (2 , 3 , − 2 , 1) contains 33 numbers: • standard demo system is F (10 , 4 , − 99 , 99) • 1.573 og 0.1824 are representable, but the following are not − 1 . 11 · 2 1 . . . 1 . 573 + 0 . 1824 = 1 . 7554 − 1 . 00 · 2 − 2 1 . 573 − 0 . 1824 = 1 . 3906 0 . 00 × 1 . 573 0 . 1824 = 0 . 2869152 1 . 00 · 2 − 2 1 . 573 / 0 . 1824 = 8 . 6239035 . . . . . . • Exact arithmetic on a finite subset of reals is not possible. 1 . 11 · 2 1 • strategy for rounding/truncating to a representable number is needed − 4 − 3 − 2 − 1 0 1 2 3 4 5 6

  2. Floating point arithmetic: computational errors Floating point arithmetic: computational errors • Algebraic laws invalid: • rounding/truncation defined by function fl : R �→ M , where M (1 . 418 ⊕ 2937) ⊖ 2936 = 2938 ⊖ 2936 = 2 . 000 is machine numbers 1 . 418 ⊕ (2937 ⊖ 2936) = 1 . 418 ⊕ 1 . 000 = 2 . 418 • machine arithmetic operations ⊕ , ⊖ , ⊗ , ⊘ defined by 1 . 418 ⊗ (2001 ⊖ 2000) 1 . 418 ⊗ 1 . 000 = = 1 . 418 x ⊕ y = fl ( x + y ) etc. (1 . 418 ⊗ 2001) ⊖ (1 . 418 ⊗ 2000) = 2837 ⊖ 2836 = 1 . 000 • In demo system F (10 , 4 , − 99 , 99) define fl by truncation, e.g. • The usual associative and distributive laws are not valid for machine arithmetic. Sometimes: 1 . 573 0 . 1824 = fl (1 . 573 + 0 . 1824) = fl (1 . 7554) = 1 . 755 ⊕ 1 . 573 0 . 1824 = fl (1 . 573 0 . 1824) = fl (1 . 3906) = 1 . 390 ⊖ − 1 . 573 0 . 1824 = fl (1 . 573 0 . 1824) = fl (0 . 2869152) = . 2869 ⊗ × ( a ⊕ b ) ⊖ c a ⊕ ( b ⊖ c ) � = 1 . 573 0 . 1824 = fl (1 . 573 / 0 . 1824) = fl (8 . 6239035 . . . ) = 8 . 623 ⊘ a ⊗ ( b ⊖ c ) � = ( a ⊗ b ) ⊖ ( a ⊗ c ) 7 8 Floating point arithmetic in Java IEEE standard for binary floating point arithmetic • Java implementations use IEEE standard for the primitive types • IEEE standard describes two number systems, approximately float and double on most computers Single precision F (2 , 24 , − 126 , 127) • Use modifier strictfp to enforce IEEE standard independent of Double precision F (2 , 53 , − 1022 , 1023) the underlying hardware. • The IEEE systems has more numbers: • Representable values in type double are ± b 1 .b 2 . . . b 53 · 2 e , – closes the representational gap around zero with xtra numbers where b i ∈ { 0 , 1 } and − 1022 ≤ e ≤ 1023 of the form 0 .b 1 b 2 . . . b 52 2 − 1022 ±∞ and NaN – has representations for ±∞ and NaN (= Not a Number) • Java syntax for ±∞ and NaN is: Double.NEGATIVE INFINITY , • The IEEE system has detailed rules for rounding to representable Double.POSITIVE INFINITY and Double.NaN numbers, ex. • Warning: legal java double literals may not be directly – overflow: 1 / 0 or 2 1023 · 2 has result ∞ representable: literal 0.2 is represented as – underflow: 1 / ∞ or 2 − 1022 − 52 / 2 has result 0 fl (0 . 2 10 ) = fl (0 . 00110011 . . . 2 ) = – 0 / 0 or ∞ − ∞ has result NaN 1 . 1001100110011001100110011001100110011001100110011010 · 2 − 3 9 10 Sources of errors in numerical computations Limited precision arithmetic • The limited precision of arithmetic in R (10 , 4 , − 99 , 99) is illustrated by computing • Hardware errors n →∞ (1 + 1 – Pentium bug n ) n e = lim • Limited precision arithmetic – summation: adding numbers of unequal magnitude (1 ⊕ 1 ⊘ 10 k ) 10 k 1 ⊘ 10 k 1 ⊕ (1 ⊘ 10 k ) k – subtraction: catastrophic cancellation 1 1 . 000 E-1 1 . 100 2 . 591 – convergence: may cycle or diverge 2 1 . 000 E-2 1 . 010 2 . 591 – formula: mathematically equivalent formulae have different 3 1 . 000 E-3 1 . 001 2 . 591 computational properties 4 1 . 000 E-4 1 . 000 1 . 000 • Logical errors 5 1 . 000 E-5 1 . 000 1 . 000 11 12

  3. Limited precision arithmetic Limited precision arithmetic • The e -computation in R (2 , 24 , − 126 , 127): (1 ⊕ 1 ⊘ 2 k ) 2 k • The e -computation seems to give a reasonable result, if one stops 1 ⊘ 2 k 1 ⊕ (1 ⊘ 2 k ) k just before the result degenerates to 1? 1 0 . 5 1 . 5 2 . 25 • WRONG: 2 0 . 25 1 . 25 2 . 4414062 3 0 . 125 1 . 125 2 . 5657845 e 4 0 . 0625 1 . 0625 2 . 6379285 R (10 , 4 , − 99 , 99) 2 . 591 . . . . . . . . . . . . R (2 , 24 , − 126 , 127) 2.71 75977 22 2 . 3841858 E-7 1 . 0000002 2 . 7175977 R (2 , 53 , − 1022 , 1023) 2.7182818 08182473 23 1 . 1920929 E-7 1 . 0000001 2 . 7175977 exact 2.71828182845904523536 ... 24 5 . 9604645 E-8 1 . 0 1 . 0 • less than half the significant digits are correct! 25 2 . 9802322 E-8 1 . 0 1 . 0 13 14 Summation • try forward summation: Summation H n = ( · · · ((1 ⊕ 1 2 ) ⊕ 1 3 ) ⊕ · · · ⊕ 1 n ) • The n th harmonic number is defined by ⊕ , ⊖ , ⊗ , ⊘ forward n H n exact n 10 2 . 928 2 . 927 1 i = 1 + 1 2 + 1 3 + · · · + 1 � H n = n 100 5 . 187 5 . 142 i =1 1000 7 . 485 7 . 069 • Associative law is not valid. 10000 9 . 787 7 . 069 • Which order of summation is best? 100000 12 . 09 7 . 069 – from the left? 1000000 14 . 39 7 . 069 – from the right? • The computed sum is constant for n ≥ 1000 since – pairwise in binary tree? 1 • experiment in toy system R (10 , 4 , − 99 , 99) with truncation 1001 = 7 . 069 ⊕ 9 . 990 · 10 − 4 = 7 . 069 7 . 069 ⊕ 15 16 Summation Summation • try backward summation: H n = (1 ⊕ ( 1 2 ⊕ ( 1 3 ⊕ · · · ⊕ 1 • add terms pairwise in a tree structure n ) · · · )) H n = ( · · · ((1 ⊕ 1 2 ) ⊕ ( 1 3 ⊕ 1 4 ⊕ )) · · · ⊕ 1 n ) · · · )) ⊕ n H n exact ⊕ , ⊖ , ⊗ , ⊘ backward forward 10 2 . 928 2 . 928 2 . 927 ⊕ ⊕ 100 5 . 187 5 . 170 5 . 142 1000 7 . 485 7 . 284 7 . 069 ⊕ ⊕ ⊕ ⊕ 10000 9 . 787 8 . 069 7 . 069 100000 12 . 09 8 . 069 7 . 069 1 1 1 1 1 1 1 1 1000000 14 . 39 8 . 069 7 . 069 2 3 4 5 6 7 8 • By adding terms in increasing order, the error is reduced! 17 18

  4. Summation Summation • similar experiment for java double, R (2 , 53 , − 1022 , 1023) n H n exact ⊕ , ⊖ , ⊗ , ⊘ tree-wise backward forward H 100000 error + , − , · , / exact 12 . 090146129863428 10 2 . 928 2 . 928 2 . 928 2 . 927 − 3 · 10 − 15 100 5 . 187 5 . 183 5 . 170 5 . 142 ⊕ , ⊖ , ⊗ , ⊘ “tree-wise” 12 . 0901461298634 31 20 · 10 − 15 ⊕ , ⊖ , ⊗ , ⊘ backward 12 . 0901461298634 08 1000 7 . 485 7 . 477 7 . 284 7 . 069 93 · 10 − 15 10000 9 . 787 9 . 778 8 . 069 7 . 069 ⊕ , ⊖ , ⊗ , ⊘ forward 12 . 090146129863 335 100000 12 . 09 12 . 07 8 . 069 7 . 069 • seems better than R (10 , 4 , − 99 , 99) ? 1000000 14 . 39 14 . 36 8 . 069 7 . 069 • truncation errors add up, but rounding errors may cancel out • By adding equal sized terms the error is diminished further! (though there is no guarantee!) 19 20 Convergence: monotone • Example: find root of polynomial f ( x ) using Newton iteration Summation • compute z = lim k →∞ x k for x k +1 = x k − f ( x k ) f ′ ( x k ) • Problem – compute x 1 + x 2 + · · · + x n • Warning: f ( x k ) y = f ( x ) – ⊕ not associative! • Advice: – Add terms in increasing order α z – Even better: add terms of equal size only x k +1 x k 21 22 Convergence: monotone √ Convergence: monotone • Computing 2 by Newton iteration • Example: squareroot computation, √ a R (10 , 4 , − 99 , 99) R (2 , 53 , − 1022 , 1023) • Find root of f ( x ) = x 2 − a . k x k x k 0 1 . 500 1 . 5000000000000000 • Newton iteration: compute 1 1 . 416 1 . 41 66666666666667 x k +1 = x k − x 2 k − a 2 1 . 414 1 . 41421 56862745099 2 x k 3 1 . 414 1 . 41421356237 46899 using x 0 = 1 4 1 . 414 1 . 414213562373095 1 2( a + 1) 5 1 . 414 1 . 4142135623730950 • Does it hold that 6 1 . 414 1 . 414213562373095 1 k →∞ x k = √ a lim √ exact 2 1 . 414 1 . 4142135623730950 for limited precision arithmetic? • the computation in R (2 , 53 , − 1022 , 1023) seems to be cycling? 23 24

Recommend


More recommend