real number representation
play

Real Number Representation 1 Topics Terminology IEEE standard - PowerPoint PPT Presentation

Real Number Representation 1 Topics Terminology IEEE standard for floating-point representation Floating point arithmetic Limitations 2 Terminology All digits in a number following any leading zeros are significant


  1. Real Number Representation 1

  2. Topics • Terminology • IEEE standard for floating-point representation • Floating point arithmetic • Limitations 2

  3. Terminology • All digits in a number following any leading zeros are significant digits : 12.345 -0.12345 0.00012345 3

  4. Terminology (cont) • The scientific notation for real numbers is: exponent mantissa × base In C, the expression: 12.456e-2 means: 12.456 × 10 -2 4

  5. Terminology (cont) • The mantissa is always normalized between 1 and the base (i.e., exactly one significant digit before the point) Unnormalized Normalized 2997.9 × 10 5 2.9979 × 10 8 B1.39FC × 16 11 B.139FC × 16 12 0.010110110101 × 2 -1 1.0110110101 × 2 -3 5

  6. Terminology (cont) • The precision of a number is how many digits (or bits) we use to represent it • For example: 3 3.14 3.1415926 3.1415926535897932384626433832795028 6

  7. Representing Numbers • A real number n is represented by a floating-point approximation n* • The computer uses 32 bits (or more) to store each approximation • It needs to store – the mantissa – the sign of the mantissa – the exponent (with its sign) 7

  8. Representing Numbers (cont) • The standard way to allocate 32 bits (specified by IEEE Standard 754) is: – 23 bits for the mantissa – 1 bit for the mantissa's sign – 8 bits for the exponent 0 31 30 23 22 8

  9. Representing Numbers (cont) – 23 bits for the mantissa – 1 bit for the mantissa's sign – 8 bits for the exponent 0 31 30 23 22 9

  10. Representing Numbers (cont) – 23 bits for the mantissa – 1 bit for the mantissa's sign – 8 bits for the exponent 0 31 30 23 22 10

  11. Representing Numbers (cont) – 23 bits for the mantissa – 1 bit for the mantissa's sign – 8 bits for the exponent 0 31 30 23 22 11

  12. Representing the Mantissa • The mantissa has to be in the range 1 ≤ mantissa < base • Therefore – If we use base 2, the digit before the point must be a 1 – So we don't have to worry about storing it We get 24 bits of precision using 23 bits 12

  13. Representing the Mantissa (cont) • 24 bits of precision are equivalent to a little over 7 decimal digits: 24 log 2 10 ≈ 7.2 13

  14. Representing the Mantissa (cont) • Suppose we want to represent π : 3.1415926535897932384626433832795..... • That means that we can only represent it as: 3.141592 (if we truncate) 3.141593 (if we round) 14

  15. Representing the Exponent • The exponent is represented as excess-127. E.g., Actual Exponent Stored Value ↔ -127 00000000 ↔ -126 00000001 . . . ↔ 0 01111111 ↔ +1 10000000 . . . ↔ i ( i +127) 2 . . . ↔ +128 11111111 15

  16. Representing the Exponent (cont) • The IEEE standard restricts exponents to the range: –126 ≤ exponent ≤ +127 • The exponents –127 and +128 have special meanings: – If exponent = – 127, the stored value is 0 – If exponent = 128, the stored value is ∞ 16

  17. Representing Numbers -- Example 1 What is 01011011 (8-bit machine) ? 0 101 1011 sign exp mantissa • Mantissa: 1.1011 • Exponent (excess-3 format): 5-3=2 1.1011 × 2 2 ⇒ 110.11 110.11 2 = 2 2 + 2 1 + 2 -1 + 2 -2 = 4 + 2 + 0.5 + 0.25 = 6.75 17

  18. Representing Numbers -- Example 2 Represent -10.375 (32-bit machine) 10.375 10 = 10 + 0.25 + 0.125 = 2 3 + 2 1 + 2 - 2 + 2 - 3 = 1010.011 2 ⇒ 1.010011 2 × 2 3 • Sign: 1 • Mantissa: 010011 • Exponent (excess-127 format): 3+127 = 130 10 = 10000010 2 1 10000010 01001100000000000000000 18

  19. Floating Point Overflow • Floating point representations can overflow, e.g., 1.111111 × 2 127 + 1.111111 × 2 127 11.111110 × 2 127 = ∞ 1.1111110 × 2 128 19

  20. Floating Point Underflow • Floating point numbers can also get too small , e.g., 10.010000 × 2 -126 ÷ 11.000000 × 2 0 0.110000 × 2 -126 = 0 1.100000 × 2 -127 20

  21. “Normalized” “Normalized” • Condition – exp ≠ 000 … 0 and exp ≠ 111 … 1 • Exponent coded as biased value E = Exp – Bias • Exp : unsigned value denoted by exp • Bias : Bias value – Single precision: 127 ( Exp : 1…254, E : -126…127) – Double precision: 1023 ( Exp : 1…2046, E : -1022…1023) – in general: Bias = 2 e-1 - 1, where e is number of exponent bits • Significand coded with implied leading 1 M = 1.xxx … x 2 • xxx … x : bits of frac • Minimum when 000 … 0 ( M = 1.0) • Maximum when 111 … 1 ( M = 2.0 – ε ) • Get extra leading bit for “free” 21

  22. Denormalized Values Denormalized Values • Condition – exp = 000 … 0 • Value – Exponent value E = – Bias + 1 – Significand value M = 0.xxx … x 2 • xxx … x : bits of frac • Cases – exp = 000 … 0 , frac = 000 … 0 • Represents value 0 • Note that have distinct values +0 and –0 – exp = 000 … 0 , frac ≠ 000 … 0 • Numbers very close to 0.0 • Lose precision as get smaller • “Gradual underflow” 22

  23. Special Values Special Values • Condition – exp = 111 … 1 • Cases – exp = 111 … 1 , frac = 000 … 0 • Represents value ∞ (infinity) • Operation that overflows • Both positive and negative • E.g., 1.0/0.0 = − 1.0/ − 0.0 = + ∞ , 1.0/ − 0.0 = −∞ – exp = 111 … 1 , frac ≠ 000 … 0 • Not-a-Number (NaN) • Represents case when no numeric value can be determined • E.g., sqrt(–1), ∞ − ∞ 23

  24. Floating Point Representation Most standard floating point representation use: 1 bit for the sign (positive or negative) 8 bits for the range (exponent field) 23 bits for the precision (fraction field) 1 8 23 S exponent fraction ( ) − = − × × ≤ ≤ S 127 exponent 1 1 . 2 , 1 254 N fraction exponent   ( )  − = − × × = S 126 exponent 1 0 . 2 , 0 N fraction exponent   24

  25. Floating Point Representation 1 8 23 S exponent fraction ( ) = − × × − ≤ ≤ S 127 exponent 1 1 . 2 , 1 254 N fraction exponent   ( )  = − × × − = S 126 exponent 1 0 . 2 , 0 N fraction exponent   5 − Example : How is the number 6 represente d in floating point? 8 5 4 1 1 1 − = − + + + = − + + + 6 4 2 4 2     8 8 8 2 8     ( )     − − − = − × + × + × + × + × + × 2 1 0 1 2 3 1 2 1 2 0 2 1 2 0 2 1 2 ( ) ( ) = − = − × 2 110 . 101 1 . 10101 2 2 2 Thus the exponent is given by: − = = 127 2 129 exponent exponent ⇒ 1 10000001 10101000000000000000000 25

  26. Floating Point Representation (example) 1 8 23 S exponent fraction ( ) − = − × × ≤ ≤ S 127 exponent 1 1 . 2 , 1 254 N fraction exponent   ( )  − = − × × = S 126 exponent 1 0 . 2 , 0 N fraction exponent   What is the decimal value of the following floating point number? 00111101100000000000000000000000 exponent exponent = 64+32+16+8+2+1=(128-8)+3=120+3=123 1 ( ) − − = − × × = × = 0 123 127 4 1 1 . 0 2 1 . 0 2 N 16 26

  27. Floating Point Representation (example) 1 8 23 S exponent fraction ( ) − = − × × ≤ ≤ S 127 exponent 1 1 . 2 , 1 254 N fraction exponent   ( )  − = − × × = S 126 exponent 1 0 . 2 , 0 N fraction exponent   What is the decimal value of the following floating point number? 01000001100101000000000000000000 exponent exponent =128+2+1=131 ( ) − = − × × = × = 0 131 127 4 1 1 . 00101 2 1 . 00101 2 10010 . 1 N 2 2 2 1 = + + − = + + = 4 1 1 2 2 2 16 2 18 . 5 N 2 27

  28. Floating Point Representation (example) 1 8 23 S exponent fraction ( ) − = − × × ≤ ≤ S 127 exponent 1 1 . 2 , 1 254 N fraction exponent   ( )  − = − × × = S 126 exponent 1 0 . 2 , 0 N fraction exponent   What is the decimal value of the following floating point number? 11000001000101000000000000000000 exponent exponent =128+2=130 ( ) − = − × × = − × = − 1 130 127 3 1 1 . 00101 2 1 . 00101 2 1001 . 01 N 2 2 2 ( ) 1 = − + + − = − + + = − 3 0 2 2 2 2 8 1 9 . 25 N   4   28  

  29. Floating Point 1 8 23 S exponent fraction ( ) = − × × − ≤ ≤ S 127 exponent 1 1 . 2 , 1 254 N fraction exponent   ( )  = − × × − = S 126 exponent 1 0 . 2 , 0 N fraction exponent   What is the largest number that can be represented in 32 bits floating point using the IEEE 754 format above? 01111111011111111111111111111111 exponent exponent =254 − − − − = × + × + + × + × 1 2 22 23 1 2 1 2 .... 1 2 1 2 fraction 1 1 − = × − × = − = − = 0 23 1 2 1 2 1 1 0 . 9999998807 9 fraction × × 23 2 1024 1024 8 29

Recommend


More recommend