Real Number Representation 1
Topics • Terminology • IEEE standard for floating-point representation • Floating point arithmetic • Limitations 2
Terminology • All digits in a number following any leading zeros are significant digits : 12.345 -0.12345 0.00012345 3
Terminology (cont) • The scientific notation for real numbers is: exponent mantissa × base In C, the expression: 12.456e-2 means: 12.456 × 10 -2 4
Terminology (cont) • The mantissa is always normalized between 1 and the base (i.e., exactly one significant digit before the point) Unnormalized Normalized 2997.9 × 10 5 2.9979 × 10 8 B1.39FC × 16 11 B.139FC × 16 12 0.010110110101 × 2 -1 1.0110110101 × 2 -3 5
Terminology (cont) • The precision of a number is how many digits (or bits) we use to represent it • For example: 3 3.14 3.1415926 3.1415926535897932384626433832795028 6
Representing Numbers • A real number n is represented by a floating-point approximation n* • The computer uses 32 bits (or more) to store each approximation • It needs to store – the mantissa – the sign of the mantissa – the exponent (with its sign) 7
Representing Numbers (cont) • The standard way to allocate 32 bits (specified by IEEE Standard 754) is: – 23 bits for the mantissa – 1 bit for the mantissa's sign – 8 bits for the exponent 0 31 30 23 22 8
Representing Numbers (cont) – 23 bits for the mantissa – 1 bit for the mantissa's sign – 8 bits for the exponent 0 31 30 23 22 9
Representing Numbers (cont) – 23 bits for the mantissa – 1 bit for the mantissa's sign – 8 bits for the exponent 0 31 30 23 22 10
Representing Numbers (cont) – 23 bits for the mantissa – 1 bit for the mantissa's sign – 8 bits for the exponent 0 31 30 23 22 11
Representing the Mantissa • The mantissa has to be in the range 1 ≤ mantissa < base • Therefore – If we use base 2, the digit before the point must be a 1 – So we don't have to worry about storing it We get 24 bits of precision using 23 bits 12
Representing the Mantissa (cont) • 24 bits of precision are equivalent to a little over 7 decimal digits: 24 log 2 10 ≈ 7.2 13
Representing the Mantissa (cont) • Suppose we want to represent π : 3.1415926535897932384626433832795..... • That means that we can only represent it as: 3.141592 (if we truncate) 3.141593 (if we round) 14
Representing the Exponent • The exponent is represented as excess-127. E.g., Actual Exponent Stored Value ↔ -127 00000000 ↔ -126 00000001 . . . ↔ 0 01111111 ↔ +1 10000000 . . . ↔ i ( i +127) 2 . . . ↔ +128 11111111 15
Representing the Exponent (cont) • The IEEE standard restricts exponents to the range: –126 ≤ exponent ≤ +127 • The exponents –127 and +128 have special meanings: – If exponent = – 127, the stored value is 0 – If exponent = 128, the stored value is ∞ 16
Representing Numbers -- Example 1 What is 01011011 (8-bit machine) ? 0 101 1011 sign exp mantissa • Mantissa: 1.1011 • Exponent (excess-3 format): 5-3=2 1.1011 × 2 2 ⇒ 110.11 110.11 2 = 2 2 + 2 1 + 2 -1 + 2 -2 = 4 + 2 + 0.5 + 0.25 = 6.75 17
Representing Numbers -- Example 2 Represent -10.375 (32-bit machine) 10.375 10 = 10 + 0.25 + 0.125 = 2 3 + 2 1 + 2 - 2 + 2 - 3 = 1010.011 2 ⇒ 1.010011 2 × 2 3 • Sign: 1 • Mantissa: 010011 • Exponent (excess-127 format): 3+127 = 130 10 = 10000010 2 1 10000010 01001100000000000000000 18
Floating Point Overflow • Floating point representations can overflow, e.g., 1.111111 × 2 127 + 1.111111 × 2 127 11.111110 × 2 127 = ∞ 1.1111110 × 2 128 19
Floating Point Underflow • Floating point numbers can also get too small , e.g., 10.010000 × 2 -126 ÷ 11.000000 × 2 0 0.110000 × 2 -126 = 0 1.100000 × 2 -127 20
“Normalized” “Normalized” • Condition – exp ≠ 000 … 0 and exp ≠ 111 … 1 • Exponent coded as biased value E = Exp – Bias • Exp : unsigned value denoted by exp • Bias : Bias value – Single precision: 127 ( Exp : 1…254, E : -126…127) – Double precision: 1023 ( Exp : 1…2046, E : -1022…1023) – in general: Bias = 2 e-1 - 1, where e is number of exponent bits • Significand coded with implied leading 1 M = 1.xxx … x 2 • xxx … x : bits of frac • Minimum when 000 … 0 ( M = 1.0) • Maximum when 111 … 1 ( M = 2.0 – ε ) • Get extra leading bit for “free” 21
Denormalized Values Denormalized Values • Condition – exp = 000 … 0 • Value – Exponent value E = – Bias + 1 – Significand value M = 0.xxx … x 2 • xxx … x : bits of frac • Cases – exp = 000 … 0 , frac = 000 … 0 • Represents value 0 • Note that have distinct values +0 and –0 – exp = 000 … 0 , frac ≠ 000 … 0 • Numbers very close to 0.0 • Lose precision as get smaller • “Gradual underflow” 22
Special Values Special Values • Condition – exp = 111 … 1 • Cases – exp = 111 … 1 , frac = 000 … 0 • Represents value ∞ (infinity) • Operation that overflows • Both positive and negative • E.g., 1.0/0.0 = − 1.0/ − 0.0 = + ∞ , 1.0/ − 0.0 = −∞ – exp = 111 … 1 , frac ≠ 000 … 0 • Not-a-Number (NaN) • Represents case when no numeric value can be determined • E.g., sqrt(–1), ∞ − ∞ 23
Floating Point Representation Most standard floating point representation use: 1 bit for the sign (positive or negative) 8 bits for the range (exponent field) 23 bits for the precision (fraction field) 1 8 23 S exponent fraction ( ) − = − × × ≤ ≤ S 127 exponent 1 1 . 2 , 1 254 N fraction exponent ( ) − = − × × = S 126 exponent 1 0 . 2 , 0 N fraction exponent 24
Floating Point Representation 1 8 23 S exponent fraction ( ) = − × × − ≤ ≤ S 127 exponent 1 1 . 2 , 1 254 N fraction exponent ( ) = − × × − = S 126 exponent 1 0 . 2 , 0 N fraction exponent 5 − Example : How is the number 6 represente d in floating point? 8 5 4 1 1 1 − = − + + + = − + + + 6 4 2 4 2 8 8 8 2 8 ( ) − − − = − × + × + × + × + × + × 2 1 0 1 2 3 1 2 1 2 0 2 1 2 0 2 1 2 ( ) ( ) = − = − × 2 110 . 101 1 . 10101 2 2 2 Thus the exponent is given by: − = = 127 2 129 exponent exponent ⇒ 1 10000001 10101000000000000000000 25
Floating Point Representation (example) 1 8 23 S exponent fraction ( ) − = − × × ≤ ≤ S 127 exponent 1 1 . 2 , 1 254 N fraction exponent ( ) − = − × × = S 126 exponent 1 0 . 2 , 0 N fraction exponent What is the decimal value of the following floating point number? 00111101100000000000000000000000 exponent exponent = 64+32+16+8+2+1=(128-8)+3=120+3=123 1 ( ) − − = − × × = × = 0 123 127 4 1 1 . 0 2 1 . 0 2 N 16 26
Floating Point Representation (example) 1 8 23 S exponent fraction ( ) − = − × × ≤ ≤ S 127 exponent 1 1 . 2 , 1 254 N fraction exponent ( ) − = − × × = S 126 exponent 1 0 . 2 , 0 N fraction exponent What is the decimal value of the following floating point number? 01000001100101000000000000000000 exponent exponent =128+2+1=131 ( ) − = − × × = × = 0 131 127 4 1 1 . 00101 2 1 . 00101 2 10010 . 1 N 2 2 2 1 = + + − = + + = 4 1 1 2 2 2 16 2 18 . 5 N 2 27
Floating Point Representation (example) 1 8 23 S exponent fraction ( ) − = − × × ≤ ≤ S 127 exponent 1 1 . 2 , 1 254 N fraction exponent ( ) − = − × × = S 126 exponent 1 0 . 2 , 0 N fraction exponent What is the decimal value of the following floating point number? 11000001000101000000000000000000 exponent exponent =128+2=130 ( ) − = − × × = − × = − 1 130 127 3 1 1 . 00101 2 1 . 00101 2 1001 . 01 N 2 2 2 ( ) 1 = − + + − = − + + = − 3 0 2 2 2 2 8 1 9 . 25 N 4 28
Floating Point 1 8 23 S exponent fraction ( ) = − × × − ≤ ≤ S 127 exponent 1 1 . 2 , 1 254 N fraction exponent ( ) = − × × − = S 126 exponent 1 0 . 2 , 0 N fraction exponent What is the largest number that can be represented in 32 bits floating point using the IEEE 754 format above? 01111111011111111111111111111111 exponent exponent =254 − − − − = × + × + + × + × 1 2 22 23 1 2 1 2 .... 1 2 1 2 fraction 1 1 − = × − × = − = − = 0 23 1 2 1 2 1 1 0 . 9999998807 9 fraction × × 23 2 1024 1024 8 29
Recommend
More recommend