2/10/2020 Today: Floating Point Background: Fractional binary numbers IEEE floating point standard: Definition Floating Point Example and properties Rounding, addition, multiplication CSci 2021: Machine Architecture and Organization Floating point in C February 10th, 2020 Summary Your instructor: Stephen McCamant Based on slides originally by: Randy Bryant, Dave O’Hallaron Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 2 Fractional Binary Numbers Fractional binary numbers 2 i What is 1011.101 2 ? 2 i-1 4 • • • 2 1 b i b i-1 ••• b 2 b 1 b 0 b -1 b -2 b -3 ••• b -j 1/2 1/4 • • • 1/8 Representation 2 -j Bits to right of “binary point” represent fractional powers of 2 Represents rational number: 3 4 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Fractional Binary Numbers: Examples Representable Numbers Value Representation Limitation #1 5 3/4 101.11 2 Can only exactly represent numbers of the form x/2 k 2 7/8 010.111 2 Other rational numbers have repeating bit representations 1 7/16 001.0111 2 Value Representation 0.0101010101[01]… 2 1/3 1/5 0.001100110011[0011]… 2 Observations 0.0001100110011[0011]… 2 Divide by 2 by shifting right (unsigned) 1/10 Multiply by 2 by shifting left Numbers of form 0.111111… 2 are just below 1.0 What if the number of bits is limited? 1/2 + 1/4 + 1/8 + … + 1/2 i + … ➙ 1.0 “Fixed point”: just one setting of binary point within the w bits Limited range of numbers (bad for very small or very large values) 5 6 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 1
2/10/2020 Today: Floating Point IEEE Floating Point Background: Fractional binary numbers IEEE Standard 754 Established in 1985 as uniform standard for floating point arithmetic IEEE floating point standard: Definition Before that, many idiosyncratic formats Example and properties Supported by all major CPUs Rounding, addition, multiplication Floating point in C Driven by numerical concerns Summary Nice standards for rounding, overflow, underflow A lot of work to make fast in hardware Numerical analysts predominated over hardware designers in defining standard Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 7 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 8 Floating Point Representation Precision options Numerical Form: Single precision: 32 bits ( – 1) s M 2 E s exp frac Sign bit s determines whether number is negative or positive Significand M normally a fractional value in range [1.0,2.0). 1 8-bits 23-bits Exponent E weights value by power of two Double precision: 64 bits Encoding s exp frac MSB s is sign bit s 1 11-bits 52-bits exp field encodes E (but is not equal to E) Extended precision: 80 bits (older Intel only) frac field encodes M (but is not equal to M) s exp frac s exp frac 1 15-bits 63 or 64-bits 9 10 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition v = ( – 1) s M 2 E v = ( – 1) s M 2 E “Normalized” (Normal) Values Normalized Encoding Example E = Exp – Bias Value: float F = 15213.0; When: exp ≠ 000…0 and exp ≠ 111…1 15213 10 = 11101101101101 2 = 1.1101101101101 2 x 2 13 Exponent coded as a biased value: E = Exp – Bias Significand Exp : unsigned value of exp field M = 1.1101101101101 2 Bias = 2 k-1 - 1, where k is number of exponent bits frac = 11011011011010000000000 2 Single precision: 127 (Exp: 1…254, E: - 126…127) Double precision: 1023 (Exp: 1…2046, E: - 1022…1023) Exponent E = 13 Bias = 127 Significand coded with implied leading 1: M = 1.xxx…x 2 10001100 2 Exp = 140 = xxx…x : bits of frac field Result: Minimum when frac=000 …0 (M = 1.0) Maximum when frac=111 …1 (M = 2.0 – ε) 0 10001100 11011011011010000000000 Get extra leading bit for “free” s exp frac 11 12 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 2
2/10/2020 v = ( – 1) s M 2 E Denormalized Values Special Values E = 1 – Bias Condition: exp = 111…1 Condition: exp = 000…0 Exponent value: E = 1 – Bias (instead of E = 0 – Bias ) Case: exp = 111…1 , frac = 000…0 Represents value (infinity) Significand coded with implied leading 0: M = 0.xxx…x 2 xxx… x : bits of frac Operation that overflows Both positive and negative Cases E.g., 1.0/0.0 = −1.0/−0.0 = + , 1.0/−0.0 = − exp = 000…0 , frac = 000…0 Represents zero value Case: exp = 111…1 , frac ≠ 000…0 Note distinct values: +0 and – 0 (why?) exp = 000…0 , frac ≠ 000…0 Not-a-Number (NaN) Numbers closest to 0.0 Represents case when no numeric value can be determined Equispaced E.g., sqrt( – 1), − , 0 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 13 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 14 Today: Floating Point Visualization: Floating Point Encodings Background: Fractional binary numbers IEEE floating point standard: Definition Example and properties − + − Normalized − Denorm +Denorm +Normalized Rounding, addition, multiplication NaN Floating point in C NaN 0 +0 Summary 15 16 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Dynamic Range (Positive Only) v = ( – 1) s M 2 E Tiny Floating Point Example n: E = Exp – Bias s exp frac E Value d: E = 1 – Bias 0 0000 000 -6 0 s exp frac 0 0000 001 -6 1/8*1/64 = 1/512 closest to zero 0 0000 010 -6 2/8*1/64 = 2/512 Denormalized 1 4-bits 3-bits … numbers 0 0000 110 -6 6/8*1/64 = 6/512 0 0000 111 -6 7/8*1/64 = 7/512 8-bit Floating Point Representation largest denorm 0 0001 000 -6 8/8*1/64 = 8/512 the sign bit is in the most significant bit smallest norm 0 0001 001 -6 9/8*1/64 = 9/512 the next four bits are the exponent, with a bias of 7 … 0 0110 110 -1 14/8*1/2 = 14/16 the last three bits are the frac 0 0110 111 -1 15/8*1/2 = 15/16 closest to 1 below Normalized 0 0111 000 0 8/8*1 = 1 numbers 0 0111 001 0 9/8*1 = 9/8 closest to 1 above Same general form as IEEE Format 0 0111 010 0 10/8*1 = 10/8 normalized, denormalized … 0 1110 110 7 14/8*128 = 224 representation of 0, NaN, infinity 0 1110 111 7 15/8*128 = 240 largest norm 0 1111 000 n/a inf 17 18 Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 3
Recommend
More recommend