Machine numbers: how floating point numbers are stored?

Floating-point number representation What do we need to store when representing floating point numbers in a computer? ! = ± 1. & × 2 ! Initially, different floating-point representations were used in computers, generating inconsistent program behavior across different machines. Around 1980s, computer manufacturers started adopting a standard representation for floating-point number: IEEE (Institute of Electrical and Electronics Engineers) 754 Standard.

Floating-point number representation Numerical form: ! = ± 1. & × 2 ! ME [ L , U ] Representation in memory: ME [ - 4 , 4 ] ! = f I m sign exponent significand C = m tsh Unsigned intl signed

Finite representation: not all Precisions: numbers can be represented exactly! ! = ± 1. & × 2 !"#$%&' C - mtshift IEEE-754 Single precision (32 bits): - I ! = f c 8 bits 23 bits lbit IEEE-754 Double precision (64 bits): 52 bits ① ! = I 11 bits C lbit

IEEE-754 Single Precision (32-bit) I I , ! = (−1) ) 1. ( × 2 * * = + − )-.&/ ⑦ ) # $ 123 ) 0fpE3 → exponent significand sign (8-bit) (23-bit) (1-bit) → ( 00000000 )z_- ( O ) go Of C f 255 1111111 1) z = ( 2552,0 ( , 25=5 → special reserve E cases I f C f 254 → If Mt shifts 254 Seat /shift=l27J → fl26fmsl27T ME f- 126,127 ] k -

IEEE-754 Single Precision (32-bit) → C- IT ⇒ Positive Sf ? ① ! = (−1) ) 1. ( × 2 * ' ⇒ Negative → f - D Example: Represent the number ! = −67.125 using IEEE Single- Precision Standard 00178127 Yo - - ( 1000011.001 ! = 1.000011001 ! ×2 " 67.125 = C -133710 5=0 - - O f- = 000011001000 - - 23 bits 01000101-0001101 C -410000101 ) , . . . obits Fits Tbi Its

- 23 P) IEEE-754 Single Precision (32-bit) → ! = (−1) ) 1. ( × 2 * = / = 0 + 127 " # ! = • Machine epsilon ( . 0 ): is defined as the distance (gap) between 1 . 04 × 20 and the next larger floating point number. ( t ) , o = × 0.000 . I . . . 23 bits " Fm'En = ' • Smallest positive normalized FP number: - 38 UFL = 2h = 2-126 g fo • Largest positive normalized FP number: ' ( I - 2- P ) = 2128 ( I - 2-24 ) ⇐ 1038 = It OF L

IEEE-754 Double Precision (64-bit) p - 53 ( htt ) ! = (−1) ) 1. ( × 2 * ! ! = # + 1023 " -0 exponent significand sign (11-bit) (52-bit) (1-bit) - - * = 0: positive sign, * = 1: negative sign Reserved exponent number for special cases: c = Mt shift { , = 00000000000 ! = 0 , = 11111111111 ! = 2047 I fmtshift 52046 Therefore 1 ≤ c ≤ 2046 → shift = 1023 - 1022 Smf 1023 → bMCFIO22,l023T#

IEEE-754 Double Precision (64-bit) ! = (−1) ) 1. ( × 2 * = / = 0 + 1023 " # ! • Machine epsilon ( . 0 ): is defined as the distance (gap) between 1 and the next larger floating point number. n Em e- 2- 1 () = 0 0111 … 111 000000000000 … 000000000 - 52 n - 1 () + 4 * = 0 000000000000 … 000000001 0111 … 111 S 0 # = 1 $%& ≈ 2.2 × 10 $'" • Smallest positive normalized FP number: UFL = 2 ! = 2 "#$%% ≈ 2.2 ×10 "&$' - 1022,1023J MEE I p=52 =U= 1023 • Largest positive normalized FP number: D D OFL = 2 ()# (1 − 2 "* ) = 2 #$%+ (1 − 2 ",& ) ≈ 1.8 ×10 &$'

Normalized floating point number scale (single precision) −∞ +∞ l l l l OFL - UFL - OFL 0 UFL 38 -38 38 10-38 10 -10 - 10 Zero ,

C. =( 0000 - - O ) . Special Values: - II ) ( =L Ill . - * ! = (−1) ) 1. ( × 2 * = ! " # 1) Zero : ! = % 000 … 000 0000 … … 0000 E- Or 23,52 2) Infinity : +∞ ( * = 0) and −∞ * = 1 ! = % 111 … 111 0000 … … 0000 HE → ¥ 0 3) NaN : (results from operations with undefined results) ! = % ()*+ℎ-). ≠ 00 … 00 111 … 111 -8,11 ( 100 - - 010 ) - 4) c -400 f=c ← ¥ > → sumbndor - O ) . . .

Normalized floating point number scale (single precision) ÷¥I¥ −∞ +∞ l l ti : : 0 :* . . . ' 1. Exam - I .fx2 f. relax - O .fx2

Subnormal (or denormalized) numbers • Noticeable gap around zero, present in any floating system, due to normalization ü The smallest possible significand is 1.00 ü The smallest possible exponent is 3 • Relax the requirement of normalization, and allow the leading digit to be zero, only when the exponent is at its minimum ( 4 = 3 ) # = (−1) ! 0. + × 2 " * - shift M = C . ) → sub nor mmal= LT C = I 0000 . . .

Subnormal (or denormalized) numbers IEEE-754 Single precision (32 bits): 126 / = 00000000 5 = 0 → O . f x 2- Exponent set to 0 = −126 Smallest positive subnormal FP number: . 01 × 2-126 = 2-23 × 2-126 , 1.4 × 10-45 O . 0000 . . IEEE-754 Double precision (64 bits): 1022 & O . f- × 2- / = 00000000000 5 = 0 vs Exponent set to 0 = −1022 Smallest positive subnormal FP number: -324 . 002 × 2-022=2-52 × 2-1022 ± to 0.000 . . - 52

Normalized floating point number scale (single precision) −∞ +∞ . . ⇒ ÷ ÷ :* . : 0 . " " " ' f . . ii. precision * gradual underflow 24 et - 126 est ÷ : in ' ' " p=4 . 001010 × 2 0.000 . - in

Subnormal (or denormalized) numbers Another special case: ! = % 0 = 000 … 000 2 - = (−1) ! 0. 0 × 2 / Note that this is a special case, and the exponent # is not evaluated as # = ) − +,-./ = −+,-./. Instead, the exponent is set to the lower bound, # = 1 • PROS: More gradual underflow to zero • CONS: - Computations with subnormal numbers are often slow; - Loss of precision

IEEE-754 Double Precision

Summary for Single Precision ! = (−1) ) 1. ( × 2 * = 0 = / − 127 " # ! Stored binary Significand value exponent (/) fraction (4) 00000000 0000…0000 zero (−1) ) 0. ( × 2 6789 567 4 ≠ 0 00000000 (−1) ) 1. ( × 2 6789 567 4 00000001 ⋮ ⋮ ⋮ (−1) ) 1. ( × 2 78: 567 4 11111110 567 4 ≠ 0 11111111 NaN 11111111 0000…0000 infinity

Machine numbers: how floating point numbers are stored? - PowerPoint PPT Presentation

Machine numbers: how floating point numbers are stored? Floating-point number representation What do we need to store when representing floating point numbers in a computer? ! = 1. & 2 ! Initially, different floating-point

Debugging Floating-Point Debugging Floating-Point Debugging Floating-Point Math in Racket Math

Floating-point numbers Fractional binary numbers IEEE floating-point standard Floating-point

Formal verification of floating-point algorithms John Harrison Intel Corporation Floating

Floating point Today ! IEEE Floating Point Standard ! Rounding ! Floating Point Operations !

Lecture 3 Floating Point Representations 1 Floating-point arithmetic We often incur

ECS 231 Computer Arithmetic 1 / 27 Outline Floating-point numbers and representations 1

9/20/2018 Today: Floating Point Background: Fractional binary numbers IEEE floating point

2/10/2020 Today: Floating Point Background: Fractional binary numbers IEEE floating point

Floating Point Representation CS3220 - Summer 2008 Jonathan Kaldor Floating Point Numbers

15-213 The course that gives CMU its Zip! Floating Point Sept 6, 2006 Topics Topics

Chapter 2 Computer representation inspired by scientific notation Floating Point Numbers

7. Floating-point Numbers II p 1 , the precision (number of places), e min , the smallest

Floating Point Real numbers 3 . 14159 ( ) 0 . 00000000001 ( 1 . 0 10 9 ) 2 . 71828 ( e )

A Machine-Checked Theory of Floating Point Arithmetic John Harrison Intel Corporation, EY2-03

Energy stored in a magnetic field Energy Stored in an Inductor Energy stored in an inductor: L 1

CS 356 Unit 3 IEEE 754 Floating Point Representation 3.2 Floating Point Used to represent

. 1 b i b i 1 b 2 b 1 b 0 b 1 b 2 b 3 b j 1/2 1/4

SIMPLE Architecture draft IETF 57 Vienna, Austria draft-houri-simple-arch-01 Informational

Scientific Presentations: Expectations M.Sc. Seminar: Discourse Coherence Theories and Modeling

Anonymity and Secure Messaging Fall 2016 Ada (Adam) Lerner lerner@cs.washington.edu Thanks to

Power and Energy Society: The History and The Future Keyi Wang 22 Apr 2020 Organised by:

Been testing software for over 10 years Started out as a Manual Tester Moved to Automation

Long-term dynamics of CA1 hippocampal place codes Suzy Xu and Emika Lisberger BioNB 4110 April

VHDL Historical view VHDL: VHISC Hardware Description Language VHISC: Very High Speed