Floating point representation (Unsigned) Fixed-point representation - PowerPoint PPT Presentation

Floating point representation

(Unsigned) Fixed-point representation The numbers are stored with a fixed number of bits for the integer part and a fixed number of bits for the fractional part. Suppose we have 8 bits to store a real number, where 5 bits store the integer part and 3 bits store the fractional part: 1 0 1 1 1.0 1 1 ! 2 !$ 2 !# 2 !" 2 $ 2 # 2 % 2 " 2 ! Smallest number: 00000.001 # = 0.125 Largest number: 11111.111 # = 31.875

(Unsigned) Fixed-point representation Suppose we have 64 bits to store a real number, where 32 bits store the integer part and 32 bits store the fractional part: $" $# 𝑏 ) 2 ) + 4 𝑐 ) 2 ') 𝑏 $" … 𝑏 # 𝑏 " 𝑏 ! . 𝑐 " 𝑐 # 𝑐 $ … 𝑐 $# # = 4 )*! )*" = 𝑏 !" × 2 !" +𝑏 !# × 2 !# + ⋯ + 𝑏 # × 2 # +𝑐 " × 2 $" +𝑐 % × 2 % + ⋯ + 𝑐 !% × 2 $!% Smallest number: 𝑏 & = 0 ∀𝑗 and 𝑐 " , 𝑐 # , … , 𝑐 $" = 0 and 𝑐 $# = 1 → 2 '$# ≈ 10 '"! Largest number: 𝑏 & = 1 ∀𝑗 and 𝑐 & = 1 ∀𝑗 → 2 $" + ⋯ + 2 ! + 2 '" + ⋯ + 2 '$# ≈ 10 (

(Unsigned) Fixed-point representation Suppose we have 64 bits to store a real number, where 32 bits store the integer part and 32 bits store the fractional part: $" $# 𝑏 ) 2 ) + 4 𝑐 ) 2 ') 𝑏 $" … 𝑏 # 𝑏 " 𝑏 ! . 𝑐 " 𝑐 # 𝑐 $ … 𝑐 $# # = 4 )*! )*" Smallest number →≈ 10 '"! Largest number → ≈ 10 ( 0 ∞

(Unsigned) Fixed-point representation Range : difference between the largest and smallest numbers possible. More bits for the integer part ⟶ increase range Precision : smallest possible difference between any two numbers More bits for the fractional part ⟶ increase precision 𝑏 ! 𝑏 " 𝑏 # . 𝑐 " 𝑐 ! 𝑐 $ ! 𝑏 " 𝑏 # . 𝑐 " 𝑐 ! 𝑐 $ 𝑐 % ! OR Wherever we put the binary point, there is a trade-off between the amount of range and precision. It can be hard to decide how much you need of each! Fix: Let the binary point “float”

Floating-point numbers A floating-point number can represent numbers of different order of magnitude (very large and very small) with the same number of fixed digits. In general, in the binary system, a floating number can be expressed as 𝑦 = ± 𝑟 × 2 & 𝑟 is the significand, normally a fractional value in the range [1.0,2.0) 𝑛 is the exponent

Floating-point numbers Numerical Form: 𝑦 = ±𝑟 × 2 " = ±𝑐 # . 𝑐 $ 𝑐 % 𝑐 & … 𝑐 ' × 2 " Fractional part of significand ( 𝑜 digits) 𝑐 ! ∈ 0,1 Exponent range : 𝑛 ∈ 𝑀, 𝑉 Precision : p = 𝑜 + 1

“Floating” the binary point 1011.1 ! = 1×8 + 0×4 + 1×2 + 1×1 + 1× 1 2 = 11.5 "# 10111 ! = 1×16 + 0×8 + 1×4 + 1×2 + 1×1 = 23 "# = 1011.1 ! × 2 " = 23 "# 101.11 ! = 1×4 + 0×2 + 1×1 + 1× 1 2 + 1× 1 4 = 5.75 "# = 1011.1 ! × 2 &" = 5.75 "# Move “binary point” to the left by one bit position: Divide the decimal number by 2 Move “binary point” to the right by one bit position: Multiply the decimal number by 2

Converting floating points Convert (39.6875) "! = 100111.1011 # into floating point representation 1.001111011 # × 2 + (39.6875) "! = 100111.1011 # =

alized floating-point numbers No Normal Normalized floating point numbers are expressed as 𝑦 = ± 1. 𝑐 $ 𝑐 % 𝑐 & … 𝑐 ' × 2 " = ± 1. 𝑔 × 2 " where 𝑔 is the fractional part of the significand, 𝑛 is the exponent and 𝑐 ! ∈ 0,1 . Hidden bit representation: The first bit to the left of the binary point 𝑐 " = 1 does not need to be stored, since its value is fixed. This representation ”adds” 1-bit of precision (we will show some exceptions later, including the representation of number zero).

Iclicker question Determine the normalized floating point representation 1. 𝒈 × 2 𝒏 of the decimal number 𝑦 = 47.125 ( 𝒈 in binary representation and 𝒏 in decimal) 1.01110001 * × 2 𝟔 A) 1.01110001 * × 2 𝟓 B) 1.01111001 * × 2 𝟔 C) D) 1.01111001 * × 2 𝟓

Normalized floating-point numbers 𝑦 = ± 𝑟 × 2 ' = ± 1. 𝑐 " 𝑐 ! 𝑐 $ … 𝑐 ( × 2 ' = ± 1. 𝑔 × 2 ' • Exponent range : 𝑀, 𝑉 • Precision : p = 𝑜 + 1 • Smallest positive normalized FP number: UFL = 2 , • Largest positive normalized FP number: OFL = 2 &'" (1 − 2 $( )

Normalized floating point number scale −∞ +∞ 0

Floating-point numbers: Simple example A ”toy” number system can be represented as 𝑦 = ±1. 𝑐 " 𝑐 # ×2 - for 𝑛 ∈ [−4,4] and 𝑐 ) ∈ {0,1} . 1.00 ! ×2 " = 1 1.00 ! ×2 ! = 4.0 1.00 ! ×2 $ = 2 1.01 ! ×2 " = 1.25 1.01 ! ×2 $ = 2.5 1.01 ! ×2 ! = 5.0 1.10 ! ×2 " = 1.5 1.10 ! ×2 $ = 3.0 1.10 ! ×2 ! = 6.0 1.11 ! ×2 " = 1.75 1.11 ! ×2 $ = 3.5 1.11 ! ×2 ! = 7.0 1.00 ! ×2 % = 8.0 1.00 ! ×2 #$ = 0.5 1.00 ! ×2 & = 16.0 1.01 ! ×2 % = 10.0 1.01 ! ×2 #$ = 0.625 1.01 ! ×2 & = 20.0 1.10 ! ×2 % = 12.0 1.10 ! ×2 #$ = 0.75 1.10 ! ×2 & = 24.0 1.11 ! ×2 % = 14.0 1.11 ! ×2 #$ = 0.875 1.11 ! ×2 & = 28.0 1.00 ! ×2 #! = 0.25 1.00 ! ×2 #% = 0.125 1.00 ! ×2 #& = 0.0625 1.01 ! ×2 #! = 0.3125 1.01 ! ×2 #% = 0.15625 1.01 ! ×2 #& = 0.078125 1.10 ! ×2 #! = 0.375 1.10 ! ×2 #& = 0.09375 1.10 ! ×2 #% = 0.1875 1.11 ! ×2 #! = 0.4375 1.11 ! ×2 #& = 0.109375 1.11 ! ×2 #% = 0.21875 Same steps are performed to obtain the negative numbers. For simplicity, we will show only the positive numbers in this example.

𝑦 = ±1. 𝑐 " 𝑐 # ×2 - for 𝑛 ∈ [−4,4] and 𝑐 ) ∈ {0,1} • Smallest normalized positive number: 1.00 # ×2 '% = 0.0625 • Largest normalized positive number: 1.11 # ×2 % = 28.0 • Any number 𝑦 closer to zero than 0.0625 would UNDERFLOW to zero. • Any number 𝑦 outside the range −28.0 and + 28.0 would OVERFLOW to infinity.

Machine epsilon Machine epsilon ( 𝜗 % ): is defined as the distance (gap) between 1 and the • next larger floating point number. 𝑦 = ±1. 𝑐 " 𝑐 % ×2 * for 𝑛 ∈ [−4,4] and 𝑐 ) ∈ {0,1} 1.00 % ×2 # = 1 1.01 % ×2 # = 1.25 𝝑 𝒏 = 0.01 # ×2 ' = 𝟏. 𝟑𝟔

Machine numbers: how floating point numbers are stored?

Floating-point number representation What do we need to store when representing floating point numbers in a computer? 𝑦 = ± 1. 𝒈 × 2 𝒏 𝑛 𝑔 𝑦 = ± sign exponent significand Initially, different floating-point representations were used in computers, generating inconsistent program behavior across different machines. Around 1980s, computer manufacturers started adopting a standard representation for floating-point number: IEEE (Institute of Electrical and Electronics Engineers) 754 Standard.

Floating-point number representation Numerical form: 𝑦 = ± 1. 𝒈 × 2 𝒏 Representation in memory: 𝑑 𝑔 𝑦 = 𝒕 sign exponent significand 𝑦 = (−1) 𝒕 1. 𝒈 × 2 𝒅:𝒕𝒊𝒋𝒈𝒖 𝒏 = 𝒅 − 𝒕𝒊𝒋𝒈𝒖

Finite representation: not all Precisions: numbers can be represented exactly! IEEE-754 Single precision (32 bits): 𝑔 𝑡 𝑑 = 𝑛 + 127 𝑦 = exponent significand sign (8-bit) (23-bit) (1-bit) IEEE-754 Double precision (64 bits): 𝑔 𝑡 𝑑 = 𝑛 + 1023 𝑦 = exponent significand sign (11-bit) (52-bit) (1-bit)

Special Values: 𝑦 = (−1) 𝒕 1. 𝒈 × 2 𝒏 = 𝒕 𝒅 𝒈 1) Zero : 𝑦 = 𝑡 000 … 000 0000 … … 0000 2) Infinity : +∞ ( 𝑡 = 0) and −∞ 𝑡 = 1 𝑦 = 𝑡 111 … 111 0000 … … 0000 3) NaN : (results from operations with undefined results) 𝑦 = 𝑡 𝑏𝑜𝑧𝑢ℎ𝑗𝑜𝑕 ≠ 00 … 00 111 … 111 Note that the exponent 𝑑 = 000 … 000 and 𝑑 = 111 … 111 are reserved for these special cases, which limits the exponent range for the other numbers.

IEEE-754 Single Precision (32-bit) 𝑦 = (−1) 𝒕 1. 𝒈 × 2 𝒏 𝑡 𝑑 = 𝑛 + 127 𝑔 exponent significand sign (8-bit) (23-bit) (1-bit) 𝑡 = 0: positive sign, 𝑡 = 1: negative sign Reserved exponent number for special cases: 𝑑 = 11111111 # = 255 and 𝑑 = 00000000 # = 0 Therefore 0 < c < 255 The largest exponent is U = 254 − 127 = 127 The smallest exponent is L = 1 − 127 = −126

IEEE-754 Single Precision (32-bit) 𝑦 = (−1) 𝒕 1. 𝒈 × 2 𝒏 Example: Represent the number 𝑦 = −67.125 using IEEE Single- Precision Standard 1000011.001 # = 1.000011001 # ×2 ( 67.125 = 𝑑 = 6 + 127 = 133 = 10000101 # 00001100100000 … 000 1 10000101 23-bit 8-bit 1-bit

IEEE-754 Single Precision (32-bit) 𝑦 = (−1) 𝒕 1. 𝒈 × 2 𝒏 = 𝑑 = 𝑛 + 127 𝒅 𝒈 𝒕 • Machine epsilon ( 𝜗 - ): is defined as the distance (gap) between 1 and the next larger floating point number. 1 $" = 0 01111111 00000000000000000000000 1 $" + 𝜗 ' = 0 01111111 00000000000000000000001 𝝑 𝒏 = 𝟑 !𝟑𝟒 ≈ 1.2 × 10 !+ • Smallest positive normalized FP number: UFL = 2 + = 2 $"%, ≈ 1.2 ×10 $!- • Largest positive normalized FP number: OFL = 2 &'" (1 − 2 $( ) = 2 "%- (1 − 2 $%. ) ≈ 3.4 ×10 !-

Floating point representation (Unsigned) Fixed-point representation - PowerPoint PPT Presentation

Floating point representation (Unsigned) Fixed-point representation The numbers are stored with a fixed number of bits for the integer part and a fixed number of bits for the fractional part. Suppose we have 8 bits to store a real number, where

Debugging Floating-Point Debugging Floating-Point Debugging Floating-Point Math in Racket Math

Machine numbers: how floating point numbers are stored? Floating-point number representation

Formal verification of floating-point algorithms John Harrison Intel Corporation Floating

Floating-point numbers Fractional binary numbers IEEE floating-point standard Floating-point

Lecture 3 Floating Point Representations 1 Floating-point arithmetic We often incur

Floating point Today ! IEEE Floating Point Standard ! Rounding ! Floating Point Operations !

Chapter 2 Computer representation inspired by scientific notation Floating Point Numbers

15-213 The course that gives CMU its Zip! Floating Point Sept 6, 2006 Topics Topics

ECS 231 Computer Arithmetic 1 / 27 Outline Floating-point numbers and representations 1

9/20/2018 Today: Floating Point Background: Fractional binary numbers IEEE floating point

2/10/2020 Today: Floating Point Background: Fractional binary numbers IEEE floating point

CS 356 Unit 3 IEEE 754 Floating Point Representation 3.2 Floating Point Used to represent

Unit 3 IEEE 754 Floating Point Representation 3.2 Floating Point Used to represent very

Floating Point Representation CS3220 - Summer 2008 Jonathan Kaldor Floating Point Numbers

CENG 342 Digital Systems Simplified Floating-point Adder Larry Pyeatt SDSM&T Binary

Real Number Representation 1 Topics Terminology IEEE standard for floating-point

tr t trr t s

Probability and Information Theory Lecture slides for Chapter 3 of Deep Learning

Compressing IP Forwarding Tables: Towards Entropy Bounds and Beyond Gbor Rtvri, Jnos

6/29/2017 Floating Point Integer data type 32-bit unsigned integers limited to whole numbers

sss rs r

Today Review for Final.. Raos Cheat Sheet. A million slides. Notes: Logic/Proofs.

Complexity Theory Eric Price UT Austin CS 331, Spring 2020 Coronavirus Edition Eric Price (UT

Perfect numbers Definition A natural number n is perfect if it is equal to the sum of its proper

Floating point representation (Unsigned) Fixed-point representation - PowerPoint PPT Presentation

Floating point representation (Unsigned) Fixed-point representation The numbers are stored with a fixed number of bits for the integer part and a fixed number of bits for the fractional part. Suppose we have 8 bits to store a real number, where

Debugging Floating-Point Debugging Floating-Point Debugging Floating-Point Math in Racket Math

Machine numbers: how floating point numbers are stored? Floating-point number representation

Formal verification of floating-point algorithms John Harrison Intel Corporation Floating

Floating-point numbers Fractional binary numbers IEEE floating-point standard Floating-point

Lecture 3 Floating Point Representations 1 Floating-point arithmetic We often incur

Floating point Today ! IEEE Floating Point Standard ! Rounding ! Floating Point Operations !

Chapter 2 Computer representation inspired by scientific notation Floating Point Numbers

15-213 The course that gives CMU its Zip! Floating Point Sept 6, 2006 Topics Topics

ECS 231 Computer Arithmetic 1 / 27 Outline Floating-point numbers and representations 1

9/20/2018 Today: Floating Point Background: Fractional binary numbers IEEE floating point

2/10/2020 Today: Floating Point Background: Fractional binary numbers IEEE floating point

CS 356 Unit 3 IEEE 754 Floating Point Representation 3.2 Floating Point Used to represent

Unit 3 IEEE 754 Floating Point Representation 3.2 Floating Point Used to represent very

Floating Point Representation CS3220 - Summer 2008 Jonathan Kaldor Floating Point Numbers

CENG 342 Digital Systems Simplified Floating-point Adder Larry Pyeatt SDSM&amp;T Binary

Real Number Representation 1 Topics Terminology IEEE standard for floating-point

tr t trr t s

Probability and Information Theory Lecture slides for Chapter 3 of Deep Learning

Compressing IP Forwarding Tables: Towards Entropy Bounds and Beyond Gbor Rtvri, Jnos

6/29/2017 Floating Point Integer data type 32-bit unsigned integers limited to whole numbers

sss rs r

Today Review for Final.. Raos Cheat Sheet. A million slides. Notes: Logic/Proofs.

Complexity Theory Eric Price UT Austin CS 331, Spring 2020 Coronavirus Edition Eric Price (UT

Perfect numbers Definition A natural number n is perfect if it is equal to the sum of its proper

CENG 342 Digital Systems Simplified Floating-point Adder Larry Pyeatt SDSM&T Binary