Fixed Point Real Numbers 16-bit Unsigned with Binary Point: 8 - PowerPoint PPT Presentation

Fixed Point Real Numbers • 16-bit Unsigned with Binary Point: 8 • XXXXXXXX.XXXXXXXX • Maximum/Minimum Values • 11111111.11111111 = 255+255/256 • 00000000.00000000 = 0 • 16-bit Signed with Binary Point: 8 • XXXXXXXX.XXXXXXXX • Maximum/Minimum Values • 01111111.11111111 = 127+255/256 • 10000000.00000000 = -128

Multiplication of Signed FP • If a has width W a and binary point bp a and b has width W b and binary point bp b . • The output of the multiplier will need width W a +W b and a bp of bp a +bp b .

Number Representation • Previous examples of FIR filters used integer representations for the filter coefficients. • What if we have coefficients with fractional components? • Two options. 1. Apply a scaling factor to all the coefficients to get the desired resolution. 2. Use a binary point numbers to represent our coefficients.

Example: • input is 8-bit signed • filter coefficients b = [1.42 2.05 -3.23 4.71 -3.11 -5.10] What values digital of b should we use? What is the required output data width?

Example: • input is 8-bit signed • filter coefficients b = [1.42 2.05 -3.23 4.71 -3.11 -5.10] What values of b should we use? What is the required output data width? • Scaling factor approach • Multiply coefficients by 100 and use a 10-bit signed format (-512 to 511). b = [142 205 -323 471 -311 -510] • Determine the maximum output. max = abs(128)*sum(abs(b)) = 251136 • ceil(log2(251136))+1 = 19-bit signed • 19-bit signed has a range(-262144 to 262143)

Example: • input is 8-bit signed • filter coefficients b = [1.42 2.05 -3.23 4.71 -3.11 -5.10] What values of b should we use? What is the required output data width? • Scaling factor approach • If the absolute scale of the output is to be retained, it will need to be divided by 100 to revert back to the original filter coefficients. • How do we divide by 100 in binary? • Maybe not a good approach in all instances.

Example: • input is 8-bit signed • filter coefficients b = [1.42 2.05 -3.23 4.71 -3.11 -5.10] What values of b should we use? What is the required output data width? • Binary Point Approach • Represent coefficients with 10-bit signed and binary point at the 6 th position • This is a design choice. • XXXX.XXXXXX • Can handle values -8+(0/64) to 7+(63/64)

Example: • input is 8-bit signed • filter coefficients b = [1.42 2.05 -3.23 4.71 -3.11 -5.10] What values of b should we use? What is the required output data width? • Binary Point Approach • Determine the digital coefficient values. • b bp = dec2bin(mod(round(64*b)+1024,1024)) • b bp = [0001011011 0010000011 1100110001 0100101101 1100111001 1010111010]

Example: • input is 8-bit signed • filter coefficients b = [1.42 2.05 -3.23 4.71 -3.11 -5.10] What values of b should we use? What is the required output data width? • Binary Point Approach • Determine the maximum output. b bp = round(64*b) max = abs(128)*sum(abs(b bp )) = 160640 • ceil(log2(160640))+1 = 19-bit signed • 19-bit signed has a range(-262144 to 262143) • Final output is a 19-bit signed with a bp of 6.

IIR Implementation: a 0 = 1 z -1 z -1 a 1 a 2 + - subtraction

IIR Implementation: Pipelining? z -1 z -1 z -1 a 1 a 2 a 1 a 2 z -1 z -1 z -1 z -1 z -1 a 1 a 2 a 1 a 2 z -1 z -1 z -1 z -1 z -1

IIR Implementation out_reg dif z -1 a 1 a 2 m1 m2 in_reg p1 p2 z -1 z -1 z -1 //functional description assign dif = in_reg - p1; assign m1 = dif*a1; assign m2 = dif*a2; always@(posedge clock) begin in_reg <= in; out_reg <= dif; p1 <= m1 + p2; p2 <= m2; end

IIR Implementation: DSP Blocks out_reg dif z -1 a 1 a 2 in_reg p1 p2 z -1 z -1 z -1 0 //dsp48 structural always@(posedge clock) begin in_reg <= in; out_reg <= dif; end macc_wrap dsp2 (.C(0),.A(dif),.B(a2),.PCOUT(p2)); macc_wrap dsp1 (.PCIN(p2),.A(dif),.B(a1),POUT(p1)); assign dif = p1 + in;

Number Representation • For the IIR filter diagram in the previous slides, there is a requirement that a 0 =1. • For cases when a 1 and a 2 are near 1 or fractional values, we cannot accurately represent these values. • Two options. 1. add a pre-multiplier to the input to incorporate an a 0 scale term. 2. If we care about the absolute scale, use a binary point numbers to represent our coefficients. 3. Remember to keep track of binary point locations especially in the feedback path.

IIR Implementation out_reg dif z -1 a 1 a 2 m1 m2 in_reg m0 p1 p2 z -1 z -1 z -1 + - subtraction a 0 //functional description assign dif = m0 - p1; assign m0 = in_reg*a0; assign m1 = dif*a1; assign m2 = dif*a2; always@(posedge clock) begin in_reg <= in; out_reg <= dif; p1 <= m1 + p2; p2 <= m2; end

IIR Implementation out_reg dif z -1 Keeping track of bp locations a 1 a 2 We will use (W:BP) notation. m1 m2 Assume all values are signed. in_reg p1 p2 Input is 8-bit signed (8:0) m0 Coefficients are (10:6) z -1 z -1 z -1 + - subtraction Assume p1 is (18:6) for subtraction a 0 m0 (18:6) diff (19:6) m1,m2 (29:12) p1 (30:12) p1 needs to have a bp of 6 so the subtraction will have equivalent input formats.

IIR Implementation out_reg dif z -1 a 1 a 2 m1 m2 in_reg m0 p1 p2 z -1 z -1 z -1 + - subtraction a 0 //functional description assign dif = m0 – (p1 >>> bp); //bp is the binary point of the coefficients assign m0 = in_reg*a0; assign m1 = dif*a1; assign m2 = dif*a2; always@(posedge clock) begin in_reg <= in; out_reg <= dif; p1 <= m1 + p2; p2 <= m2; end

IIR Implementation out_reg dif z -1 a 1 a 2 m1 m2 in_reg m0 p1 p2 z -1 z -1 z -1 + - subtraction a 0 //functional description assign dif = m0 – p1; assign m0 = in_reg*a0; assign m1 = (dif*a1) >>> bp; //or shift assign m2 = (dif*a2) >>> bp; //here always@(posedge clock) begin in_reg <= in; out_reg <= dif; p1 <= m1 + p2; p2 <= m2; end

IIR Implementation out_reg dif z -1 a 1 a 2 p1 p2 z -1 z -1 z -1 0 a 0 //dsp48 structural always@(posedge clock) begin out_reg <= dif; end macc_wrap dsp0 (.C(p1 >>> bp),.A(in),.B(a0),.POUT(dif)); macc_wrap dsp2 (.C(0),.A(sum),.B(a2),.PCOUT(p2)); macc_wrap dsp1 (.PCIN(p2),.A(sum),.B(a1),POUT(p1));

IIR Implementation out_reg sum z -1 -a 1 -a 2 p1 p2 z -1 z -1 z -1 0 a 0 //dsp48 structural always@(posedge clock) begin out_reg <= dif; end macc_wrap dsp0 (.C(p1 >>> bp),.A(in),.B(a0),.POUT(sum)); macc_wrap dsp2 (.C(0),.A(sum),.B(-a2),.PCOUT(p2)); macc_wrap dsp1 (.PCIN(p2),.A(sum),.B(-a1),POUT(p1));

IIR Implementation: Pipelining? Really only need to pipeline a 2 nd order IIR filter to realize 2 poles. Then we can cascade a number of them to realize M poles. z -1 z -1 z -1 z -1 a 1 a 2 a 3 a 4 z -4 a 1 a 2 a 3 a 4 z -1 z -1 z -1 z -1

IIR Implementation: Pipelining? z -1 z -1 z -1 z -1 a 2 a 4 The above diagram has 4 poles and has extra registers for pipelining. Idea is to start with an IIR filter with 4 poles (2 we want to keep and 2 of our choosing that will be canceled). Based on the 2 we want to keep, determine what the 2 additional poles need to be to eliminate the a 1 and a 3 terms. Pre-multiply with a cascaded FIR filter with zeros placed at the locations of the two additional poles.

IIR Implementation: Pipelining? • Turn to the math... • Z-domain 1 1 𝐼 𝑨 = 1 − 𝑏 1 𝑨 −1 − 𝑏 2 𝑨 −2 = 1 − 𝑞 1 𝑨 −1 1 − 𝑞 2 𝑨 −1 • Add some poles but compensate in the numerator with an FIR filter. 1 − 𝑞 3 𝑨 −1 1 − 𝑞 4 𝑨 −1 1 𝐼 𝑨 = 1 − 𝑞 1 𝑨 −1 1 − 𝑞 2 𝑨 −1 1 − 𝑞 3 𝑨 −1 1 − 𝑞 4 𝑨 −1 • Choose p 3 & p 4 to cancel the z -1 & z -3 coefficients in the denominator.

IIR Implementation: Pipelining • Using the original polynomial. 1 + 𝑏 1 𝑨 −1 + −𝑏 2 𝑨 −2 1 𝐼 𝑨 = 1 − 𝑏 1 𝑨 −1 − 𝑏 2 𝑨 −2 1 − −𝑏 1 𝑨 −1 − 𝑏 2 𝑨 −2 1 + 𝑏 1 𝑨 −1 + −𝑏 2 𝑨 −2 𝐼 𝑨 = 2 + 2𝑏 2 𝑨 −2 − 𝑏 2 2 𝑨 −4 1 − 𝑏 1

IIR Implementation: Pipelining z -1 z -1 z -2 z -2 2 +2a 2 2 a 1 -a 2 a 1 a 2 z -1 z -1 z -1 z -1 z -1 2 +2a 2 2 a 1 -a 2 a 1 a 2 z -1 z -1 z -1

IIR Implementation: Pipelining • Still not fully pipelined. • Add 4 zeros/poles instead of 2. z -1 z -2 z -2 z -2 z -1 z -2 z -2 b 1 b 2 b 3 b 4 a 1 ’ a 2 ’ z -1 z -1 z -1 z -1 z -1 z -1 z -1 z -1 z -1 z -1 z -1 z -1 z -1

Fixed Point Real Numbers 16-bit Unsigned with Binary Point: 8 - PowerPoint PPT Presentation

Fixed Point Real Numbers 16-bit Unsigned with Binary Point: 8 XXXXXXXX.XXXXXXXX Maximum/Minimum Values 11111111.11111111 = 255+255/256 00000000.00000000 = 0 16-bit Signed with Binary Point: 8 XXXXXXXX.XXXXXXXX

Video 1: Intro to Floating point (Unsigned) Fixed-point representation The numbers are stored

Floating point representation (Unsigned) Fixed-point representation The numbers are stored with a

Real Numbers in Real Applications John Harrison Intel Corporation Real numbers for fun and

Fixed and Floating-point Numbers Eric McCreath Fractional binary numbers Remember how the

Fixed and Floating Point Numbers Eric McCreath Fractional binary numbers Remember how the

Fixed point iteration Definition Let g : R R , then p is a fixed point of g if g ( p ) = p .

Car odometer (fixed number of digits) lecture 1 - two's complement - floating point numbers -

Fixed point lecture 2 encourage you to participate in studies such as these. Fixed point means

5/15/2019 Representing Numbers Outline How can I represent a Real number ? Digital CMOS design

The Basics 1 -1 Real Numbers

lecture 2 - fixed point - IEEE floating point standard Wed. January 13, 2016 For those

Floating Point Representation CS3220 - Summer 2008 Jonathan Kaldor Floating Point Numbers

Classes of Real Numbers All real numbers can be represented by a line: 1/2 1 0

Floating Point Real numbers 3 . 14159 ( ) 0 . 00000000001 ( 1 . 0 10 9 ) 2 . 71828 ( e )

Algorithms using real numbers Floating point arithmetic limited precision Algorithms

Welcome! Todays Agenda: Introduction Float to Fixed Point and Back Operations

Welcome! Todays Agenda: Introduction Float to Fixed Point and Back Operations

Floating-point numbers Fractional binary numbers IEEE floating-point standard Floating-point

1.3 - Binary Point Numbers and the ASCII Table! What are they? Binary point numbers are the way

Development Process Binary vs text files CS Basics Endianness Negative Numbers

Algorithms using real numbers (continued) Noter ch.4 Algorithms using real numbers

Welcome! Todays Agenda: Introduction Float to Fixed Point and Back Operations

1 can define x = x . 1 By adding an element 1 to the real numbers (and all

lecture 1 - two's complement - floating point numbers - hexadecimal Mon. January 11, 2016