CENG 342 – Digital Systems Simplified Floating-point Adder Larry Pyeatt SDSM&T
Binary Floating Point Representation Floating point number consists of three components: sign bit, exponent, and mantissa. Example: 12.75: sign is + , exponent is 10 2 , and mantissa is . 1275. Stored in a normalized representation, in binary: ( − 1 ) s × . m × 2 e , where s and e represent the sign and exponent, and m is the mantissa. Floating-point adder for 13 bit format: 1 bit for sign, 4 bits for exponent, and 8 bits for mantissa. Assumptions: Both exponent and fraction are unsigned. Normalized representation: the MSB of the fraction must be 1. If the magnitude is smaller than 0 . 10000000 2 × 2 0 , it needs to be converted to 0. Ignore the round-off error (lower bits will be discarded when shifted out)
Major steps Sorting: find the bigger and smaller numbers Alignment: align two numbers so that they have the same exponent, if necessary, adjust the exponent of the smaller number Add/Subtract: perform addition when both have the same sign, otherwise perform subtraction Normalization: After a subtraction, the result may have leading zeros in front. Count number of leading 0s ( n ), then shift fraction n bits, and adjust exponent by n . If after a subtraction, the result is too small to be normalized, make both exponent and fraction 0. If after an addition, the result generates a carry out bit, shift mantissa to right 1 bit and increment exponent.
Examples in Decimal
Entity Declaration Design uses a similar algorithm for decimal addition The suffixes ’b’ , ’s’ , ’a’ , ’r’ and ’n’ are used in signal names represent big number, small number, aligned number, result of addition/subtraction and normalized number, respectively. 1 library ieee; 2 use ieee.std_logic_1164.all; 3 use ieee.numeric_std.all; 4 5 entity fp_adder is port ( 6 sign1, sign2: in std_logic; 7 exp1, exp2: in std_logic_vector(3 downto 0); 8 9 frac1, frac2: in std_logic_vector(7 downto 0); sign_out: out std_logic; 10 exp_out: out std_logic_vector(3 downto 0); 11 frac_out: out std_logic_vector(7 downto 0) 12 ); 13 14 end fp_adder ;
Architecture – Part 1 16 architecture arch of fp_adder is 17 -- suffix b, s, a, n for big, small, aligned, 18 -- normalized number signal signb, signs: std_logic; 19 signal expb, exps, expn: unsigned(3 downto 0); 20 signal fracb, fracs, fraca, fracn: unsigned(7 downto 0); 21 signal sum_norm: unsigned(7 downto 0); 22 signal exp_diff: unsigned(3 downto 0); 23 signal sum: unsigned(8 downto 0); -- extra bit for carry-out 24 signal lead0: unsigned(2 downto 0); 25 26 begin -- 1st stage: sort to find the larger 27 number 28 process (sign1, sign2, exp1, exp2, frac1, 29 frac2) 30 Exponent and fraction both need 31 begin 32 comparison, combine these two together 33 if (exp1 & frac1) > (exp2 & frac2) then 34 signb <= sign1; 35 signs <= sign2; 36 expb <= unsigned(exp1); 37 38 exps <= unsigned(exp2); fracb <= unsigned(frac1); 39 fracs <= unsigned(frac2); 40
Architecture – Part 2 else 41 signb <= sign2; 42 signs <= sign1; 43 expb <= unsigned(exp2); 44 exps <= unsigned(exp1); 45 fracb <= unsigned(frac2); 46 fracs <= unsigned(frac1); 47 end if; 48 end process; 49 50 -- 2nd stage: align smaller number 51 exp_diff <= expb - exps; 52 --with exp_diff select 53 lefraca <= 54 fracs when "0000", 55 "0" & fracs(7 downto 1) when "0001", 56 "00" & fracs(7 downto 2) when "0010", 57 "000" & fracs(7 downto 3) when "0011", 58 "0000" & fracs(7 downto 4) when "0100", 59 "00000" & fracs(7 downto 5) when "0101", 60 "000000" & fracs(7 downto 6) when "0110", 61 "0000000" & fracs(7) when"0111", 62 "00000000" when others; 63 64 -- 3rd stage: add/substract 65 sum <= (’0’ & fracb) + (’0’ & fraca) when 66 signb=signs else 67 68 (’0’ & fracb) - (’0’ & fraca);
Architecture – Part 3 -- 4th stage: normalize 70 -- 4a - count leading 0s 71 lead0 <= "000" when (sum(7)=’1’) else 72 "001" when (sum(6)=’1’) else 73 "010" when (sum(5)=’1’) else 74 "011" when (sum(4)=’1’) else 75 "100" when (sum(3)=’1’) else 76 "101" when (sum(2)=’1’) else 77 "110" when (sum(1)=’1’) else 78 "111"; 79 80 -- 4b - shift significand according to leading 0 81 with lead0 select 82 sum_norm <= 83 sum(7 downto 0) 84 when "000", 85 sum(6 downto 0) & ’0’ 86 when "001", 87 sum(5 downto 0) & "00" 88 when "010", 89 sum(4 downto 0) & "000" when "011", 90 sum(3 downto 0) & "0000" when "100", 91 92 sum(2 downto 0) & "00000" when "101", sum(1 downto 0) & "000000" when "110", 93 sum(0) & "0000000" when others; 94
Architecture – Part 4 -- 4c - special conditions 96 process(sum,sum_norm,expb,lead0) 97 begin 98 if sum(8)=’1’ then -- w/ carry out; shift frac to right 99 expn <= expb + 1; 100 fracn <= sum(8 downto 1); 101 elsif (lead0 > expb) then -- too small to normalize; 102 expn <= (others=>’0’); -- set to 0 103 fracn <= (others=>’0’); 104 else 105 expn <= expb - lead0; 106 fracn <= sum_norm; 107 end if; 108 end process; 109 110 -- form output: 111 sign_out <= signb; 112 exp_out <= std_logic_vector(expn); 113 frac_out <= std_logic_vector(fracn); 114 115 end arch;
Testing Circuit – Part 1 The floating-point adder needs 13-bit operands. For two inputs, it needs 26-bit in total. The S3 board cannot provide enough physical inputs to test the circuit. We must assign constants or duplicated switch signals to the adder’s inputs. The addition result is passed to hexadecimal decoders and results are shown on the 7-segment LEDs. Exponent (4-bit) is displayed on the rightmost LED LSB of mantissa (4-bit) is displayed on the left of the exponent. MSB (4-bit) is displayed to the left of the LSB. The“sign” is displayed on the leftmost LED 1 library ieee; 2 use ieee.std_logic_1164.all; 3 use ieee.numeric_std.all; 4 5 entity fp_adder_test is port( 6 clk: in std_logic; -- will be used in 7-seg LED display time-multiplexing module 7 sw: in std_logic_vector(7 downto 0); 8 btn: in std_logic_vector(3 downto 0); 9 an: out std_logic_vector(3 downto 0); 10 sseg: out std_logic_vector(7 downto 0) 11 ); 12 13 end fp_adder_test;
Testing Circuit – Part 2 14 15 architecture arch of fp_adder_test is signal sign1, sign2: std_logic; 16 signal exp1, exp2: std_logic_vector(3 downto 0); 17 signal frac1, frac2: std_logic_vector(7 downto 0); 18 signal sign_out: std_logic; 19 signal exp_out: std_logic_vector(3 downto 0); 20 signal frac_out: std_logic_vector(7 downto 0); 21 signal led3, led2, led1, led0: std_logic_vector(7 downto 0); 22 23 24 begin -- set up the fp adder input signals 25 sign1 <= ’0’; 26 exp1 <= "1000"; 27 frac1<= ’1’ & sw(1) & sw(0) & "10101"; 28 sign2 <= sw(7); 29 exp2 <= btn; 30 frac2 <= ’1’ & sw(6 downto 0); 31 -- instantiate fp adder 32 fp_add_unit: entity work.fp_adder 33 port map( 34 sign1=>sign1, sign2=>sign2, exp1=>exp1, exp2=>exp2, 35 frac1=>frac1, frac2=>frac2, 36 sign_out=>sign_out, exp_out=>exp_out, 37 frac_out=>frac_out 38 39 );
Testing Circuit – Part 3 -- instantiate three instances of hex decoders 40 -- exponent is shown on the rightmost LED 41 sseg_unit_0: entity work.hex_to_sseg 42 port map(hex=>exp_out, dp=>’0’, sseg=>led0); 43 -- 4 LSBs of fraction 44 sseg_unit_1: entity work.hex_to_sseg 45 port map(hex=>frac_out(3 downto 0), 46 dp=>’1’, sseg=>led1); 47 -- 4 MSBs of fraction 48 sseg_unit_2: entity work.hex_to_sseg 49 port map(hex=>frac_out(7 downto 4), 50 51 dp=>’0’, sseg=>led2); -- sign 52 led3 <= "11111110" when sign_out=’1’ else -- middle bar 53 "11111111"; -- blank 54 -- instantiate 7-seg LED display time-multiplexing module 55 disp_unit: entity work.disp_mux 56 port map( 57 clk=>clk, reset=>’0’, 58 in0=>led0, in1=>led1, in2=>led2, in3=>led3, 59 an=>an, sseg=>sseg 60 ); 61 62 end arch;
Recommend
More recommend