cda 4253 cis 6930 fpga system design rtl design
play

CDA 4253/CIS 6930 FPGA System Design RTL Design Methodology Hao - PowerPoint PPT Presentation

CDA 4253/CIS 6930 FPGA System Design RTL Design Methodology Hao Zheng Comp S ci & Eng Univ of South Florida 1 Structure of a Typical Digital Design Data Inputs Control Inputs Control Signals Datapath Controller (Execution (Control


  1. CDA 4253/CIS 6930 FPGA System Design RTL Design Methodology Hao Zheng Comp S ci & Eng Univ of South Florida 1

  2. Structure of a Typical Digital Design Data Inputs Control Inputs Control Signals Datapath Controller (Execution (Control Unit) Unit) Status Signals Data Outputs Control Outputs 2

  3. Hardware Design with RTL VHDL Control inputs ... RF/Scratch pad Control signals Bus 1 Bus 2 State register (SR) Next- state Output Memory ALU logic MUL ... logic Status signals Bus 3 Data path Controller Control outputs 3

  4. Steps of the Design Process 1. Text description 2. Define interface 3. Describe the functionality using pseudo-code 4. Convert pseudo-code to FSM in state diagram 1. Define states and state transitions 2. Define datapath operations in each state. 5. Develop VHDL code to implement FSM 6. Develop testbench for simulation and debugging 7. Implementation and timing simulation • Timing simulation can reveal more bugs than pre- synthesis simulation 8. Test the implementation on FPGA boards 4

  5. Min_Max_Average 5

  6. Pseudocode Input : M[i] Outputs : max, min, average max = 0 min = MAX // the maximal constant Data M[i] are stored in sum = 0 memory. for i=0 to 31 do d = M[i]; Results are stored in the sum = sum + d internal registers. if (d < min) then min = d endif if (d > max) then max = d endif endfor average = sum/32 6

  7. Circuit Interface clk done reset n n in_data out_data 5 MIN_MAX_AVR 2 in_addr out_addr write start 7

  8. Interface Table Width Meaning Port clk 1 System clock reset 1 System reset – clears internal registers in_data n Input data bus 5 Address of the internal memory where input data is stored in_addr 1 Synchronous write control signal – validity of in_data write 1 Starts the computations start 1 Asserted when all results are ready done n Output data bus used to read results out_data 2 01 – reading minimum 10 – reading maximum out_addr 11 – reading average 8

  9. Datapath Input : M[i] Output : max, min, average max = 0 min = max sum = 0 for i=0 to 31 do d = M[i]; sum = sum + d if (d < min) then min = d endif if (d > max) then max = d endif endfor average = sum/32 9

  10. Datapath Input : M[i] Output : max, min, average d min max max = 0 min = max sum = 0 + > < for i=0 to 31 do d = M[i]; sum = sum + d d min d max if (d < min) then sum min = d mux mux endif if (d > max) then /32 max = d endif endfor max min average average = sum/32 10

  11. State Diagram for Input: M[i] Outputs: max, min, average Controller max = 0 min = MAX sum = 0 for i=0 to 31 do d = M[i]; sum = sum + d if (d < min) then min = d endif if (d > max) then max = d endif endfor average = sum/32 11

  12. State Diagram for Input: M[i] Controller Outputs: max, min, average max = 0 start=0/ init min = MAX sum = 0 start=1 / rst<=1 for i=0 to 31 do done<=0 d = M[i]; sum = sum + d run i < 32 / i++ if (d < min) then min = d i==32 / done<=1 endif if (d > max) then end max = d endif endfor Output logic: in_addr <= i; average = sum/32 out_data <= ... 12

  13. Sorting 13

  14. Sorting - Example During Sorting After Before sorting sorting i=0 i=0 i=0 i=1 i=1 i=2 j=1 j=2 j=3 j=2 j=3 j=3 Data Addr 0 3 3 2 2 1 1 1 1 1 2 2 3 3 3 3 2 2 2 4 4 4 4 4 4 4 3 3 1 1 1 1 2 2 3 4 Legend: position of memory position of memory M j M i indexed by i indexed by j 14

  15. Pseudocode for i=0 to k-2 do K is a constant, the number of A = M[i] integers to be for j=i+1 to k-1 do sorted in memory B = M[j] if A > B then M denotes memory. M[i] = B M[j] = A Memory address is A = B either i or j. end if end for end for 15

  16. Sorting – Interface clock din N reset dout N Memory Sort k addr start we done 16

  17. Sorting – Datapath for i=0 to k-2 do • Registers to hold A, B, A = M[i] • Memory addresses i and j for j=i+1 to k-1 do • Incrementor B = M[j] • Comparator if A > B then M[i] = B M[j] = A A = B end if end for end for 17

  18. Sorting – Datapath for i=0 to k-2 do enable A = M[i] for j=i+1 to k-1 do Ri B = M[j] +1 if A > B then i M[i] = B mux sel1 M[j] = A A = B j +1 end if end for Rj end for 18

  19. Sorting – Datapath for i=0 to k-2 do j i A = M[i] B din for j=i+1 to k-1 do sel3 mux sel2 B = M[j] mux if A > B then addr M[i] = B R A R B A B M[j] = A A = B B A mux end if end for dout end for 19

  20. Sorting – Datapath for i=0 to k-2 do A = M[i] for j=i+1 to k-1 do j k-1 i k-2 B A B = M[j] if A > B then > > > M[i] = B M[j] = A end_j AgtB end_i A = B end if end for status signals end for 20

  21. Sorting – Controller for i=0 to k-2 do • Nested loops by two FSMs: A = M[i] one for the outer loop for j=i+1 to k-1 do controls the one for the inner B = M[j] loop. if A > B then • Reuse the FSM for the single M[i] = B for loop in the previous M[j] = A example. A = B end if end for end for 21

  22. Sorting – Controller for i=0 to k-2 do start=0/ init A = M[i] for j=i+1 to k-1 do done<=0 start=1 / rst<=1, i<=0 B = M[j] end_i=1 / outer end if A > B then done<=1 M[i] = B end_i=0 / end_j=1 / we <= 0 M[j] = A sel2 <= 0 i++; A = B sel3 <= 0 ... end if inner end for j++; end_j=0 / end for … 22

  23. Behavioral Level Design reg_next Combinational inputs register reg Logic clk output 23

  24. FSMD for i=0 to k-2 do A = M[i] for j=i+1 to k-1 do B = M[j] if A > B then M[i] = B M[j] = A A = B end if end for end for 24

  25. FSMD i = 0; 1 while i < k-1 do 2 addr = i 3 for i=0 to k-2 do A = M[addr] 4 A = M[i] j=i+1 5 for j=i+1 to k-1 do while j < k do 6 addr = j 7 B = M[j] B = M[addr] 8 if A > B then if A > B then 9 addr = i 10 M[i] = B M[addr] = B 11 M[j] = A addr = j 12 A = B M[addr] = A 13 A = B 14 end if end if 15 end for j=j+1 16 end while 17 end for i = i+1; 18 end while 19 25

  26. FSMD i = 0; 1 while i < k-1 do 2 addr = i 3 A = M[addr] 4 j=i+1 5 while j < k do 6 addr = j 7 B = M[addr] 8 if A > B then 9 addr = i 10 M[addr] = B 11 addr = j 12 M[addr] = A 13 A = B 14 end if 15 j=j+1 16 end while 17 i = i+1; 18 end while 19 26

  27. FSMD 1 i = 0; 2 while i < k-1 do Current Next Cond Operations State State 3 addr = i 1 2 start=‘1’ 4 A = M[addr] i <= 0 5 j = i+1; 2 3 i < k-1 null 6 while j < k do 2 18 !(i<k-1) done <= ‘1’ 7 j = j+1 addr <= i, 8 addr = j 3 6 true A <= M[addr]; j <= j+1; 9 B = M[addr] 10 if A > B then 6 7 j < k null 11 addr = i 6 17 !(j<k) null 12 M[addr] = B j++; addr <= j; B <= 7 10 true 13 addr = j M[addr]; 14 M[addr] = A 10 16 A > B addr <= i; M[addr] <= B; 15 A = B 10 16 !(A > B) null 16 end if 16 6 true null 17 end while 17 2 true null 18 end while ... ... ... ... 27

  28. FSMD 1 i = 0; 2 while i < k-1 do 3 addr = i 4 A = M[addr] Curren Next Cond Operations 5 j = i+1 t State State 6 while j < k do s0 s1 start=‘1’ i <= 0 7 addr = j addr <= i, 8 B = M[addr] s1 s2 i < k-1 A <= M[addr]; j <= i+1; 9 if A > B then s1 s0 !(i<k-1) 10 addr = i done <= ‘1’ 11 M[addr] = B s2 s3 j < k addr <= j; B <= M[addr]; 12 addr = j s2 s1 !(j<k) i <= i+1 13 M[addr] = A addr <= i; M[addr] <= B; 14 A = B s3 s2 A > B addr <= j; M[addr] <= A; A <= B; j <= j+1; 15 end if s3 s2 !(A > B) 16 j = j+1 j <= j+1; 17 end while 18 i = i + 1 19 end while 28

  29. Optimization for Performance 29

  30. Performance Definitions • Throughput : the number of inputs processed per unit time. • Latency : the amount of time for an input to be processed. • Maximizing throughput and minimizing latency in conflict. • Both require timing optimization: - Reduce delay of the critical path 30

  31. Achieving High Throughput: Pipelining • Divide data processing into stages • Process different data inputs in different stages simultaneously. process (clk) begin process begin if rising_edge(clk) then if then xpower = 1; if start=‘1’ then if then for for (i = 0; i < 3; i++) cnt <= 3; done <= ‘0’; xpower = x * xpower; elsif cnt > 0 then elsif then cnt <= cnt – 1; xpower <= xpower * x; elsif cnt = 0 then elsif then Throughput : 1 data / 3 cycles = done <= ‘1’; 0.33 data / cycle . end if ; end if Latency : 3 cycles. end process end process ; Critical path delay: 1 multiplier delay 31

  32. Achieving High Throughput: Pipelining process (clk, rst) begin if rising_edge(clk) then if start=‘1’ then -- stage 1 x1 <= x; xpower1 <= x; done1 <= start; xpower = 1; end if ; for (i = 0; i < 3; i++) -- stage 2 xpower = x * xpower; x2 <= x1; xpower2 <= xpower1 * x1; done2 <= done1; Throughput: 1 data / cycle -- stage 3 Latency: 3 cycles + register delays. xpower <= xpower2 * x2; Critical path delay: 1 multiplier delay done <= done2; end if ; end process ; 32

  33. Achieving High Throughput: Pipelining • Divide data processing into stages • Process different data inputs in different stages simultaneously. dout din 33

Recommend


More recommend