implementing a mips processor using sme
play

Implementing a MIPS processor using SME Carl-Johannes Johnsen - PowerPoint PPT Presentation

Implementing a MIPS processor using SME Carl-Johannes Johnsen Department of Computer Science University of Copenhagen August 22, 2017 Carl-Johannes Johnsen Implementing a MIPS processor using SME Background The Machine Architecture class at


  1. Implementing a MIPS processor using SME Carl-Johannes Johnsen Department of Computer Science University of Copenhagen August 22, 2017 Carl-Johannes Johnsen Implementing a MIPS processor using SME

  2. Background The Machine Architecture class at DIKU teaches the theory of computer organization and design. However, it does not teach how to construct specialized hardware, as one might implement on an FPGA. FPGAs are more attractive than general purpose CPUs in some applications, as they do not necessarily have the same overhead in both performance and power usage, as they are not as complex. However, FPGAs are programmed using Hardware Description Languages, which are very tedious to program. This has changed with SME. Carl-Johannes Johnsen Implementing a MIPS processor using SME

  3. Introduction The SME programming model is similar to the CSP model, except it is globally synchronous, has broadcasting channels and a hidden clock. This makes it more suitable for generating hardware models than CSP. Additionally, SME can be transpiled into VHDL, which can be written onto an FPGA. I am implementing a MIPS processor as taught in Machine Architecture by using SME, and documenting the process, so it could be used as teaching material for a course on hardware development. Carl-Johannes Johnsen Implementing a MIPS processor using SME

  4. Basic combinatorial I started by implementing some basic combinatorial circuits, as this was an simple approach to SME. The first one I made consisted of four processes, which simulate four basic gates: AND, NOT, OR and XOR. Tester input output AND OR NOT XOR Carl-Johannes Johnsen Implementing a MIPS processor using SME

  5. Basic combinatorial - Full adder example 1 public interface InputA : IBus { 2 bool bit { get; set; } 3 } 4 ... 5 public class AndGate : SimpleProcess { 6 [InputBus] InputB input1; 7 [InputBus] InputC input2; 8 [OutputBus] Internal3 output; 9 10 protected override void OnTick() { InputA Sum XOR XOR 11 output.bit = input1.bit && input2.bit; 12 } 13 } InputB AND 14 15 public class OrGate : SimpleProcess { 16 ... InputC Carry AND OR 17 protected override void OnTick() { 18 output.bit = input1.bit || input2.bit; 19 } 20 } 21 22 public class XorGate : SimpleProcess { 23 ... 24 protected override void OnTick() { 25 output.bit = input1.bit ^ input2.bit; 26 } 27 } Carl-Johannes Johnsen Implementing a MIPS processor using SME

  6. Components After getting some experience with SME, i started working on the MIPS processor. I started by implementing each of the components as SME processes, as I can then verify them individually. 1 public class Register : SimpleProcess { 1 public interface ReadA : IBus { 2 [InputBus] ReadA readA; 2 short addr { get; set; } 3 [InputBus] ReadB readB; 3 } 4 [InputBus] WriteBus write; 4 ... 5 5 6 [OutputBus] OutputA outputA; 6 public interface WriteBus : IBus { 7 [OutputBus] OutputB outputB; 7 bool enabled { get; set; } 8 8 short addr { get; set; } 9 uint[] data = new uint[32]; 9 uint data { get; set; } 10 10 } 11 protected override void OnTick() { 11 12 if (write.enabled && write.addr > 0) 12 public interface OutputA : IBus { 13 data[write.addr] = write.data; 13 uint data { get; set; } 14 outputA.data = data[readA.addr]; 14 } 15 outputB.data = data[readB.addr]; 15 ... 16 } 17 } Carl-Johannes Johnsen Implementing a MIPS processor using SME

  7. Single Cycle With all of the components implemented and verified, ’wiring’ up the processes is straightforward, as the busses should just be named accordingly. Jump unit Control unit PC Instruction Memory Splitter Register file ALU Memory | | | Write buffer Sign extend ALU control Clock Carl-Johannes Johnsen Implementing a MIPS processor using SME

  8. Extending the processor Then I wanted to extend the processor to handle additional instructions. Jump unit Control unit PC | Instruction Memory Memory | Splitter Register file ALU | JAL | Write buffer Sign extend ALU control Clock Carl-Johannes Johnsen Implementing a MIPS processor using SME

  9. Pipelining Following the procedure from the Machine Architecture class, I have pipelined the processor, and handled the hazards introduced by pipelining with an hazard detection unit, and a forwarding unit. IF ID EX MEM WB Hazard Detectection Jump Control PC | | | Instruction Memory Register File ALU Memory | Forwarding Unit Carl-Johannes Johnsen Implementing a MIPS processor using SME

  10. Performance To test both the single cycle processor and the pipelined processor, I have made some programs in MIPS assembler. To verify both the number of executed clock ticks and the results of the program, I have run them in the MIPS simulation program MARS. MARS SME # CT time (ms) CR (hz) # CT time (ms) CR (hz) Towers of Hanoi n = 5 719 585 ∼ 1229 720 - 1058 516 - 1190 ∼ 1395 - ∼ 889 Quicksort n = 8 483 582 ∼ 829 484 - 763 375 - 895 ∼ 1290 - ∼ 852 Fib no optimization n = 10 220 584 ∼ 376 221 - 251 191 - 356 ∼ 1157 - ∼ 753 Fib forward n = 10 98 586 ∼ 167 100 - 130 119 - 209 ∼ 840 - ∼ 622 Fib hazard n = 10 84 588 ∼ 142 86 - 126 113 - 212 ∼ 761 - ∼ 594 Carl-Johannes Johnsen Implementing a MIPS processor using SME

  11. Synthesizing As mentioned SME can be transpiled into VHDL, which can be further synthesized, placed and routed onto an FPGA. Vivado Vivado SME ghdl behavioral post-impl simulation simulation simulation simulation Export AXI Generate SDK Hardware interface bitstream Carl-Johannes Johnsen Implementing a MIPS processor using SME

  12. Synthesizing - Logic gates As when I started working with SME, I wanted to get some experience with VHDL and Vivado, by using a simple network. I.e. I started by implementing the Logic Gates, by mapping the top-level input and output wires to hardware switches and LEDs, followed by generating the bitstream. Carl-Johannes Johnsen Implementing a MIPS processor using SME

  13. Synthesizing - AXI interface Then I wanted to be able to communicate with the generated hardware. This is possible on the Zynq chip, by using an AXI interface. The Zynq chip on the ZedBoard consists of a dual core ARM processor and an FPGA. The AXI interface allows the ARM processor to communicate with the FPGA. Vivado has an AXI interface template, which consists of a set of registers, which are exposed to the ARM processor through peripheral memory. Carl-Johannes Johnsen Implementing a MIPS processor using SME

  14. Synthesizing - AXI interface 1 int main() { 1 #include "xparameters.h" 2 init_platform(); 2 #include "LogicGates_AXI.h" 3 3 #include "xil_io.h" 4 write_regs(0,0); 4 5 print_regs(); 5 int base = XPAR_LOGICGATES_AXI_0_S00_AXI_BASEADDR; 6 6 int bit1 = LOGICGATES_AXI_S00_AXI_SLV_REG0_OFFSET; 7 write_regs(0,1); 7 int bit2 = LOGICGATES_AXI_S00_AXI_SLV_REG1_OFFSET; 8 print_regs(); 8 int and = LOGICGATES_AXI_S00_AXI_SLV_REG2_OFFSET; 9 9 int or = LOGICGATES_AXI_S00_AXI_SLV_REG3_OFFSET; 10 write_regs(1,0); 10 int not = LOGICGATES_AXI_S00_AXI_SLV_REG4_OFFSET; 11 print_regs(); 11 int xor = LOGICGATES_AXI_S00_AXI_SLV_REG5_OFFSET; 12 12 13 write_regs(1,1); 13 void print_regs() { 14 print_regs(); 14 xil_printf("%d %d | %d %d %d %d\n", 15 15 LOGICGATES_AXI_mReadReg(base, bit1), 16 cleanup_platform(); 16 LOGICGATES_AXI_mReadReg(base, bit2), 17 return 0; 17 LOGICGATES_AXI_mReadReg(base, and), 18 } 18 LOGICGATES_AXI_mReadReg(base, or), 19 LOGICGATES_AXI_mReadReg(base, not), 20 LOGICGATES_AXI_mReadReg(base, xor)); 21 } 22 23 void write_regs(int bit1_data, int bit2_data) { 1 0 0 | 0 0 1 0 24 LOGICGATES_AXI_mWriteReg(base, bit1, bit1_data); 2 0 1 | 0 1 1 1 25 LOGICGATES_AXI_mWriteReg(base, bit2, bit2_data); 3 1 0 | 0 1 0 1 26 } 4 1 1 | 1 1 0 0 Carl-Johannes Johnsen Implementing a MIPS processor using SME

  15. Synthesis - Single Cycle Then I compared the Towers of Hanoi program on the hardware implementation of the Single Cycle processor and the SME simulation. FPGA SME #CT time (ms) #CT time(ms) ∼ Speedup n = 5 718 ∼ 0.1436 719 516 × 3593 n = 10 22572 ∼ 4.5144 22574 13012 × 2882 n = 20 23068776 ∼ 4613.7552 N/A N/A N/A Carl-Johannes Johnsen Implementing a MIPS processor using SME

  16. Synthesis - Pipelined I have not succesfully implemented the Pipelined processor with the AXI interface. I do however have additional metrics on the different hardware implementations. Design Clockrate Memory (kb) Utilization Power (W) Logic Gates N/A N/A 4 % 0.001 Logic Gates AXI 100 0.02 1 % 0.006 Single Cycle 5 0.19 14 % 0.001 Single Cycle AXI 5 1 22 % 0.147 Pipelined 68.98 0.19 24 % 0.019 Pipelined BRAM 71.43 64 7 % 0.041 Carl-Johannes Johnsen Implementing a MIPS processor using SME

  17. Conclusion Implemented a MIPS processor in SME Extended the accepted instruction set Pipelined the processor Synthesized, placed and routed both processors Carl-Johannes Johnsen Implementing a MIPS processor using SME

  18. Future Work Add an AXI interface to the Pipelined processor See how many cores can be fitted onto one FPGA Making a superscalar processor Running a minimal operating system Carl-Johannes Johnsen Implementing a MIPS processor using SME

Recommend


More recommend