3-3.1 Spiral 3-3 Single Cycle CPU
3-3.2 Learning Outcomes • I understand how the single-cycle CPU datapath supports each type of instruction • I understand why each mux is needed to select appropriate inputs to the datapath components • I know how to design the control signals as a function of the type of instruction
3-3.3 Hardware vs. Software REVIEW
3-3.4 Sorting: Software Implementation • To perform the algorithm in software means the processor fetches instructions, executes them, which causes the processor to then read and write the data in memory into it's sorted positions • Sorting 64 element on a 2.8 GHz Xeon processor Memory – 16 microseconds 0 78 Processor • Can we do better w/ more HW? 51 106 35 fffff A D C Custom (Sort) HW
3-3.5 Sorting: Hardware Implementation • Sorting 64 element on a 2.8 GHz Xeon processor [SW only] – 16 microseconds What did we do to reduce • Sorting 64 numbers in [old] custom HW CLK period in this design? – CLK period = 30 ns => 6 microseconds total – 30 ns is due to the 8 number HW sorter – Merging (Select-Val) stages are < 10 ns – Can we improve? 30 ns 10 ns 10 ns 10 ns X0 Y0 FIFO/Queue 1a/b FIFO/Queue 1a/b FIFO/Queue 1a/b X1 Y1 8 16 32 X2 Y2 ... X3 64 Y3 ...from HW Sorting memory Network SelectVal X4 SelectVal Y4 (1 per clock) SelectVal FIFO/Queue 2a/b FIFO/Queue 2a/b FIFO/Queue 2a/b ...to memory X5 Y5 8 ... 16 32 X6 Y6 X7 Y7
3-3.6 Sorting: Final Comparison • Sorting 64 element on a 2.8 GHz Xeon processor [SW only] – 16 microseconds total time • Sorting 64 numbers in [old] custom HW – CLK period = 30 ns => 6 microseconds total = ~2.5x speedup • Sorting 64 numbers in [old] pipelined HW Memory – CLK period = 10 ns => 0 78 Processor 2 microseconds total = ~8x speedup 51 106 – Processor is freed 35 fffff to do other work A D C Custom (Sort) HW
3-3.7 Building hardware to execute software GENERAL PURPOSE HARDWARE
3-3.8 CPU Organization Scope • We will build a CPU to implement our subset of the MIPS ISA – Memory Reference Instructions: • Load Word (LW) • Store Word (SW) – Arithmetic and Logic Instructions: • ADD, SUB, AND, OR, SLT – Branch and Jump Instructions: • Branch if equal (BEQ) • Jump unconditional (J) • These basic instructions exercise a majority of the necessary datapath and control logic for a more complete implementation
3-3.9 Single-Cycle CPU Datapath 0 1 + MemRead & MemWrite Sh. 4 ALUOp[1:0] Left A MemtoReg + 2 PCSrc RegDst B Control ALUSrc Branch [31:26] RegWrite [25:21] Read Reg. 1 # 5 MemRead [20:16] Read Reg. 2 # 5 Read 0 Addr. 0 Write Zero data 1 [15:11] PC Reg. # ALU Instruc. 1 Res. Addr. Read 5 Write 0 data 2 I-Cache Read Data 1 1 RegDst Data Register File [15:0] Write Data 16 32 Sign ALUSrc MemtoReg Extend D-Cache INST[5:0] ALU control ALUOp[1:0] MemWrite 9
3-3.10 Fetch • Address in PC is used to fetch instruction while it is also incremented by 4 to point to the next instruction • Remember, the PC doesn’t update until the end of the clock cycle / beginning of next cycle • Mux provides a path for branch target addresses time Fetch branch target clk PC 4 0x0040001c 400014 0x400018 0x40001c A PC+4 + Adder B 400018 0x40001c 0x400020 0x00400018 opcode rs rt rd shamt func 0x00400018 Addr. 000000 01001 01010 10000 00000 100000 0 Instruc. 1 0x012a8020 I-Cache ADD $16,$9,$10 PC
3-3.11 Decode • Opcode and func. field are decoded to produce other control signals • Execution of an ALU instruction (ADD $3,$1,$2) requires reading 2 register values and writing the result to a third • REGWrite is an enable signal indicating the write data should be written to the specified register Control Control Signals Logic opcode 000000 00001 00010 Read Read Value of $1 Reg. 1 # data 1 1 5 rs Read Read Value of $2 Reg. 2 # 2 data 2 5 Instruction Word rt Write Reg. # Register File is the collection of GPR’s. Our register 3 5 ADD $3,$1,$2 00011 file has 3 “ports” (port = ability to concurrently Write rd Data read or write a register). To see why we need 3, Register File consider an “ADD $3,$1,$2”. We need 2 read ports shamt 00000 to read two operands (i.e. $1 + $2) and 1 write port for the result ($3) 100000 func CLK REGWrite Result from add
3-3.12 Register File • 32 registers each storing 32-bits • Read registers => Muxes to choose desired value • Write register => Decoder and registers w/ enable Read Reg #1 Each Mux chooses which Register File register value to output based D 0 $0 on the 5-bit reg. # provided by EN the instruction 1 D $1 Read data 1 EN Write data D $31 31 EN 0 0 1 1 Write Read data 2 A[4:0] Reg. # RegWrite EN 31 31 5-to-32 decoder converts 5-bit write reg. # to 1-of-32 output signals to enable that register to capture the write data on the next edge. If Read Reg #2 RegWrite is 0 the decoder is disabled making CLK all outputs go to 0 and thus no register updates.
3-3.13 Datapath for ALU instruction • ALU takes inputs from register file and performs the add, sub, and, or, slt, operations • Result is written back to dest. register 1 Read Reg. 1 # ALUop 2 Read Instruc. word Reg. 2 # Read $1 value 3 Write data 1 ADD $3,$1,$2 Zero Reg. # ALU $2 value Res. Read Sum Write data 2 Data Register File
3-3.14 Memory Access Datapath • Operands are read from register file while offset is sign extended • ALU calculates effective address • Memory access is performed • If LW, read data is written back to register LW $4,0xfff8($1) SW $3,0x1a($1) 1 Read 1 Read Reg. 1 # Reg. 1 # ADD Read 3 Read Reg. 2 # $1 value Reg. 2 # Read $1 value 4 Read Write Zero data 1 Write Zero data 1 Reg. # ALU Reg. # ALU Res. Addr. Read Read Res. Addr. Write Sum Read data 2 Data Write Sum Read Data data 2 Read Data Data Register File Data Write Register File $3 value Write Data Data Sign Write Data D-Cache Sign Extend 32 0xffff fff8 D-Cache Extend 32 0x0000001a
3-3.15 Branch Datapath • BEQ requires… – ALU for comparison (examine ‘zero’ output) – Sign extension unit for branch offset – Adder to add PC and offset • Need a separate adder since ALU is used to perform comparison PC+4 (incremented PC) Adder Sum Branch Target Shift Address to PC Left 2 1 Read Reg. 1 # byte offset ALUop 2 Read Instruc. word Reg. 2 # Read $1 value Write data 1 BEQ $1,$2,offset ZERO Zero Reg. # ALU $2 value Res. Read Sum Write data 2 Data Register File word offset extended word offset Sign Extend
3-3.16 Branch Datapath Question • Is it okay to start adding branch offset even before determining whether the branch is taken or not? – Yes, it does not hurt because the ZERO signal will control whether that Branch Target is used to update the PC or not PC+4 (incremented PC) Adder Sum Branch Target Shift Address to PC Left 2 1 Read Reg. 1 # ALUop 2 Read Instruc. word Reg. 2 # Read $1 value Write data 1 BEQ $1,$2,offset ZERO (To control logic) Zero Reg. # ALU $2 value Res. Read Sum Write data 2 Data Register File word offset extended word offset Sign Extend
3-3.17 Fetch Datapath Question 1 • Can the adder used to increment the PC be an ALU and be used/shared for ALU instructions like ADD/SUB/etc. – In a single-cycle CPU, resources cannot be shared thus we need a separate adder and separate ALU “Next” PC = PC + 4 A 4 + S B CLK Addr. Data PC Instruction Word Current PC / Read Address I-Cache / I-MEM Write
3-3.18 Fetch Datapath Question 2 • Do we need the “Write” enable signal on the PC register for our single-cycle CPU? – In the single-cycle CPU, the PC is updated EVERY clock cycle (since we execute a new instruction each cycle). Thus we are writing the PC every cycle and don’t need the write signal. “Next” PC = PC + 4 A 4 + S B CLK Addr. Data PC Instruction Word Current PC / Read Address I-Cache / I-MEM Write
3-3.19 RegFile Question 1 • Why do we need the write enable signal, REGWrite? – We have certain instructions like BEQ or SW that do not cause a register to be updated. Thus we need the ability to NOT change a register. Control Control Signals Logic opcode 000000 00001 00010 Read Read Value of $1 Reg. 1 # data 1 5 rs Read Read Value of $2 Reg. 2 # data 2 5 Instruction Word rt Write Reg. # 5 ex. ALU instruc. 00011 Write rd Data Register File shamt 00000 100000 func CLK REGWrite Result from add
3-3.20 RegFile Question 2 • Can write to registers be level sensitive or does it have to be edge-sensitive? – It must be edge-sensitive since a register may be source and destination (i.e. add $1,$1,$2). If it was level sensitive we would have an uncontrolled feedback loop. Control Control Signals Logic opcode 000000 Read Read Value of $1 Reg. 1 # data 1 00001 00010 00011 5 rs Read Read Reg. 2 # Value of $2 data 2 5 Instruction Word rt Write Reg. # 5 ex. ALU instruc. Write rd Data Register File shamt 00000 100000 func CLK REGWrite Result from add
Recommend
More recommend