cs422 computer architecture
play

CS422 Computer Architecture Spring 2004 Lecture 05, 06 Jan 2004 - PowerPoint PPT Presentation

CS422 Computer Architecture Spring 2004 Lecture 05, 06 Jan 2004 Bhaskaran Raman Department of CSE IIT Kanpur http://web.cse.iitk.ac.in/~cs422/index.html DLX DLX pronounced Deluxe Has the features of many recent experimental


  1. CS422 Computer Architecture Spring 2004 Lecture 05, 06 Jan 2004 Bhaskaran Raman Department of CSE IIT Kanpur http://web.cse.iitk.ac.in/~cs422/index.html

  2. DLX ● DLX pronounced “Deluxe” ● Has the features of many recent experimental and commercial machines ● [ AMD 29K, DECstation 3100, HP 850, IBM 801, Intel i860, MIPS M/120A, MIPS M/1000, Motorola 88K, RISC I, SGI 4D/60, SPARCstation-1, Sun-4/110, Sun-4/260 ] /13 = 560 = DLX (Roman) ● Good architectural features (e.g. simplicity), easy to understand

  3. DLX Architecture: Registers and Data Types ● Has 32 32-bit GPRs: R0...R31 ● Also, FP registers – 32 single precision: F0...F31 – Or, 16 double precision: F0, F2, ... F30 ● Value of R0 is always ZERO! ● Data types: – Integer: bytes, half-words, words – FP: single/double precision

  4. DLX Memory Addressing ● Uses 32-bit, big-endian mode ● Addressing modes: – Only immediate and displacement, with 16-bit fields ● Register deferred? ● Place zero in displacement field ● Absolute ● Use R0 for the register

  5. DLX Instruction Format Opcode RS1 RD Immediate (6) (5) (5) (16) I-type instruction: loads, stores, all immediates, conditional branch, jump register, jump and link register Opcode RS1 RS2 Func RD (6) (5) (5) (11) (5) R-type instruction: register-register ALU operations Opcode Offset relative to PC (6) (26) J-type instruction: jump, jump and link, trap and return

  6. DLX Operations ● Four classes: Load/store, ALU, branch, FP ● ALU instructions are register-register ● R0 used to synthesize some operations: – Examples: loading a constant, reg-reg move ● Compares “set” a register ● Jump and link pushes next PC onto R31 ● FP operations in single/double precision ● FP compares set a bit in a special status reg ● FP unit also used for integer multiply/divide!

  7. DLX Performance: MIPS vs VAX Perf. ratio IC ratio CPI ratio 4 3.75 3.5 3.25 3 MIPS/VAX ratio 2.75 2.5 2.25 2 1.75 1.5 1.25 1 0.75 0.5 0.25 0 Spice Matrix Nasa7 Fpppp T Doduc Espres Eqntott Li om- so SPEC89 benchmarks

  8. Pipelining ● Its natural! ● Laundry example... (Randy Katz's slides) ● DLX has a simple architecture – Easy to pipeline ● Pipelining speedup: – Can be viewed as reduction in CPI – Or, reduction in clock cycle ● Defining clock cycle as the amount of time between two successive instruction completions

  9. A Simple DLX Implementation ● Instruction Fetch (IF) cycle: – IR <-- M[PC] – NPC <-- PC + 4 ● Instruction Decode (ID) cycle: – Done in parallel with register read (fixed field decode) – Register/Immediate read: ● A <-- R[IR6..10] ● B <-- R[IR11..15] ● Imm <-- sign-extend(IR16..31)

  10. A Simple DLX Implementation (continued) ● Execution/effective address (EX) cycle: – Memory reference: ● ALUOutput <-- A + Imm – Register-register ALU instruction: ● ALUOutput <-- A func B – Register-immediate ALU instruction: ● ALUOutput <-- A op Imm – Branch: ● ALUOutput <-- NPC + Imm ● Cond <-- A op 0 [op is one of == or !=]

  11. A Simple DLX Implementation (continued) ● Memory access/branch completion (MEM) cycle: – Memory access: ● LMD <-- M[ALUOutput] ● Or, M[ALUOutput] <-- B – Branch: PC = (cond) ? ALUOutput : NPC ● Write-back (WB) cycle: – Reg-reg ALU opn: R[IR16..20] <-- ALUOutput – Reg-imm ALU opn: R[IR11..15] <-- ALUOutput – Load instruction: R[IR11..15] <-- LMD

  12. The DLX Data-path IF ID EX MEM WB m u 4 x Zero? Cond NPC Add m u A x PC ALU ALU o/p Reg. m B m File u Instrn. Data LMD IR u x mem mem x Imm Sign ext.

  13. Further lectures... ● Pipelining this data-path ● Pipelining issues

Recommend


More recommend