p age 1
play

P age 1 A take on Moores Law Technology Trends Bit-level - PDF document

Outline Why Take CS252? CS252 Fundament al Abst ract ions & Concept s Graduate Computer Architecture Lecture 1 I nst ruct ion Set Archit ect ure & Organizat ion Administrivia I ntroduction Pipelined I nst ruct ion


  1. Outline • Why Take CS252? CS252 • Fundament al Abst ract ions & Concept s Graduate Computer Architecture Lecture 1 • I nst ruct ion Set Archit ect ure & Organizat ion • Administrivia I ntroduction • Pipelined I nst ruct ion Processing • Perf ormance • The Memory Abst ract ion January 22, 2002 • Summary Prof . David E Culler Comput er Science 252 Spring 2002 CS252/ Culler CS252/ Culler 1/ 22/ 02 1/ 22/ 02 Lec 1. 1 Lec 1. 2 Why take CS252? Example Hot Developments ca. 2002 • Manipulating the instruction set abstraction • To design the next great instruction set?...well... – it anium: t ranslat e I SA64 - > micro- op sequences – instruction set architecture has largely converged – t ransmet a : cont inuous dynamic t ranslat ion of I A32 – especially in the desktop / server / laptop space – t insilica: synthesize the I SA f rom the application – dictated by powerf ul market f orces – reconf igurable HW • Tremendous organizational innovation relative to • Virtualization established I SA abstractions – vmware: emulate f ull virtual machine – JI T: compile to abstract virtual machine, dynamically compile • Many New instruction sets or equivalent to host – embedded space, cont rollers, specialized devices, . . . • P arallelism • Design, analysis, implementation concepts vital to all – wide issue, dynamic instruction scheduling, EPI C aspects of EE & CS – multithreading (SMT) – syst ems, PL, t heory, circuit design, VLSI , comm. – chip multiprocessors • Communication • Equip you with an intellectual toolbox f or dealing with a host of systems design challenges – network processors, network interf aces • Exotic explorations – nanot echnology, quantum computing CS252/ Culler CS252/ Culler 1/ 22/ 02 1/ 22/ 02 Lec 1. 3 Lec 1. 4 Forces on Computer Architecture Amazing Underlying Technology Change Technology Programming Languages Applications Computer Architecture Operating Systems History (A = F / M) CS252/ Culler CS252/ Culler 1/ 22/ 02 1/ 22/ 02 Lec 1. 5 Lec 1. 6 P age 1

  2. A take on Moore’s Law Technology Trends Bit-level parallelism Instruction-level Thread-level (?) 100,000,000 • Clock Rate: ~30% per year • Transist or Densit y: ~35% N 10,000,000 N N N N • Chip Area: ~15% N N R10000 N N N N N N N N N N • Transist ors per chip: ~55% N N NN N N N N N N NN N N N N N 1,000,000 N N N • Tot al Perf ormance Capabilit y: ~100% Pentium N Transistors N N N N i80386 • by t he t ime you graduat e. . . N i80286 N N N R3000 100,000 – 3x clock rat e (3- 4 GHz) N R2000 N N – 10x transistor count (1 Billion transistors) N i8086 – 30x raw capability 10,000 i8080 N i8008 N N N • plus 16x dram densit y, 32x disk densit y N i4004 1,000 1970 1975 1980 1985 1990 1995 2000 2005 CS252/ Culler CS252/ Culler 1/ 22/ 02 1/ 22/ 02 Lec 1. 7 Lec 1. 8 Measurement and Evaluation Perf ormance Trends Architecture is an iterative process -- searching the space of possible designs -- at all levels of computer systems Design 100 Supercomputers Analysis 10 Performance Mainframes Creativity Microprocessors Minicomputers 1 Cost / Performance Analysis 0.1 1965 1970 1975 1980 1985 1990 1995 Good Ideas Good Ideas Mediocre Ideas Bad Ideas CS252/ Culler CS252/ Culler 1/ 22/ 02 1/ 22/ 02 Lec 1. 9 Lec 1. 10 What is “Computer Architecture”? Coping with CS 252 • Students with too varied background? Application – I n past, CS grad students took written prelim exams on Operating undergraduate material in hardware, sof tware, and theory System – 1st 5 weeks reviewed background, helped 252, 262, 270 Compiler Firmware – Prelims were dropped => some unprepared f or CS 252? Instruction Set Architecture • I n class exam on Tues Jan. 29 (30 mins) Instr. Set Proc. I/O system – Doesn’t af f ect grade, only admission into class Datapath & Control – 2 grades: Admitted or audit/ take CS 152 1st Digital Design – I mprove your experience if recapture common background Circuit Design • Review: Chapt ers 1, CS 152 home page, maybe Layout “Comput er Organizat ion and Design (COD)2/ e” • Coordinat ion of many levels of abst ract ion – Chapters 1 to 8 of COD if never took prerequisite – I f took a class, be sure COD Chapters 2, 6, 7 are f amiliar • Under a rapidly changing set of f orces – Copies in Bechtel Library on 2- hour reserve • Design, Measurement , and Evaluat ion • FAST review t his week of basic concept s CS252/ Culler CS252/ Culler 1/ 22/ 02 1/ 22/ 02 Lec 1. 11 Lec 1. 12 P age 2

  3. The I nstruction Set: a Critical I nterf ace Review of Fundamental Concepts • I nst ruct ion Set Archit ect ure software • Machine Organizat ion • I nst ruct ion Execut ion Cycle instruction set • Pipelining • Memory hardware • Bus (Peripheral Hierarchy) • Perf ormance I ron Triangle CS252/ Culler CS252/ Culler 1/ 22/ 02 1/ 22/ 02 Lec 1. 13 Lec 1. 14 I nstruction Set Architecture Organization . . . the attributes of a [computing] system as seen • Capabilit ies & Perf ormance Logic Designer's View by t he programmer, i. e. t he concept ual st ruct ure Charact erist ics of Principal and f unct ional behavior, as dist inct f rom t he ISA Level Functional Units organizat ion of t he dat a f lows and cont rols t he logic – (e. g. , Registers, ALU, Shif ters, Logic FUs & Interconnect design, and t he physical implement at ion. Units, . . . ) – Amdahl, Blaaw, and • Ways in which t hese component s Brooks, 1964 are int erconnect ed SOFTWARE SOFTWARE • I nf ormat ion f lows bet ween -- Organization of Programmable Storage component s -- Data Types & Data Structures: • Logic and means by which such Encodings & Representations inf ormat ion f low is cont rolled. -- Instruction Formats • Choreography of FUs to -- Instruction (or Operation Code) Set realize the I SA -- Modes of Addressing and Accessing Data Items and Instructions • Register Transf er Level (RTL) -- Exceptional Conditions Descript ion CS252/ Culler CS252/ Culler 1/ 22/ 02 1/ 22/ 02 Lec 1. 15 Lec 1. 16 Review: MI PS R3000 (core) Review: Basic I SA Classes r0 0 Programmable storage Data types ? Accumulator: r1 acc ← acc + mem[A] 2^32 x bytes ° Format ? 1 address add A ° 31 x 32-bit GPRs (R0=0) acc ← acc + mem[A + x] Addressing Modes? 1+x address addx A ° 32 x 32-bit FP regs (paired DP) r31 Stack: PC HI, LO, PC tos ← tos + next 0 address add lo hi General Purpose Register: Arithmetic logical EA(A) ← EA(A) + EA(B) 2 address add A B Add, AddU, Sub, SubU, And, Or, Xor, Nor, SLT, SLTU, EA(A) ← EA(B) + EA(C) 3 address add A B C AddI, AddIU , SLTI, SLTIU, AndI, OrI, XorI, LUI Load/ Store: SLL, SRL, SRA, SLLV, SRLV, SRAV Ra ← Rb + Rc 3 address add Ra Rb Rc Memory Access Ra ← mem[Rb] load Ra Rb mem[Rb] ← Ra LB, LBU, LH, LHU, LW, LWL,LWR store Ra Rb SB, SH, SW, SWL, SWR Control 32-bit instructions on word boundary J, JAL, JR, JALR BEq, BNE, BLEZ,BGTZ,BLTZ,BGEZ,BLTZAL,BGEZAL CS252/ Culler CS252/ Culler 1/ 22/ 02 1/ 22/ 02 Lec 1. 17 Lec 1. 18 P age 3

  4. I nstruction Formats MI PS Addressing Modes & Formats • Simple addressing modes Variable: … • All instructions 32 bits wide Fixed: Register (direct) op rs rt rd Hybrid: register Immediate op rs rt immed • Addressing modes Base+index – each operand requires addess specif ier => variable f ormat op rs rt immed Memory • code size => variable lengt h inst ruct ions register + • perf ormance => f ixed lengt h inst ruct ions PC-relative – simple decoding, predictable operations op rs rt immed Memory • Wit h load/ st ore inst ruct ion arch, only one memory PC + address and f ew addressing modes • Register Indirect? • => simple f ormat , address mode given by opcode CS252/ Culler CS252/ Culler 1/ 22/ 02 1/ 22/ 02 Lec 1. 19 Lec 1. 20 Cray- 1: the original RI SC VAX- 11: the canonical CI SC Variable format, 2 and 3 address instruction Register-Register 15 9 8 6 5 3 2 0 Byte 0 1 n m Op Rd Rs1 R2 OpCode A/M A/M A/M Load, Store and Branch • Rich set of ort hogonal address modes 6 2 15 9 8 5 3 0 15 0 – immediate, of f set, indexed, aut oinc/ dec, indirect, Op Rd Rs1 Immediate indirect+of f set – applied t o any operand • Simple and complex inst ruct ions – synchronization instructions – data structure operations (queues) – polynomial evaluation CS252/ Culler CS252/ Culler 1/ 22/ 02 1/ 22/ 02 Lec 1. 21 Lec 1. 22 Review: Load/ Store Architectures MI PS R3000 I SA (Summary) Registers • I nst ruct ion Cat egories ° 3 address GPR MEM reg – Load/ St ore ° Register to register arithmetic R0 - R31 – Computational ° Load and store with simple addressing modes (reg + immediate) ° Simple conditionals – Jump and Branch – Float ing Point compare ops + branch z op r r r » coprocessor compare&branch PC – Memory Management condition code + branch on condition op r r immed HI – Special ° Simple f ixed- f ormat encoding op offset LO 3 Instruction Formats: all 32 bits wide ° Substantial increase in instructions OP rs rd sa funct rt ° Decrease in data BW (due to many registers) OP rs rt immediate ° Even more significant decrease in CPI (pipelining) ° Cycle time, Real estate, Design time, Design complexity OP jump target CS252/ Culler CS252/ Culler 1/ 22/ 02 1/ 22/ 02 Lec 1. 23 Lec 1. 24 P age 4

Recommend


More recommend