ee 457
play

EE 457 Focus on CPU Design Microarchitecture EE 457 Unit 0 - PowerPoint PPT Presentation

0.1 0.2 EE 457 Focus on CPU Design Microarchitecture EE 457 Unit 0 General Digital System Design Focus on Memory Hierarchy Cache Class Introduction Virtual Memory Focus on Computer Arithmetic Basic Hardware


  1. 0.1 0.2 EE 457 • Focus on CPU Design – Microarchitecture EE 457 Unit 0 – General Digital System Design • Focus on Memory Hierarchy – Cache Class Introduction – Virtual Memory • Focus on Computer Arithmetic Basic Hardware Organization – Fast Adders – Fast Multipliers 0.3 0.4 Course Info Prerequisites • Lecture: • EE 354L “Introduction to Digital Circuits” – Prof. Redekopp (redekopp@usc.edu) • Discussion: – Logic design – TA: See website – State machine implementation • Website: – Datapath/control unit implementation http://bytes.usc.edu/ee457 – Verilog HDL https://courses.uscden.net/d2l/home • EE 109/354 “Basic Computer Organization” • Midterm (30%): – Assembly language programming • Final (36%): – Basic hardware organization and structures • Homework Assignments (14%): Individual • C or similar high-level programming knowledge • Lab Assignments (20%): Individual and Teams of 2 – Contact TA • Familiarity with Verilog HDL

  2. 0.5 0.6 EE 109/354 Required Knowledge EE 354L Requisite Knowledge • You must know and understand the following terms and • You must know and understand the following terms and concepts; please review them as necessary concepts; please review them as necessary – Bit, Nibble (four bit word), Byte, Word (16- or 32-bit value) – Combinational design of functions specified by truth tables and function tables – CPU, ALU, CU (Control Unit), ROM, RAM (RWM), Word length of a computer, System Bus (Address, Data, Control) – Design of adders, comparators, multiplexers, decoders, demultiplexers – General Purpose Registers, Instruction Register (IR), Program Counter – Tri-state outputs and buses (PC), Stack, Stack Pointer (SP) Subroutine calls, Flag register (or – Sequential Logic components: D-Latches and D-Flip-Flops, counters, Condition Code Register or Processor Status Word), registers Microprogramming – State Machine Design: State diagrams, Mealy vs. Moore-style outputs, – Instruction Set, Addressing Modes, Machine Language, Assembly Input Function Logic, Next State Logic, State Memory, Output Function Language, Assembler, High Level Language, Compiler, Linker, Object Logic, power-on reset state code, Loader – State Machine Design using encoded state assignments vs. one-hot – Interrupts, Exceptions, Interrupt Vector, Vectored Interrupts, Traps state assignment – Drawing, interpretation, and analysis of waveform diagrams 0.7 0.8 Levels of Architecture Computer Arithmetic Requisite Knowledge • System architecture • You must know and understand the following terms and – High-level HW org. concepts; please review them as necessary Applications C / C++ / • Instruction Set Architecture – Unsigned and Signed (2’s complement representation) Numbers Java – A contract or agreement about what the OS Libraries – Unsigned and signed addition and subtraction SW HW will support and how the programmer Programmer’s Model – Overflow in addition and subtraction can write SW for the HW Assembly / Virtualization (Instruction Set Architecture) Machine Code – Vocabulary that the HW understands and Layer – Multiplication SW is composed of Processor / Memory / – Booth’s algorithm for multiplications of signed numbers I/O • Microarchitecture Microarchitecture – Restoring or Non-Restoring Division for unsigned numbers Functional Units – HW implementation for executing – Hardware implementations for adders and multipliers (Registers, Adders, Muxes) instructions – Usually transparent to SW programs but not HW Logic Gates program performance Transistors – Example: Intel and AMD have different microarchitectures but support essentially Voltage / Currents the same instruction set

  3. 0.9 0.10 Why is Architecture Important Digital System Spectrum Application • Key idea: Any “algorithm” can be implemented in HW or Specific Hardware • Enabling ever more capable computers (no software) SW or some mixture of both • Different systems require different architectures A digital systems can be located anywhere in a spectrum • Computing System of: Flexibility, Design Time – PC’s – ALL HW: (a.k.a. Application-Specific IC’s) Spectrum Performance – Servers – ALL SW: An embedded computer system Cost – Embedded Systems • Advantages of application specific HW – Faster, less power • Simple control devices like ATM’s, toys, appliances • Advantages of an embedded computer system (i.e. • Media systems like game consoles and MP3 players general purpose HW for executing SW) • Robotics – Reprogrammable (i.e. make a mistake, fix it) General Purpose – Less expensive than a dedicated hardware system (single HW w/ Software computer system can be used for multiple designs) • MP3 Player: System-on-Chip (SoC) approach – Some dedicated HW for intensive MP3 decoding operations – Programmable processor for UI & other simple tasks http://d2rormqr1qwzpz.cloudfront.net/photos/2014/01/01/56914-moto_x.jpg 0.11 0.12 Computer Components Combine 2c. Flour Mix in 3 eggs • Processor Instructions – Executes the program and performs all the operations • Main Memory Data – Stores data and program Processor (Reads instructions, ( instructions) operates on data) – Different forms: Processor • RAM = read and write but Arithmetic + volatile (lose values when power off) Logic + Control • ROM = read-only but non-volatile Drivers and Trends Software (maintains values when power Circuitry Program off) ARCHITECTURE OVERVIEW – Significantly slower than the processor speeds Program • Input / Output Devices Input Output (Instructions) – Generate and consume data from Devices Devices Data the system (Operands) – MUCH, MUCH slower than the Memory (RAM) Disk Drive processor Data

  4. 0.13 0.14 Moore’s Law, Computer Architecture & Real- Architecture Issues Estate Planning • Fundamentally, architecture is all about the different • Moore’s Law = Number of transistors able to be ways of answering the question: fabricated on a chip grows exponentially with time “What do we do with the ever-increasing number of • Computer architects decide, transistors available to us” “What should we do with all of this capability?” • Similarly real-estate • Goal of a computer architect is to take increasing developers ask, “How do we transistor budgets of a chip (i.e. Moore’s Law) and make best use of the land produce an equivalent increase in computational area given to us?” ability USC University Park Development Master Plan http://re.usc.edu/docs/University%20Park%20Development%20Project.pdf 0.15 0.16 Transistor Physics Technology Nodes • Cross-section of transistors on an IC • Moore’s Law is founded on our ability to keep shrinking transistor sizes – Gate/channel width shrinks – Gate oxide shrinks • Transistor feature size is referred to as the implementation “technology node”

  5. 0.17 0.18 Growth of Transistors on Chip Implications of Moore’s Law 1,000,000 Core 2 Duo Pentium D (291M) (230M) Pentium 4 • What should we do with all these transistors Prescott (125M) 100,000 Pentium 3 – Put additional simple cores on a chip (28M) Pentium 4 Northwood (42M) – Use transistors to make cores execute instructions 10,000 Tranistor Count (Thousands) Pentium Pentium 2 (3.1M) (7M) faster Pentium Pro (5.5M) Intel '486 1,000 – Use transistors for more on-chip cache memory (1.2M) Intel '386 • Cache is an on-chip memory used to store data the (275K) 100 Intel '286 processor is likely to need (134K) • Faster than main-memory which is on a separate chip Intel 8086 (29K) 10 and much larger (thus slower) 1 1975 1980 1985 1990 1995 2000 2005 2010 Year 0.19 0.20 Pentium 4 Increase in Clock Frequency 10000 Pentium 4 Prescott (3600) L2 Cache Core 2 Duo (2400) Pentium 4 Pentium D Willamette (2800) (1500) 1000 Pentium 3 Pentium 2 (700) (266) Frequency (MHz) Pentium Pro L1 Data 100 (200) Intel '486 Pentium Intel '386 (25) (60) (20) Intel '286 (12.5) 10 Intel 8086 (8) 1 L1 Instruc. 1975 1980 1985 1990 1995 2000 2005 2010 Year

  6. 0.21 0.22 Intel Nehalem Quad Core Progression to Parallel Systems • If power begins to limit clock frequency, how can we continue to achieve more and more operations per second? – By running several processor cores in parallel at lower frequencies – Two cores @ 2 GHz vs. 1 core @ 4 GHz yield the same theoretical maximum ops./sec. • We’ll end our semester by examining (briefly) a few parallel architectures – Chip multiprocessors (multicore) – Graphics Processor Units (SIMT) 0.23 0.24 Flynn’s Taxonomy GPU Chip Layout • Categorize architectures based on relationship between • 2560 Small program (instructions) and data Cores SISD SIMD / SIMT • Upwards of Single-Instruction, Single-Data Single Instruction, Multiple Data (Single Instruction, Multiple Thread) 7.2 billion • Typical, single-threaded processor • Vector Units (e.g. Intel MMX, SSE, transistors SSE2) • GPU’s • 8.2 TFLOPS • 320 MISD MIMD Gbytes/sec Multiple Instruction, Single-Data Multiple Instruction, Multiple-Data • Less commonly used (some streaming • Multi-threaded processors architectures may be considered in this • Typical CMP/Multicore system (Task category) parallelism with different threads Source: NVIDIA executing) Photo: http://www.theregister.co.uk/2010/01/19/nvidia_gf100/

Recommend


More recommend