Foto: Hughes Leglise-Bataille some rights reserved How do we define (speed) performance ? Response time (aka execution time) – the time between the start and the Thus, to maximize completion of a task performance, need to Important to minimize execution individual users. time. 1 Digitalteknik och Datorarkitektur 5hp performance = execution_time Kapitel 4: prestanda 24 april 2008 karl.marklund@it.uu.se Nisse Stockholm 1 performance = execution_time Göteborg Klasse Jag är n gånger snabbare än Nisse. performance klasse execution_time nisse = n = performance nisse execution_time klasse ≈ 270 km 100 km/h Tid att utföra en uppgift från start till slut: execution time , response time , latency . 60 passagerare km/h Till Göteborg Passagerare Passagerare*km/h 2 passagerare Sportbil 230 1h 10 min 2 460 230 km/h Buss 100 2h 43 min 60 6000 Mängden nyttigt arbete per tidsenhet: throughput , bandwidth . 1
Response Time Det tar 4 månader att odla fram en tomat… � How long does it take for my job to run? ...men det betyder inte att vi � How long does it take to execute a job? endast kan odla fram 3 � How long must I wait for the database query? tomater på ett år. Throughput I de fall vi inte kan utföra flera uppgifter parallellt: � How many jobs can the machine run at once? Execution Time: tidsenheter/uppgift � What is the average execution rate? Throughput: uppgifter/tidsenhet � How much work is getting done? If we upgrade a machine with a new processor what do we increase? Execution Time = 1/Througput If we add a new machine to the lab what do we increase? Elapsed Time : counts CPU time : doesn't count everything (disk and memory I/O or time spent running other programs accesses, I/O , etc.) a useful number, but often can be broken up into not good for comparison system time, and user purposes. time. Our focus - user CPU time : time spent executing the lines of code that are "in" our program. Vad bestämmer om ett program körs snabbt eller långsamt? Hur stort programmet Instead of reporting k o m seconds program × seconds cycles p i l ? är... dvs antal rader kod a t execution time in = o r Antal instruktioner. (LOC)... seconds, we often use program cycle cycles . Clock “ticks” indicate when to start activities (one abstraction) Hur ofta processorn kan utföra en uppgift � clock cycle time... Beror på time cycle time = seconds per cycle hårdvaran ! clock rate (frequency) = cycles per Clock cycles per second (1 Hz = 1 cycle/sec) instruction (CPI). 1 × 12 = A 4 Ghz. clock has a 10 250 picosecond s (ps) cycle time × 9 4 10 2
Clock Cycles per program = cycles seconds program × seconds Instruction cycle CPU time = Instruction_count x CPI x clock_cycle_time cycle_time = 1/clock_rate Instruction_count x CPI clock_rate = 1/cycle_time CPUt ime = ----------------------------------------------- So, to improve performance… clock_rate everything else being equal you can Can measure the CPU execution time by running the program. either increase or decrease… These equations separate the three key factors that affect performance: The clock rate is usually given in the documentation. ________ the # of required cycles for a program, or decrease Can measure instruction count by using profilers/simulators without knowing all of the implementation details. ________ the clock cycle time or, said another way, decrease CPI varies by instruction type and ISA implementation for which increase ________ the clock rate. we must know the implementation details… How many cycles are required for a program? Different numbers of cycles for different instructions Could assume that number of cycles equals number of instructions: time Är detta antagande korrekt? 2nd instruction 3rd instruction 1st instruction • Multiplication takes more time than addition • Floating point operations take longer than integer ones ... 4th 5th 6th • Accessing memory takes more time than accessing registers Changing the cycle time often changes the number of time cycles required for various instructions… How long time to Memory-reference instructions: lw,lb,sw, sb reach a stable state ? Arithmetic-logical instructions: add, sub, and, or, slt When can signals be read State State Combinational and written? Control flow instructions: beq, j element element Logic Use the program counter (PC) to supply 1 3 2 the instruction address and fetch the Aha! instruction from memory (and update the PC) A state element clock can be read and one clock cycle written in the Fetch Execute the same clock PC = PC+4 instruction (possibly cycle! write registers). An edge-triggered methodology 1 read contents of state elements Execute Decode 2 send values through combinational logic Decode the instruction and read registers. 3 write results to one or more state elements 3
Determinates of CPU Performance A given program will require: • some number of instructions (machine instructions) • some number of cycles CPU time = Instruction_count x CPI x clock_cycle • some number of seconds Instruction_ CPI clock_cycle count We have a vocabulary that relates these Algorithm X X quantities: • cycle time (seconds per cycle) Programming X X • clock rate (cycles per second) language • CPI (cycles per instruction) Compiler X X ISA X X X A a floating point intensive MIPS (millions of instructions per second) Processor application might have a X X organization this would be higher for a program higher CPI Technology using simple instructions. X The Instruction Set Architecture , or ISA, of a computer is Summary: Evaluating ISAs the interface between the software and the hardware. � Design-time metrics: � Can it be implemented, in how long, at what cost? � Can it be programmed? Ease of compilation? � Static Metrics: � How many bytes does the program occupy in memory? � Dynamic Metrics: � How many instructions are executed? How many bytes does the processor fetch to execute the program? It is the portion of the In some sense, the � How many clocks are required per instruction? instruction set architecture computer visible to the CPI is defined by the set of � How "lean" a clock is practical? programmer/compiler such as registers, assembly instructions that Best Metric : Time to execute the program! instructions and can be used and by what memory access. they do. depends on the instructions set, the processor organization, and compilation Inst. Count Cycle Time techniques. Reduced Instruction Set Most “popular” instructions? Computer ( RISC ). A type of microprocessor architecture that utilizes a small, highly-optimized set of instructions , rather than a more 10 simple instructions specialized set of instructions often account for 96% of all found in other types of architectures. Load 22% instructions Conditional branch 20% By reducing the number of transistors Compare 16% and instructions to only those most Store 12% should make sure that they frequently used, the computer would go fast because they are the Add 8% get more done in a shorter amount of common case. And 6% time. Sub 5% Move register-register 4% It is dubious that it’s worth In this 1984 photograph, Stanford computer Call 1% scientists, left to right, John Shott, John Hennessy implementing many other and James D. Meindl brainstorm about the MIPS Return 1% sophisticated functions project, which simplified computing with RISC architecture. Other 4% Photo: Chuck Painter. 4
CISC (Complex Instruction Set Computer) is a retroactive definition CISC RISC that was introduced to distinguish the design from RISC Memory-to-memory: microprocessors. In contrast to Includes multi-clock Single-clock, "LOAD" and "STORE" complex instructions reduced instruction only RISC, CISC chips have a large incorporated in amount of different and complex Memory-to-memory: Register to register: instructions . instruction. "LOAD" and "STORE" "LOAD" and "STORE" incorporated in instructions are independent instructions Small code sizes, Low cycles per second, high cycles per second large code sizes Transistors used for storing Spends more transistors complex instructions on memory registers CISC processors generally feature variable-length instructions and multiple addressing formats and have a small number of general-purpose registers . Intel's 80x86 family is the quintessential example of CISC. 5
Recommend
More recommend