CSE 675.02: Introduction to Computer Architecture Performance • Measure, Report, and Summarize Performances • Make intelligent choices • See through the marketing hype of Computer Systems • Key to understanding underlying organizational motivation Why is some hardware better than others for different programs? Presentation C What factors of system performance are hardware related? (e.g., Do we need a new machine, or a new operating system?) How does the machine's instruction set affect performance? Gojko Babi ć 06/27/2005 Basic Performance Metrics Which of these airplanes has the best performance? • Response time: the time between the start and the completion of a task (in time units) Airplane Passengers Range (mi) Speed (mph) • Throughput: the total amount of tasks done in a given time Boeing 737-100 101 630 598 period (in number of tasks per unit of time) Boeing 747 470 4150 610 BAC/Sud Concorde 132 4000 1350 Douglas DC-8-50 146 8720 544 • Example: Car assembly factory: – 4 hours to produce a car (response time), • How much faster is the Concorde compared – 6 cars per an hour produced (throughput) to the 747? In general, there is no relationship between those two metrics, • How much bigger is the 747 than the Douglas DC-8? – throughput of the car assembly factory may increase to 18 cars per an hour without changing time to produce one car. – How? g. babic Presentation C 4
Computer Performance: Introduction Computer Performance: TIME, TIME, • The computer user is interested in response time (or execution time) – the time between the start and completion of a given TIME task (program). • Response Time (latency) • The manager of a data processing center is interested in — How long does it take for my job to run? throughput – the total amount of work done in given time. — How long does it take to execute a job? — How long must I wait for the database query? • The computer user wants response time to decrease, while • Throughput the manager wants throughput increased. — How many jobs can the machine run at once? — What is the average execution rate? • Main factors influencing performance of computer system are: — How much work is getting done? – processor and memory, – input/output controllers and peripherals, – compilers, and • If we upgrade a machine with a new processor what do we increase? – operating system. • If we add a new machine to the lab what do we increase? g. babic Presentation C 5 Analysis of CPU Time CPU time depends on the program which is executed, Execution Time including: – a number of instructions executed, – types of instructions executed and their frequency of usage. • Elapsed Time Computers are constructed is such way that events in hardware – counts everything (disk and memory accesses, I/O , etc.) – a useful number, but often not good for comparison purposes are synchronized using a clock. • CPU time Clock rate is given in Hz (=1/sec). – doesn't count I/O or time spent running other programs A clock rate defines durations of discrete time intervals called – can be broken up into system time, and user time clock cycle times or clock cycle periods: • Our focus: user CPU time – time spent executing the lines of code that are "in" our program g. babic Presentation C 8
Book's Definition of Performance Clock Cycles • Instead of reporting execution time in seconds, we often use cycles • For some program running on machine X, seconds program × seconds cycles program = cycle Performance X = 1 / Execution time X • Clock “ticks” indicate when to start activities (one abstraction): • "X is n times faster than Y" Performance (X) time n = –––––––––––––– • cycle time = time between ticks = seconds per cycle Performance (Y) • clock rate (frequency) = cycles per second (1 Hz. = 1 cycle/sec) • Problem: 1 12 A 4 Ghz. clock has a × = cycle time 10 250 picosecond s (ps) – machine A runs a program in 20 seconds 9 × 4 10 – machine B runs the same program in 25 seconds How many cycles are required for a program? How to Improve Performance • Could assume that number of cycles equals seconds program × seconds cycles program = number of instructions cycle 2nd instruction 3rd instruction 1st instruction So, to improve performance (everything else being equal) you can either (increase or decrease?) • 4th 5th 6th ... ________ the # of required cycles for a program, or time ________ the clock cycle time or, said another way, This assumption is incorrect, ________ the clock rate. different instructions take different amounts of time on different machines. Why? hint: remember that these are machine instructions, not lines of C code
Different numbers of cycles for Example different instructions time • Our favorite program runs in 10 seconds on computer A, which has a 4 GHz. clock. We are trying to help a computer designer build a new machine B, that will run this program in 6 seconds. The designer can use new (or • Multiplication takes more time than addition perhaps more expensive) technology to substantially • Floating point operations take longer than integer ones increase the clock rate, but has informed us that this • Accessing memory takes more time than accessing increase will affect the rest of the CPU design, causing registers machine B to require 1.2 times as many clock cycles as machine A for the same program. What clock rate should • Important point: changing the cycle time often changes we tell the designer to target?" the number of cycles required for various instructions (more later) Now that we understand cycles Performance • Performance is determined by execution time • A given program will require • Do any of the other variables equal performance? – some number of instructions (machine instructions) – # of cycles to execute program? – some number of cycles – # of instructions in program? – some number of seconds – # of cycles per second? • We have a vocabulary that relates these quantities: – average # of cycles per instruction? – cycle time (seconds per cycle) – average # of instructions per second? – clock rate (cycles per second) – CPI (cycles per instruction) • Common pitfall: thinking one of the variables is a floating point intensive application might have a higher CPI indicative of performance when it really isn’t. – MIPS (millions of instructions per second) this would be higher for a program using simple instructions
CPU Time Equation CPI Example • CPU time = Clock cycles for a program * Clock cycle time = Clock cycles for a program / Clock rate Clock cycles for a program is a total number of clock cycles • Suppose we have two implementations of the same instruction set architecture (ISA). needed to execute all instructions of a given program. For some program, • CPU time = Instruction count * CPI / Clock rate Machine A has a clock cycle time of 250 ps and a CPI of 2.0 Machine B has a clock cycle time of 500 ps and a CPI of 1.2 CPI – the average number of clock cycles per instruction (for a given execution of a given program) is an important parameter What machine is faster for this program, and by how much? given as: CPI = Clock cycles for a program / Instructions count • If two machines have the same ISA which of our quantities (e.g., clock Instruction count is a number of instructions executed, rate, CPI, execution time, # of instructions, MIPS) will always be sometimes referred as the instruction path length. identical? g. babic Presentation C 17 # of Instructions Example MIPS example • A compiler designer is trying to decide between two code • Two different compilers are being tested for a 4 GHz. machine with three different classes of instructions: Class A, Class B, and Class C, sequences for a particular machine. Based on the hardware which require one, two, and three cycles (respectively). Both implementation, there are three different classes of compilers are used to produce code for a large piece of software. instructions: Class A, Class B, and Class C, and they require one, two, and three cycles (respectively). The first compiler's code uses 5 million Class A instructions, 1 million Class B instructions, and 1 million Class C instructions. The first code sequence has 5 instructions: 2 of A, 1 of B, The second compiler's code uses 10 million Class A instructions, 1 and 2 of C million Class B instructions, and 1 million Class C instructions. The second sequence has 6 instructions: 4 of A, 1 of B, and 1 of C. • Which sequence will be faster according to MIPS? • Which sequence will be faster according to execution time? Which sequence will be faster? How much? What is the CPI for each sequence?
Recommend
More recommend