Performance, Power CS301 Prof Szajda
Performance Metrics (How do we compare two machines?)
What to Measure? Which airplane has the best performance? � 3
Performance • One size does not fit all • Depends on application domain � Scientific computing � Graphics � Databases � General-Purpose desktop � Beware of designing to benchmark! • Depends on technology characteristics � DRAM speed and capacity, chip size, etc.
Which Metric Do We Use? • Response or execution time � Di fg erence between start and end time � Individual user cares most about this • Throughput � Total amount of work done in given time � Frequently used for servers and clusters • How are these a fg ected by � Replacing processor with faster version? � Adding more processors?
Execution Time • Shorter execution time is better • Allows comparison between 2 machines
Relative Performance • “X is n times faster than Y” • Example: � Machine A takes 10s to run program � Machine B takes 15s to run same program � What is the performance ratio?
Di fg erent Time Values Execution time • � Wall-clock, response, or elapsed time � Includes everything (processing,I/O, OS overhead, etc)! � Determines system performance CPU time • � Time spent executing code for this task only � Does not include I/O or time-sharing � Comprises user CPU time and system CPU time � Di fg erent programs are a fg ected di fg erently by CPU and system performance � man time � 90.7u 12.9s 2:39 65% � User: 90.7 sec � System: 12.9 sec � Elapsed time: 2 min 39 sec
Clock Cycles • Instead of expressing time in seconds, use clock cycles • Clock � Determines when events take place � Runs at constant rate (ex. 1 GHz) � Easy to convert between clock rate and seconds � Clock rate = 1 / Clock Cycle time � 500 MHz = 1 / (2 ns) � 1 ns = 10 -9 s
CPU Clocking � Operation of digital hardware governed by a constant-rate clock Clock period Clock (cycles) Data transfer and computation Update state � Clock period: duration of a clock cycle � e.g., 250ps = 0.25ns = 250 × 10 –12 s � Clock frequency (rate): cycles per second � e.g., 4.0GHz = 4000MHz = 4.0 × 10 9 Hz Chapter 1 — Computer Abstractions and Technology —
An Aside Chapter 1 — Computer Abstractions and Technology —
CPU Time � Performance improved by � Reducing number of clock cycles � Increasing clock rate � Hardware designer must often trade off clock rate against cycle count Chapter 1 — Computer Abstractions and Technology —
CPU Time Example � Computer A: 2GHz clock, 10s CPU time � Designing Computer B � Aim for 6s CPU time � Can do faster clock, but causes 1.2 × clock cycles � How fast must Computer B clock be? Chapter 1 — Computer Abstractions and Technology —
Instruction Count and CPI � Instruction Count for a program � Determined by program, ISA and compiler � Average cycles per instruction � Determined by CPU hardware � If different instructions have different CPI � Average CPI affected by instruction mix Chapter 1 — Computer Abstractions and Technology —
CPI Example � Computer A: Cycle Time = 250ps, CPI = 2.0 � Computer B: Cycle Time = 500ps, CPI = 1.2 � Same ISA � Which is faster, and by how much? A is faster… …by this much Chapter 1 — Computer Abstractions and Technology —
Application Characteristics • Determine the mix of di fg erent instruction types � Integer arithmetic � Logical operations � Floating point arithmetic � Loads and stores • Di fg erent applications have di fg erent CPI because of di fg erent instruction mixes
CPI in More Detail � If different instruction classes take different numbers of cycles � Weighted average CPI Relative frequency Chapter 1 — Computer Abstractions and Technology —
CPI Example � Alternative compiled code sequences using instructions in classes A, B, C Class A B C CPI for class 1 2 3 IC in sequence 1 2 1 2 IC in sequence 2 4 1 1 � Sequence 1: IC = 5 � Sequence 2: IC = 6 � Clock Cycles � Clock Cycles = 2 × 1 + 1 × 2 + 2 × 3 = 4 × 1 + 1 × 2 + 1 × 3 = 10 = 9 � Avg. CPI = 10/5 = 2.0 � Avg. CPI = 9/6 = 1.5 Chapter 1 — Computer Abstractions and Technology —
Performance Summary The BIG Picture � Performance depends on � Algorithm: affects IC, possibly CPI � Programming language: affects IC, CPI � Compiler: affects IC, CPI � Instruction set architecture: affects IC, CPI, T c Chapter 1 — Computer Abstractions and Technology —
Amdahl’s Law • How much speedup do you get from an enhancement? Speedup = Execution time w/o enhancement Execution time w/ enhancement • Based on � Fraction of time enhancement used � Improvement in enhanced mode fraction enh Exec new = Exec old × ((1-fraction enh ) + ) Speedup enh
§1.10 Fallacies and Pitfalls Pitfall: Amdahl’s Law � Improving an aspect of a computer and expecting a proportional improvement in overall performance � Example: multiply accounts for 80s/100s � How much improvement in multiply performance to get 5 × overall? � Can’t be done! � Corollary: make the common case fast Chapter 1 — Computer Abstractions and Technology —
Review Question • Your machine has a clock rate of 2.4GHz. How long is the clock cycle?
Review Questions • Suppose you are given the following: � Machine A � 1 GHz � Average CPI = 1.6 � Instructions = 1.7 Billion � Machine B � 3.3 GHz � Average CPI = 6.1 � Instructions = 2 Billion • Which machine is faster? By how much?
Review Questions • What is the average CPI for a machine with the following CPIs on an application with the following instruction frequency? Frequen Type CPI cy Arithmeti 0.45 1 c Memory 0.3 8 Control 0.2 3 Mult/Div 0.05 5
Review Questions • What factors must be included when comparing the relative performance of two machines?
Amdahl’s Law fraction enh Exec new = Exec old × ((1-fraction enh ) + ) Speedup enh • Suppose you have an enhancement that makes a functional unit 10x faster. • Speedup if used 5% of the time? • Speedup if used 40% of the time?
Review Questions • What is the equation for execution time? • What does Amdahl’s Law say?
Benchmarks • Programs specifically used to measure performance • Hope is that it is representative of how computer will be used • Examples � SPEC Integer and Floating Point � MediaBench � MineBench � TPC
SPEC CPU Benchmark � Programs used to measure performance � Supposedly typical of actual workload � Standard Performance Evaluation Corp (SPEC) � Develops benchmarks for CPU, I/O, Web, … � SPEC CPU2006 � Elapsed time to execute a selection of programs � Negligible I/O, so focuses on CPU performance � Normalize relative to reference machine � Summarize as geometric mean of performance ratios � CINT2006 (integer) and CFP2006 (floating-point) Chapter 1 — Computer Abstractions and Technology —
CINT2006 for Intel Core i7 920 Chapter 1 — Computer Abstractions and Technology —
§1.7 The Power Wall Recent Concern: Power � In CMOS IC technology × 40 5V → 1V × 1000 Chapter 1 — Computer Abstractions and Technology —
Tricks to Increase Power • Attach large cooling devices • Turn o fg parts of chips not used in given clock cycle � Can increase power to 300 watts... � ...But these and other ways all prohibitively expensive for desktop computers. So... � 32
More Recent Approaches: Chip Multiprocessors • Reasons for change � Limited opportunities to improve single thread performance � Power � On-chip communication latencies
§1.8 The Sea Change: The Switch to Multiprocessors Uniprocessor Performance Constrained by power, instruction-level parallelism, memory latency Chapter 1 — Computer Abstractions and Technology —
Multiprocessors � Multicore microprocessors � More than one processor per chip � Requires explicitly parallel programming � Compare with instruction level parallelism � Hardware executes multiple instructions at once � Hidden from the programmer � Hard to do � Programming for performance � Load balancing � Optimizing communication and synchronization Chapter 1 — Computer Abstractions and Technology —
§1.9 Concluding Remarks Concluding Remarks � Cost/performance is improving � Due to underlying technology development � Hierarchical layers of abstraction � In both hardware and software � Instruction set architecture � The hardware/software interface � Execution time: the best performance measure � Power is a limiting factor � Use parallelism to improve performance Chapter 1 — Computer Abstractions and Technology —
Recommend
More recommend