What Does Performance Mean? Response time Lecture 2: Performance Evaluation � A simulation program finishes in 5 minutes Methods Throughput � A web server serves 5 million request per Performance definition, benchmark, second summarizing performance, Amdahl’s Other metrics law, and CPI � MIPS (million instruction per second) � MFLOPS � Clock frequency Quantitative Definitions Performance Comparison “X is n times faster than Y”: Use response time or execution time: Execution time Performanc e � Performance is 1/(Execution time) = y = x n Performanc e Execution time � Performance is 1/CPI y x n : speedup if we are considering an � Performance is IPC (instruction per cycle, enhancement, optimization, etc. talk later) Some terms � Elapsed time vs. CPU time � Improve performance: decrease execution time, Use throughput increase throughput � Improve execution time: decrease execution time � Performance is 5 million requests per � Degrade performance: the reverse of the above; second, 5 simulation programs per hour brings negative speedup Performance of Computers Benchmark Suite Performance is defined for a given program and a given Benchmark suite is a collection of benchmarks machine. How about the machine alone? Need with a variety of applications benchmark programs: � Alleviating weakness of a single benchmark Real applications: scientific programs, compilers, text- processing software, image processing � More representative for computer designers to Modified applications: providing portability and focus evaluate their design Kernels: good to isolate performance of individual Categories of benchmark suites features � Desktop benchmarks: CPU, memory, and graphics � Lmbench: measure latency and bandwidth of memory, file performance system, networking, etc. Toy benchmarks � Sever benchmarks: throughput-oriented, I/O and OS intensive Synthetic benchmarks: matching average execution profile � Embedded benchmarks: measuring the ability to meet deadline and save power 1
SPEC CPU Benchmark Other SPEC Benchmarks SPEC: Standard Performance Evaluation SPECviewperf and SPEapc: 3D graphics Corporation performance CPU-intensive benchmark for evaluating SPEC JVM98: performance of client-side processor performance of workstation Java virtual machine Four generations: SPEC89, SPEC92, SPEC JBB2000: Server-cline Java SPEC95, and SPEC2000 application Two types of programs: INT and FP SPEC WEB99: evaluating WWW servers Emphasizing memory system SPEC HPC96: parallel and distributed performance in SPEC2000 computing Server Benchmarks Embedded Benchmark SPEC CPU2000, WBB99, SFS97 EEMBC (Embedded Microprocessor TPC Measuring the ability of a system to Benchmark Consortium) benchmarks handle transactions � Based on kernel performance � TPC-C: online transaction processing (OLTP) � Five classes: automotive/industrial, benchmark (for bank systems) consumer networking, office automation, � TPC-H: ad hoc decision make support and telecommunications � TPC-R: decision make support with standard queries Embedded benchmarks are not mature � TPC-W: simulating business-oriented transactional web server Metric 1: Arithmetic Mean Summarizing Performance Given the performance of a set of programs, how Total execution time / (number of to evaluate the performance of machines? programs) 1 n ∑ Time A B C i n = i 1 P1 (secs) 1 10 20 � Simple and intuitive P2 (secs) 1000 100 20 � Representative if the user run the programs Total (secs) 1001 110 40 an equal number of times Which computer is the best one? 2
Metric 2: Weighted Arithmetic Mean Metric 3: Geometric Means Based on relative performance to a reference Give (different) weights to different machine programs n ∏ Execution time ratio n n n i ∑ × ∑ = Weight Time , Weight 1 i = 1 i Relative performance is consistent with i i = i 1 = i 1 different reference machines Geometric mean(X ) X i = Geometric mean( i ) � Considering the frequencies of programs in Geometric mean(Y ) Y i i the workload � If C is 2x faster than B (using B as the reference), B is 2x faster than A (A as the reference), then C is 4x faster than A (A as the reference) Example Harmonic Mean Recall the previous example Given speedups s1, s2, …, s_n, the A B C average speedup by harmonic mean is P1 (secs) 1 10 20 P2 (secs) 1000 100 20 Total (secs) 1001 110 40 1 / (1/s1 + 1/s2 + … + 1/s_n) Arithmetic mean: B is 9.1x faster than A, C is Why not arithmetic mean? 25x times faster than A Geometric mean: A and B are equally fast, and C is only 60% faster than A Amdahl’s Law Amdahl’s Law We know about performance: defining, Predict overall speedup from “local measuring, and summarizing speedup” by an enhancement, provided the frequency to use the enhancement is How to maximize performance gains from know. the beginning in our design? � “Local speedup” is related to design and Principle: Make the Common Case Fast! optimization objectives, like to double CPU frequency, to reduce cache latency by half 3
Amdahl’s Law Amdahl’s Law Application Objective: improve performance of a graphics = Execution time Execution Time engine new old Choice one: Speed up FP Square root by 10x Fraction ( ) × − + 1 Fraction enhanced Choice two: Speed up all FP instruction by 1.6x enhanced Speedup enhance Assume 20% inst are FP Square root, 50% for all FP inst Execution time = Speedup old overall Execution time Ask: Which choice is better? new The answer is: 1 = Fraction ( ) + 1 - Fraction enhanced Implication: Optimizing for the common case first enhanced Speedup enhanced CPI and IPC CPU Time Equation CPI: Average number of cycles spend for = × CPU time CPU clock cycles cycle time each instruction = × CPU clock cycles Instructio n count CPI CPU clock cycles for a program CPI = Instructio n count ⇒ = × CPU time Instructio n count CPI IPC: Average number of instructions that can be finished in one cycle × cycle time Instructio n count IPC = CPU clock cycles for a program Make Design Choice Using CPU Time Equation Based on Instruction Types Equation FP FPSQR Other = × CPU time CPU Clock Cycles Clock cycle time Frequency 25% 2% 75% n CPI 4.0 20 1.33 = × CPU Clock Cycles ∑ IC CPI i i = i 1 n Alternative 1: CPI FPSQR 20 → 2 ⇒ = × × CPU time ∑ IC CPI Clock cycle time Alternative 2: CPI FP 4 → 2.5 i i = i 1 n ∑ = × CPI Instructio n frequency CPI Which one is better? Calculate speedups. i i i = 1 4
Recommend
More recommend