performance power die yield
play

Performance, Power, Die Yield CS301 Prof Szajda Administrative - PowerPoint PPT Presentation

Performance, Power, Die Yield CS301 Prof Szajda Administrative HW #1 assigned w Due Wednesday, 9/3 at 5:00 pm Performance Metrics (How do we compare two machines?) What to Measure? Which airplane has the best performance? 4


  1. Performance, Power, Die Yield CS301 Prof Szajda

  2. Administrative • HW #1 assigned w Due Wednesday, 9/3 at 5:00 pm

  3. Performance Metrics (How do we compare two machines?)

  4. What to Measure? Which airplane has the best performance? 4

  5. Performance • One size does not fit all • Depends on application domain w Scientific computing w Graphics w Databases w General-Purpose desktop w Beware of designing to benchmark! • Depends on technology characteristics w DRAM speed and capacity, chip size, etc.

  6. Which Metric Do We Use? • Response or execution time w Di fg erence between start and end time w Individual user cares most about this • Throughput w Total amount of work done in given time w Frequently used for servers and clusters • How are these a fg ected by w Replacing processor with faster version? w Adding more processors?

  7. Execution Time • Shorter execution time is better • Allows comparison between 2 machines

  8. Relative Performance • “X is n times faster than Y” • Example: w Machine A takes 10s to run program w Machine B takes 15s to run same program w What is the performance ratio?

  9. Di fg erent Time Values Execution time • w Wall-clock, response, or elapsed time § Includes everything (processing,I/O, OS overhead, etc)! w Determines system performance CPU time • w Time spent executing code for this task only § Does not include I/O or time-sharing w Comprises user CPU time and system CPU time § Di fg erence programs are a fg ected di fg erently by CPU and system performance w man time 90.7u 12.9s 2:39 65% § § User: 90.7 sec § System: 12.9 sec § Elapsed time: 2 min 39 sec

  10. Clock Cycles • Instead of expressing time in seconds, use clock cycles • Clock w Determines when events take place w Runs at constant rate (ex. 1 GHz) w Easy to convert between clock rate and seconds § Clock rate = 1 / Clock Cycle § 500 MHz = 1 / (2 ns) § 1 ns = 10 -9 s

  11. CPU Clocking n Operation of digital hardware governed by a constant-rate clock Clock period Clock (cycles) Data transfer 
 and computation Update state n Clock period: duration of a clock cycle n e.g., 250ps = 0.25ns = 250 × 10 –12 s n Clock frequency (rate): cycles per second n e.g., 4.0GHz = 4000MHz = 4.0 × 10 9 Hz Chapter 1 — Computer Abstractions and Technology —

  12. CPU Time n Performance improved by n Reducing number of clock cycles n Increasing clock rate n Hardware designer must often trade off clock rate against cycle count Chapter 1 — Computer Abstractions and Technology —

  13. CPU Time Example n Computer A: 2GHz clock, 10s CPU time n Designing Computer B n Aim for 6s CPU time n Can do faster clock, but causes 1.2 × clock cycles n How fast must Computer B clock be? Chapter 1 — Computer Abstractions and Technology —

  14. Instruction Count and CPI n Instruction Count for a program n Determined by program, ISA and compiler n Average cycles per instruction n Determined by CPU hardware n If different instructions have different CPI n Average CPI affected by instruction mix Chapter 1 — Computer Abstractions and Technology —

  15. CPI Example n Computer A: Cycle Time = 250ps, CPI = 2.0 n Computer B: Cycle Time = 500ps, CPI = 1.2 n Same ISA n Which is faster, and by how much? A is faster … … by this much Chapter 1 — Computer Abstractions and Technology —

  16. Application Characteristics • Determine the mix of di fg erent instruction types w Integer arithmetic w Logical operations w Floating point arithmetic w Loads and stores • Di fg erent applications have di fg erent CPI because of di fg erent instruction mixes

  17. CPI in More Detail n If different instruction classes take different numbers of cycles n Weighted average CPI Relative frequency Chapter 1 — Computer Abstractions and Technology —

  18. CPI Example n Alternative compiled code sequences using instructions in classes A, B, C Class A B C CPI for class 1 2 3 IC in sequence 1 2 1 2 IC in sequence 2 4 1 1 n Sequence 1: IC = 5 n Sequence 2: IC = 6 n Clock Cycles 
 n Clock Cycles 
 = 2 × 1 + 1 × 2 + 2 × 3 
 = 4 × 1 + 1 × 2 + 1 × 3 
 = 10 = 9 n Avg. CPI = 10/5 = 2.0 n Avg. CPI = 9/6 = 1.5 Chapter 1 — Computer Abstractions and Technology —

  19. Performance Summary The BIG Picture n Performance depends on n Algorithm: affects IC, possibly CPI n Programming language: affects IC, CPI n Compiler: affects IC, CPI n Instruction set architecture: affects IC, CPI, T c Chapter 1 — Computer Abstractions and Technology —

  20. Amdahl’s Law • How much speedup do you get from an enhancement? Speedup = Execution time w/o enhancement Execution time w/ enhancement • Based on w Fraction of time enhancement used w Improvement in enhanced mode fraction enh Exec new = Exec old × ((1-fraction enh ) + ) Speedup enh

  21. §1.10 Fallacies and Pitfalls Pitfall: Amdahl’s Law n Improving an aspect of a computer and expecting a proportional improvement in overall performance n Example: multiply accounts for 80s/100s n How much improvement in multiply performance to get 5 × overall? n Can’t be done! n Corollary: make the common case fast Chapter 1 — Computer Abstractions and Technology —

  22. Review Question • Your machine has a clock rate of 2.4GHz. How long is the clock cycle?

  23. Review Questions • Suppose you are given the following: w Machine A § 1 GHz § Average CPI = 1.6 § Instructions = 1.7 Billion w Machine B § 3.3 GHz § Average CPI = 6.1 § Instructions = 2 Billion • Which machine is faster? By how much?

  24. Review Questions • What is the average CPI for a machine with the following CPIs on an application with the following instruction frequency? Type Frequency CPI Arithme(c 0.45 1 Memory 0.3 8 Control 0.2 3 Mult/Div 0.05 5

  25. Review Questions • What factors must be included when comparing the relative performance of two machines?

  26. Amdahl’s Law fraction enh Exec new = Exec old × ((1-fraction enh ) + ) Speedup enh • Suppose you have an enhancement that makes function 10x faster. • Speedup if used 5% of the time? • Speedup if used 40% of the time?

  27. Review Questions • What is the equation for execution time? • What does Amdahl’s Law say?

  28. Benchmarks • Programs specifically used to measure performance • Hope is that it is representative of how computer will be used • Examples w SPEC Integer and Floating Point w MediaBench w MineBench w TPC

  29. SPEC CPU Benchmark n Programs used to measure performance n Supposedly typical of actual workload n Standard Performance Evaluation Corp (SPEC) n Develops benchmarks for CPU, I/O, Web, … n SPEC CPU2006 n Elapsed time to execute a selection of programs n Negligible I/O, so focuses on CPU performance n Normalize relative to reference machine n Summarize as geometric mean of performance ratios n CINT2006 (integer) and CFP2006 (floating-point) Chapter 1 — Computer Abstractions and Technology —

  30. CINT2006 for Intel Core i7 920 Chapter 1 — Computer Abstractions and Technology —

  31. §1.7 The Power Wall Recent Concern: Power Trends n In CMOS IC technology × 30 5V → 1V × 1000 Chapter 1 — Computer Abstractions and Technology —

  32. Tricks to Increase Power • Attach large cooling devices • Turn o fg parts of chips not used in given clock cycle w Can increase power to 300 watts... w ...But these and other ways all prohibitively expensive for desktop computers. So... 32

  33. More Recent Approaches: 
 Chip Multiprocessors • Reasons for change w Limited opportunities to improve single thread performance w Power w On-chip communication latencies

  34. Tapering Processor Performance

  35. §1.8 The Sea Change: The Switch to Multiprocessors Uniprocessor Performance Constrained by power, instruction-level parallelism, memory latency Chapter 1 — Computer Abstractions and Technology —

  36. Multiprocessors n Multicore microprocessors n More than one processor per chip n Requires explicitly parallel programming n Compare with instruction level parallelism n Hardware executes multiple instructions at once n Hidden from the programmer n Hard to do n Programming for performance n Load balancing n Optimizing communication and synchronization Chapter 1 — Computer Abstractions and Technology —

  37. §1.9 Concluding Remarks Concluding Remarks n Cost/performance is improving n Due to underlying technology development n Hierarchical layers of abstraction n In both hardware and software n Instruction set architecture n The hardware/software interface n Execution time: the best performance measure n Power is a limiting factor n Use parallelism to improve performance Chapter 1 — Computer Abstractions and Technology —

Recommend


More recommend