1 Definition of CPU execution time CPI -- Cycles per instruction - PDF document

Performance of computer systems Moore’s Law • Many different factors among which: – Technology • Raw speed of the circuits (clock, switching time) • Process technology (how many transistors on a chip) – Organization • What type of processor (e.g., RISC vs. CISC) • What type of memory hierarchy • What types of I/O devices – How many processors in the system – Software • O.S., compilers, database drivers etc Courtesy Intel Corp. 4/26/2004 CSE378 Performance. 1 4/26/2004 CSE378 Performance. 2 Processor-Memory Performance Gap What are some possible metrics • x Memory latency decrease (10x over 8 years but densities have increased • Raw speed (peak performance = clock rate) 100x over the same period) • Execution time (or response time): time to execute one • o x86 CPU speed (100x over 10 years) (suite of) program from beginning to end. Pentium IV 1000 o – Need benchmarks for integer dominated programs, scientific, Pentium III o graphical interfaces, multimedia tasks, desktop apps, utilities etc. Pentium Pro o “Memory wall” • Throughput (total amount of work in a given time) Pentium 100 o o – measures utilization of resources (good metric when many users: 386 “Memory gap” x x e.g., large data base queries, Web servers) x x x x 10 • Quite often improving execution time will improve throughput and vice-versa 1 89 91 93 95 97 99 01 4/26/2004 CSE378 Performance. 3 4/26/2004 CSE378 Performance. 4 Execution time Metric Measuring execution time • Execution time: inverse of performance • Wall clock, response time, elapsed time Performance A = 1 / (Execution_time A ) • Some systems have a “time” function • Processor A is faster than Processor B – Unix 13.7u 23.6s 18:37 3% 2069+1821k 13+24io 62pf+0w Execution_time A < Execution_time B • Difficult to make comparisons from one system to another Performance A > Performance B because of too many factors • Relative performance • Remainder of this lecture: CPU execution time Performance A / Performance B =Execution_time B / Execution_time A – Of interest to microprocessors vendors and designers 4/26/2004 CSE378 Performance. 5 4/26/2004 CSE378 Performance. 6 1

Definition of CPU execution time CPI -- Cycles per instruction CPU execution_time = CPU clock_cycles*clock cycle_time • Definition: CPI average number of clock cycles per instr. CPU clock_cycles = Number of instr. * CPI • CPU clock_cycles is program dependent thus CPU exec_time = Number of instr. * CPI * clock cycle_time CPU execution_time is program dependent • Computer architects try to minimize CPI • clock cycle_time ( nanoseconds , ns) depends on the – or maximize its inverse IPC : number of instructions per cycle • CPI in isolation is not a measure of performance particular processor – program dependent, compiler dependent • clock cycle_time = 1/ clock cycle_rate (rate in MHz ) – but good for assessing architectural enhancements (experiments with same – clock cycle_time = 1 µ s, clock cycle_rate = 1 MHz programs and compilers) • In an ideal pipelined processor (to be seen soon) CPI =1 – clock cycle_time = 1ns, clock cycle_rate = 1 GHz – but… not ideal so CPI > 1 • Alternate definition – could have CPI <1 if several instructions execute in parallel (superscalar CPU execution_time = CPU clock_cycles / clock cycle_rate processors) 4/26/2004 CSE378 Performance. 7 4/26/2004 CSE378 Performance. 8 Classes of instructions How to measure the average CPI A given of the Elapsed time: wall clock processor CPU exec_time = Number of instr. * CPI * clock cycle_time • Some classes of instr. take longer to execute than others – e.g., floating-point operations take longer than integer operations • Count instructions executed in each class • Assign CPI’s per classes of inst., say CPI i • Needs a simulator CPU exec_time = Σ (CPI i * C i )* clock cycle_time – interprets every instruction and counts their number where C i is the number of insts. of class i that have been executed • or a profiler • Note that minimizing the number of instructions does not – discover the most often used parts of the program and instruments necessarily improve execution time only those • Improving one part of the architecture can improve the CPI – or use sampling of one class of instructions • Use of programmable hardware counters – One often talks about the contribution to the CPI of a class of instructions – modern microprocessors have this feature but it’s limited 4/26/2004 CSE378 Performance. 9 4/26/2004 CSE378 Performance. 10 Other popular performance measures: MIPS Other metric: MFLOPS • MIPS (Millions of instructions per second) • Similar to MIPS in spirit MIPS = Instruction count / (Exec.time * 10 6 ) • Used for scientific programs/machines MIPS = (Instr. count * clock rate)/(Instr. count *CPI * 10 6 ) • MFLOPS: million of floating-point ops/second MIPS = clock rate /(CPI * 10 6 ) • MIPS is a rate: the higher the better • MIPS in isolation no better than CPI in isolation – Program and/or compiler dependent – Does not take the instruction set into account – can give “wrong” comparative results 4/26/2004 CSE378 Performance. 11 4/26/2004 CSE378 Performance. 12 2

Benchmarks How to report (benchmark) performance Benchmark: workload representative of what a system will be used for • If you measure execution times use arithmetic mean • • Industry benchmarks – e.g., for n benchmarks – SPECint and SPECfp industry benchmarks updated every few years, ( Σ exec_time i ) / n Currently SPEC CPU2000 • If you measure rates use harmonic mean – Linpack (Lapack), NASA kernel: scientific benchmarks n/ ( Σ 1 / rate i ) = 1/(arithmetic mean) – TPC-A, TPC-B, TPC-C and TPC-D used for databases and data mining – Other specialized benchmarks (Olden for list processing, Specweb, SPEC JVM98 etc…) – Benchmarks for desktop applications, web applications are not as standard – Beware! Compilers are super optimized for the benchmarks 4/26/2004 CSE378 Performance. 13 4/26/2004 CSE378 Performance. 14 Computer design: Make the common case fast • Amdahl’s law (speedup) • Speedup = (performance with enhancement)/(performance base case) Or equivalently, Speedup = (exec.time base case)/(exec.time with enhancement) • For example, application to parallel processing – s fraction of program that is sequential – Speedup S is at most 1/ s – That is if 20% of your program is sequential the maximum speedup with an infinite number of processors is at most 5 4/26/2004 CSE378 Performance. 15 3

1 Definition of CPU execution time CPI -- Cycles per instruction - PDF document

Performance of computer systems Moores Law Many different factors among which: Technology Raw speed of the circuits (clock, switching time) Process technology (how many transistors on a chip) Organization What type

Performance Hakim Weatherspoon CS 3410 Computer Science Cornell University [Weatherspoon,

Sparse Approximate Inverse Preconditioners Revisited Salvatore Filippone Daniele Bertaccini

THE CASE MOLLA SALI VS. GREECE ECHR 19 TH DECEMBER 2018, APPL. 20452/2014 D R M ARCO R IZZUTI

Adaptive Interventions: What are they? Why do we need them? and How can we study them? Daniel

Notes from SUSY2018 Conference Romain Madar 06/2018 Romain Madar Notes from SUSY2018 Conference

Wheeled Rob 17. Wheeled Robots Guy Campion, Woojin Chung 17.2.5 Characterization of Robot

Darboux integrating factors of planar polynomial vector fields: Inverse problems (A Related

Updated Anti-neutrino Oscillation Results from MiniBooNE Byron Roe University of Michigan For

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from

ESG integration in the long-term investment process AUTHOR: Matt Whineray CEO, NZ SUPER FUND

Hidden Markov Models Pratik Lahiri Introduction A hidden Markov model (HMM) is a

Culture eats strategy for breakfast - Peter Drucker 2 12/2/18 Collective Impact Backbone Staff

Terrain Unitys Terrain editor islands topographical landscapes Mountains And

Data Breaches, Credit Card Fraud, Front Page News Are You Next? Calvin Weeks EnCE, CEDS,

Mitigation Journey Richard Ottley Head, Manufacturing Excellence The Response to Crisis

How can we strengthen financial market preparedness? Kerstin af Jochnick First Deputy Governor

2/7/2019 MIDC LEADERSHIP CONFERENCE FEBRUARY 2019 1 2 OBJECTIVES Meet other leaders around

C. Mallorca 260 08008 Barcelona Tel. 932 155 989 www.auren.com Non Financial Information: key

Development of Optimized Radar Data Assimilation Capability within the Fully Coupled EnKFEnVar

IDAPA 02.02.14 Weights & Measures Negotiated Rulemaking JUNE 23, 2020 9:00-11:30AM

Technology Platform Segmentation

Theory of Parallel Evolutionary Algorithms Dirk Sudholt University of Sheffield, UK Based on

The Independent Stream an Introduction Nevil Brownlee Independent Submissions Editor IETF

MSc Advanced Computing, MSc Computing (Spec.) Comp. 4th year, ISE 4th year & JMC 4th year 480

1 Definition of CPU execution time CPI -- Cycles per instruction - PDF document

Performance of computer systems Moores Law Many different factors among which: Technology Raw speed of the circuits (clock, switching time) Process technology (how many transistors on a chip) Organization What type

Performance Hakim Weatherspoon CS 3410 Computer Science Cornell University [Weatherspoon,

Sparse Approximate Inverse Preconditioners Revisited Salvatore Filippone Daniele Bertaccini

THE CASE MOLLA SALI VS. GREECE ECHR 19 TH DECEMBER 2018, APPL. 20452/2014 D R M ARCO R IZZUTI

Adaptive Interventions: What are they? Why do we need them? and How can we study them? Daniel

Notes from SUSY2018 Conference Romain Madar 06/2018 Romain Madar Notes from SUSY2018 Conference

Wheeled Rob 17. Wheeled Robots Guy Campion, Woojin Chung 17.2.5 Characterization of Robot

Darboux integrating factors of planar polynomial vector fields: Inverse problems (A Related

Updated Anti-neutrino Oscillation Results from MiniBooNE Byron Roe University of Michigan For

INFO 4300 / CS4300 Information Retrieval slides adapted from Hinrich Sch utzes, linked from

ESG integration in the long-term investment process AUTHOR: Matt Whineray CEO, NZ SUPER FUND

Hidden Markov Models Pratik Lahiri Introduction A hidden Markov model (HMM) is a

Culture eats strategy for breakfast - Peter Drucker 2 12/2/18 Collective Impact Backbone Staff

Terrain Unitys Terrain editor islands topographical landscapes Mountains And

Data Breaches, Credit Card Fraud, Front Page News Are You Next? Calvin Weeks EnCE, CEDS,

Mitigation Journey Richard Ottley Head, Manufacturing Excellence The Response to Crisis

How can we strengthen financial market preparedness? Kerstin af Jochnick First Deputy Governor

2/7/2019 MIDC LEADERSHIP CONFERENCE FEBRUARY 2019 1 2 OBJECTIVES Meet other leaders around

C. Mallorca 260 08008 Barcelona Tel. 932 155 989 www.auren.com Non Financial Information: key

Development of Optimized Radar Data Assimilation Capability within the Fully Coupled EnKFEnVar

IDAPA 02.02.14 Weights &amp; Measures Negotiated Rulemaking JUNE 23, 2020 9:00-11:30AM

Technology Platform Segmentation

Theory of Parallel Evolutionary Algorithms Dirk Sudholt University of Sheffield, UK Based on

The Independent Stream an Introduction Nevil Brownlee Independent Submissions Editor IETF

MSc Advanced Computing, MSc Computing (Spec.) Comp. 4th year, ISE 4th year &amp; JMC 4th year 480

IDAPA 02.02.14 Weights & Measures Negotiated Rulemaking JUNE 23, 2020 9:00-11:30AM

MSc Advanced Computing, MSc Computing (Spec.) Comp. 4th year, ISE 4th year & JMC 4th year 480