CS305 Computer Architecture Fall 2009 Lecture 04 Bhaskaran Raman - PowerPoint PPT Presentation

CS305 Computer Architecture Fall 2009 Lecture 04 Bhaskaran Raman Department of CSE, IIT Bombay http://www.cse.iitb.ac.in/~br/ http://www.cse.iitb.ac.in/synerg/doku.php?id=public:courses:cs305-fall09:start

Today's Topics ● Performance metrics, CPI ● Performance comparison ● Benchmarks

Performance Comparison ● What performance metric to use? ● User cares about response time ● Performance is inversely proportional ● What is execution time? ● Response time ● CPU time: User time + System time ● System performance vs. CPU performance ● Throughput vs. response-time ● We will focus on CPU performance

Which Program's Execution Time? ● Real “workload” is ideal ● Practical options: ● Real programs: compilers, office-suite, scientific... ● Kernels: key pieces of programs – Example: Livermore loops ● Toy benchmarks: small programs – Examples: Quick-sort, tower of Hanoi... ● Synthetic benchmarks: try to capture “average” frequency of instructions in real programs – Example: Whetstone, Dhrystone

More on Performance Comparisons... ● Caveat of benchmarks ● They are needed ● But manufacturers tend to optimize for benchmarks ● Need to be updated periodically ● Benchmark suite: collection of programs ● E.g. SPEC2000 ● Reporting performance ● Reproducibility: program version, compiler, flags ● SPEC specifies compiler flags for baseline comparison

Some Numerics... Computer A Computer B Computer C Program P1 (secs) 1 10 20 Program P2 (secs) 1000 100 20 Total (secs) 1001 110 40 ● Total (or average) execution time is a possible metric ● Weighted execution time is better  W i × T i

Normalizing the Performance Norm(A)Norm(A)Norm(A)Norm(B)Norm(B)Norm(B)Norm(C)Norm(C)Norm(C) A B C A B C A B C P1 1 10 20 0.1 1 2 0.05 0.5 1 P2 1 0.1 0.02 10 1 0.2 50 5 1 AM 1 5.05 10.01 5.05 1 1.1 25.03 2.75 1 GM 1 1 0.63 1 1 0.63 1.58 1.58 1 ● Normalize such that all programs take the same time, on some machine ● Arithmetic mean predicts performance ● Geometric mean?

Summary ● Performance inversely proportional to execution- time ● We are concerned with CPU time of unloaded machine ● Weighted execution time with weights from real workload is ideal ● Else, normalize w.r.t one machine

Amdahl's Law ● Amdahl's law: ● Diminishing returns 1-F 1-F ● Limit on overall speedup F/Speedup ● Corollary: make the F common case fast

Amdahl's Law ● Amdahl's law: 1-F ● Diminishing returns ● Limit on overall speedup F  1 − F  F Overall speedup = F  1 − F  Speedup 1-F ● Corollary: make the common case fast F/Speedup

Illustrating Amdahl's Law ● Example: implement faster memory, or faster ALU? ● Proposed memory speedup: 10x ● Proposed ALU speedup: 3x ● Depends on fraction of instructions – Suppose F mem = 0.2, F alu = 0.5, F other = 0.3 1 Speedup with faster memory = 0.8  0.2 / 10 = 1.22 1 Speedup with faster ALU = 0.5  0.5 / 3 = 1.5

Example continued... F alu = 0.5 ● Fixing for what value of is F mem going for a faster memory better? 1 1 − F mem  F mem / 10  1.5 ⇒ F mem  10 27 = 0.36

The CPU Performance Equation CPU time = Num.clock cycles × Clock cycletime OR CPU time = Num.of clock cycles ÷ Clock rate For a program, Num.of clock cycles = InstructionCount × Cycles Per Instruction = IC × CPI Putting these together CPU time = IC × CPI × Cycletime

More on the Equation ● This form is convenient ● Involves many relevant parameters ● Remembering is easy CPU time = Seconds Program = Seconds Clock cycle × Clock cycles Instruction × Instructions Program ● With CPI as the independent variable CPU time CPI = Clock cycletime × IC

Other Convenient Forms of the Equation ● Number of clock cycles can be counted as: n CPU clock cycles = ∑ CPI i × IC i i = 1 n Hence ,CPU time = ∑ CPI i × IC i × Clock cycletime i = 1 ● Calculating in terms of CPI CPI i n IC i Clock cycletime × IC = ∑ CPU time CPI i × IC  CPI = i = 1

Usefulness of the Equation ● easier to measure than IC i F i ● Equivalently, is measured through F i IC i ● Equation includes relevant parameters such as the cycle time

Measuring the Parameters for the Equation ● Clock cycle time: ● Easy for existing architectures ● Needs to be estimated in the design process ● Instruction Count: ● Requires a compiler ● And, simulator/interpreter, or instrumentation code ● CPI for each instruction type: ● Easy for simple architectures ● Pipelines, caches introduce complications ● Need to simulate and measure average CPI

A Design Example ● A design choice for conditional branch instructions: ● Choice 1: condition code is set by a compare instruction, checked by the next (branch) instruction – 20% instructions are branches, and another 20% are compares – 2 cycles per branch, 1 cycle for all others – Clock-rate is 25% faster ● Choice 2: single instruction for compare and branch ● Which choice is better?

Solution for Design Example CPU time 1 = IC 1 ×[ 0.8 × 1  0.2 × 2 ] = IC 1 C × 1.2 1.25 × C 1.25 CPU time 2 = IC 1 ×[ 0.6 × 1  0.2 × 2 ] = IC 1 C C

CS305 Computer Architecture Fall 2009 Lecture 04 Bhaskaran Raman - PowerPoint PPT Presentation

CS305 Computer Architecture Fall 2009 Lecture 04 Bhaskaran Raman Department of CSE, IIT Bombay http://www.cse.iitb.ac.in/~br/ http://www.cse.iitb.ac.in/synerg/doku.php?id=public:courses:cs305-fall09:start Today's Topics Performance

CS305 Computer Architecture Fall 2009 Lecture 03 Bhaskaran Raman Department of CSE, IIT Bombay

CS305 Computer Architecture Autumn 2011 Lecture 01 Bhaskaran Raman Department of CSE, IIT

CS305 Social, Ethical, and Legal Implications of Computing Spring 2010 Prof. Harry Porter

CS305 Topic Introduction to Ethics Sources: Baase: A Gift of Fire and Quinn: Ethics for the

CS305 Topic Reliability Errors in Computer Systems Impacts of Computer Errors

CS305 Topic Privacy Concept Evolution Rights to Privacy Privacy and Technologies

CS305 Speech and Paper Need to start looking for a topic (or topics) Pre-talk slides and

CS305 Topic Other Impacts Productivity and jobs Work environment Globalization

Software Engineering CS305, Autumn 2020 Nikhil Hegde, IIT Dharwad 1 Software Engineering

Software Engineering CS305, Autumn 2020 Week 2 Nikhil Hegde, IIT Dharwad 1 Last Week

Fall to Fall Enrollment Comparison Fall to Fall Enrollment Comparison Student FTE, Fall 2000

Seasonal Outreach Fall Fall Outreach Campaign Fall Outreach Campaign Fall Outreach Fall

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

Architecture: Culture and Space Architecture: Culture and Space Architecture: Culture and Space

ICS 233 ICS 233 ICS 233 ICS 233 Computer Architecture & Computer Architecture &

CSE 675.02: three aspects of computer design: instruction set architecture, Introduction to

Lecture 2: Architectural Performance Laws and Rules of Thumb Prof. V. Catania Lab. Calcolatori

How to get peak FLOPS (CPU) What I wish I knew when I was twenty about CPU Kenjiro Taura

Computer Organization & Assembly Language Programming (CSE 2312) Lecture 3 Taylor Johnson

Chapter Chapter 1 Computer Abstractions and Technology 1.1 Introduction The Computer

Cache 10/27/16 The Memory Hierarchy Smaller On 1 cycle to access Chip Faster Registers CPU

1 Memory Read Transaction (1) Memory Read Transaction (2) CPU places address A on the memory

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Empirical Analysis of SLS Algorithms adapted and extended from slides for SLS:FA, Chapter 4

Sambuz

Useful Links

Newsletter

Mail Us

CS305 Computer Architecture Fall 2009 Lecture 04 Bhaskaran Raman - PowerPoint PPT Presentation

CS305 Computer Architecture Fall 2009 Lecture 04 Bhaskaran Raman Department of CSE, IIT Bombay http://www.cse.iitb.ac.in/~br/ http://www.cse.iitb.ac.in/synerg/doku.php?id=public:courses:cs305-fall09:start Today's Topics Performance

CS305 Computer Architecture Fall 2009 Lecture 03 Bhaskaran Raman Department of CSE, IIT Bombay

CS305 Computer Architecture Autumn 2011 Lecture 01 Bhaskaran Raman Department of CSE, IIT

CS305 Social, Ethical, and Legal Implications of Computing Spring 2010 Prof. Harry Porter

CS305 Topic Introduction to Ethics Sources: Baase: A Gift of Fire and Quinn: Ethics for the

CS305 Topic Reliability Errors in Computer Systems Impacts of Computer Errors

CS305 Topic Privacy Concept Evolution Rights to Privacy Privacy and Technologies

CS305 Speech and Paper Need to start looking for a topic (or topics) Pre-talk slides and

CS305 Topic Other Impacts Productivity and jobs Work environment Globalization

Software Engineering CS305, Autumn 2020 Nikhil Hegde, IIT Dharwad 1 Software Engineering

Software Engineering CS305, Autumn 2020 Week 2 Nikhil Hegde, IIT Dharwad 1 Last Week

Fall to Fall Enrollment Comparison Fall to Fall Enrollment Comparison Student FTE, Fall 2000

Seasonal Outreach Fall Fall Outreach Campaign Fall Outreach Campaign Fall Outreach Fall

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

Architecture: Culture and Space Architecture: Culture and Space Architecture: Culture and Space

ICS 233 ICS 233 ICS 233 ICS 233 Computer Architecture &amp; Computer Architecture &amp;

CSE 675.02: three aspects of computer design: instruction set architecture, Introduction to

Lecture 2: Architectural Performance Laws and Rules of Thumb Prof. V. Catania Lab. Calcolatori

How to get peak FLOPS (CPU) What I wish I knew when I was twenty about CPU Kenjiro Taura

Computer Organization &amp; Assembly Language Programming (CSE 2312) Lecture 3 Taylor Johnson

Chapter Chapter 1 Computer Abstractions and Technology 1.1 Introduction The Computer

Cache 10/27/16 The Memory Hierarchy Smaller On 1 cycle to access Chip Faster Registers CPU

1 Memory Read Transaction (1) Memory Read Transaction (2) CPU places address A on the memory

CPU Scheduling CPU Scheduling CPU Scheduling 101 CPU Scheduling 101 The CPU scheduler makes a

Empirical Analysis of SLS Algorithms adapted and extended from slides for SLS:FA, Chapter 4

Sambuz

Useful Links

Newsletter

Mail Us

ICS 233 ICS 233 ICS 233 ICS 233 Computer Architecture & Computer Architecture &

Computer Organization & Assembly Language Programming (CSE 2312) Lecture 3 Taylor Johnson