CS533 benchmark v. trans. To subject (a system) to a series of tests - PDF document

Types of Workloads CS533 benchmark v. trans. To subject (a system) to a series of tests Modeling and Performance In order to obtain prearranged results not available on Competitive systems. – S. Kelly-Bootle, The Devil’s DP Dictionary Evaluation of Network and • Test workload – denotes any workload used in Computer Systems performance study • Real workload – one observed on a system while being used. – cannot be repeated (easily) Types of Workloads – may not even exist (proposed system) • Synthetic workload – similar characteristics to real workload – can be applied in a repeated manner – relatively easy to port • Benchmark == Workload (Chapter 4) – Benchmarking is process of comparing 2+ systems 1 2 with workloads Outline Addition Instructions • Introduction • Early computers had CPU as most • Addition instructions expensive component • Most frequent operation was addition • Instruction mixes • Computer with faster addition instruction • Kernels • Synthetic programs performed better • So, run many addition operations as test • Application benchmarks workload • Problem – More instructions used – Some more complicated than others 3 4 Instruction Mixes Example: Gibson Instruction Mix • Number and complexity of instructions 1. Load and Store 13.2 increased 2. Fixed-Point Add/Sub 6.1 • Could measure instructions individually, but 3. Compares 3.8 4. Branches 16.6 used in different amounts 1959, 5. Float Add/Sub 6.9 IBM 650 – Measure relative frequencies of various IBM 704 6. Float Multiply 3.8 instructions on real systems 7. Float Divide 1.5 – Use as weighting factors to get avg instruction 8. Fixed-Point Multiply 0.6 time 9. Fixed-Point Divide 0.2 � Instruction mixes 10. Shifting 4.4 • Units are 11. Logical And/Or 1.6 12. Instructions not using regs 5.3 • Millions of Instructions Per Second (MIPS) 13. Indexing 18.0 • Millions of Floating-Point Ops per Sec (MFLOPS) Total 100 5 6 1

Problems with Instruction Mixes Kernels • In modern systems, instruction time • Used set of instructions that made up a variable depending upon service provided by processor. A kernel . – Addressing modes, cache hit rates, pipelining – Early on, did not consider I/O so also called a processing kernel – Interference with other devices during • Set of operations for problem processor-memory access – Distribution of zeros in multiplier – Ex: Sieve, Tree Searching, Matrix Inversion • Some problems such as zeros and branches – Times a conditional branch is taken • Mixes do not reflect special hardware such don’t apply as page table lookups • Problem • Only represents speed of processor – I/O still not considered – Bottleneck may be in other parts of system 7 8 Example of Synthetic Programs Synthetic • Add I/O request to test load Workload • Add control loop so can make request as Generation frequently as needed • Easy to port, distribute Program • Can have measurement data built in • Still, does not necessarily make Buckholz, 1969 representative memory or disk accesses • Often small, so do not exercise virtual memory 9 10 Application Workloads Popular Benchmarks: Sieve (1 of 2) • For special-purpose system, may be able to • Sieve of Eratosthenes (finds primes) • Write down all numbers 1 to n run representative applications as measure of performance • Strike out multiples of k for k = 2, 3, 5 … – Ex: airline reservation sqrt( n ) – Ex: banking • Make use of entire system (I/O, etc). – In steps of remaining numbers • Issues may be – input parameters – multiuser • Only applicable when specific applications are targeted 11 12 2

Popular Benchmarks: Ackermann’s Popular Benchmarks: Sieve (2 of 2) Function (1 of 2) • Assess efficiency of procedure calling mechanisms • Ackermann’s Function has two parameters, is recursive – Benchmark is to call Ackerman(3, n ) for values of n = 1 to 6 • Return value is 2 n +3 -3, can be used to verify implementation • Number of calls: (512x4 n -1 – 15x2 n+3 + 9 n + 37)/3 – Can be used to compute time per call • Depth is 2 n +3 – 4, stack space doubles n ++ 13 14 Popular Benchmarks: Whetstone • Set of 11 modules designed to match Popular observed frequencies in ALGOL programs Benchmarks: – Array addressing, arithmetic, subroutine Ackermann’s calls, parameter passing – Ported to Fortran, most popular in C, … Function • Many variations of Whetstone, so take (2 of 2) care when comparing results • Problems – specific kernel – only valid for small, scientific (floating) apps that fit in cache (Simula) – Does not exercise I/O 15 16 Popular Benchmarks: LINPACK Popular Benchmarks: Dhrystone • Programs that solve dense systems of • Pun on Whetstone • Intent to represent systems programming linear equations – Many float adds and multiplies environments – Core is Basic Linear Algebra Subprograms • Most common was in C, but many versions (BLAS), called repeatedly • Low nesting depth and instructions in each • Usually, solve 100x100 system of equations • Represents mechanical engineering call • Large amount of time copying strings applications on workstations • Mostly integer performance with no float – Drafting to finite element analysis – High computation speed and good graphics operations processing 17 18 3

Popular Benchmarks: Lawrence Popular Benchmarks: Debit-Credit Livermore Loops • Was Defacto Standard for Transaction • 24 vectorizable, scientific tests • Floating point operations Processing Systems • Retail bank wanted 1000 branches, 10k – Physics and chemistry apps have found 40- tellers, 10000k accounts online with peak 60% floating point operations load of 100 TPS • Relevant for: fluid dynamics, airplane • Performance in TPS where 95% of all design, weather modeling transactions with 1 second or less of response time (arrival of last bit, sending of first bit) • Now, Transaction Processing Council (TPC) has made more precise benchmarks – TPC-A, TPC-B, TCP-C 19 20 Popular Benchmarks: SPEC • Systems Performance Evaluation Cooperative (SPEC) (http://www.spec.org) – Non-profit, leading computer vendors – Suite of benchmarks • CPU2000: CPUINT and CPUFP – Making CPU2004 • Graphics • Systems and Applications: – Web, Java Client-Server, Network Files System, Mail • Results database • Performance compared to baseline machine 21 4

CS533 benchmark v. trans. To subject (a system) to a series of tests - PDF document

Types of Workloads CS533 benchmark v. trans. To subject (a system) to a series of tests Modeling and Performance In order to obtain prearranged results not available on Competitive systems. S. Kelly-Bootle, The Devils DP Dictionary

Synchronization presented by Radu Teodorescu CS533 Why we need it? Parallel programs share

CS533 Concepts of Operating Systems Class 2 Thread vs Event-Based Programming Questions Why

CS533 done Work or School or Modeling and Performance Describe a performance study

CS533 No experiment is ever a complete failure. It can always serve as a negative Modeling and

CS533 One or more systems, real or hypothetical Modeling and Performance You want to

CS533 Modeling and Performance That which is monitored improves. Source unknown A monitor

CS533 Concepts of Operating Systems Linux Kernel Locking Techniques Intro to kernel locking

Spin Lock Performance Introduction Shared memory multiprocessors o Various different

SEDA: An Architecture for Well- Conditioned Scalable Internet Services Overview What does

The Structuring of Systems Using Upcalls David D. Clark Presented by: Peter Banda The

Why Events Are A Bad Idea (for high-concurrency servers) Rob von Behren, Jeremy Condit and Eric

= x ... What is a Statistic ? What are Statistic s ? A quantity that is computed

The Structure of the THE -Multiprogramming System Edsger W. Dijkstra Technological

Shared Memory Consistency Models: A Tutorial Outline Concurrent programming on a uniprocessor

Lightweight Remote Procedure Call Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska,

The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors Author: Thomas E.

Synthetic Difference in Differences Dmitry Arkhangelsky Susan Athey David Hirshberg Guido

Lattice Synergy Curtis A. Meyer Carnegie Mellon University May 15 th , 2009 Lattice QCD

Radio recombination lines: the synergy between a big dish and dipoles Pedro Salas The Big

The Peter Wall Institute for Advanced Studies: Opening the Dialogue Jon Beasley-Murray

What deep generative models can do for you: Opportunities, challenges, and open questions Giulia

High-frequency imaging of a moving object Clifford Nolan University of Limerick Conference in

Ytterbium quantum gases in Florence Leonardo Fallani University of Florence & LENS Credits

Welfare Dynamics Measurement: Two Definitions of a Vulnerability Line and Their Application

CS533 benchmark v. trans. To subject (a system) to a series of tests - PDF document

Types of Workloads CS533 benchmark v. trans. To subject (a system) to a series of tests Modeling and Performance In order to obtain prearranged results not available on Competitive systems. S. Kelly-Bootle, The Devils DP Dictionary

Synchronization presented by Radu Teodorescu CS533 Why we need it? Parallel programs share

CS533 Concepts of Operating Systems Class 2 Thread vs Event-Based Programming Questions Why

CS533 done Work or School or Modeling and Performance Describe a performance study

CS533 No experiment is ever a complete failure. It can always serve as a negative Modeling and

CS533 One or more systems, real or hypothetical Modeling and Performance You want to

CS533 Modeling and Performance That which is monitored improves. Source unknown A monitor

CS533 Concepts of Operating Systems Linux Kernel Locking Techniques Intro to kernel locking

Spin Lock Performance Introduction Shared memory multiprocessors o Various different

SEDA: An Architecture for Well- Conditioned Scalable Internet Services Overview What does

The Structuring of Systems Using Upcalls David D. Clark Presented by: Peter Banda The

Why Events Are A Bad Idea (for high-concurrency servers) Rob von Behren, Jeremy Condit and Eric

= x ... What is a Statistic ? What are Statistic s ? A quantity that is computed

The Structure of the THE -Multiprogramming System Edsger W. Dijkstra Technological

Shared Memory Consistency Models: A Tutorial Outline Concurrent programming on a uniprocessor

Lightweight Remote Procedure Call Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska,

The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors Author: Thomas E.

Synthetic Difference in Differences Dmitry Arkhangelsky Susan Athey David Hirshberg Guido

Lattice Synergy Curtis A. Meyer Carnegie Mellon University May 15 th , 2009 Lattice QCD

Radio recombination lines: the synergy between a big dish and dipoles Pedro Salas The Big

The Peter Wall Institute for Advanced Studies: Opening the Dialogue Jon Beasley-Murray

What deep generative models can do for you: Opportunities, challenges, and open questions Giulia

High-frequency imaging of a moving object Clifford Nolan University of Limerick Conference in

Ytterbium quantum gases in Florence Leonardo Fallani University of Florence &amp; LENS Credits

Welfare Dynamics Measurement: Two Definitions of a Vulnerability Line and Their Application

Ytterbium quantum gases in Florence Leonardo Fallani University of Florence & LENS Credits