Evaluating the Cost of Atomic Operations on Modern Architectures M - PowerPoint PPT Presentation

spcl.inf.ethz.ch @spcl_eth A TOMICS : P ERFORMANCE D IMENSIONS Modified : in one cache and dirty Cache coherence state? Exclusive : in one cache and clean Shared : in >1 cache and clean

spcl.inf.ethz.ch @spcl_eth A TOMICS : P ERFORMANCE D IMENSIONS Modified : in one cache and dirty Cache coherence state? Exclusive : in one cache and clean Shared : in >1 cache and clean Invalid : garbage data

spcl.inf.ethz.ch @spcl_eth A TOMICS : P ERFORMANCE D IMENSIONS Architecture

spcl.inf.ethz.ch @spcl_eth R ESEARCH Q UESTIONS

spcl.inf.ethz.ch @spcl_eth R ESEARCH Q UESTIONS How do we model the performance of atomics?

spcl.inf.ethz.ch @spcl_eth R ESEARCH Q UESTIONS What is the How do we model the performance performance of difference between atomics? various atomics?

spcl.inf.ethz.ch @spcl_eth R ESEARCH Q UESTIONS What is the How do we model the performance performance of difference between atomics? various atomics? What is the influence of various parameters and mechanisms?

spcl.inf.ethz.ch @spcl_eth L ATENCY M ODEL Core … Cache Cache Cache Cache line

spcl.inf.ethz.ch @spcl_eth L ATENCY M ODEL Core Read for ownership … Cache Cache Cache Cache line

spcl.inf.ethz.ch @spcl_eth L ATENCY M ODEL Cache coherence state Core Read for ownership … Cache Cache Cache Cache line

spcl.inf.ethz.ch @spcl_eth L ATENCY M ODEL Cache coherence state Core Read for ownership = max(read latency, invalidation latency) … Cache Cache Cache Cache line

spcl.inf.ethz.ch @spcl_eth L ATENCY M ODEL Cache coherence state Core Read for ownership Cache line = max(read latency, invalidation latency) … Cache Cache Cache Cache line

spcl.inf.ethz.ch @spcl_eth L ATENCY M ODEL Cache coherence state Core Read for ownership Cache line = max(read latency, invalidation latency) Execute = constant … Cache Cache Cache Cache line

spcl.inf.ethz.ch @spcl_eth L ATENCY M ODEL Cache coherence state Core Read for ownership Cache line = max(read latency, invalidation latency) Execute = constant … Cache Cache Cache Cache line Atomic

spcl.inf.ethz.ch @spcl_eth L ATENCY M ODEL Cache coherence state Core Read for ownership Cache line = max(read latency, invalidation latency) Execute = constant … Cache Cache Cache Cache line Atomic Atomic

spcl.inf.ethz.ch @spcl_eth L ATENCY M ODEL Cache coherence state Core Read for ownership Cache line = max(read latency, invalidation latency) Execute = constant … Cache Cache Cache Cache line Atomic Cache Atomic coherence state

spcl.inf.ethz.ch @spcl_eth L ATENCY M ODEL E XCLUSIVE OR M ODIFIED S TATE Core Read for ownership Cache line = max(read latency, invalidation latency) Execute … Cache Cache Cache = constant Cache line Cache Atomic coherence state

spcl.inf.ethz.ch @spcl_eth L ATENCY M ODEL E XCLUSIVE OR M ODIFIED S TATE Core Read for ownership Cache line = read latency Execute … Cache Cache Cache = constant Cache line Cache Atomic coherence state

spcl.inf.ethz.ch @spcl_eth L ATENCY M ODEL E XCLUSIVE OR M ODIFIED S TATE mean of observed observed data predictions data Core Read for ownership Cache line = read latency Execute … Cache Cache Cache = constant Cache line Cache Atomic coherence state

spcl.inf.ethz.ch @spcl_eth L ATENCY H ASWELL , E XCLUSIVE

spcl.inf.ethz.ch @spcl_eth L ATENCY B ULLDOZER , E XCLUSIVE FAA CAS

spcl.inf.ethz.ch @spcl_eth L ATENCY H ASWELL , E XCLUSIVE Alignment?

spcl.inf.ethz.ch @spcl_eth L ATENCY Operand B ULLDOZER , E XCLUSIVE size? 64 bit 128 bit

spcl.inf.ethz.ch @spcl_eth B ANDWIDTH H ASWELL , A TOMICS

spcl.inf.ethz.ch @spcl_eth C ONCLUSIONS P ERFORMANCE I NSIGHTS

spcl.inf.ethz.ch @spcl_eth C ONCLUSIONS P ERFORMANCE I NSIGHTS The same latency of different atomics in most scenarios

spcl.inf.ethz.ch @spcl_eth C ONCLUSIONS P ERFORMANCE I NSIGHTS The same latency of different atomics in most scenarios CAS is the fastest for some cases

spcl.inf.ethz.ch @spcl_eth C ONCLUSIONS P ERFORMANCE I NSIGHTS Unaligned atomics should be avoided at all costs The same latency of different atomics in most scenarios CAS is the fastest for some cases

spcl.inf.ethz.ch @spcl_eth C ONCLUSIONS P ERFORMANCE I NSIGHTS Unaligned atomics should be avoided at all costs The same latency of different atomics in most scenarios No parallel execution (low bandwidth) even if there are no data deps CAS is the fastest for some cases

spcl.inf.ethz.ch @spcl_eth C ONCLUSIONS P ERFORMANCE I NSIGHTS Unaligned atomics should be avoided at all costs The same latency of different atomics in most scenarios No parallel execution (low bandwidth) even if there are no data deps CAS is the fastest for some cases Small operand sizes give best performance

Evaluating the Cost of Atomic Operations on Modern Architectures M - PowerPoint PPT Presentation

spcl.inf.ethz.ch @spcl_eth Evaluating the Cost of Atomic Operations on Modern Architectures M ACIEJ B ESTA , H ERMANN S CHWEIZER , T ORSTEN H OEFLER spcl.inf.ethz.ch @spcl_eth L ARGE -S CALE I RREGULAR G RAPH P ROCESSING spcl.inf.ethz.ch

MODERN 1 MODERN 2 MODERN 3 MODERN 4 MODERN A peep at some distant orb has power to raise

DK - Batteridrevet vakuum lfter AL-Atomic 500 D - Batteriebetrieber Vakuumheber AL-Atomic 500

TUTORIAL - TUTORIAL -ABC ABC TOTAL COST for a COST OBJECT TOTAL COST for a COST OBJECT

Atomic page flip and mode setting Hardware structure and abstraction Atomic page flip The

Cost Report Capital Cost Operating Cost (Up front cost) (Annual cost over time) Utilities

Cost Allocation Plans and Indirect Cost Rates Cost Allocation Plans and Indirect Cost Rates

Modern Risk Modern Risk Modern Risk Management Modern Risk Management anagement Concepts:

Unit 1 Atomic Structure and Nuclear Chemistry Introduction to the atom Modern Atomic Theory

Chapter 4 Chapter 4 Marginal Costing and Cost-Volume-Profit Analysis Cost behaviour Cost

The Atomic Simulation Environment Ask Hjorth Larsen and the ASE development team Abinit

Cesium By Olivia H., P.10 Cesium Atomic Symbol: Cs State at room temperature: solid Atomic

Efficiency of equilibria Non-atomic routing games Non-atomic routing games Definition:

Atomic Workstation Kalev Lember, Red Hat desktop team DevConf.cz 2018 What is Fedora Atomic

Atomic Physics Accelerator Facility at Darmstadt, Warsaw, November 24, 2003 Atomic Physics at

Modern OLTP Indexes (Part 2) 1 / 43 Modern OLTP Indexes (Part 2) Recap Recap 2 / 43 Modern OLTP

COST European Cooperation in Science and Technology Introduction to the COST Framework Programme

Execute shell commands in subprocess COMMAN D LIN E AUTOMATION IN P YTH ON Noah Gift

KAIROS: Incremental Verification in High-Level Synthesis through Latency-Insensitive Design Luca

The Pressing Need for Electromigration-Aware Physical Design 1 Jens Lienig, Matthias Thiele

Approaches to imputing missing data in complex survey data Christine Wells, Ph.D. IDRE UCLA

ReverCSP: Time-travelling in CSP computations Carlos Galindo 1 Naoki Nishida 2 Josep Silva 1

Functions and procedures Rules of Processing Problem statement (short form) ;; Data Definition

1 Last class: Process Creation Today: Process Management 2 Process Description 3

Stability and Scalability in Global Rou0ng S. K. Han 1 ,