evaluating the cost of atomic operations on modern
play

Evaluating the Cost of Atomic Operations on Modern Architectures M - PowerPoint PPT Presentation

spcl.inf.ethz.ch @spcl_eth Evaluating the Cost of Atomic Operations on Modern Architectures M ACIEJ B ESTA , H ERMANN S CHWEIZER , T ORSTEN H OEFLER spcl.inf.ethz.ch @spcl_eth L ARGE -S CALE I RREGULAR G RAPH P ROCESSING spcl.inf.ethz.ch


  1. spcl.inf.ethz.ch @spcl_eth A TOMICS : P ERFORMANCE D IMENSIONS Modified : in one cache and dirty Cache coherence state? Exclusive : in one cache and clean Shared : in >1 cache and clean

  2. spcl.inf.ethz.ch @spcl_eth A TOMICS : P ERFORMANCE D IMENSIONS Modified : in one cache and dirty Cache coherence state? Exclusive : in one cache and clean Shared : in >1 cache and clean Invalid : garbage data

  3. spcl.inf.ethz.ch @spcl_eth A TOMICS : P ERFORMANCE D IMENSIONS Architecture

  4. spcl.inf.ethz.ch @spcl_eth A TOMICS : P ERFORMANCE D IMENSIONS Architecture

  5. spcl.inf.ethz.ch @spcl_eth A TOMICS : P ERFORMANCE D IMENSIONS Architecture

  6. spcl.inf.ethz.ch @spcl_eth A TOMICS : P ERFORMANCE D IMENSIONS Architecture

  7. spcl.inf.ethz.ch @spcl_eth A TOMICS : P ERFORMANCE D IMENSIONS Architecture

  8. spcl.inf.ethz.ch @spcl_eth R ESEARCH Q UESTIONS

  9. spcl.inf.ethz.ch @spcl_eth R ESEARCH Q UESTIONS How do we model the performance of atomics?

  10. spcl.inf.ethz.ch @spcl_eth R ESEARCH Q UESTIONS What is the How do we model the performance performance of difference between atomics? various atomics?

  11. spcl.inf.ethz.ch @spcl_eth R ESEARCH Q UESTIONS What is the How do we model the performance performance of difference between atomics? various atomics? What is the influence of various parameters and mechanisms?

  12. spcl.inf.ethz.ch @spcl_eth L ATENCY M ODEL Core … Cache Cache Cache Cache line

  13. spcl.inf.ethz.ch @spcl_eth L ATENCY M ODEL Core … Cache Cache Cache Cache line

  14. spcl.inf.ethz.ch @spcl_eth L ATENCY M ODEL Core Read for ownership … Cache Cache Cache Cache line

  15. spcl.inf.ethz.ch @spcl_eth L ATENCY M ODEL Cache coherence state Core Read for ownership … Cache Cache Cache Cache line

  16. spcl.inf.ethz.ch @spcl_eth L ATENCY M ODEL Cache coherence state Core Read for ownership = max(read latency, invalidation latency) … Cache Cache Cache Cache line

  17. spcl.inf.ethz.ch @spcl_eth L ATENCY M ODEL Cache coherence state Core Read for ownership Cache line = max(read latency, invalidation latency) … Cache Cache Cache Cache line

  18. spcl.inf.ethz.ch @spcl_eth L ATENCY M ODEL Cache coherence state Core Read for ownership Cache line = max(read latency, invalidation latency) … Cache Cache Cache Cache line

  19. spcl.inf.ethz.ch @spcl_eth L ATENCY M ODEL Cache coherence state Core Read for ownership Cache line = max(read latency, invalidation latency) Execute = constant … Cache Cache Cache Cache line

  20. spcl.inf.ethz.ch @spcl_eth L ATENCY M ODEL Cache coherence state Core Read for ownership Cache line = max(read latency, invalidation latency) Execute = constant … Cache Cache Cache Cache line Atomic

  21. spcl.inf.ethz.ch @spcl_eth L ATENCY M ODEL Cache coherence state Core Read for ownership Cache line = max(read latency, invalidation latency) Execute = constant … Cache Cache Cache Cache line Atomic

  22. spcl.inf.ethz.ch @spcl_eth L ATENCY M ODEL Cache coherence state Core Read for ownership Cache line = max(read latency, invalidation latency) Execute = constant … Cache Cache Cache Cache line Atomic Atomic

  23. spcl.inf.ethz.ch @spcl_eth L ATENCY M ODEL Cache coherence state Core Read for ownership Cache line = max(read latency, invalidation latency) Execute = constant … Cache Cache Cache Cache line Atomic Cache Atomic coherence state

  24. spcl.inf.ethz.ch @spcl_eth L ATENCY M ODEL E XCLUSIVE OR M ODIFIED S TATE Core Read for ownership Cache line = max(read latency, invalidation latency) Execute … Cache Cache Cache = constant Cache line Cache Atomic coherence state

  25. spcl.inf.ethz.ch @spcl_eth L ATENCY M ODEL E XCLUSIVE OR M ODIFIED S TATE Core Read for ownership Cache line = read latency Execute … Cache Cache Cache = constant Cache line Cache Atomic coherence state

  26. spcl.inf.ethz.ch @spcl_eth L ATENCY M ODEL E XCLUSIVE OR M ODIFIED S TATE mean of observed observed data predictions data Core Read for ownership Cache line = read latency Execute … Cache Cache Cache = constant Cache line Cache Atomic coherence state

  27. spcl.inf.ethz.ch @spcl_eth L ATENCY H ASWELL , E XCLUSIVE

  28. spcl.inf.ethz.ch @spcl_eth L ATENCY B ULLDOZER , E XCLUSIVE FAA CAS

  29. spcl.inf.ethz.ch @spcl_eth L ATENCY H ASWELL , E XCLUSIVE Alignment?

  30. spcl.inf.ethz.ch @spcl_eth L ATENCY Operand B ULLDOZER , E XCLUSIVE size? 64 bit 128 bit

  31. spcl.inf.ethz.ch @spcl_eth B ANDWIDTH H ASWELL , A TOMICS

  32. spcl.inf.ethz.ch @spcl_eth C ONCLUSIONS P ERFORMANCE I NSIGHTS

  33. spcl.inf.ethz.ch @spcl_eth C ONCLUSIONS P ERFORMANCE I NSIGHTS The same latency of different atomics in most scenarios

  34. spcl.inf.ethz.ch @spcl_eth C ONCLUSIONS P ERFORMANCE I NSIGHTS The same latency of different atomics in most scenarios CAS is the fastest for some cases

  35. spcl.inf.ethz.ch @spcl_eth C ONCLUSIONS P ERFORMANCE I NSIGHTS Unaligned atomics should be avoided at all costs The same latency of different atomics in most scenarios CAS is the fastest for some cases

  36. spcl.inf.ethz.ch @spcl_eth C ONCLUSIONS P ERFORMANCE I NSIGHTS Unaligned atomics should be avoided at all costs The same latency of different atomics in most scenarios No parallel execution (low bandwidth) even if there are no data deps CAS is the fastest for some cases

  37. spcl.inf.ethz.ch @spcl_eth C ONCLUSIONS P ERFORMANCE I NSIGHTS Unaligned atomics should be avoided at all costs The same latency of different atomics in most scenarios No parallel execution (low bandwidth) even if there are no data deps CAS is the fastest for some cases Small operand sizes give best performance

Recommend


More recommend