NUMA-Friendly Stack (using Delegation and Elimination) Irina Calciu - PowerPoint PPT Presentation

NUMA-Friendly Stack (using Delegation and Elimination) Irina Calciu Justin Gottschlich Maurice Herlihy HotPar ‘13 1

Trends for Future Architectures 2

Uniform Memory Access (UMA) 3

Non-Uniform Memory Access (NUMA) NUMA NODE (multiple cores, shared NUMA NODE (multiple cores, shared Last Level Cache) Last Level Cache) ( interconnect ) NUMA NODE (multiple cores, shared NUMA NODE (multiple cores, shared Last Level Cache) Last Level Cache) Cache coherency maintained between caches on different NUMA nodes 4

Overview • Motivation • Algorithms • Results • Conclusions 5

Delegation NUMA node 0 NUMA node 1 Clients Clients Server SEQ STACK 6

Delegation NUMA node 0 NUMA node 1 SEQ STACK Slots Slots Client 1 Client 5 Client 2 Client 6 Server Loops through Client 3 Client 7 Client 4 all slots Client 8 7

Elimination, Rendezvous 8

Local Rendezvous NUMA node 0 NUMA node 1 STACK 9

Delegation + Elimination NUMA node 0 NUMA node 1 Clients Clients Server SEQ STACK 10

Delegation + LOCAL Elimination NUMA node 0 NUMA node 1 Clients Clients Server SEQ STACK 11

Effect of Elimination Throughput (Better) 90% push 10% pop 50% push 50% pop 12

Effect of Delegation Throughput (Better) 90% push 10% pop 50% push 50% pop 13

Number of Slots Throughput (Better) 90% push 10% pop 50% push 50% pop 14

Workloads: Balanced vs. Unbalanced Throughput (Better) 70% push 30% pop 50% push 50% pop 15

Advantages • Memory and cache locality • Reduced bus traffic • Increased parallelism through elimination 16

Drawbacks • Communication cost between clients and server thread o Insignificant compared to the benefits • Serializing otherwise parallel data structures o Parallelism through elimination • Elimination opportunities decrease as workload more unbalanced 17

Open Questions • Are there other data structures where we can use delegation and elimination? • Are there data structures where direct access is much better? • What can we do for those data structures? 18

Thank you! Questions? 19

References • A Scalable Lock-free Stack Algorithm http://www.inf.ufsc.br/~dovicchi/pos-ed/pos/artigos/p206- hendler.pdf • Flat Combining and the Synchronization-Parallelism Tradeoff http://www.cs.bgu.ac.il/~hendlerd/papers/flat-combining.pdf • Fast and Scalable Rendezvousing http://www.cs.tau.ac.il/~afek/rendezvous.pdf 20

Cache to Cache Traffic Better 21

Coefficient of Variation Better 22

Flat Combining 23

Delegation SERVER CLIENT Loop through all slots: Find corresponding slot If slot has message: (by NUMA node and cpuid) Post message Wait for response Take message Process message Send response Get response Time 24

Delegation SERVER CLIENT Loop through all slots: Find corresponding slot If slot has message: (by NUMA node and cpuid) try_elimination: if (eliminate) return Post message Wait for response Take message Process message Send response Get response else try_elimination Time 25

Delegation SERVER CLIENT Loop through all slots: Find corresponding slot If slot has message: (by NUMA node and cpuid) try_elimination: if (eliminate) return if (Acquire slot lock) Post message Wait for response Take message Process message Send response Get response Release slot lock else try_elimination Time 26

Open Questions • Performance • Scalability • Power 27

NUMA-Friendly Stack (using Delegation and Elimination) Irina Calciu - PowerPoint PPT Presentation

NUMA-Friendly Stack (using Delegation and Elimination) Irina Calciu Justin Gottschlich Maurice Herlihy HotPar 13 1 Trends for Future Architectures 2 Uniform Memory Access (UMA) 3 Non-Uniform Memory Access (NUMA) NUMA NODE (multiple

Scalable NUMA-aware Blocking Synchronization Primitives Sanidhya Kashyap , Changwoo Min, Taesoo

NUMA-aware Reader-Writer Locks Tom Herold, Marco Lamina 04.02.2015 NUMA Seminar Agenda 1.

Automatic NUMA Balancing Rik van Riel, Principal Software Engineer, Red Hat Vinod Chegu, Master

Stack Stack Heap Heap Data Data Text Text Program A Program B Stack Stack Text Heap

COMP 633 - Parallel Computing Lecture 10 September 15, 2020 CC-NUMA (1) CC-NUMA implementation

Stack and Queue Stack Overview Stack ADT Basic operations of stack Pushing, popping

Stack ADT Tiziana Ligorio 1 Todays Plan Questons? Stack ADT 2 Abstract Data Types

Call Stack Stack Bottom Memory region managed with stack discipline Procedures and the Call

NUMA Non-Uniform Memory Access Numa becomes more common because memory controllers get close

NUMA Support for Charm++ Does memory affinity matter? Christiane Pousa Ribeiro Maxime Martinasso

FreeBSD and NUMA John Baldwin NYC*BUG June 3, 2015 What is NUMA Non-Uniform Memory

NUMA-ICTM: A Parallel Version of ICTM Exploiting Memory Placement Strategies for NUMA Machines

NUMA-aware Matrix-Matrix-Multiplication Max Reimann, Philipp Otto 1 About this talk

Sorting with Pop Stacks Stack sorting Pop stack sorting 1-pop-stack sortability 2-pop-stack

Compilers Stack Machines Alex Aiken Stack Machines Only storage is a stack An

The Stack Eric McCreath The Stack The stack is a simple but useful data structure in computer

An Assertion Language for Debugging SDN Applications Ryan Beckett with X. Kelvin Zou,

Client Termination As A Last Resort National HOPWA Institute 2017 Tampa, FL Learning Objectives q

NTCIR-9 Kick-Off Event ff 2010.10.05 : 13:30- English Session: 15:30-

Admiral 2017 Half Year Results 16 th August 2017 Introduction David Stevens, Group CEO Group

Using Loss Data to Win Over Clients Webinar: June 29 th @ 11am EDT How can you use loss data to

COBRA Overview and QSEHRA Assistance Consolidated Omnibus Budget Reconciliation Act (COBRA)

Chapter 3: Processes: Outline Process Concept: views of a process Process Scheduling CSCI

1 Inte r nal Contr ol Compone nts F ive Co mpo ne nts o f I nte rna l Co ntro l Co