NUMA-Friendly Stack (using Delegation and Elimination)
Irina Calciu Justin Gottschlich Maurice Herlihy
HotPar ‘13
1
NUMA-Friendly Stack (using Delegation and Elimination) Irina Calciu - - PowerPoint PPT Presentation
NUMA-Friendly Stack (using Delegation and Elimination) Irina Calciu Justin Gottschlich Maurice Herlihy HotPar 13 1 Trends for Future Architectures 2 Uniform Memory Access (UMA) 3 Non-Uniform Memory Access (NUMA) NUMA NODE (multiple
HotPar ‘13
1
2
3
(interconnect) NUMA NODE (multiple cores, shared Last Level Cache) NUMA NODE (multiple cores, shared Last Level Cache) NUMA NODE (multiple cores, shared Last Level Cache) NUMA NODE (multiple cores, shared Last Level Cache) Cache coherency maintained between caches on different NUMA nodes
4
5
NUMA node 0 NUMA node 1 Clients Clients SEQ STACK Server
6
NUMA node 0 NUMA node 1 Server Client 5 Client 6 Client 7 Client 8 Slots Client 1 Client 2 Client 3 Client 4 Slots Loops through all slots SEQ STACK
7
8
NUMA node 0 NUMA node 1 STACK
9
NUMA node 0 NUMA node 1 Clients Clients SEQ STACK Server
10
NUMA node 0 NUMA node 1 Clients Clients SEQ STACK Server
11
Throughput (Better) 50% push 50% pop 90% push 10% pop
12
Throughput (Better) 50% push 50% pop 90% push 10% pop
13
Throughput (Better) 50% push 50% pop 90% push 10% pop
14
Throughput (Better) 50% push 50% pop 70% push 30% pop
15
16
17
18
19
http://www.inf.ufsc.br/~dovicchi/pos-ed/pos/artigos/p206- hendler.pdf
http://www.cs.bgu.ac.il/~hendlerd/papers/flat-combining.pdf
http://www.cs.tau.ac.il/~afek/rendezvous.pdf
20
Better
21
Better
22
23
CLIENT Find corresponding slot (by NUMA node and cpuid) Post message Wait for response Get response SERVER Loop through all slots: If slot has message: Take message Process message Send response Time
24
CLIENT Find corresponding slot (by NUMA node and cpuid) try_elimination: if (eliminate) return Post message Wait for response Get response else try_elimination SERVER Loop through all slots: If slot has message: Take message Process message Send response Time
25
CLIENT Find corresponding slot (by NUMA node and cpuid) try_elimination: if (eliminate) return if (Acquire slot lock) Post message Wait for response Get response Release slot lock else try_elimination SERVER Loop through all slots: If slot has message: Take message Process message Send response Time
26
27