NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, - PowerPoint PPT Presentation

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 14 November 2014

Lecture 6 � Introduction � Amdahl’s law � Basic spin-locks � Queue-based locks � Hierarchical locks � Reader-writer locks � Reading without locking � Flat combining

Overview � Building shared memory data structures � Lists, queues, hashtables, … � Why? � Used directly by applications (e.g., in C/C++, Java, C#, …) � Used in the language runtime system (e.g., management of work, implementations of message passing, …) � Used in traditional operating systems (e.g., synchronization between top/bottom-half code) � Why not? � Don’t think of “threads + shared data structures” as a default/good/complete/desirable programming model � It’s better to have shared memory and not need it… 3

What do we care about? Ease to write Suppose I have a sequential How does performance change implementation (no as we increase the number of concurrency control at all): is Does it matter? Who is the When can it threads? When does the the new implementation 5% Correctness target audience? How much be used? implementation add or avoid slower? 5x slower? 100x effort can they put into it? Is synchronization? slower? What does it mean implementing a data structure to be correct? an undergrad programming Between threads in the same e.g., if multiple concurrent exercise? …or a research process? Between processes threads are using iterators on a paper? How well sharing memory? Within an shared data structure at the How fast is it? interrupt handler? does it scale? same time? With/without some kind of runtime system support? 4

What do we care about? Ease to write When can it Correctness be used? How well How fast is it? does it scale? 5

What do we care about? Be explicit about goals and trade-offs 1. � A benefit in one dimension often has costs in another � Does a perf increase prevent a data structure being used in some particular setting? � Does a technique to make something easier to write make the implementation slower? � Do we care? It depends on the setting 2. Remember, parallel programming is rarely a recreational activity � The ultimate goal is to increase perf (time, or resources used) � Does an implementation scale well enough to out-perform a good sequential implementation? 6

Suggested reading � “The art of multiprocessor programming”, Herlihy & Shavit – excellent coverage of shared memory data structures, from both practical and theoretical perspectives � “Transactional memory, 2 nd edition”, Harris, Larus, Rajwar – recently revamped survey of TM work, with 350+ references � “NOrec: streamlining STM by abolishing ownership records”, Dalessandro, Spear, Scott, PPoPP 2010 � “Simplifying concurrent algorithms by exploiting transactional memory”, Dice, Lev, Marathe, Moir, Nussbaum, Olszewski, SPAA 2010 � Intel “Haswell” spec for SLE (speculative lock elision) and RTM (restricted transactional memory) 7

Amdahl’s law

Amdahl’s law � “Sorting takes 70% of the execution time of a sequential program. You replace the sorting algorithm with one that scales perfectly on multi-core hardware. On a machine with n cores, how many cores do you need to use to get a 4x speed-up on the overall algorithm?” 9

Amdahl’s law, f=70% 4.5 4.0 3.5 Desired 4x speedup 3.0 Speedup 2.5 2.0 Speedup achieved (perfect scaling on 70%) 1.5 1.0 0.5 0.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 #cores 10

Amdahl’s law, f=70% 1 ��(�, �) = �(1 − �) + � � � � f = fraction of code speedup applies to c = number of cores used 11

Amdahl’s law, f=70% 4.5 4.0 3.5 Desired 4x speedup 3.0 Speedup 2.5 Limit as c → ∞ = 1/(1-f) = 3.33 2.0 1.5 Speedup achieved (perfect scaling on 70%) 1.0 0.5 0.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 #cores 12

Amdahl’s law, f=10% 1.12 1.10 1.08 Amdahl’s law limit, just 1.11x 1.06 Speedup achieved Speedup with perfect scaling 1.04 1.02 1.00 0.98 0.96 0.94 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 #cores 13

Amdahl’s law, f=98% Speedup 60 20 40 50 10 30 0 1 7 13 19 25 31 37 43 49 55 #cores 61 67 73 79 85 91 97 103 109 115 121 127 14

Amdahl’s law & multi-core Suppose that the same h/w budget (space or power) can make us: 1 2 3 4 1 2 5 6 7 8 1 9 10 11 12 3 4 13 14 15 16 15

Perfof big & small cores 1.2 Assumption: perf = α √resource 1.0 Core perf (relative to 1 big core 0.8 Total perf: Total perf: 1 * 1 = 1 0.6 16 * 1/4 = 4 0.4 0.2 0.0 1/16 1/8 1/4 1/2 1 Resources dedicated to core 16

Amdahl’s law, f=98% 3.5 3.0 Perf (relative to 1 big core) 2.5 16 small 4 medium 2.0 1.5 1 big 1.0 0.5 0.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 #Cores 17

Amdahl’s law, f=75% 1.2 1 big 1.0 Perf (relative to 1 big core) 4 medium 0.8 16 small 0.6 0.4 0.2 0.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 #Cores 18

Amdahl’s law, f=5% 1.2 1 big 1.0 Perf (relative to 1 big core) 0.8 4 medium 0.6 0.4 16 small 0.2 0.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 #Cores 19

Asymmetric chips 3 4 1 7 8 9 10 11 12 13 14 15 16 20

Amdahl’s law, f=75% 1.6 1+12 1.4 Perf (relative to 1 big core) 4 medium 1.2 1 big 1 0.8 16 small 0.6 0.4 0.2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 #Cores 21

Amdahl’s law, f=5% 1.2 Perf (relative to 1 big core) 1 1 big 0.8 4 medium 0.6 1+12 0.4 0.2 16 small 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 #Cores 22

Amdahl’s law, f=98% 3.5 Perf (relative to 1 big core) 3 1+12 2.5 16 small 4 medium 2 1.5 1 big 1 0.5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 #Cores 23

Amdahl’s law, f=98% 9 Speedup (relative to 1 big core) 8 7 1+192 6 5 4 3 2 256 small 1 0 #Cores 24

Amdahl’s law, f=98% 9 Speedup (relative to 1 big core) 8 7 1+192 6 5 4 3 2 256 small 1 0 Leave larger core idle #Cores in parallel section 25

Basic spin-locks

Test and set (pseudo-code) Pointer to a location holding a boolean value (TRUE/FALSE) �� Read the current �� contents of the �� location b points to… � �� …set the contents of *b to TRUE 27

Test and set • Suppose two threads use it at once testAndSet(b)->true Thread 1: time Thread 2: testAndSet(b)->false 28 Non-blocking data structures and transactional memory

Test and set lock lock: FALSE FALSE => lock available TRUE => lock held void acquireLock(bool *lock) { Each call tries to acquire while (testAndSet(lock)) { the lock, returning TRUE /* Nothing */ if it is already held } } NB: all this is pseudo- code, assuming SC void releaseLock(bool *lock) { memory *lock = FALSE; } 29 Non-blocking data structures and transactional memory

Test and set lock lock: FALSE TRUE Thread 1 Thread 2 void acquireLock(bool *lock) { while (testAndSet(lock)) { /* Nothing */ } } void releaseLock(bool *lock) { *lock = FALSE; } 30 Non-blocking data structures and transactional memory

What are the problems here? testAndSet implementation causes contention 31 Non-blocking data structures and transactional memory

Contention from testAndSet Single- Single- threaded threaded core core L1 cache L1 cache L2 cache L2 cache Main memory 32 Non-blocking data structures and transactional memory

Multi-core h/w –separate L2 testAndSet(k) Single- Single- threaded threaded core core L1 cache L1 cache k L2 cache L2 cache k Main memory 33 Non-blocking data structures and transactional memory

Multi-core h/w –separate L2 testAndSet(k) Single- Single- threaded threaded core core k L1 cache L1 cache k L2 cache L2 cache Main memory 34 Non-blocking data structures and transactional memory

Multi-core h/w –separate L2 testAndSet(k) Single- Single- threaded threaded core core Does this still happen in practice? Do modern CPUs avoid fetching the L1 cache L1 cache k line in exclusive mode on failing TAS? L2 cache L2 cache k Main memory 35 Non-blocking data structures and transactional memory

What are the problems here? testAndSet No control over implementation locking policy causes contention Only supports mutual Spinning may waste exclusion: not reader- resources while writer locking waiting 36

General problem � No logical conflict between two failed lock acquires � Cache protocol introduces a physical conflict � For a good algorithm: only introduce physical conflicts if a logical conflict occurs � In a lock: successful lock-acquire & failed lock-acquire � In a set: successful insert(10) & failed insert(10) � But not: � In a lock: two failed lock acquires � In a set: successful insert(10) & successful insert(20) � In a non-empty queue: enqueue on the left and remove on the right 37

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, - PowerPoint PPT Presentation

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 14 November 2014 Lecture 6 Introduction Amdahls law Basic spin-locks Queue-based locks Hierarchical locks Reader-writer locks Reading

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 25 November 2016 Lecture 8

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 27 November 2015 Lecture 8

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 18 November 2016 Lecture 7

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 21 November 2014 Lecture 7

Data Blocking Jon K. Nilsen Department of Physics and Scientific Computing Group University of

Transactional Memory: Architectural support for Lock-Free Data Structure Transactional Memory:

Transactional memory with data Transactional memory with data invariants: or putting the

Extending Hardware Transactional Memory to Support Non-busy Waiting and Non-transactional Actions

Transactional Memory 1 To read more This days papers: Herlihy and Moss, Transactional

Data Blocking Jon K. Nilsen Department of Physics and Scientific Computing Group University of

Hardware Transactional Memory Shao-Hung Chiu, Upasana Sridhar Transactional Memory - Where did

Verification of Transactional Memories that support Non-Transactional Memory Accesses Ariel Cohen

Pragmatic Primitives for Non-blocking Data Structures PODC 2013 Trevor Brown, University of

Software Transactional Memory for Dynamic-sized Data Structures Maurice Herlihy, Victor Luchango,

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Blocking and Non-blocking Checkpointing and Rollback Recovery for Networks-on-Chip Claudia Rusu 1

Law: When a Crisis Meets a Crisis: Has the Pandemic Affected Drug Overdose Deaths? Gerald

Greater Love Jeremiah 31:31-34 The time is coming, declares the Lord, when I will make

A Christian Nation Luke 1:67-80 Christians live in nations awaiting the coming Christian

ConnectHome Nation Webinar 2020 US Census, ConnectHomeUSA Surprise & Closing 1 Agenda

Welcome to your second training, Logistics and Resources! 1 Before we dive in, lets briefly go

Protocol on SEA Section A4.6: SEA of plans & programmes Decision Resource Manual to

EMERGING ENERGY SECURITY RISKS AND RISK MITIGATION: THE ROLE OF INTERNATIONAL LEGAL FRAMEWORK

Financial Crime Hypothetical The Law Society Financial Crime Hypothetical ABC Corp ABC

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, - PowerPoint PPT Presentation

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 14 November 2014 Lecture 6 Introduction Amdahls law Basic spin-locks Queue-based locks Hierarchical locks Reader-writer locks Reading

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 25 November 2016 Lecture 8

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 27 November 2015 Lecture 8

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 18 November 2016 Lecture 7

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 21 November 2014 Lecture 7

Data Blocking Jon K. Nilsen Department of Physics and Scientific Computing Group University of

Transactional Memory: Architectural support for Lock-Free Data Structure Transactional Memory:

Transactional memory with data Transactional memory with data invariants: or putting the

Extending Hardware Transactional Memory to Support Non-busy Waiting and Non-transactional Actions

Transactional Memory 1 To read more This days papers: Herlihy and Moss, Transactional

Data Blocking Jon K. Nilsen Department of Physics and Scientific Computing Group University of

Hardware Transactional Memory Shao-Hung Chiu, Upasana Sridhar Transactional Memory - Where did

Verification of Transactional Memories that support Non-Transactional Memory Accesses Ariel Cohen

Pragmatic Primitives for Non-blocking Data Structures PODC 2013 Trevor Brown, University of

Software Transactional Memory for Dynamic-sized Data Structures Maurice Herlihy, Victor Luchango,

Memory II. Memory improvement III. Problems with memory 3 systems/stages of Memory: memory

Blocking and Non-blocking Checkpointing and Rollback Recovery for Networks-on-Chip Claudia Rusu 1

Law: When a Crisis Meets a Crisis: Has the Pandemic Affected Drug Overdose Deaths? Gerald

Greater Love Jeremiah 31:31-34 The time is coming, declares the Lord, when I will make

A Christian Nation Luke 1:67-80 Christians live in nations awaiting the coming Christian

ConnectHome Nation Webinar 2020 US Census, ConnectHomeUSA Surprise &amp; Closing 1 Agenda

Welcome to your second training, Logistics and Resources! 1 Before we dive in, lets briefly go

Protocol on SEA Section A4.6: SEA of plans &amp; programmes Decision Resource Manual to

EMERGING ENERGY SECURITY RISKS AND RISK MITIGATION: THE ROLE OF INTERNATIONAL LEGAL FRAMEWORK

Financial Crime Hypothetical The Law Society Financial Crime Hypothetical ABC Corp ABC

ConnectHome Nation Webinar 2020 US Census, ConnectHomeUSA Surprise & Closing 1 Agenda

Protocol on SEA Section A4.6: SEA of plans & programmes Decision Resource Manual to