Implementing Locks Nima Honarmand (Based on slides by Prof. Andrea - PowerPoint PPT Presentation

Fall 2017 :: CSE 306 Implementing Locks Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau)

Fall 2017 :: CSE 306 Lock Implementation Goals • We evaluate lock implementations along following lines • Correctness • Mutual exclusion : only one thread in critical section at a time • Progress (deadlock-free): if several simultaneous requests, must allow one to proceed • Bounded wait (starvation-free): must eventually allow each waiting thread to enter • Fairness : each thread waits for same amount of time • Also, threads acquire locks in the same order as requested • Performance : CPU time is used efficiently

Fall 2017 :: CSE 306 Building Locks • Locks are variables in shared memory • Two main operations: acquire() and release() • Also called lock() and unlock() • To check if locked, read variable and check value • To acquire, write “locked” value to variable • Should only do this if already unlocked • If already locked, keep reading value until unlock observed • To release, write “unlocked” value to variable

Fall 2017 :: CSE 306 First Implementation Attempt • Using normal load/store instructions Boolean lock = false; // shared variable Void acquire(Boolean *lock) { Final check of while condition & write while (*lock) /* wait */ ; to lock should happen atomically *lock = true; } Void release(Boolean *lock) { *lock = false; } • This does not work. Why? • Checking and writing of the lock value in acquire() need to happen atomically.

Fall 2017 :: CSE 306 Solution: Use Atomic RMW Instructions • Atomic Instructions guarantee atomicity • Perform Read, Modify, and Write atomically ( RMW ) • Many flavors in the real world • Test and Set • Fetch and Add • Compare and Swap ( CAS ) • Load Linked / Store Conditional

Fall 2017 :: CSE 306 Example: Test-and-Set Semantic: // return what was pointed to by addr // at the same time, store newval into addr atomically int TAS(int *addr, int newval) { int old = *addr; *addr = newval; return old; } Implementation in x86: int TAS(volatile int *addr, int newval) { int result = newval; asm volatile("lock; xchg %0, %1" : "+m" (*addr), "=r" (result) : "1" (newval) : "cc"); return result; }

Fall 2017 :: CSE 306 Lock Implementation with TAS typedef struct __lock_t { int flag; } lock_t; void init(lock_t *lock) { lock->flag = ?? ; } void acquire(lock_t *lock) { while ( ???? ) ; // spin-wait (do nothing) } void release(lock_t *lock) { lock->flag = ?? ; }

Fall 2017 :: CSE 306 Lock Implementation with TAS typedef struct __lock_t { int flag; } lock_t; void init(lock_t *lock) { lock->flag = 0 ; } void acquire(lock_t *lock) { while ( TAS(&lock->flag, 1) == 1 ) ; // spin-wait (do nothing) } void release(lock_t *lock) { lock->flag = 0 ; }

Fall 2017 :: CSE 306 Evaluating Our Spinlock • Lock implementation goals 1) Mutual exclusion : only one thread in critical section at a time 2) Progress (deadlock-free): if several simultaneous requests, must allow one to proceed 3) Bounded wait : must eventually allow each waiting thread to enter 4) Fairness : threads acquire lock in the order of requesting 5) Performance : CPU time is used efficiently • Which ones are NOT satisfied by our lock impl? • 3, 4, 5

Fall 2017 :: CSE 306 Our Spinlock is Unfair unlock lock unlock lock unlock lock unlock lock lock spin spin spin spin A B A B A B A B 0 20 40 60 80 100 120 140 160 Scheduler is independent of locks/unlocks

Fall 2017 :: CSE 306 Fairness and Bounded Wait • Use Ticket Locks Semantics: int FAA(int *ptr) { • Idea: reserve each thread’s turn int old = *ptr; to use a lock. *ptr = old + 1; • Each thread spins until their turn. return old; } • Use new atomic primitive: fetch-and-add Implementation: // Let’s use GCC’s built-in • Acquire: Grab ticket using // atomic functions this time around fetch-and-add __sync_fetch_and_add(ptr, 1) • Spin while not thread’s ticket != turn • Release: Advance to next turn

Fall 2017 :: CSE 306 Ticket Lock Example Initially, turn = ticket = 0 A lock(): gets ticket 0, spins until turn == 0  A runs B lock(): gets ticket 1, spins until turn == 1 C lock(): gets ticket 2, spins until turn == 2 A unlock(): turn++ (turn = 1)  B runs A lock(): gets ticket 3, spins until turn == 3 B unlock(): turn++ (turn = 2)  C runs C unlock(): turn++ (turn = 3)  A runs A unlock(): turn++ (turn = 4) C lock(): gets ticket 4  C runs

Fall 2017 :: CSE 306 Ticket Lock Implementation typedef struct { int ticket; int turn; } lock_t; void lock_init(lock_t *lock) { lock->ticket = 0; lock->turn = 0; } void acquire (lock_t *lock) { int myturn = FAA(&lock->ticket); while (lock->turn != myturn); // spin } void release (lock_t *lock) { lock->turn += 1; }

Fall 2017 :: CSE 306 Busy-Waiting (Spinning) Performance • Good when… • many CPUs • locks held a short time • advantage: avoid context switch • Awful when… • one CPU • locks held a long time • disadvantage: spinning is wasteful

Fall 2017 :: CSE 306 CPU Scheduler Is Ignorant • …of busy-waiting locks lock unlock lock spin spin spin spin spin A B C D A B C D 0 20 40 60 80 100 120 140 160 CPU scheduler may run B instead of A even though B is waiting for A

Fall 2017 :: CSE 306 Ticket Lock with yield() typedef struct { int ticket; int turn; } lock_t; … void acquire (lock_t *lock) { int myturn = FAA(&lock->ticket); while (lock->turn != myturn) yield(); } void release (lock_t *lock) { lock->turn += 1; }

Fall 2017 :: CSE 306 Yielding instead of Spinning lock unlock lock spin spin spin spin spin A B C D A B C D no yield: 0 20 40 60 80 100 120 140 160 lock unlock lock A A B yield: 0 20 40 60 80 100 120 140 160

Fall 2017 :: CSE 306 Evaluating Ticket Lock • Lock implementation goals 1) Mutual exclusion : only one thread in critical section at a time 2) Progress (deadlock-free): if several simultaneous requests, must allow one to proceed 3) Bounded wait : must eventually allow each waiting thread to enter 4) Fairness : threads acquire lock in the order of requesting 5) Performance : CPU time is used efficiently • Which ones are NOT satisfied by our lock impl? • 5 (even with yielding, too much overhead)

Fall 2017 :: CSE 306 Spinning Performance • Wasted time • Without yield: O ( threads × time_slice ) • With yield: O ( threads × context_switch_time ) • So even with yield, spinning is slow with high thread contention • Next improvement: instead of spinning, block and put thread on a wait queue

Fall 2017 :: CSE 306 Blocking Locks • acquire() removes waiting threads from run queue using special system call • Let’s call it park () — removes current thread from run queue • release() returns waiting threads to run queue using special system call • Let’s call it unpark(tid) — returns thread tid to run queue • Scheduler runs any thread that is ready • No time wasted on waiting threads when lock is not available • Good separation of concerns • Keep waiting threads on a wait queue instead of scheduler’s run queue • Note: park() and unpark() are made-up syscalls — inspired by Solaris’ lwp_park() and lwp_unpark() system calls

Fall 2017 :: CSE 306 Building a Blocking Lock void acquire(lock_t *l) { typedef struct { while (TAS(&l->guard, 1) == 1); int lock; int guard; if (l->lock) { queue_t q; queue_add(l->q, gettid()); } lock_t; l->guard = 0; park(); // blocked 1) What is guard for? } else { l->lock = 1; l->guard = 0; 2) Why okay to spin on } guard ? } void release(lock_t *l) { 3) In release() , why not while (TAS(&l->guard, 1) == 1); set lock=false when unparking? if (queue_empty(l->q)) l->lock=false; else 4) Is the code correct? unpark(queue_remove(l->q)); l->guard = false; • Hint: there is a race condition }

Fall 2017 :: CSE 306 Race Condition Thread 1 in acquire() Thread 2 in release() if (l->lock) { queue_add(l->q, gettid()); l->guard = 0; while (TAS(&l->guard, 1) == 1); if (queue_empty(l->q)) l->lock=false; else unpark(queue_remove(l->q)); park(); • Problem: guard not held when calling park() • Thread 2 can call unpark() before Thread 1 calls park()

Fall 2017 :: CSE 306 Solving Race Problem: Final Correct Lock void acquire(lock_t *l) { typedef struct { while (TAS(&l->guard, 1) == 1); int lock; if (l->lock) { int guard; queue_add(l->q, gettid()); queue_t q; setpark(); } lock_t; l->guard = 0; park(); // blocked } else { • setpark() informs the l->lock = 1; l->guard = 0; OS of my plan to park() } myself } void release(lock_t *l) { • If there is an unpark() while (TAS(&l->guard, 1) == 1); between my setpark() if (queue_empty(l->q)) and park() , park() will l->lock=false; else return immediately (no unpark(queue_remove(l->q)); blocking) l->guard = false; }

Implementing Locks Nima Honarmand (Based on slides by Prof. Andrea - PowerPoint PPT Presentation

Fall 2017 :: CSE 306 Implementing Locks Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau) Fall 2017 :: CSE 306 Lock Implementation Goals We evaluate lock implementations along following lines Correctness Mutual

Today Synchronization Implementing Locks Oct 29, 2018 Sprenkle - CSCI330 1 Review

synchronization 2: locks / memory ordering 1 last time pthread create/join racing where data

Implementing Atomicity with Locks Dave Cunningham April 4, 2006 Dave Cunningham Implementing

Threads & Locks Threads & Locks Srinidhi Varadarajan Topics Topics Thread

Topics Topics Thread Programming (Chapter 12) Threads & Locks

Locks & barriers (week 2) 2 / 58 INF4140 - Models of concurrency Locks & barriers,

LEVELLING THE NEW SEA LOCKS IN THE NETHERLANDS; INCLUDING THE DENSITY DIFFERENCE Wim Kortlever,

Lets talk locks! @kavya719 kavya locks. locks are slow locks are slow latency

Module 15: Managing Transactions and Locks Overview Introduction to Transactions and Locks

Locks & barriers 2 / 47 INF4140 - Models of concurrency Locks & barriers, lecture 2

Locks & barriers INF4140 - Models of concurrency Locks & barriers, lecture 2 Hsten

POSIX Thread Synchronization Mutex Locks Condition Variables Read-Write Locks

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 14 November 2014 Lecture 6

Semaphores, Locks & Conditions Intrinsic vs. Explicit Locks Pre Java 5.0 only intrinsic

Verifying Asynchronous programs with nested locks K Narayan Kumar CMI, Chennai Joint work with

Locks and Crosses in the Foreign-Exchange Electronic Communication Networks Ly Tran Last

Operating System Principles: Semaphores and Locks for Synchronization CS 111 Operating Systems

NUMA-aware Reader-Writer Locks Tom Herold, Marco Lamina 04.02.2015 NUMA Seminar Agenda 1.

Last time Need for synchronization primitives 7: Synchronization Locks and building locks

Database concurrency control with precision: orthogonal key-value locking Goetz Graefe Locks vs

Lecture 8: Reader/Writer Locks Goal: walk through an example synchronization problem, found in

Briggs and Stratton locks used in our GMCs Joe Ekl Professional Locksmith Member of GMCES

The cause of keys not turning in the locks is usually one of these 3 things. 1. Inserting

Scalable Range Locks for Scalable Address Spaces And Beyond Alex Kogan Dave Dice Shady

Implementing Locks Nima Honarmand (Based on slides by Prof. Andrea - PowerPoint PPT Presentation

Fall 2017 :: CSE 306 Implementing Locks Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau) Fall 2017 :: CSE 306 Lock Implementation Goals We evaluate lock implementations along following lines Correctness Mutual

Today Synchronization Implementing Locks Oct 29, 2018 Sprenkle - CSCI330 1 Review

synchronization 2: locks / memory ordering 1 last time pthread create/join racing where data

Implementing Atomicity with Locks Dave Cunningham April 4, 2006 Dave Cunningham Implementing

Threads &amp; Locks Threads &amp; Locks Srinidhi Varadarajan Topics Topics Thread

Topics Topics Thread Programming (Chapter 12) Threads &amp; Locks

Locks &amp; barriers (week 2) 2 / 58 INF4140 - Models of concurrency Locks &amp; barriers,

LEVELLING THE NEW SEA LOCKS IN THE NETHERLANDS; INCLUDING THE DENSITY DIFFERENCE Wim Kortlever,

Lets talk locks! @kavya719 kavya locks. locks are slow locks are slow latency

Module 15: Managing Transactions and Locks Overview Introduction to Transactions and Locks

Locks &amp; barriers 2 / 47 INF4140 - Models of concurrency Locks &amp; barriers, lecture 2

Locks &amp; barriers INF4140 - Models of concurrency Locks &amp; barriers, lecture 2 Hsten

POSIX Thread Synchronization Mutex Locks Condition Variables Read-Write Locks

NON-BLOCKING DATA STRUCTURES AND TRANSACTIONAL MEMORY Tim Harris, 14 November 2014 Lecture 6

Semaphores, Locks &amp; Conditions Intrinsic vs. Explicit Locks Pre Java 5.0 only intrinsic

Verifying Asynchronous programs with nested locks K Narayan Kumar CMI, Chennai Joint work with

Locks and Crosses in the Foreign-Exchange Electronic Communication Networks Ly Tran Last

Operating System Principles: Semaphores and Locks for Synchronization CS 111 Operating Systems

NUMA-aware Reader-Writer Locks Tom Herold, Marco Lamina 04.02.2015 NUMA Seminar Agenda 1.

Last time Need for synchronization primitives 7: Synchronization Locks and building locks

Database concurrency control with precision: orthogonal key-value locking Goetz Graefe Locks vs

Lecture 8: Reader/Writer Locks Goal: walk through an example synchronization problem, found in

Briggs and Stratton locks used in our GMCs Joe Ekl Professional Locksmith Member of GMCES

The cause of keys not turning in the locks is usually one of these 3 things. 1. Inserting

Scalable Range Locks for Scalable Address Spaces And Beyond Alex Kogan Dave Dice Shady

Threads & Locks Threads & Locks Srinidhi Varadarajan Topics Topics Thread

Topics Topics Thread Programming (Chapter 12) Threads & Locks

Locks & barriers (week 2) 2 / 58 INF4140 - Models of concurrency Locks & barriers,

Locks & barriers 2 / 47 INF4140 - Models of concurrency Locks & barriers, lecture 2

Locks & barriers INF4140 - Models of concurrency Locks & barriers, lecture 2 Hsten

Semaphores, Locks & Conditions Intrinsic vs. Explicit Locks Pre Java 5.0 only intrinsic