D D u k e S y s t t e m s ¡ Implemen(ng ¡Threads ¡and ¡Synchroniza(on ¡ Jeff ¡Chase ¡ Duke ¡University ¡
Operating Systems: The Classical View Each process Programs has a private run as data data virtual address independent space and at processes. least one thread. Protected ...and upcalls (e.g., signals) system calls Protected OS Threads kernel enter the mediates kernel for access to OS shared services. resources. The kernel code and data are protected from untrusted processes.
Project 1t • Pretend that your 1t code is executing within the kernel. • Pretend that the 1t API methods are system calls. • Pretend that your kernel runs on a uniprocessor. – One core; at most one thread is in running state at a time. • Pretend that your 1t code has direct access to protected hardware functions (since it is in the kernel). – Enable/disable interrupts – (You can’t really, because your code is executing in user mode. But we can use Unix signals to simulate timer interrupts, and we can simulate blocking them). • It may be make-believe, but you are building the foundation of a classical operating system kernel.
Threads in Project 1 thread_create(func, arg); Threads thread_yield(); thread_lock(lockID); Locks/Mutexes thread_unlock(lockID); Mesa monitors thread_wait(lockID, cvID); Condition thread_signal(lockID, cvID); Variables thread_broadcast(lockID, cvID); All functions return an error code: 0 is success, else -1.
Thr Thread contr ead control block ol block Address Space TCB1 TCB2 TCB3 PC PC PC Ready queue SP SP SP registers registers registers Code Code Code Stack Stack Stack Thread 1 PC SP registers running CPU
Thread contr Thr ead control block ol block Address Space TCB2 TCB3 PC PC Ready queue SP SP registers registers Code Stack Stack Stack Thread 1 PC SP registers running CPU
Cr Creating a new thr eating a new thread ead ead • Also called “forking” a thr Also called “forking” a thread • Idea: cr Idea: create initial state, put on r eate initial state, put on ready queue eady queue 1. 1. Allocate, initialize a new TCB Allocate, initialize a new TCB 2. Allocate a new stack 2. Allocate a new stack 3. 3. Make it look like thr Make it look like thread was going to call a function ead was going to call a function • PC points to first instruction in function • SP points to new stack • Stack contains arguments passed to function • Project 1: use makecontext 4. 4. Add thr Add thread to r ead to ready queue eady queue
Implementing threads • Thread_fork(func, args) – Allocate thread control block – Allocate stack – Build stack frame for base of stack (stub) – Put func, args on stack – Put thread on ready list – Will run sometime later (maybe right away!) • stub(func, args): Pintos switch_entry – Call (*func)(args) – Call thread_exit()
CPU Scheduling 101 The OS scheduler makes a sequence of “ moves ” . – Next move: if a CPU core is idle, pick a ready thread t from the ready pool and dispatch it (run it). – Scheduler’s choice is “ nondeterministic ” – Scheduler’s choice determines interleaving of execution blocked ¡ ready ¡pool ¡ If ¡>mer ¡expires, ¡or ¡ threads ¡ wait/yield/terminate ¡ Wakeup ¡ GetNextToRun ¡ SWITCH() ¡
Thread states and transitions If a thread is in the ready state thread, then the system may choose to run it “at any time”. When a thread is running, the system may choose to preempt it at any time. From the point of view of the program, dispatch and preemption are nondeterministic : we can’t know the schedule in advance. running These preempt and yield dispatch transitions are preempt controlled by the kernel sleep scheduler. dispatch Sleep and wakeup transitions are initiated wakeup blocked ready by calls to internal sleep/wakeup APIs by a running thread. wait
Timer interrupts enable timeslicing user while(1); … mode u-start resume time kernel “top half” kernel mode kernel “bottom half” (interrupt handlers) clock interrupt interrupt return Enables timeslicing The system clock (timer) interrupts each core periodically, giving control back to the kernel. The kernel may preempt the running thread and switch to another (an involuntary context switch). time à à
Synchronization: layering Concurrent Applications Condition Variables Semaphores Locks Interrupt Disable Atomic Read/Modify/Write Instructions Multiple Processors Hardware Interrupts
Plot summary I We need hardware support for atomic read-modify- write operations on data item X by a thread T. • Atomic means that no other code can operate on X while T’s operation is in progress. – Can’t allow any interleaving of operations on X! • Locks provide atomic critical sections, but … • We need hardware support to implement safe locks! • In this discussion we continue to presume that thread primitives and locks are implemented in the kernel.
Plot summary II Options for hardware support for synchronization: 1. Kernel software can disable interrupts on T’s core. – Prevents an involuntary context switch (preempt-yield) to another thread on T’s core. – Also prevents conflict with any interrupt handler on T’s core. – But on multi-core systems, we also must prevent accesses to X by other cores, and disabling interrupts isn’t sufficient. 2. For multi-core systems the solution is spinlocks . – Spinlocks are locks that busy-wait in a loop when not free, instead of blocking (a blocking lock is called a mutex ). – Use hardware-level atomic instructions to build spinlocks. – Use spinlocks internally to implement higher-level synchronization (e.g., monitors).
Spinlock: a first try Spinlocks provide mutual exclusion int s = 0 ; among cores without blocking. lock () { while (s == 1) Global spinlock variable {}; ASSERT (s == 0); Busy-wait until lock is free. s = 1; } Spinlocks are useful for lightly contended critical sections where unlock () { there is no risk that a thread is ASSERT(s == 1); preempted while it is holding the lock, s = 0; i.e., in the lowest levels of the kernel. }
Spinlock: what went wrong Race to acquire. int s = 0 ; Two (or more) cores see s == 0. lock () { while (s == 1) {}; s = 1; } unlock (); s = 0; }
Spinlock: what went wrong Race to acquire. int s = 0 ; Two (or more) cores see s == 0. lock () { while (s == 1) {}; s = 1; } unlock (); s = 0; }
We need an atomic “ toehold ” • To implement safe mutual exclusion, we need support for some sort of “ magic toehold ” for synchronization. – The lock primitives themselves have critical sections to test and/ or set the lock flags. • Safe mutual exclusion on multicore systems requires specific hardware support: atomic instructions – Examples : test-and-set, compare-and-swap, fetch-and-add . – These instructions perform an atomic read-modify-write of a memory location. We use them to implement locks. – If we have any of those, we can build higher-level synchronization objects like monitors or semaphores. – Note: we also must be careful of interrupt handlers … .
Using r Using read-modify-write instructions ead-modify-write instructions • Disabling interrupts Disabling interrupts • Ok for uni-processor, breaks on multi-processor • Why? • Could use atomic load-stor Could use atomic load-store to make a lock e to make a lock • Inefficient, lots of busy-waiting • Har Hardwar dware people to the r e people to the rescue! escue!
Using r Using read-modify-write instructions ead-modify-write instructions • Moder Modern pr n processor ar ocessor architectur chitectures es • Provide an atomic read-modify-write instruction • Atomically Atomically • Read value from memory into register • Write new value to memory • Implementation details Implementation details • Lock memory location at the memory controller
Example: test&set Example: test&set test&set (X) { tmp = X Set: sets location to 1 X = 1 Test: returns old value return (tmp) } • Atomically! Atomically! • Slightly dif Slightly differ ferent on x86 (Exchange) ent on x86 (Exchange) • Atomically Atomically swaps value between register and memory
Spinlock implementation Spinlock implementation • Use test&set Use test&set • Initially Initially, value = 0 , value = 0 unlock () { lock () { value = 0 while (test&set(value) == 1) { } } } What happens if value = 1? What happens if value = 0?
Atomic instructions: Test-and-Set Spinlock::Acquire () { while(held); One example: tsl test-and-set-lock held = 1; } (from an old machine) load test load Wrong store test load 4(SP), R2 ; load “ this ” store busywait: load 4(R2), R3 ; load “ held ” flag bnz R3, busywait ; spin if held wasn ’ t zero Problem: store #1, 4(R2) ; held = 1 interleaved load/ test/store. Right load 4(SP), R2 ; load “ this ” Solution: TSL busywait: atomically sets the tsl 4(R2), R3 ; test-and-set this->held flag and leaves the bnz R3, busywait ; spin if held wasn ’ t zero old value in a register. ( bnz means “branch if not zero”)
Threads on cores tsl L tsl L bnz bnz load tsl L add bnz int x ; store tsl L zero L bnz worker() tsl L jmp tsl L while (1) { bnz tsl L bnz acquire L ; load bnz load x++; add tsl L add release L ; store bnz store }; zero L tsl L zero L } jmp bnz jmp tsl L tsl L
Recommend
More recommend