Moving thread activation policies to userspace using kfutex Helge Bahmann <hcb@chaoticmind.net> Google Zürich
Pop quiz: Which class of operations do processes spend >99% of their time in?
Introduction What are threads? ● Answer I: A "parallelism abstraction" A piece of a program running sequentially with respect to itself, and running with unspecified parallelism to the remainder of the program. A means to expressing "conceptual parallelism". ● Answer II: An "operating system concept" A virtualized instance of a CPU, mapped dynamically to physical CPUs. A means to achieving "factual parallelism".
Introduction Linux event waiting "primitives" (not exhaustive) ● select/poll/epoll_wait/epoll_pwait/... ● sigsuspend/sigtimedwait/sigwaitinfo ● waitpid ● sleep/usleep/nanosleep ● ioctl(..., DRM_IOCTL_WAIT_VBLANK, ...) ● pthread_mutex_lock/pthread_cond_wait ● ... Observation: Combined event notification/delivery
epoll-based notification kernel space setup steady state processing
epoll-based notification kernel space setup steady state processing
Common event handling patterns ● "Edge client" – Many kinds of event sources (peripherals, user interaction, network, ...) – ~1 instance each – almost no "intended" parallelism ● Service – Single dominant kind of event source (usually network) – many instances each – maximize throughput through parallelism ● Reality usually somewhere between these extremes
Leader/followers (classical) ● Design constraints: ● Solution: – Single (logical) event – "Leader" dequeues event source – Promotes new "leader" – Handling any event may from pool of followers take arbitrary (varying) – Handles event amount of time – Joins pool of followers Goal: Maximise throughput through parallelism Simplest possible implementation for (;;) { relies on thread activation policy by std::unique_lock<std::mutex> "mutex" to select new leader. lock(m); Event ev = get_event_from_queue(); m.unlock(); handle_event(ev); }
Leader/followers (classical) user space kernel space epoll device queue driver thread 1 thread 2 irq
Leader/followers (classical) ● Literature: more fancy^Wsophisticated leader selection This does not change two fundamental facts: – The promoted follower will be temporarily woken, just to put itself back to sleep again – The last active thread cannot become leader again without another pointless wake up of the current leader to displace it ● Due to thread/CPU affinity one IPI per operation ● Particularly pathological for #threads = #CPUs
futex Linux system call for suspending/waking up threads based on an address ● futex(addr, FUTEX_WAIT, value) Atomically verifies that *addr == value and puts calling thread to sleep in "waiting at addr " state. Returns 0 if thread was put to sleep (and woken later). ● futex(addr, FUTEX_WAKE, count) Wakes up at most count threads in "waiting at addr " state ● futex(addr, FUTEX_REQUEUE, new_addr) Changes all threads currently in "waiting at addr " state into "waiting at new_addr " state
futex Implementing a mutex class mutex { void mutex::lock() { public : state_type current = state_.load(); for (;;) { void lock(); switch (current) { void unlock(); case unlocked: { private : if (state_.compare_exchange_weak( enum state_type { current, locked)) { return ;} unlocked = 0, break ; locked = 1, } locked_contention = 2 case locked: { }; if (!state_.compare_exchange_weak( std::atomic<state_type> current, locked_contetion) { state_; break ; ... } // fallthrough }; } case locked_contention { void mutex::unlock() { if (!futex(&state, FUTEX_WAIT, if (state_.exchange(unlocked) locked_contention)) { return ;} == locked_contention) { state_type current = state_.load(); futex(&state, FUTEX_WAKE, break ; 1); } } } } } }
futex FUTEX_REQUEUE comes into play to avoid a "thundering herd" problem with condition variables "Naive" wake up will template < typename X> class synchronized_queue { cause all threads to race public : template < typename Iter> acquiring the mutex, void enqueue_many(Iter begin, Iter end) { blocking all but one again std::unique_lock<std::mutex> lock(m_); at just this point. queue.insert(queue.end(), begin, end); c.notify_all(); "Requeue" allows to lock.unlock(); change the woken threads } from "waiting at condition X dequeue() { variable" state to "waiting std::unique_lock<std::mutex> lock(m_); at mutex" state and thus while (queue.empty()) { c.wait(lock);} avoids the thundering X result = std::move(queue.front();) herd. queue.pop_front(); lock.unlock(); return result; } private : std::mutex m_; std::condition_variable c_; std::list<X> queue_; };
kfutex ● Extension to allow futex signalling from kernel space – User space defines... ● Address of an atomic variable (doubles as futex location) ● Mutation protocol: Single parameterized atomic operation ● Wake up criterion: Single parameterized test of pre/post value – Kernel acts on these directives when signalling a kfutex ● Extension to bind kfutex signalling to kernel events – e.g. I/O readiness ● Peripherally related: Extension for event ringbuffer
kfutex-based notification user space kernel space setup steady state processing
kfutex-based notification user space kernel space setup steady state processing
leader/followers (futex) ● Bind event source to kfutex – "Leader" FUTEX_WAITs on this event futex – "Followers" FUTEX_WAIT on a private signalling futex each ● When leader receives an event – it FUTEX_REQUEUEs one of the followers to the event futex – begins handling event ● When thread finishes handling an event – either: waits on its private signalling futex – or : FUTEX_REQUEUEs current leader to its private signalling futex ("demotes") and becomes leader itself ● Leader selection policy in user space
Summary ● kfutex unifies inter-thread and kernel notification ● kfutex separates event notification/delivery – delivery suitably possible through e.g. lock-free ring buffers ● allows moving activation policy decisions to user space; avoids "useless" task wake ups ● efficiency gain by avoiding kernel entry in fast paths ● kernel implementation complexity to avoid "abuse" of kfutex side effects – futex key hash collisions, page pinning ● synchronization implementation complexity – lock-free kernel/user-space synchronization protocol
Recommend
More recommend