Problem Statement Fred Runtime Evaluation Wrap Up User-level Threading: Have Your Cake and Eat It Too Martin Karsten and Saman Barghi David R. Cheriton School of Computer Science University of Waterloo June 2020 SIGMETRICS 2020 1/27
Problem Statement Fred Runtime Evaluation Wrap Up Motivation application programming paradigms • network service handling concurrent sessions SIGMETRICS 2020 2/27
Problem Statement Fred Runtime Evaluation Wrap Up Motivation application programming paradigms • network service handling concurrent sessions event-based programming • explicit state management • asynchronous control flow → callback hell SIGMETRICS 2020 2/27
Problem Statement Fred Runtime Evaluation Wrap Up Motivation application programming paradigms • network service handling concurrent sessions event-based programming • explicit state management • asynchronous control flow → callback hell thread-per-session programming • automatic state management • synchronous control flow SIGMETRICS 2020 2/27
Problem Statement Fred Runtime Evaluation Wrap Up Motivation application programming paradigms • network service handling concurrent sessions event-based programming • explicit state management • asynchronous control flow → callback hell thread-per-session programming • automatic state management • synchronous control flow ⇒ performance ? SIGMETRICS 2020 2/27
Problem Statement Fred Runtime Evaluation Wrap Up Background parallel hardware → threads & synchronization SIGMETRICS 2020 3/27
Problem Statement Fred Runtime Evaluation Wrap Up Background parallel hardware → threads & synchronization kernel thread caveats • limit: typically 10Ks • (some) execution overhead • complex scheduling for fairness & control SIGMETRICS 2020 3/27
Problem Statement Fred Runtime Evaluation Wrap Up Background parallel hardware → threads & synchronization kernel thread caveats • limit: typically 10Ks • (some) execution overhead • complex scheduling for fairness & control ⇒ user-level threads! • key aspect: scheduling • requirement: user-level I/O blocking SIGMETRICS 2020 3/27
Problem Statement Fred Runtime Evaluation Wrap Up Take Away user-level threads • similar throughput to event-based programming • load balancing can sometimes reduce tail latency SIGMETRICS 2020 4/27
Problem Statement Fred Runtime Evaluation Wrap Up Take Away user-level threads • similar throughput to event-based programming • load balancing can sometimes reduce tail latency kernel threads not that bad either • up to a limit SIGMETRICS 2020 4/27
Problem Statement Fred Runtime Evaluation Wrap Up Take Away user-level threads • similar throughput to event-based programming • load balancing can sometimes reduce tail latency kernel threads not that bad either • up to a limit Fred Runtime rules! SIGMETRICS 2020 4/27
Problem Statement Fred Runtime Evaluation Wrap Up Table of Contents 1 Problem Statement 2 Fred Runtime 3 Evaluation 4 Wrap Up SIGMETRICS 2020 5/27
Problem Statement Fred Runtime Evaluation Wrap Up Problem Statement minimum overhead of user-level threading? SIGMETRICS 2020 6/27
Problem Statement Fred Runtime Evaluation Wrap Up Problem Statement minimum overhead of user-level threading? roadmap • build minimum viable user-level threading runtime • compare to state of the art threading runtimes • evaluate production-grade application SIGMETRICS 2020 6/27
Problem Statement Fred Runtime Evaluation Wrap Up Approach Application Application vs Event Handling Thread Runtime SIGMETRICS 2020 7/27
Problem Statement Fred Runtime Evaluation Wrap Up Approach Application Application vs Event Handling Thread Runtime Memcached - in-memory key/value store • minimum port to thread-per-session • fully preserved state machine • no structural benefits SIGMETRICS 2020 7/27
Problem Statement Fred Runtime Evaluation Wrap Up Table of Contents 1 Problem Statement 2 Fred Runtime 3 Evaluation 4 Wrap Up SIGMETRICS 2020 8/27
Problem Statement Fred Runtime Evaluation Wrap Up Scheduler performance: simple and lightweight scalability: local queueing effectiveness: load sharing efficiency: idle-sleep SIGMETRICS 2020 9/27
Problem Statement Fred Runtime Evaluation Wrap Up Inverse Shared Ready Stack Ready−Queue 1 benaphore processor ring (for stealing) Processor 1 V() fred Ready−Queue 2 counter P() Processor 2 Ready−Queue 3 Processor 3 Staging−Queue waiting processors "processor ready−stack" SIGMETRICS 2020 10/27
Problem Statement Fred Runtime Evaluation Wrap Up I/O Blocking automatically suspend thread during I/O wait essential for synchronous control flow suspend/resume user-level thread • user-level synchronization primitives • OS-level notifications SIGMETRICS 2020 11/27
Problem Statement Fred Runtime Evaluation Wrap Up I/O Notifications poller input OS query event loop epoll/kqueue interest set output freds I/O Synchronization Vector (indexed by FD) SIGMETRICS 2020 12/27
Problem Statement Fred Runtime Evaluation Wrap Up Table of Contents 1 Problem Statement 2 Fred Runtime 3 Evaluation 4 Wrap Up SIGMETRICS 2020 13/27
Problem Statement Fred Runtime Evaluation Wrap Up Threading Benchmarks comparison of 9 different threading runtimes performance & scalability problems • Arachne, Mordor, µ C++ efficiency problems • Arachne, Boost, Qthreads • busy-looping scheduler solid results • Fred, Libfiber, Pthreads • Go: higher constant scheduling overhead SIGMETRICS 2020 14/27
Problem Statement Fred Runtime Evaluation Wrap Up Performance 10 Libfiber Qthreads Fred Throughput x10 7 (32 Cores) 8 Pthread Go Boost 6 Arachne Mordor uC++ 4 2 0 0 5 10 15 20 25 30 35 40 Duration of Each Work Unit (us) SIGMETRICS 2020 15/27
Problem Statement Fred Runtime Evaluation Wrap Up Efficiency 300 Libfiber Pthread Arachne Qthreads Go Mordor 250 Fred Boost uC++ Cost of Iteration (us) 200 150 100 50 0 0 5 10 15 20 25 30 Core Count SIGMETRICS 2020 16/27
Problem Statement Fred Runtime Evaluation Wrap Up I/O Benchmarks I/O stress test for Fred, Go, Libfiber, Pthread compared to best-in-class event-based server • Libfiber breaks • Go and Pthread limited • only Fred competitive SIGMETRICS 2020 17/27
Problem Statement Fred Runtime Evaluation Wrap Up I/O Scalability 1600 ULib Fred (8 poller freds) Request Throughput (x1000/sec) 1400 Pthread Go 1200 uC++ 1000 800 600 400 200 0 0 5 10 15 20 25 30 Cores SIGMETRICS 2020 18/27
Problem Statement Fred Runtime Evaluation Wrap Up Application Benchmarks SIGMETRICS 2020 19/27
Problem Statement Fred Runtime Evaluation Wrap Up Application Benchmarks only Fred competitive with original Memcached tail latency results from Arachne paper • only apply to special case: #RX queues < #cores • performance of Pthread for low connection count! SIGMETRICS 2020 19/27
Problem Statement Fred Runtime Evaluation Wrap Up Throughput 800 Fred Vanilla 700 Query Throughput (x1000/sec) Pthread Arachne 600 Fred (shared RQ) 500 400 300 200 100 0 0 2 4 6 8 10 12 14 16 Cores SIGMETRICS 2020 20/27
Problem Statement Fred Runtime Evaluation Wrap Up Throughput - more connections 700 Fred Vanilla 600 Query Throughput (x1000/sec) Pthread Fred (shared RQ) 500 Arachne 400 300 200 100 0 0 2 4 6 8 10 12 14 16 Cores SIGMETRICS 2020 21/27
Problem Statement Fred Runtime Evaluation Wrap Up Tail Latency: Arachne Results 10000 Vanilla (pin/rfs) Read Latency (us), 99th Percentile Fred (pin) Arachne Pthread (rfs) 1000 100 10 0 200 400 600 800 1000 Query Throughput (x1000) SIGMETRICS 2020 22/27
Problem Statement Fred Runtime Evaluation Wrap Up Tail Latency: Explanation original experiment: 8 RX queues for 12 cores head-of-line blocking? modified setup: 16 RX queues for 12 cores tail latency discrepancies largely gone... SIGMETRICS 2020 23/27
Problem Statement Fred Runtime Evaluation Wrap Up Tail Latency: Regular 10000 Vanilla (pin) Read Latency (us), 99th Percentile Fred (pin) Arachne Pthread 1000 100 10 0 200 400 600 800 1000 Query Throughput (x1000) SIGMETRICS 2020 24/27
Problem Statement Fred Runtime Evaluation Wrap Up Tail Latency: Higher Connection Count 1,536 → 7,680 connections 100000 Vanilla (pin) Read Latency (us), 99th Percentile Fred (pin) Arachne Pthread 10000 1000 100 10 0 100 200 300 400 500 600 700 800 900 Query Throughput (x1000) SIGMETRICS 2020 25/27
Problem Statement Fred Runtime Evaluation Wrap Up Table of Contents 1 Problem Statement 2 Fred Runtime 3 Evaluation 4 Wrap Up SIGMETRICS 2020 26/27
Problem Statement Fred Runtime Evaluation Wrap Up Wrap Up Fred: nimble user-level threading runtime comprehensive performance evaluation user-level threading possible at low overhead scenarios with improved performance? Fred currently the best reference platform SIGMETRICS 2020 27/27
Recommend
More recommend