W ORKLOAD LOAD C HARACTER OF I NTE VE C LOUD UD S ERVIC CTERIZ IZATION ATION OF TERACTI CTIVE ICES ON ON B IG AND S MALL S ERVER P LATF IG AND TFOR ORMS MS Shuang Chen *, Shay Galon**, Christina Delimitrou*, Srilatha Manne**, and José Martínez* *Cornell University **Cavium Inc.
E XE VE S UMM XECU CUTI TIVE MMARY How to achieve low tail latency for interactive cloud services? • Tail latency more important and challenging • The entire stack from SW to HW is involved Understand how tail latency reacts to application and system changes • See how current designs work • Get insights on future designs Introduction • Characterization• Implications Page 1 of 20
M OTI TIVATION VATION Introduction • Characterization• Implications Page 2 of 20
L OW OW L ATENCY CY Tail latency • e.g., QoS defined as 99 th %ile in 500usec 0.99 0.99 5 = 0.95 0.99 0.99 0.99 0.99 Introduction • Characterization• Implications Page 3 of 20
L OW OW T AIL IL L ATE NCY R EQUIR TENCY IREM EMEN ENTS TS The entire stack from SW to HW is involved • Application bottleneck • Different user cases Application • Scalability Resource Manager • Overhead of virtualization Virtualization • SW isolation mechanisms OS • Overhead of context switching • HW isolation mechanisms Hardware • Hyperthreading Introduction • Characterization• Implications Page 4 of 20
C ATE ZE LC C APPLICATI TEGORI GORIZE ICATIONS ONS By requirement of tail latency • us: memcached • ms: web server, in-memory database • s: persistent database By statefulness • Stateful: memcached • Stateless: web server Introduction • Characterization• Implications Page 5 of 20
S ELECT ED LC W C W ORKLOADS CTED LOADS QoS NGINX Strictness • Web server • Stateless Memcached • 99 th % in tens of ms Statefulness Memcached NGINX • Key-value store • Stateful • 99 th % in hundreds of us Introduction • Characterization• Implications Page 6 of 20
S ERVER A RCH CHIT ITECTUR CTURE P P … 22 Cores … … 48 Cores … P P 2 Threads/Core 1 Thread/Core L1 I/D: L1 I/D: 32 /32KB 32/32KB L1 I/D: L1 I/D: … 78/32KB 78/32KB … L2: 256KB L2: 256KB LLC: 55MB, 20 ways LLC: 16MB, 16 ways 14nm 28nm Memory: 128G DDR4 Memory: 128G DDR4 NIC: 10Gbps NIC: 10Gbps Cavium ThunderX Intel Xeon E5-2699 v4 $4,115 $785 Introduction • Characterization• Implications Page 7 of 20
S TU IED P ARAME TUDIE METE TERS RS • Application bottleneck • Different user cases Application • Scalability Resource Manager • Overhead of virtualization Virtualization • SW isolation mechanisms OS • Overhead of context switching • HW isolation mechanisms Hardware • Hyperthreading Introduction • Characterization • Implications Page 8 of 20
I NPUT L OAD Xeon 5.2x 5x ThunderX Memcached NGINX Introduction • Characterization • Implications Page 9 of 20
M EMC HED L ATE CY D ECOMP MCAC ACHE TENCY COMPOSITI OSITION ON IRQ RX Kernel NIC NIC Syscall Receive Send User Little user-space processing At 10% of max throughput Xeon 6 3111 5 Network delay ThunderX 14 4 5 9 7 24 2x slower than Xeon At 90% of max throughput Queuing delay Xeon 6 782 1,009 3 15 ThunderX 7 1,290 1,650 20 24 14 Introduction • Characterization • Implications Page 10 of 20
S TU IED P ARAME TUDIE METE TERS RS • Application bottleneck • Different user cases Application • Scalability Resource Manager • Overhead of virtualization Virtualization • SW isolation mechanisms OS • Overhead of context switching • HW isolation mechanisms Hardware • Hyperthreading Introduction • Characterization • Implications Page 11 of 20
M EMC HED V ALUE S IZE MCAC ACHE IZE ThunderX Xeon • Memory copy • Network processing and transmission • ThunderX is more sensitive Introduction • Characterization • Implications Page 12 of 20
N UMB OF M EMC ED I TE MBER ER OF MCAC ACHED TEMS MS ThunderX Xeon • Cache capacity • ThunderX is more sensitive Introduction • Characterization • Implications Page 13 of 20
S TU IED P ARAME TUDIE METE TERS RS • Application bottleneck • Different user cases Application • Scalability Resource Manager • Overhead of virtualization Virtualization • SW isolation mechanisms OS • Overhead of context switching • HW isolation mechanisms Hardware • Hyperthreading Introduction • Characterization • Implications Page 14 of 20
S CA CALABI ABILIT LITY Memcached NGINX • Interrupt handling • Load imbalance • Lock contention Introduction • Characterization • Implications Page 15 of 20
S TU IED P ARAME TUDIE METE TERS RS • Application bottleneck • Different user cases Application • Scalability Resource Manager • Overhead of virtualization Virtualization • SW isolation mechanisms OS • Overhead of context switching • HW isolation mechanisms Hardware • Hyperthreading Introduction • Characterization • Implications Page 16 of 20
C ONTE XT S WIT TEXT ITCH CHIN ING Memcached on Xeon Memcached on ThunderX • Statically spawned threads VS dynamically allocated cores • ThunderX is more sensitive Introduction • Characterization • Implications Page 17 of 20
S TU IED P ARAME TUDIE METE TERS RS • Application bottleneck • Different user cases Application • Scalability Resource Manager • Overhead of virtualization Virtualization • SW isolation mechanisms OS • Overhead of context switching • HW isolation mechanisms Hardware • Hyperthreading Introduction • Characterization • Implications Page 18 of 20
HYP YPERTH THREADING ADING Reduce the overhead of context switching • Allocate two threads on two hyperthreads Make better use of execution units • Co-locate different applications Memcached & Nginx on Memcached & Nginx on the same hyperthreads different hyperthreads Introduction • Characterization • Implications Page 19 of 20
Q UESTIONS TIONS ? I MP OF T HESE S TU MPLIC ICAT ATION IONS OF TUDIE IES Reduce queuing delays Improve elasticity Application • Lock alternatives • Load balance Resource Manager Reduce the overhead of virtualization Virtualization Avoid context switching OS Make best use of SW isolation mechanisms Big VS Small Cores Hardware Make best use of HW features Introduction • Characterization • Implications Page 20 of 20
Recommend
More recommend