interactive cloud services
play

interactive cloud services? Tail latency more important and - PowerPoint PPT Presentation

W ORKLOAD LOAD C HARACTER OF I NTE VE C LOUD UD S ERVIC CTERIZ IZATION ATION OF TERACTI CTIVE ICES ON ON B IG AND S MALL S ERVER P LATF IG AND TFOR ORMS MS Shuang Chen *, Shay Galon**, Christina Delimitrou*, Srilatha Manne**, and Jos


  1. W ORKLOAD LOAD C HARACTER OF I NTE VE C LOUD UD S ERVIC CTERIZ IZATION ATION OF TERACTI CTIVE ICES ON ON B IG AND S MALL S ERVER P LATF IG AND TFOR ORMS MS Shuang Chen *, Shay Galon**, Christina Delimitrou*, Srilatha Manne**, and José Martínez* *Cornell University **Cavium Inc.

  2. E XE VE S UMM XECU CUTI TIVE MMARY  How to achieve low tail latency for interactive cloud services? • Tail latency more important and challenging • The entire stack from SW to HW is involved  Understand how tail latency reacts to application and system changes • See how current designs work • Get insights on future designs Introduction • Characterization• Implications Page 1 of 20

  3. M OTI TIVATION VATION Introduction • Characterization• Implications Page 2 of 20

  4. L OW OW L ATENCY CY  Tail latency • e.g., QoS defined as 99 th %ile in 500usec 0.99 0.99 5 = 0.95 0.99 0.99 0.99 0.99 Introduction • Characterization• Implications Page 3 of 20

  5. L OW OW T AIL IL L ATE NCY R EQUIR TENCY IREM EMEN ENTS TS  The entire stack from SW to HW is involved • Application bottleneck • Different user cases Application • Scalability Resource Manager • Overhead of virtualization Virtualization • SW isolation mechanisms OS • Overhead of context switching • HW isolation mechanisms Hardware • Hyperthreading Introduction • Characterization• Implications Page 4 of 20

  6. C ATE ZE LC C APPLICATI TEGORI GORIZE ICATIONS ONS  By requirement of tail latency • us: memcached • ms: web server, in-memory database • s: persistent database  By statefulness • Stateful: memcached • Stateless: web server Introduction • Characterization• Implications Page 5 of 20

  7. S ELECT ED LC W C W ORKLOADS CTED LOADS QoS  NGINX Strictness • Web server • Stateless Memcached • 99 th % in tens of ms Statefulness  Memcached NGINX • Key-value store • Stateful • 99 th % in hundreds of us Introduction • Characterization• Implications Page 6 of 20

  8. S ERVER A RCH CHIT ITECTUR CTURE P P … 22 Cores … … 48 Cores … P P 2 Threads/Core 1 Thread/Core L1 I/D: L1 I/D: 32 /32KB 32/32KB L1 I/D: L1 I/D: … 78/32KB 78/32KB … L2: 256KB L2: 256KB LLC: 55MB, 20 ways LLC: 16MB, 16 ways 14nm 28nm Memory: 128G DDR4 Memory: 128G DDR4 NIC: 10Gbps NIC: 10Gbps Cavium ThunderX Intel Xeon E5-2699 v4 $4,115 $785 Introduction • Characterization• Implications Page 7 of 20

  9. S TU IED P ARAME TUDIE METE TERS RS • Application bottleneck • Different user cases Application • Scalability Resource Manager • Overhead of virtualization Virtualization • SW isolation mechanisms OS • Overhead of context switching • HW isolation mechanisms Hardware • Hyperthreading Introduction • Characterization • Implications Page 8 of 20

  10. I NPUT L OAD Xeon 5.2x 5x ThunderX Memcached NGINX Introduction • Characterization • Implications Page 9 of 20

  11. M EMC HED L ATE CY D ECOMP MCAC ACHE TENCY COMPOSITI OSITION ON IRQ RX Kernel NIC NIC Syscall Receive Send User Little user-space processing At 10% of max throughput Xeon 6 3111 5 Network delay ThunderX 14 4 5 9 7 24 2x slower than Xeon At 90% of max throughput Queuing delay Xeon 6 782 1,009 3 15 ThunderX 7 1,290 1,650 20 24 14 Introduction • Characterization • Implications Page 10 of 20

  12. S TU IED P ARAME TUDIE METE TERS RS • Application bottleneck • Different user cases Application • Scalability Resource Manager • Overhead of virtualization Virtualization • SW isolation mechanisms OS • Overhead of context switching • HW isolation mechanisms Hardware • Hyperthreading Introduction • Characterization • Implications Page 11 of 20

  13. M EMC HED V ALUE S IZE MCAC ACHE IZE ThunderX Xeon • Memory copy • Network processing and transmission • ThunderX is more sensitive Introduction • Characterization • Implications Page 12 of 20

  14. N UMB OF M EMC ED I TE MBER ER OF MCAC ACHED TEMS MS ThunderX Xeon • Cache capacity • ThunderX is more sensitive Introduction • Characterization • Implications Page 13 of 20

  15. S TU IED P ARAME TUDIE METE TERS RS • Application bottleneck • Different user cases Application • Scalability Resource Manager • Overhead of virtualization Virtualization • SW isolation mechanisms OS • Overhead of context switching • HW isolation mechanisms Hardware • Hyperthreading Introduction • Characterization • Implications Page 14 of 20

  16. S CA CALABI ABILIT LITY Memcached NGINX • Interrupt handling • Load imbalance • Lock contention Introduction • Characterization • Implications Page 15 of 20

  17. S TU IED P ARAME TUDIE METE TERS RS • Application bottleneck • Different user cases Application • Scalability Resource Manager • Overhead of virtualization Virtualization • SW isolation mechanisms OS • Overhead of context switching • HW isolation mechanisms Hardware • Hyperthreading Introduction • Characterization • Implications Page 16 of 20

  18. C ONTE XT S WIT TEXT ITCH CHIN ING Memcached on Xeon Memcached on ThunderX • Statically spawned threads VS dynamically allocated cores • ThunderX is more sensitive Introduction • Characterization • Implications Page 17 of 20

  19. S TU IED P ARAME TUDIE METE TERS RS • Application bottleneck • Different user cases Application • Scalability Resource Manager • Overhead of virtualization Virtualization • SW isolation mechanisms OS • Overhead of context switching • HW isolation mechanisms Hardware • Hyperthreading Introduction • Characterization • Implications Page 18 of 20

  20. HYP YPERTH THREADING ADING  Reduce the overhead of context switching • Allocate two threads on two hyperthreads  Make better use of execution units • Co-locate different applications Memcached & Nginx on Memcached & Nginx on the same hyperthreads different hyperthreads Introduction • Characterization • Implications Page 19 of 20

  21. Q UESTIONS TIONS ? I MP OF T HESE S TU MPLIC ICAT ATION IONS OF TUDIE IES  Reduce queuing delays  Improve elasticity Application • Lock alternatives • Load balance Resource Manager  Reduce the overhead of virtualization Virtualization  Avoid context switching OS  Make best use of SW isolation mechanisms  Big VS Small Cores Hardware  Make best use of HW features Introduction • Characterization • Implications Page 20 of 20

Recommend


More recommend