core aware scheduling balancing application parallelism
play

Core-Aware Scheduling: Balancing Application Parallelism with Core - PowerPoint PPT Presentation

Core-Aware Scheduling: Balancing Application Parallelism with Core Availability Henry Qin Advisor: John Ousterhout Febuary 2, 2016 1 / 15 Introduction Motivation: Inefficient core and thread management Hard to get high throughput in low


  1. Core-Aware Scheduling: Balancing Application Parallelism with Core Availability Henry Qin Advisor: John Ousterhout Febuary 2, 2016 1 / 15

  2. Introduction Motivation: Inefficient core and thread management Hard to get high throughput in low latency services Difficult to match application parallelism to available cores. Proposal: Core-Aware Scheduling Thread scheduling moves to user level Kernel allocates cores to applications 2 / 15

  3. Outline Motivation Proposal for Core-Aware Scheduling Related Work Current Status Request for Feedback 3 / 15

  4. A Throughput Problem RAMCloud write requests must make replication requests to backup servers, and wait for their return. RAMCloud uses polling to avoid expensive kernel thread switches and and kernel bypass to avoid system calls. When the master runs out of CPU cores it must cease processing requests. 4 / 15

  5. Core Exhaustion Bottleneck New write request, no cores to write to local log. Master Replication Rpc Write to Log Backup Backup Backup 5 / 15

  6. What happens under load? Backups are slower to respond, since they coexist with masters. Write requests wait even longer for backups, spinning cores for even longer. 6 / 15

  7. 7 / 15

  8. Match application parallelism to available cores Application servers can have many threads running such as log cleaners, worker threads, and failure detection threads. We want to neither overcommit nor undercommit cores. Overcommit cores ==> undesirable kernel multiplexing because there are multiple kernel threads per core Under commit cores ==> idle cores. When the log cleaner needs to run, we would like to scale down the number of worker threads so that we do not exceed available cores. 8 / 15

  9. Core-Aware Scheduling: Kernel Core Allocator Kernel scheduler class which allocates cores to applications on request. In general, kernel never preempts a thread running on the cores it has allocated to the process. Allow kernel to safely multiplex latency-sensitive applications with CPU-bound batch jobs. Latency-sensitive applications can request only as many cores as they need, and give up cores when they no longer need it. 9 / 15

  10. Core-Aware Scheduling: Userland Scheduler Fast context switches enable practical core multiplexing in a low-latency system. Manage thread priorities and parallelism level based on application-specified policies. User-level scheduler requests dedicated cores from the OS, and always knows exactly how many cores it has. 10 / 15

  11. Preempted Questions How will you handle system calls for blocking IO? Why is thread pinning insufficient? 11 / 15

  12. Related Work Scheduler Activations inspired this work but it not sufficiently core-aware because the kernel makes too many scheduling decisions. 12 / 15

  13. Related Work Scheduler Activations inspired this work but it not sufficiently core-aware because the kernel makes too many scheduling decisions. Linux cgroups do not allow support the dedicated allocation of specific cores. 12 / 15

  14. Related Work Scheduler Activations inspired this work but it not sufficiently core-aware because the kernel makes too many scheduling decisions. Linux cgroups do not allow support the dedicated allocation of specific cores. Cappricio does not support multicore. 12 / 15

  15. Related Work Scheduler Activations inspired this work but it not sufficiently core-aware because the kernel makes too many scheduling decisions. Linux cgroups do not allow support the dedicated allocation of specific cores. Cappricio does not support multicore. Go does not address the core allocation problem; no mechanism to communicate with kernel for dedicated cores. 12 / 15

  16. Related Work Scheduler Activations inspired this work but it not sufficiently core-aware because the kernel makes too many scheduling decisions. Linux cgroups do not allow support the dedicated allocation of specific cores. Cappricio does not support multicore. Go does not address the core allocation problem; no mechanism to communicate with kernel for dedicated cores. Cilk requires user threads to be non-blocking. 12 / 15

  17. Related Work Scheduler Activations inspired this work but it not sufficiently core-aware because the kernel makes too many scheduling decisions. Linux cgroups do not allow support the dedicated allocation of specific cores. Cappricio does not support multicore. Go does not address the core allocation problem; no mechanism to communicate with kernel for dedicated cores. Cilk requires user threads to be non-blocking. OpenMP supports neither core allocation nor explicit management of thread scheduling. 12 / 15

  18. Current Status Implemented a simple user-level dispatcher. Measured a single direction context switch with no cache pollution at 9 ns on an Intel(R) Xeon(R) CPU X3470 @ 2.93GHz 13 / 15

  19. Request for Feedback 14 / 15

  20. Request for Feedback Do you know of a threading system that solves these problems of core allocation and fast context switching practically and cleanly? 14 / 15

  21. Request for Feedback Do you know of a threading system that solves these problems of core allocation and fast context switching practically and cleanly? Have you ever measured the core utilization over short time intervals (ms and s) on your large-scale systems? 14 / 15

  22. Request for Feedback Do you know of a threading system that solves these problems of core allocation and fast context switching practically and cleanly? Have you ever measured the core utilization over short time intervals (ms and s) on your large-scale systems? Do you have dedicated hardware or shared machines? 14 / 15

  23. Request for Feedback Do you know of a threading system that solves these problems of core allocation and fast context switching practically and cleanly? Have you ever measured the core utilization over short time intervals (ms and s) on your large-scale systems? Do you have dedicated hardware or shared machines? How do you decide on the number of OS threads for an application? 14 / 15

  24. Request for Feedback Do you know of a threading system that solves these problems of core allocation and fast context switching practically and cleanly? Have you ever measured the core utilization over short time intervals (ms and s) on your large-scale systems? Do you have dedicated hardware or shared machines? How do you decide on the number of OS threads for an application? What is the relationship between this number and the number of cores on the machine? 14 / 15

  25. Thank You! If we did not talk at the poster session, please find me at the reception! Send mail to hq6@cs.stanford.edu Questions? 15 / 15

Recommend


More recommend