user level interprocess communication for shared memory
play

User-Level Interprocess Communication for Shared Memory - PowerPoint PPT Presentation

User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad Thomas E. Anderson Edward D. Lazowska Henry M. Levy Presented by: Dan Lake Introduction IPC is central to operating system design IPC is central


  1. User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad Thomas E. Anderson Edward D. Lazowska Henry M. Levy Presented by: Dan Lake

  2. Introduction • IPC is central to operating system design IPC is central to operating system design • Advantages of Decomposed Systems Advantages of Decomposed Systems • Failure Isolation (address space boundaries) Failure Isolation (address space boundaries) • Extensibility (add new modules) Extensibility (add new modules) • Modularity (interfaces enforced) Modularity (interfaces enforced) • Kernel traditionally responsible for IPC Kernel traditionally responsible for IPC • Kernel-based IPC has problems Kernel-based IPC has problems • Architectural performance barriers (LRPC 70%) Architectural performance barriers (LRPC 70%) • Interaction of kernel-IPC and user-level threads Interaction of kernel-IPC and user-level threads • Strong interdependencies Strong interdependencies • Cost of partitioning these facilities is high Cost of partitioning these facilities is high

  3. Solution For Shared Memory Multiprocessors • URPC (User Remote Procedure Calls) URPC (User Remote Procedure Calls) • Separete three components of IPC Separete three components of IPC a) Data transfer Data transfer b) Thread management Thread management c) Processor reallocation Processor reallocation • Goals Goals • Move a a & & b b into user-level into user-level Move • Limit the kernel to performing only Limit the kernel to performing only c c • Eliminate kernel from cross-address space Eliminate kernel from cross-address space communication communication

  4. Message Passing • Logical channels of pair-wise Logical channels of pair-wise shared memory shared memory • Channels created & mapped Channels created & mapped once once for every client/server for every client/server pairing pairing • Channels are bi-directional Channels are bi-directional • TSL controlls access in TSL controlls access in either direction either direction • Just as secure as going Just as secure as going through the kernel through the kernel

  5. Data & Security • Applications access URPC procedures through Applications access URPC procedures through Stubs layer Stubs layer • Stubs unmarshal data into procedure Stubs unmarshal data into procedure parameters parameters • Stubs copy data in/out, no direct use of shared Stubs copy data in/out, no direct use of shared memory memory • Arguments are passed in buffers that are Arguments are passed in buffers that are allocated and pair-wise mapped during binding allocated and pair-wise mapped during binding • Data queues monitored by application level Data queues monitored by application level thread management thread management

  6. Thread Management • LRPC: client threads always cross address- LRPC: client threads always cross address- space to server space to server • URPC: always try to reschedule another thread URPC: always try to reschedule another thread within address-space within address-space • Switching threads within the same address Switching threads within the same address space requires less overhead than processor space requires less overhead than processor reallocation reallocation • Synchronous from programmer pov, but Synchronous from programmer pov, but asynchronous to thread mgmt level asynchronous to thread mgmt level

  7. Processor Reallocation • Switching the processor between threads of Switching the processor between threads of different address spaces different address spaces • Requires privileged kernel mode to access Requires privileged kernel mode to access protected mapping registers protected mapping registers • Does include significant overhead Does include significant overhead • As pointed out in the LRPC paper As pointed out in the LRPC paper • URPC strives to avoid processor reallocation URPC strives to avoid processor reallocation • This avoidance can lead to substantial This avoidance can lead to substantial performance gains performance gains

  8. Optimistic Scheduling Policy • Assumptions Assumptions • Client has other work to do Client has other work to do • Server will soon have a processor to service a Server will soon have a processor to service a message message

  9. Sample Execution Timeline Optimistic Reallocation Scheduling Policy pending outgoing messages detected  FCMgr processor donated “Underpowered” 

  10. Why the optimistic approach doesn’t always hold • This approach does not work as well when the This approach does not work as well when the application application • Runs as a single thread Runs as a single thread • Is Real time Is Real time • Has high latency I/O Has high latency I/O • Priority Invocations Priority Invocations • URPC solves some of these problems by URPC solves some of these problems by allowing forced processor reallocation even if allowing forced processor reallocation even if there is still work to do there is still work to do

  11. Kernel Handles Processor Reallocation • URPC handles this through call called URPC handles this through call called “ “Processor.Donate” Processor.Donate” • This passes control of an idle processor down This passes control of an idle processor down to the kernel, and then back up to a specified to the kernel, and then back up to a specified address in the receiving space address in the receiving space

  12. Voluntary Return of Processors • The policy of a URPC server process: The policy of a URPC server process: “…Upon receipt of a processor from a client Upon receipt of a processor from a client “… address, return the processor when all address, return the processor when all outstanding messages from the client have outstanding messages from the client have generated replies, or when the server generated replies, or when the server determines that the client has become determines that the client has become ‘underpowered’….” ‘underpowered’….”

  13. Parallels to User Threads Paper • Even though URPC implement a policy/protocol, Even though URPC implement a policy/protocol, there is absolutely no way to enforce it. This there is absolutely no way to enforce it. This has the potential to lead to some interesting has the potential to lead to some interesting side effects. side effects. • This is similar to some of the problems This is similar to some of the problems discussed in the User Threads paper discussed in the User Threads paper • For example, a server thread could conceivably For example, a server thread could conceivably continue to hold a donated processor and handle continue to hold a donated processor and handle requests from other clients requests from other clients

  14. What this leads to… • Starvation Starvation • URPC handles this by only directly reallocating URPC handles this by only directly reallocating processors to load balance. processors to load balance. • The system also needs the notion of preemptive The system also needs the notion of preemptive reallocation reallocation • The Preemptive reallocation must also adhere to The Preemptive reallocation must also adhere to • No higher priority thread waits while a lower priority thread No higher priority thread waits while a lower priority thread runs runs • No processor idles when there is work for it to do (even if No processor idles when there is work for it to do (even if the work is in another address space) the work is in another address space)

  15. Performance Note: Table II results are independent of load

  16. Performance • Latency is proportional to the number of threads per cpu’s Latency is proportional to the number of threads per cpu’s • T = C = S = 1 call latency is 93 microseconds T = C = S = 1 call latency is 93 microseconds • T = 2, C =1, S = 1, latency increases to 112 microseconds however, T = 2, C =1, S = 1, latency increases to 112 microseconds however, throughput raises 75% (benefits of parallelism) throughput raises 75% (benefits of parallelism) • Call latency effectively reduced to 53 microseconds Call latency effectively reduced to 53 microseconds • C = 1, S = 0, worst performance C = 1, S = 0, worst performance • In both cases, C = 2, S = 2 yields best performance In both cases, C = 2, S = 2 yields best performance

  17. Performance • Worst case URPC latency for one thread is 375 us Worst case URPC latency for one thread is 375 us • Similar hardware, LRPC call latency is 157 us Similar hardware, LRPC call latency is 157 us • Reasons: Reasons: • URPC requires two level scheduling URPC requires two level scheduling • URPC ‘s low level scheduling is done by LRPC URPC ‘s low level scheduling is done by LRPC • Small price considering possible gains, this is necessary Small price considering possible gains, this is necessary to have high level scheduling to have high level scheduling

Recommend


More recommend