Solros: A Data-Centric Operating System Architecture for Heterogeneous Computing Changwoo Min , Woonhak Kang, Mohan Kumar, Sanidhya Kashyap, Steffen Maass, Heeseung Jo, Taesoo Kim Virginia Tech, eBay, Georgia Tech, Chonbuk National University April 26, 2018 Changwoo Min Solros: Data-Centric OS April 26, 2018 1 / 21
Cambrian Explosion of Processor Architecture Specialization of general-purpose processors Changwoo Min Solros: Data-Centric OS April 26, 2018 2 / 21
Cambrian Explosion of Processor Architecture Specialization of general-purpose processors Generalization of co-processors Changwoo Min Solros: Data-Centric OS April 26, 2018 2 / 21
Cambrian Explosion of Processor Architecture Specialization of general-purpose processors Generalization of co-processors Specialization of co-processors Changwoo Min Solros: Data-Centric OS April 26, 2018 2 / 21
Blazingly fast IO Devices Blazingly fast storage/memory Changwoo Min Solros: Data-Centric OS April 26, 2018 3 / 21
Blazingly fast IO Devices Blazingly fast storage/memory Blazingly fast network Changwoo Min Solros: Data-Centric OS April 26, 2018 3 / 21
Blazingly fast IO Devices How to exploit the full potential of such hardware devices without pain? System-wide performance Ease of programming Blazingly fast storage/memory Blazingly fast network Changwoo Min Solros: Data-Centric OS April 26, 2018 3 / 21
Outline Heterogeneous Computing Architectures 1 Solros: Split-Kernel Approach 2 Solros Architecture Operating System Services Evaluation 3 Changwoo Min Solros: Data-Centric OS April 26, 2018 4 / 21
Host-Centric Approach Host OS controls co-processors and IO devices Examples: OpenCL, CUDA Host processor Application OS Mem Core I/O device SSD / NIC Co-processor Application control Mem Core data Changwoo Min Solros: Data-Centric OS April 26, 2018 5 / 21
Host-Centric Approach Host OS controls co-processors and IO devices Examples: OpenCL, CUDA Host processor Application ① OS Mem Core I/O device SSD / NIC Co-processor Application control Mem Core data Changwoo Min Solros: Data-Centric OS April 26, 2018 5 / 21
Host-Centric Approach Host OS controls co-processors and IO devices Examples: OpenCL, CUDA Host processor Application ① OS Mem Core ② I/O device SSD / NIC Co-processor Application control Mem Core data Changwoo Min Solros: Data-Centric OS April 26, 2018 5 / 21
Host-Centric Approach Host OS controls co-processors and IO devices Examples: OpenCL, CUDA Host processor Application ① OS Mem Core ② I/O device SSD / NIC Co-processor Application ③ control Mem Core data Changwoo Min Solros: Data-Centric OS April 26, 2018 5 / 21
Host-Centric Approach Host OS controls co-processors and IO devices Examples: OpenCL, CUDA Host processor Application ① OS Mem Core ② I/O device SSD / NIC Co-processor Application ③ control Mem Core data Problem Redundant data communication Complex to program and hard to optimize Changwoo Min Solros: Data-Centric OS April 26, 2018 5 / 21
Coprocessor-Centric Architecture Co-processors control IO devices Examples: Xeon Phi (Linux), GPUfs [ASPLOS13], GPUNet [OSDI14] Host processor Application OS Mem Core I/O device SSD / NIC Co-processor Application OS control Mem Core data Changwoo Min Solros: Data-Centric OS April 26, 2018 6 / 21
Coprocessor-Centric Architecture Co-processors control IO devices Examples: Xeon Phi (Linux), GPUfs [ASPLOS13], GPUNet [OSDI14] Host processor Application OS Mem Core I/O device SSD / NIC Co-processor Application ① OS control Mem Core data Changwoo Min Solros: Data-Centric OS April 26, 2018 6 / 21
Coprocessor-Centric Architecture Co-processors control IO devices Examples: Xeon Phi (Linux), GPUfs [ASPLOS13], GPUNet [OSDI14] Host processor Application OS Mem Core I/O device SSD / NIC Co-processor Application ① OS ② control Mem Core data Changwoo Min Solros: Data-Centric OS April 26, 2018 6 / 21
Coprocessor-Centric Architecture Co-processors control IO devices Examples: Xeon Phi (Linux), GPUfs [ASPLOS13], GPUNet [OSDI14] Host processor Application OS Mem Core I/O device SSD / NIC Co-processor Application ① OS ② control Mem Core data Problem Significant effort required for porting IO stack to co-processor Not completely exploiting powerful host processors Changwoo Min Solros: Data-Centric OS April 26, 2018 6 / 21
Outline Heterogeneous Computing Architectures 1 Solros: Split-Kernel Approach 2 Solros Architecture Operating System Services Evaluation 3 Changwoo Min Solros: Data-Centric OS April 26, 2018 7 / 21
Solros Goal Ease of programming Best use of processor architecture System-wide optimization Changwoo Min Solros: Data-Centric OS April 26, 2018 8 / 21
Solros Goal Ease of programming Best use of processor architecture System-wide optimization Challenge Co-processor needs IO abstraction IO stacks is branch-divergent and difficult to parallelize It needs system-wide information Changwoo Min Solros: Data-Centric OS April 26, 2018 8 / 21
Solros Architecture Split-Kernel Architecture Data-plane OS Runs on a co-processor Provides IO abstraction Delegates actual IO operations to a control-plane OS Control-plane OS Runs on a host processor Runs actual IO stack Performs system-wide coordination Changwoo Min Solros: Data-Centric OS April 26, 2018 9 / 21
Solros Architecture Control-plane OS: actual OS service + system-wide coordination Data-plane OS: thin communication layer to host processor Co-processor Host processor Application Application OS stub OS proxy Policy Core Mem Mem Core I/O device SSD / NIC control data Changwoo Min Solros: Data-Centric OS April 26, 2018 10 / 21
Solros Architecture Control-plane OS: actual OS service + system-wide coordination Data-plane OS: thin communication layer to host processor Co-processor Host processor Application Application ① OS stub OS proxy Policy Core Mem Mem Core I/O device SSD / NIC control data Changwoo Min Solros: Data-Centric OS April 26, 2018 10 / 21
Solros Architecture Control-plane OS: actual OS service + system-wide coordination Data-plane OS: thin communication layer to host processor Co-processor Host processor Application Application ① OS stub OS proxy Policy Core Mem Mem Core I/O device ② SSD / NIC control data Changwoo Min Solros: Data-Centric OS April 26, 2018 10 / 21
Solros Architecture Control-plane OS: actual OS service + system-wide coordination Data-plane OS: thin communication layer to host processor Co-processor Host processor Application Application ① OS stub OS proxy Policy Core Mem Mem Core I/O device ③ ② SSD / NIC control data Changwoo Min Solros: Data-Centric OS April 26, 2018 10 / 21
Solros Architecture Control-plane OS: actual OS service + system-wide coordination Data-plane OS: thin communication layer to host processor Co-processor Host processor Application Application ① OS proxy Policy OS stub Core Mem Mem Core I/O device ③ ② SSD / NIC control data Co-processor has OS abstraction with minimal effort Best use of each of the fat and lean processors Efficient global coordination among devices (policy) Changwoo Min Solros: Data-Centric OS April 26, 2018 10 / 21
Operating System Services 1 Transport service 2 Filesystem service 3 Network service Changwoo Min Solros: Data-Centric OS April 26, 2018 11 / 21
Operating System Services 1 Transport service 2 Filesystem service 3 Network service Changwoo Min Solros: Data-Centric OS April 26, 2018 12 / 21
Transport Service High performance data transfer among devices are challenging: Uniform data transfer among devices High contention in massively-parallel co-processor Asymmetric performance between host processor and co-processor Changwoo Min Solros: Data-Centric OS April 26, 2018 13 / 21
Transport Service High performance data transfer among devices are challenging: Uniform data transfer among devices High contention in massively-parallel co-processor Asymmetric performance between host processor and co-processor Our approach Uniform data transfer ⇒ system-mapped PCIe window High contention ⇒ combining, replication, interleaving, etc. Asymmetric performance ⇒ flexibly configurable (host DMA engine vs. co-processor DMA engine) Changwoo Min Solros: Data-Centric OS April 26, 2018 13 / 21
Transport Service High performance data transfer among devices are challenging: Uniform data transfer among devices High contention in massively-parallel co-processor Asymmetric performance between host processor and co-processor Our approach Uniform data transfer ⇒ system-mapped PCIe window High contention ⇒ combining, replication, interleaving, etc. Asymmetric performance ⇒ flexibly configurable (host DMA engine vs. co-processor DMA engine) See details in the paper Changwoo Min Solros: Data-Centric OS April 26, 2018 13 / 21
Filesystem Service Peer-to-peer operation Buffered operation Co-processor Host processor Application File system proxy ① File system stub File system PCIe DMA engine control data SSD Zero-copy of data between co-processor memory and SSD Minimal data transfer Changwoo Min Solros: Data-Centric OS April 26, 2018 14 / 21
Recommend
More recommend