The Multikernel A new OS architecture for scalable multicore systems Andrew Baumann 1 Paul Barham 2 Pierre-Evariste Dagand 3 Tim Harris 2 Rebecca Isaacs 2 Simon Peter 1 Timothy Roscoe 1 Adrian Schüpbach 1 Akhilesh Singhania 1 1 Systems Group, ETH Zurich 2 Microsoft Research, Cambridge 3 ENS Cachan Bretagne Systems Group | Department of Computer Science | ETH Zurich SOSP, 12th October 2009
Introduction How should we structure an OS for future multicore systems? ◮ Scalability to many cores ◮ Heterogeneity and hardware diversity 12.10.2009 The Multikernel: A new OS architecture for scalable multicore systems 2
System diversity FB DIMM FB DIMM FB DIMM FB DIMM AMD Opteron (Istanbul) MCU MCU MCU MCU L2$ L2$ L2$ L2$ L2$ L2$ L2$ L2$ Full Cross Bar C0 C1 C2 C3 C4 C5 C6 C7 FPU FPU FPU FPU FPU FPU FPU FPU SPU SPU SPU SPU SPU SPU SPU SPU Sun Niagara T2 Intel Nehalem (Beckton) 12.10.2009 The Multikernel: A new OS architecture for scalable multicore systems 3
The interconnect matters Today’s 8-socket Opteron 12.10.2009 The Multikernel: A new OS architecture for scalable multicore systems 4
The interconnect matters Tomorrow’s 8-socket Nehalem 12.10.2009 The Multikernel: A new OS architecture for scalable multicore systems 5
The interconnect matters On-chip interconnects In-Order In-Order In-Order In-Order Multi-threaded Multi-threaded Multi-threaded Display Interface Display Interface Display Interface Display Interface Fixed Function Fixed Function Fixed Function Fixed Function Multi-threaded Wide SIMD Wide SIMD Wide SIMD Wide SIMD I$ I$ I$ I$ I$ I$ D$ D$ D$ D$ D$ D$ I$ I$ I$ I$ D$ D$ D$ D$ Memory Controller Memory Controller Memory Controller Memory Controller Memory Controller Memory Controller Memory Controller Memory Controller Coherent L2 Cache Coherent L2 Cache Coherent L2 Cache Coherent L2 Cache System Interface System Interface System Interface System Interface Texture Logic Texture Logic Texture Logic Texture Logic In-Order In-Order Multi-threaded Multi-threaded Wide SIMD Wide SIMD I$ I$ I$ I$ I$ I$ D$ D$ D$ D$ D$ D$ I$ I$ I$ I$ D$ D$ D$ D$ DDR2 Controller 0 DDR2 Controller 1 SerDes SerDes PROCESSOR CACHE PCIe 0 L2 CACHE Reg File MAC/ L-1I L-1D P P P I-TLB D-TLB PHY 2 1 0 2D DMA UART, MDN TDN HPI, I2C, GbE 0 UDN IDN JTAG,SPI STN SWITCH Flexible GbE 1 I/O Flexible I/O PCIe 1 MAC/ XAUI 1 PHY MAC/ PHY SerDes SerDes DDR2 Controller 3 DDR2 Controller 2 12.10.2009 The Multikernel: A new OS architecture for scalable multicore systems 6
Core diversity ◮ Within a system: ◮ Programmable NICs ◮ GPUs ◮ FPGAs (in CPU sockets) ◮ On a single die: ◮ Performance asymmetry ◮ Streaming instructions (SIMD, SSE, etc.) ◮ Virtualisation support 12.10.2009 The Multikernel: A new OS architecture for scalable multicore systems 7
Summary ◮ Increasing core counts, increasing diversity ◮ Unlike HPC systems, cannot optimise at design time 12.10.2009 The Multikernel: A new OS architecture for scalable multicore systems 8
The multikernel model ◮ It’s time to rethink the default structure of an OS ◮ Shared-memory kernel on every core ◮ Data structures protected by locks ◮ Anything else is a device 12.10.2009 The Multikernel: A new OS architecture for scalable multicore systems 9
The multikernel model ◮ It’s time to rethink the default structure of an OS ◮ Shared-memory kernel on every core ◮ Data structures protected by locks ◮ Anything else is a device ◮ Proposal: structure the OS as a distributed system ◮ Design principles: 1. Make inter-core communication explicit 2. Make OS structure hardware-neutral 3. View state as replicated 12.10.2009 The Multikernel: A new OS architecture for scalable multicore systems 9
Outline Introduction Motivation Hardware diversity The multikernel model Design principles The model Barrelfish Evaluation Case study: Unmap 12.10.2009 The Multikernel: A new OS architecture for scalable multicore systems 10
1. Make inter-core communication explicit ◮ All communication with messages (no shared state) 12.10.2009 The Multikernel: A new OS architecture for scalable multicore systems 11
1. Make inter-core communication explicit ◮ All communication with messages (no shared state) ◮ Decouples system structure from inter-core communication mechanism ◮ Communication patterns explicitly expressed ◮ Naturally supports heterogeneous cores, non-coherent interconnects (PCIe) ◮ Better match for future hardware ◮ ...with cheap explicit message passing (e.g. Tile64) ◮ ...without cache-coherence (e.g. Intel 80-core) ◮ Allows split-phase operations ◮ Decouple requests and responses for concurrency ◮ We can reason about it 12.10.2009 The Multikernel: A new OS architecture for scalable multicore systems 11
Message passing vs. shared memory: experiment Shared memory (move the data to the operation): ◮ Each core updates the same memory locations (no locking) ◮ Cache-coherence protocol migrates modified cache lines ◮ Processor stalled while line is fetched or invalidated ◮ Limited by latency of interconnect round-trips ◮ Performance depends on data size (cache lines) and contention (number of cores) 12.10.2009 The Multikernel: A new OS architecture for scalable multicore systems 12
Shared memory results 4 × 4-core AMD system 12 SHM1 10 Latency (cycles × 1000) 8 6 4 2 0 2 4 6 8 10 12 14 16 Cores 12.10.2009 The Multikernel: A new OS architecture for scalable multicore systems 13
Shared memory results 4 × 4-core AMD system 12 SHM2 SHM1 10 Latency (cycles × 1000) 8 6 4 2 0 2 4 6 8 10 12 14 16 Cores 12.10.2009 The Multikernel: A new OS architecture for scalable multicore systems 13
Shared memory results 4 × 4-core AMD system 12 SHM4 SHM2 SHM1 10 Latency (cycles × 1000) 8 6 4 2 0 2 4 6 8 10 12 14 16 Cores 12.10.2009 The Multikernel: A new OS architecture for scalable multicore systems 13
Shared memory results 4 × 4-core AMD system 12 SHM8 SHM4 Stalled cycles (no locking!) SHM2 10 SHM1 Latency (cycles × 1000) 8 6 4 2 0 2 4 6 8 10 12 14 16 Cores 12.10.2009 The Multikernel: A new OS architecture for scalable multicore systems 13
Message passing vs. shared memory: experiment Message passing (move the operation to the data): ◮ A single server core updates the memory locations ◮ Each client core sends RPCs to the server ◮ Operation and results described in a single cache line ◮ Block while waiting for a response (in this experiment) 12.10.2009 The Multikernel: A new OS architecture for scalable multicore systems 14
Message passing vs. shared memory: tradeoff 4 × 4-core AMD system 12 SHM8 SHM4 SHM2 10 SHM1 Latency (cycles × 1000) MSG1 8 6 4 2 0 2 4 6 8 10 12 14 16 Cores 12.10.2009 The Multikernel: A new OS architecture for scalable multicore systems 15
Message passing vs. shared memory: tradeoff 4 × 4-core AMD system 12 SHM8 SHM4 SHM2 10 SHM1 Latency (cycles × 1000) MSG8 MSG1 8 6 4 2 0 2 4 6 8 10 12 14 16 Cores 12.10.2009 The Multikernel: A new OS architecture for scalable multicore systems 15
Message passing vs. shared memory: tradeoff 4 × 4-core AMD system 12 SHM8 SHM4 SHM2 10 SHM1 Latency (cycles × 1000) MSG8 MSG1 8 6 4 2 Messaging faster for: 0 ≥ 4 cores 2 4 6 8 10 12 14 16 ≥ 4 cache lines Cores 12.10.2009 The Multikernel: A new OS architecture for scalable multicore systems 15
Message passing vs. shared memory: tradeoff 4 × 4-core AMD system 12 SHM8 SHM4 SHM2 10 SHM1 Latency (cycles × 1000) MSG8 MSG1 8 Server 6 4 2 0 2 4 6 8 10 12 14 16 Cores Actual cost of update at server 12.10.2009 The Multikernel: A new OS architecture for scalable multicore systems 15
Message passing vs. shared memory: tradeoff 4 × 4-core AMD system 12 SHM8 SHM4 SHM2 10 SHM1 Latency (cycles × 1000) MSG8 MSG1 8 Server 6 “spare” cycles 4 if RPC was split-phase 2 0 2 4 6 8 10 12 14 16 Cores Actual cost of update at server 12.10.2009 The Multikernel: A new OS architecture for scalable multicore systems 15
2. Make OS structure hardware-neutral ◮ Separate OS structure from hardware ◮ Only hardware-specific parts: ◮ Message transports (highly optimised / specialised) ◮ CPU / device drivers 12.10.2009 The Multikernel: A new OS architecture for scalable multicore systems 16
2. Make OS structure hardware-neutral ◮ Separate OS structure from hardware ◮ Only hardware-specific parts: ◮ Message transports (highly optimised / specialised) ◮ CPU / device drivers ◮ Adaptability to changing performance characteristics ◮ Late-bind protocol and message transport implementations 12.10.2009 The Multikernel: A new OS architecture for scalable multicore systems 16
3. View state as replicated ◮ Potentially-shared state accessed as if it were a local replica ◮ Scheduler queues, process control blocks, etc. 12.10.2009 The Multikernel: A new OS architecture for scalable multicore systems 17
Recommend
More recommend